Compare Two DataFrames Row by Row (2024)

Pandas DataFrame.compare()function is used to compare given DataFrames row by row along with the specified align_axis. Sometimes we have two or more DataFrames having the same data with slight changes, in those situations we need to observe the difference between two DataFrames. By default, compare() function compares two DataFrames column-wise and returns the differences side by side. It can compare only DataFrames having the same shape with the same dimensions and having the same row indexes and column labels.

In this article, I will explain using compare() function, its syntax, and parameters how we can compare the two DataFrames row by row with examples.

1. Quick Examples of Compare Two DataFrames Row by Row

If you are in a hurry, below are some quick examples of comparing two DataFrames row by row.

# Below are quick examples# Example 1: Compare two DataFrames row by rowdiff = df.compare(df1, align_axis = 0)# Example 2: To ignore NaN values set keep_equal=Truediff = df.compare(df1, keep_equal=True, align_axis = 0)# Example 3: Set keep_shape = true and keep same shape diff = df.compare(df1, keep_shape = True, align_axis = 0)# Example 4: Get differences of DataFrames keep equal values and shapediff = df.compare(df1, keep_equal=True, keep_shape = True, align_axis = 0)

2. Syntax of Pandas df.compare()

Following is the syntax of the Pandas compare() function.

# Following is the syntax of compare() functionDataFrame.compare(other, align_axis=1, keep_shape=False, keep_equal=False, result_names=('self', 'other'))

2.1 Parameters

Following are the parameters of the compare() function.

  • Other: It is a DataFrame Object and is used to compare with a given DataFrame.
  • align_axis: It defines the axis of comparison. The default value is 1 for columns. If it is set with 0for rows. For columns resulting differences are merged vertically whereas, for rows resulting differences are merged horizontally.
  • keep_shape: (bool), Default value is False. If it is True, all rows and columns exist along with different values. Otherwise, only different values exist.
  • keep_equal :(bool) Default value is False. If it is True, keep all equal values instead of NaN values.
  • result_names : (tuple): Default (‘self’, ‘other’)

2.2 Return Value

It returns DataFrame where the elements are differences of given DataFrames. Resulting in DataFrame having a multi-index with ‘self’ and ‘other’ are at the innermost level of the row index.

Create DataFrame

Now, Let’screate Pandas DataFrameusing data from aPython dictionary, where the columns areCourses,Fee,DurationandDiscount.

# Create DataFrameimport pandas as pdimport pandas as pdtechnologies = ({ 'Courses':["Spark", "NumPY", "pandas", "Java", "PySpark"], 'Fee' :[20000,25000,30000,22000,26000], 'Duration':['30days','40days','35days','60days','50days'], 'Discount':[1000,2500,1500,1200,3000] })technologies1 = ({ 'Courses':["Spark", "Hadoop", "pandas", "Java", "PySpark"], 'Fee' :[20000,24000,30000,22000,21000], 'Duration':['30days','40days','35days','60days','50days'], 'Discount':[1000,2500,1500,1200,3000] })df = pd.DataFrame(technologies)print("DataFrame1:\n", df)df1 = pd.DataFrame(technologies1)print("DataFrame2:\n", df1) 

Yields below output.

3. Usage of Pandas DataFrame.compare() Function.

Pandas DataFrame.compare() function compares two equal sizes and dimensions of DataFrames row by row along with align_axis = 0 and returns The DataFrame with unequal values of given DataFrames. By default, it compares the DataFrames column by column. If we want to get the same sized resulting DataFrame we can use its parameter keep_shape and use keep_equal param to avoid NaN values in the resulting DataFrame.

Let’s use this function on given DataFrames along with align_axis=0 to find the difference between two DataFrames row by row.

# Comparing the two DataFrames row by rowdiff = df.compare(df1, align_axis = 0)print(" After comparing two DataFrames:\n", diff)

Yields below output.

Compare Two DataFrames Row by Row (2)

As we can see from the above, differences have been added one by one in the resultant DataFrame.

4. Pass keep_equal into compare() & Compare

As we can see from the above, the resulting DataFrame has been obtained where equal values are treated as NaN values. So, overcome the NaN values by settingkeep_equalasTrue then and pass into compare() function. It will override the NaN values with equal values of given DataFrames.

# Ignore NaN values pass keep_equal=Truediff = df.compare(df1, keep_equal=True, align_axis = 0)print(" After comparing two DataFrames:\n", diff)

Yields below output.

# Output:# After comparing two DataFrames: Courses Fee1 self NumPy 25000 other Hadoop 240004 self Pyspark 26000 other Pyspark 21000

5. Pass keep_shape into compare() & Compare Pandas Row by Row

If we want to get the same-sized resulting DataFrame, we can set keep_shape as True and then pass it to the compare() function. It will return the same-sized DataFrame where equal values are treated as NaN values. For example,

# Set keep_shape = true and keep same shape diff = df.compare(df1, keep_shape = True, align_axis = 0)print(" After comparing two DataFrames:\n", diff)

Yields below output.

# Output:# After comparing two DataFrames: Courses Fee Duration Discount0 self NaN NaN NaN NaN other NaN NaN NaN NaN1 self NumPy 25000.0 NaN NaN other Hadoop 24000.0 NaN NaN2 self NaN NaN NaN NaN other NaN NaN NaN NaN3 self NaN NaN NaN NaN other NaN NaN NaN NaN4 self NaN 26000.0 NaN NaN other NaN 21000.0 NaN NaN

6. Pass keep_equal & keep_shape into compare()

Set keep_shapeandkeep_equalasTrue and pass them into the compare() function it will return the same-sized resulting DataFrame along with equal values of given DataFrames.

# Get differences of DataFrames keep equal values and shapediff = df.compare(df1, keep_equal=True, keep_shape = True, align_axis = 0)print(" After comparing two DataFrames:\n", diff)

Yields below output.

# Output:# After comparing two DataFrames: Courses Fee Duration Discount0 self Spark 20000 30days 1000 other Spark 20000 30days 10001 self NumPy 25000 40days 2500 other Hadoop 24000 40days 25002 self pandas 30000 35days 1500 other pandas 30000 35days 15003 self Java 22000 60days 1200 other Java 22000 60days 12004 self Pyspark 26000 50days 3000 other Pyspark 21000 50days 3000

7. Conclusion

In this article, I have explained DataFrame.compare() function and using its syntax, and parameters how we can compare the two DataFrames row by row along with multiple examples

Related Articles

  • How to stack the Pandas DataFrame?
  • How to unstack the Pandas DataFrame?
  • Pandas Difference Between Two DataFrames
  • How to Plot Columns of Pandas DataFrame
  • How to Add Plot Legends in Pandas?
  • Pandas DataFrame insert() Function
  • How to Get Size of Pandas DataFrame?
  • How to Convert Pandas DataFrame to List?
  • How to Convert Pandas to PySpark DataFrame
  • Pandas Series.isin() Function
  • Pandas.Series.combine()
  • Pandas Rolling Sum

Reference

Compare Two DataFrames Row by Row (2024)

References

Top Articles
Latest Posts
Article information

Author: Laurine Ryan

Last Updated:

Views: 6063

Rating: 4.7 / 5 (57 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Laurine Ryan

Birthday: 1994-12-23

Address: Suite 751 871 Lissette Throughway, West Kittie, NH 41603

Phone: +2366831109631

Job: Sales Producer

Hobby: Creative writing, Motor sports, Do it yourself, Skateboarding, Coffee roasting, Calligraphy, Stand-up comedy

Introduction: My name is Laurine Ryan, I am a adorable, fair, graceful, spotless, gorgeous, homely, cooperative person who loves writing and wants to share my knowledge and understanding with you.