Python Programming Challenge 3: Comparing Two CSV Files


Today’s challenge is very straightforward, we need to write a simple Python program to compare two CSV files to determine if there are any differences between them. 

For each line, if they are different, we output the line number followed by the contents of the line from both files.

marks1.csv

For instance, if the content of the first CSV file is

Peter,30
Kevin,12
Sam,15
Oliver,45


marks2.csv

and the content of the second CSV file is
Peter,30
Kevin,32
Sam,15
Oliver,49

The program should give us the following output:

Differences Found
Line 2 (Kevin,12 vs Kevin,32)
Line 4 (Oliver,45 vs Oliver,49)

Here’s a video showing how the program works. 

The suggested solution and a run-through of it can be found below.

Suggested Solution:

f = open ('marks1.csv', 'r')
g = open ('marks2.csv', 'r')
line = 0

print('Differences Found')

while True:
    lineF = f.readline().strip()
    lineG = g.readline().strip()
    line += 1

    if (lineF or lineG):
        if (lineF != lineG):
            print("Line %d (%s vs %s)" %(line, lineF, lineG))
    else:
        f.close()
        g.close()
        break

Main Concepts Used:

  • File operations
  • while loop

Run Through:

We begin by opening two files (marks1.csv and marks2.csv) and storing them into the variables f and g. Next, we declare a variable called line and initialize it to 0. We also use the print() function to print the line ‘Differences Found’.

Now we are ready to loop through the files.

We use a while True loop to do that. This is an infinite loop that will keep running until we somehow end it. 

Inside the loop, we use the readline() method to read the two files line by line. Notice that we add the strip() method after the readline() method for both files.

The strip() method is a built-in Python method that removes all the leading and trailing spaces from a string. (We can also use it to remove other characters, but the default setting is to remove spaces.)

We do that because we don’t want the program to conclude that two lines are different simply because one line has space at the end while the other does not. For our purpose, we want the program to treat ‘Hello’ and ‘Hello   ‘ as the same.

After we read one line from each file, we add 1 to the variable line.

Next, we use if (lineF or lineG) to determine if there’s any content read. As long as there is content read, lineF and lineG will not be empty and the condition will evaluate to True

When that happens, we compare lineF with lineG and display a message showing the difference between lineF and lineG if they are different.

On the other hand, if the end of both files is reached, lineF and lineG will be empty. In that case, we close both files and use the break command to break out of the loop.

Written by Jamie | Last Updated September 18, 2019

Recent Posts