Today’s challenge is very straightforward, we need to write a simple Python program to compare two CSV files to determine if there are any differences between them.
For each line, if they are different, we output the line number followed by the contents of the line from both files.
For instance, if the content of the first CSV file is
Peter,30
Kevin,12
Sam,15
Oliver,45
and the content of the second CSV file is
Peter,30
Kevin,32
Sam,15
Oliver,49
The program should give us the following output:
Differences Found
Line 2 (Kevin,12 vs Kevin,32)
Line 4 (Oliver,45 vs Oliver,49)
Here’s a video showing how the program works.
The suggested solution and a run-through of it can be found below.
Suggested Solution:
f = open ('marks1.csv', 'r')
g = open ('marks2.csv', 'r')
line = 0
print('Differences Found')
while True:
lineF = f.readline().strip()
lineG = g.readline().strip()
line += 1
if (lineF or lineG):
if (lineF != lineG):
print("Line %d (%s vs %s)" %(line, lineF, lineG))
else:
f.close()
g.close()
break
Main Concepts Used:
- File operations
- while loop
Run Through:
We begin by opening two files (marks1.csv and marks2.csv) and storing them into the variables f
and g
. Next, we declare a variable called line
and initialize it to 0. We also use the print()
function to print the line ‘Differences Found’.
Now we are ready to loop through the files.
We use a while True
loop to do that. This is an infinite loop that will keep running until we somehow end it.
Inside the loop, we use the readline()
method to read the two files line by line. Notice that we add the strip()
method after the readline()
method for both files.
The strip()
method is a built-in Python method that removes all the leading and trailing spaces from a string. (We can also use it to remove other characters, but the default setting is to remove spaces.)
We do that because we don’t want the program to conclude that two lines are different simply because one line has space at the end while the other does not. For our purpose, we want the program to treat ‘Hello’ and ‘Hello ‘ as the same.
After we read one line from each file, we add 1 to the variable line
.
Next, we use if (lineF or lineG)
to determine if there’s any content read. As long as there is content read, lineF
and lineG
will not be empty and the condition will evaluate to True
.
When that happens, we compare lineF
with lineG
and display a message showing the difference between lineF
and lineG
if they are different.
On the other hand, if the end of both files is reached, lineF
and lineG
will be empty. In that case, we close both files and use the break
command to break out of the loop.