Today’s challenge requires some high school statistics.
Specifically, here’s what we need to do.
First, we need to read from a text file (marks.txt) that consists of integers separated by commas.
Next, we need to construct a frequency distribution table for these integers.
A frequency distribution table is a table that summarizes values and their frequencies.
For instance, suppose we have the following data that represents the marks of students for a particular test
1, 1, 0, 1, 4, 5, 2, 3, 3, 4, 6
A frequency distribution table summarizes the data into
Marks = 0, Frequency = 1
Marks = 1, Frequency = 3
Marks = 2, Frequency = 1
Marks = 3, Frequency = 2
Marks = 4, Frequency = 2
Marks = 5, Frequency = 1
Marks = 6, Frequency = 1
After summarizing the data into a table, we need to calculate and display the sum and mean of the data (correct to 2 decimal places).
To calculate the mean, we divide the sum by the number of numbers.
Using the data above, the sum is 1 + 1 + 0 + 1 + 4 + 5 + 2 + 3 + 3 + 4 + 6 = 30. The mean is 30/11 = 2.73.
For those of you who are familiar with using Python for data analysis, you would probably be familiar with using Pandas to perform these calculations. However, we won’t be using Pandas here. Instead, we’ll be using basic Python.
In summary, the task today is to write a program in Python that does the following:
- Read from a text file (marks.txt) that consists of integers separated by commas
- Construct the frequency distribution table of all the integers in the file
- Calculate the sum and mean of all the integers in the file
Here’s a video showing how the program works.
The suggested solution and a run-through of it can be found below.
Suggested Solution:
marks = []
f = open ('marks.txt', 'r')
for line in f:
line = line.rstrip()
marks = marks + line.split(', ')
marks = list(map(int, marks))
smallest = min(marks)
greatest = max(marks)
for i in range(smallest, greatest + 1):
frequency = marks.count(i)
print('Marks = %d, Freq = %d' %(i, frequency))
print('\nThe sum is %d.' % (sum(marks)))
print('The mean is %4.2f.' %(sum(marks)/len(marks)))
Main Concepts Used:
File operationsfor
loops
Lists
String Functionsmap()
function
Run Through:
First, we declare an empty list called marks
. Next, we open the marks.txt file and assign it to f
.
In the demo video above, the marks.txt file used consists of the following content:
1, 2, 0, 10, 7
3, 4, 1, 5, 6
8, 9, 10, 1, 4
We first use a for
loop to loop through this file line by line.
Within the for
loop, we use the rstrip()
method to remove the trailing newline character from each line.
The reason for this is each line read from the file actually has a newline character (\n
) at the end of it. This is the character we get when we press ‘Enter’ on our keyboard to move the cursor to the next line.
For instance, the first line in marks.txt is read in as '1, 2, 0, 10, 7\n'
.
The rstrip()
method is needed to remove the '\n'
character at the end of the string. We need to remove this else we’ll get the list ['1', '2', '0', '10', '7\n']
when we use the split()
method. This will affect the frequency distribution table later as '7\n'
will not be counted correctly.
After we remove the '\n'
character, we use the split()
method to split each line into a list of strings.
For instance, '1, 2, 0, 10, 7'
becomes ['1', '2', '0', '10', '7']
.
Once we have split the line correctly, we use the + operator to append the resulting list to marks. The + operator can be used to append one list to another. For instance, if
firstList = ['1', '2', '3', '4', '5']
secondList = ['0', '4', '5', '6', '9']
firstList + secondList
gives us ['1', '2', '3', '4', '5', '0', '4', '5', '6', '9']
.
After we read all the lines from marks.txt and into the marks
list, we need to convert marks
(which is a list of strings) into a list of integers. To do that, we use the map()
function.
The map()
function is a built-in Python function that applies a function to all the elements of an iterable (such as a list). It accepts the names of the function and the iterable as arguments and returns the result as a map object.
Here, we use the map()
function (map(int, marks))
) to apply the int()
function to all elements in marks
.
As the map()
function returns a map object, we use the built-in list()
function to convert the result back into a list.
marks
now becomes a list of integers. With that, we are ready to print the frequency distribution table.
For this, we need to first get the smallest and greatest number in marks
. We use the built-in min()
and max()
functions to do that and assign the results to the variables smallest
and greatest
respectively.
Next, we use the range()
function to loop from the smallest to the greatest number. Recall that range(a, b)
gives us consecutive numbers from a
to b-1
? Suppose we want to loop from 1 to 5 inclusive, we need to write for i in range(1, 6)
. Hence, in our case, we wrote for i in range(smallest, greatest + 1)
.
Within the for
loop, we use the count()
method to count the number of occurrences of the variable i
in marks
.
For instance, when i = 2
, marks.count(2)
gives us the number of occurrences of the integer 2 in marks
. We assign this to a variable called frequency
.
Next, we print out the values of i
and frequency
. This gives us the frequency distribution for each of the integers in marks
.
Once the frequency table is complete, we simply use the built-in sum()
function to calculate the sum of the elements in marks
and print it out. Next, we divide the sum by the number of elements in marks
to calculate the mean and print it out.