In today’s post, we’ll learn 2 different ways to compare if two strings are equal in Python. We’ll also learn to do other forms of comparisons (such as whether one string is greater than another).
Last but not least, we’ll work on a practice question that compares if two strings are equal, ignoring non-letters and upper and lower case differences.
Here’s some topics we’ll cover in today’s post:
- How to compare if two strings are equal in Python
- How does the Python interpreter compare two strings
- What is string interning
- What is the difference between the
==
andis
operators
Table of Contents
Key Concepts
How to determine if two strings are equal in Python
First, let’s talk about two different ways to determine if strings are equal in Python. The two ways are:
- using the
==
operator - using the
is
operator
Using the == operator
The most straightforward method to determine if two strings are equal in Python is to use the ==
operator. This operator compares if two strings have the same value and returns True
if they do. Else, it returns False
.
Comparison is done one character at a time, using the Unicode code points of each character. Let’s look at some examples to illustrate what this means:
mainStr = 'abcd'
str1 = 'abce'
str2 = 'abcde'
str3 = 'ABCD'
str4 = 'abcd'
str5 = ''.join(['a', 'b', 'c', 'd'])
print(mainStr == str1)
print(mainStr == str2)
print(mainStr == str3)
print(mainStr == str4)
print(mainStr == str5)
Here, we first declare and initialise a string called mainStr
with the value 'abcd'
.
Next, we declare and initialise five other strings to compare with mainStr
.
If you run the code above, you’ll get the following output:
False
False
False
True
True
When we compare mainStr
with str1
(on line 8), Python compares the first character in mainStr
with the first character in str1
. Since both characters are the same ('a'
), it moves on to the second character.
It keeps doing that (comparing character by character) for all the characters in both strings. If it encounters a character that is different, it returns False
. This happens at the fourth character. As 'e'
is not equal to 'd'
, Python returns False
.
Next, we compare mainStr
with str2
. As both strings have different lengths, the strings are not equal. Hence, Python returns False
.
On line 10, we compare mainStr
with str3
and get False
as the output. This is because string comparison is done using Unicode code points.
Unicode is a universal standard for encoding characters and symbols. It does that by assigning a code point to every character and symbol in every language in the world.
'a'
is assigned the code point 97, while 'A'
is assigned the code point 65. Hence, 'a'
is not equal to 'A'
. The same applies to the rest of the characters in mainStr
vs str3
. As a result, we get False
when we compare the two strings. This example demonstrates the fact the string comparison using the ==
operator is case sensitive.
On line 11, we compare mainStr
with str4
. As both strings are identical, we get True
as the output.
Finally, on line 12, we compare mainStr
with str5
.
str5
is formed by applying the join()
method on the elements in the list ['a', 'b', 'c', 'd']
. This method joins all the elements in the list using a specified string as the separator. This specified string refers to the string that is used to call the method (i.e. the string before the .
operator).
Since we use the empty string (''
) to call the join()
method in our example, the elements in ['a', 'b', 'c', 'd']
are joined without any separator. This gives us 'abcd'
as the result.
Hence, when we compare mainStr
with str5
, we get True
as the output.
Using the is operator
Besides using the ==
operator, we can use the is
operator to compare two strings. This operator is stricter than the ==
operator.
While the ==
operator only requires both strings to have the same value, the is
operator requires both strings to be identical. In other words, the is
operator evaluates to True
only if the variables on either side of the operator point to the same object.
To appreciate what this means, one has to understand how Python handles strings. Suppose we have the following code:
msg1 = 'Hello'
msg2 = 'Hello'
msg3 = 'Hello'
msg4 = 'Hello'
msg5 = 'Hello'
Here, we assign the same string literal (i.e. strings that are literally typed into the program source code, surrounded by quotation marks) to five different variables – msg1
, msg2
, msg3
, msg4
and msg5
.
When we do that, Python does not create five strings. Instead, it creates a single string and assigns a reference to that string to all five variables. This is known as string interning and is done to help us save memory.
As a result of string interning, instead of needing memory space for five strings, we only need one.
msg1
, msg2
, msg3
, msg4
and msg5
all point to the same memory location. This is illustrated in the diagram below:
We can verify that all variables point to the same string using the id()
function. This function returns the unique numeric identifier associated with an object (such as a string), based on the memory location of that object.
If you run the following code,
msg1 = 'Hello'
msg2 = 'Hello'
msg3 = 'Hello'
msg4 = 'Hello'
msg5 = 'Hello'
msg6 = ''.join(['H', 'e', 'l', 'l', 'o'])
print(id(msg1))
print(id(msg2))
print(id(msg3))
print(id(msg4))
print(id(msg5))
print(id(msg6))
you’ll get an output similar to what is shown below:
140586103726320
140586103726320
140586103726320
140586103726320
140586103726320
140586100793264
The exact values you get will be different from what is shown above, as the id of a string changes each time the program runs. However, regardless of the values you get, you’ll see that the first five values are the same, while the last value is different from the rest.
This is because when Python sees lines 1 to 5 in the code above, it creates a single string 'Hello'
and assigns a reference to that string to all five variables.
In contrast, when it sees line 6, it creates a new string using the join()
method and assigns a reference to that string to msg6
.
String interning is not done for strings created using the join()
method.
As a result, msg1
, msg2
, msg3
, msg4
and msg5
point to the same memory location while msg6
points to a different location.
A full discussion of when Python interns strings and when it doesn’t is beyond the scope of this post. If you are interested in finding out more, you can check out this useful post here.
Now that we understand how Python handles strings, let’s look at an example of how the ==
operator differs from the is
operator.
str1 = 'abcd'
str2 = 'abcd'
str3 = ''.join(['a', 'b', 'c', 'd'])
print('ID of strings')
print(id(str1))
print(id(str2))
print(id(str3))
print()
print('== operator')
print(str1 == str2)
print(str1 == str3)
print()
print('is operator')
print(str1 is str2)
print(str1 is str3)
If you run the code above, you’ll get the following output:
ID of strings
140652046683824
140652046683824
140652046683696
== operator
True
True
is operator
True
False
If you study the output above, you’ll see that the ==
operator returns True
as long as the values of the two strings are equal. Hence, it returns True
for both str1 == str2
and str1 == str3
.
In contrast, the is
operator returns True
only when the two variables point to the same memory location (i.e. they point to the same string). Hence, it returns True
for str1 is str2
and False
for str1 is str3
.
Other comparisons for strings (!=, is not, >, >=, <, <=)
Next, let’s proceed to discuss other comparisons for strings.
These comparisons include comparing if two strings are not equal (!=
, is not
), or if one string is greater than (>
) or smaller than another (<
).
Let’s look at some examples. Suppose we have a string called mainStr
declared as follows:
mainStr = 'pqrs'
we’ll use this string to do various comparisons.
!= and is not
First, let’s look at how the !=
and is not
operators work:
str1 = 'pqrs'
str2 = ''.join(['p', 'q', 'r', 's'])
#!= operator
print('!= operator')
print(mainStr != str1)
print(mainStr != str2)
print()
#is not operator
print('is not operator')
print(mainStr is not str1)
print(mainStr is not str2)
print()
If you run the code above, you’ll get the following output:
!= operator
False
False
is not operator
False
True
Here, we compare mainStr
with str1
and str2
. As mainStr
, str1
and str2
all have the same value, mainStr != str1
and mainStr != str2
both return False
.
In contrast, due to string interning, mainStr
and str1
are the same string while str2
is a different string from mainStr
(i.e. they are stored in different memory locations). Hence, mainStr is not str1
returns False
while mainStr is not str2
returns True
.
> and < operators
Next, let’s look at the >
and <
operators:
str3 = 'paxy'
print(mainStr > str3)
print(mainStr < str3)
If you run the code above, you’ll get the following output:
True
False
Here, we compare mainStr
('pqrs'
) with str3
('paxy'
).
As the second character in mainStr
('q'
) has a greater Unicode code point than the second character in str3
('a'
), mainStr
is considered greater than str3
. Hence, mainStr > str3
returns True
.
> vs >= operators
Next, let’s compare the >
vs >=
operators.
The >
operator returns True
when the string on the left is greater than the string on the right. In contrast, the >=
operator returns True
when the string on the left is greater than or equal to the string on the right.
If you run the code below:
print(mainStr > str1)
print(mainStr == str1)
print(mainStr >= str1)
you’ll get the following output:
False
True
True
Here, mainStr >= str1
is True
(the last output) as mainStr
equals str1
(even though it is not greater than str1
).
Uppercase vs lowercase
After the >
and >=
operators, let’s look at the difference between uppercase and lowercase letters.
If you run the code below:
str4 = 'PQRS'
print(mainStr > str4)
print(mainStr < str4)
you’ll get the following output:
True
False
As lowercase characters have a greater Unicode code point than uppercase letters, 'p'
is considered to be greater than 'P'
. Hence, mainStr
('pqrs'
) is considered to be greater than str4
('PQRS'
). Therefore, mainStr > str4
returns True
.
Comparing strings of different length
Last but not least, let’s look at how Python compares strings of different lengths.
If you run the code below:
str5 = 'pqrst'
str6 = 'z'
print(mainStr > str5)
print(mainStr < str5)
print(mainStr > str6)
print(mainStr < str6)
you’ll get the following output:
False
True
False
True
The output shows that mainStr
is smaller than both str5
and str6
.
This is because all the characters in mainStr
('pqrs'
) are the same as the characters in str5
('pqrst'
). However, str5
has an extra character 't'
. Hence, mainStr
is considered to be smaller than str5
.
Next, mainStr
is considered to be smaller than str6
as its first character 'p'
is smaller than the first (and only) character of str6
('z'
).
Clear?
That’s it. We have now covered all the string comparison operators in Python. Let’s move on to the practice question for today.
Practice Question
Today’s practice question requires us to write a function called compareStr()
that has two parameters – a
and b
, both of which are strings.
The function compares the two strings and returns True
if the two strings are equal, False
otherwise.
However, the function is case-insensitive. In addition, it ignores any non-letters in the string.
For instance, 'Hello'
and 'hello'
are considered to be equal. In addition, 'Hello, how are you?'
and 'hello how are you???'
are also considered to be equal.
Expected Results
To test your function, you can use the code below:
print(compareStr('ABCD', 'abcd'))
print(compareStr('!@%$#@ABCD', 'ABCD'))
print(compareStr('Good morning! How are you feeling?', 'Good morning... how are you feeling???'))
print(compareStr('abc', 'abcd'))
print(compareStr('abcd', 'pq'))
you should get the following output:
True
True
True
False
False
Suggested Solution
Here’s the suggested solution for today’s practice:
def compareStr(a, b):
a = a.lower()
b = b.lower()
new_a = ''
new_b = ''
for i in range(len(a)):
if ord(a[i]) >= 97 and ord(a[i]) <= 122:
new_a += a[i]
for i in range(len(b)):
if ord(b[i]) >= 97 and ord(b[i]) <= 122:
new_b += b[i]
if new_a == new_b:
return True
else:
return False
Here, we first define a function called compareStr()
that has two parameters – a
and b
.
Inside the function, we convert a
and b
to lowercase (using the built-in lower()
function) and assign them back to their respective variables. This is necessary as the compareStr()
function is case-insensitive. When we convert both a
and b
to lowercase, we will not get a different answer due to any case differences. (We could have converted both of them to uppercase too.)
Next, we use two for
loops to check for non-letters in a
and b
. The first for
loop (from lines 8 to 10) checks for non-letters in a
.
ord()
is a built-in Python function that gives the Unicode code point for a character. For instance, ord('p')
gives the code point for the character 'p'
. As the Unicode code points for lowercase letters are from 97 to 122, the if
condition on line 9 evaluates to True
if the current character in a
is a lowercase letter. When that happens, we append the character to a string called new_a
(on line 10).
We repeat the same procedure for b
using the for
loop from lines 12 to 14.
After the two for
loops, new_a
and new_b
will contain all the letters in a
and b
respectively (without any other characters). For instance, if a
equals '123hello!!!'
, new_a
equals 'hello'
.
Next, we simply use the ==
operator to check if new_a
equals new_b
. If they are equal to each other, we return True
. Else, we return False
.
With that, the function is complete. That’s all for today’s practice question.