How to compare strings in Python – A complete guide


In today’s post, we’ll learn 2 different ways to compare if two strings are equal in Python. We’ll also learn to do other forms of comparisons (such as whether one string is greater than another).

Last but not least, we’ll work on a practice question that compares if two strings are equal, ignoring non-letters and upper and lower case differences.

Here’s some topics we’ll cover in today’s post:

Key Concepts

How to determine if two strings are equal in Python

First, let’s talk about two different ways to determine if strings are equal in Python. The two ways are:

  • using the == operator
  • using the is operator

Using the == operator

The most straightforward method to determine if two strings are equal in Python is to use the == operator. This operator compares if two strings have the same value and returns True if they do. Else, it returns False.

Comparison is done one character at a time, using the Unicode code points of each character. Let’s look at some examples to illustrate what this means:

mainStr = 'abcd'
str1 = 'abce'
str2 = 'abcde'
str3 = 'ABCD'
str4 = 'abcd'
str5 = ''.join(['a', 'b', 'c', 'd'])

print(mainStr == str1)
print(mainStr == str2)
print(mainStr == str3)
print(mainStr == str4)
print(mainStr == str5)

Here, we first declare and initialise a string called mainStr with the value 'abcd'.

Next, we declare and initialise five other strings to compare with mainStr.

If you run the code above, you’ll get the following output:

False
False
False
True
True

When we compare mainStr with str1 (on line 8), Python compares the first character in mainStr with the first character in str1. Since both characters are the same ('a'), it moves on to the second character.

It keeps doing that (comparing character by character) for all the characters in both strings. If it encounters a character that is different, it returns False. This happens at the fourth character. As 'e' is not equal to 'd', Python returns False.

Next, we compare mainStr with str2. As both strings have different lengths, the strings are not equal. Hence, Python returns False.

On line 10, we compare mainStr with str3 and get False as the output. This is because string comparison is done using Unicode code points.

Unicode is a universal standard for encoding characters and symbols. It does that by assigning a code point to every character and symbol in every language in the world.

'a' is assigned the code point 97, while 'A' is assigned the code point 65. Hence, 'a' is not equal to 'A'. The same applies to the rest of the characters in mainStr vs str3. As a result, we get False when we compare the two strings. This example demonstrates the fact the string comparison using the == operator is case sensitive.

On line 11, we compare mainStr with str4. As both strings are identical, we get True as the output.

Finally, on line 12, we compare mainStr with str5.

str5 is formed by applying the join() method on the elements in the list ['a', 'b', 'c', 'd']. This method joins all the elements in the list using a specified string as the separator. This specified string refers to the string that is used to call the method (i.e. the string before the . operator).

Since we use the empty string ('') to call the join() method in our example, the elements in ['a', 'b', 'c', 'd'] are joined without any separator. This gives us 'abcd' as the result.

Hence, when we compare mainStr with str5, we get True as the output.

Using the is operator

Besides using the == operator, we can use the is operator to compare two strings. This operator is stricter than the == operator.

While the == operator only requires both strings to have the same value, the is operator requires both strings to be identical. In other words, the is operator evaluates to True only if the variables on either side of the operator point to the same object.

To appreciate what this means, one has to understand how Python handles strings. Suppose we have the following code:

msg1 = 'Hello'
msg2 = 'Hello'
msg3 = 'Hello'
msg4 = 'Hello'
msg5 = 'Hello'

Here, we assign the same string literal (i.e. strings that are literally typed into the program source code, surrounded by quotation marks) to five different variables – msg1, msg2, msg3, msg4 and msg5.

When we do that, Python does not create five strings. Instead, it creates a single string and assigns a reference to that string to all five variables. This is known as string interning and is done to help us save memory.

As a result of string interning, instead of needing memory space for five strings, we only need one.

msg1, msg2, msg3, msg4 and msg5 all point to the same memory location. This is illustrated in the diagram below:

We can verify that all variables point to the same string using the id() function. This function returns the unique numeric identifier associated with an object (such as a string), based on the memory location of that object.

If you run the following code,

msg1 = 'Hello'
msg2 = 'Hello'
msg3 = 'Hello'
msg4 = 'Hello'
msg5 = 'Hello'
msg6 = ''.join(['H', 'e', 'l', 'l', 'o'])

print(id(msg1))
print(id(msg2))
print(id(msg3))
print(id(msg4))
print(id(msg5))
print(id(msg6))

you’ll get an output similar to what is shown below:

140586103726320
140586103726320
140586103726320
140586103726320
140586103726320
140586100793264

The exact values you get will be different from what is shown above, as the id of a string changes each time the program runs. However, regardless of the values you get, you’ll see that the first five values are the same, while the last value is different from the rest.

This is because when Python sees lines 1 to 5 in the code above, it creates a single string 'Hello' and assigns a reference to that string to all five variables.

In contrast, when it sees line 6, it creates a new string using the join() method and assigns a reference to that string to msg6.

String interning is not done for strings created using the join() method.

As a result, msg1, msg2, msg3, msg4 and msg5 point to the same memory location while msg6 points to a different location.

A full discussion of when Python interns strings and when it doesn’t is beyond the scope of this post. If you are interested in finding out more, you can check out this useful post here.

Now that we understand how Python handles strings, let’s look at an example of how the == operator differs from the is operator.

str1 = 'abcd'
str2 = 'abcd'
str3 = ''.join(['a', 'b', 'c', 'd'])

print('ID of strings')
print(id(str1))
print(id(str2))
print(id(str3))
print()

print('== operator')
print(str1 == str2)
print(str1 == str3)
print()

print('is operator')
print(str1 is str2)
print(str1 is str3)

If you run the code above, you’ll get the following output:

ID of strings
140652046683824
140652046683824
140652046683696

== operator
True
True

is operator
True
False

If you study the output above, you’ll see that the == operator returns True as long as the values of the two strings are equal. Hence, it returns True for both str1 == str2 and str1 == str3.

In contrast, the is operator returns True only when the two variables point to the same memory location (i.e. they point to the same string). Hence, it returns True for str1 is str2 and False for str1 is str3.

Other comparisons for strings (!=, is not, >, >=, <, <=)

Next, let’s proceed to discuss other comparisons for strings.

These comparisons include comparing if two strings are not equal (!=, is not), or if one string is greater than (>) or smaller than another (<).

Let’s look at some examples. Suppose we have a string called mainStr declared as follows:

mainStr = 'pqrs'

we’ll use this string to do various comparisons.

!= and is not

First, let’s look at how the != and is not operators work:

str1 = 'pqrs'
str2 = ''.join(['p', 'q', 'r', 's'])

#!= operator
print('!= operator')
print(mainStr != str1)
print(mainStr != str2)
print()

#is not operator
print('is not operator')
print(mainStr is not str1)
print(mainStr is not str2)
print()

If you run the code above, you’ll get the following output:

!= operator
False
False

is not operator
False
True

Here, we compare mainStr with str1 and str2. As mainStr, str1 and str2 all have the same value, mainStr != str1 and mainStr != str2 both return False.

In contrast, due to string interning, mainStr and str1 are the same string while str2 is a different string from mainStr (i.e. they are stored in different memory locations). Hence, mainStr is not str1 returns False while mainStr is not str2 returns True.

> and < operators

Next, let’s look at the > and < operators:

str3 = 'paxy'

print(mainStr > str3)
print(mainStr < str3)

If you run the code above, you’ll get the following output:

True
False

Here, we compare mainStr ('pqrs') with str3 ('paxy').

As the second character in mainStr ('q') has a greater Unicode code point than the second character in str3 ('a'), mainStr is considered greater than str3. Hence, mainStr > str3 returns True.

> vs >= operators

Next, let’s compare the > vs >= operators.

The > operator returns True when the string on the left is greater than the string on the right. In contrast, the >= operator returns True when the string on the left is greater than or equal to the string on the right.

If you run the code below:

print(mainStr > str1)
print(mainStr == str1)
print(mainStr >= str1)

you’ll get the following output:

False
True
True

Here, mainStr >= str1 is True (the last output) as mainStr equals str1 (even though it is not greater than str1).

Uppercase vs lowercase

After the > and >= operators, let’s look at the difference between uppercase and lowercase letters.

If you run the code below:

str4 = 'PQRS'

print(mainStr > str4)
print(mainStr < str4)

you’ll get the following output:

True
False

As lowercase characters have a greater Unicode code point than uppercase letters, 'p' is considered to be greater than 'P'. Hence, mainStr ('pqrs') is considered to be greater than str4 ('PQRS'). Therefore, mainStr > str4 returns True.

Comparing strings of different length

Last but not least, let’s look at how Python compares strings of different lengths.

If you run the code below:

str5 = 'pqrst'
str6 = 'z'

print(mainStr > str5)
print(mainStr < str5)
print(mainStr > str6)
print(mainStr < str6)

you’ll get the following output:

False
True
False
True

The output shows that mainStr is smaller than both str5 and str6.

This is because all the characters in mainStr ('pqrs') are the same as the characters in str5 ('pqrst'). However, str5 has an extra character 't'. Hence, mainStr is considered to be smaller than str5.

Next, mainStr is considered to be smaller than str6 as its first character 'p' is smaller than the first (and only) character of str6 ('z').

Clear?

That’s it. We have now covered all the string comparison operators in Python. Let’s move on to the practice question for today.

Practice Question

Today’s practice question requires us to write a function called compareStr() that has two parameters – a and b, both of which are strings.

The function compares the two strings and returns True if the two strings are equal, False otherwise.

However, the function is case-insensitive. In addition, it ignores any non-letters in the string.

For instance, 'Hello' and 'hello' are considered to be equal. In addition, 'Hello, how are you?' and 'hello how are you???' are also considered to be equal.

Expected Results

To test your function, you can use the code below:

print(compareStr('ABCD', 'abcd'))
print(compareStr('!@%$#@ABCD', 'ABCD'))
print(compareStr('Good morning! How are you feeling?', 'Good morning... how are you feeling???'))
print(compareStr('abc', 'abcd'))
print(compareStr('abcd', 'pq'))

you should get the following output:

True
True
True
False
False

Suggested Solution

Here’s the suggested solution for today’s practice:

Click to see suggested solution
def compareStr(a, b):

    a = a.lower()
    b = b.lower()
    new_a = ''
    new_b = ''
    
    for i in range(len(a)):
        if ord(a[i]) >= 97 and ord(a[i]) <= 122:
            new_a += a[i]

    for i in range(len(b)):
        if ord(b[i]) >= 97 and ord(b[i]) <= 122:
            new_b += b[i]

    if new_a == new_b:
        return True
    else:
        return False

Here, we first define a function called compareStr() that has two parameters – a and b.

Inside the function, we convert a and b to lowercase (using the built-in lower() function) and assign them back to their respective variables. This is necessary as the compareStr() function is case-insensitive. When we convert both a and b to lowercase, we will not get a different answer due to any case differences. (We could have converted both of them to uppercase too.)

Next, we use two for loops to check for non-letters in a and b. The first for loop (from lines 8 to 10) checks for non-letters in a.

ord() is a built-in Python function that gives the Unicode code point for a character. For instance, ord('p') gives the code point for the character 'p'. As the Unicode code points for lowercase letters are from 97 to 122, the if condition on line 9 evaluates to True if the current character in a is a lowercase letter. When that happens, we append the character to a string called new_a (on line 10).

We repeat the same procedure for b using the for loop from lines 12 to 14.

After the two for loops, new_a and new_b will contain all the letters in a and b respectively (without any other characters). For instance, if a equals '123hello!!!', new_a equals 'hello'.

Next, we simply use the == operator to check if new_a equals new_b. If they are equal to each other, we return True. Else, we return False.

With that, the function is complete. That’s all for today’s practice question.

Written by Jamie | Last Updated October 19, 2020

Recent Posts