There are three main methods to find the intersection of two lists in Python:
- Using sets
- Using list comprehension
- Using the built-in
filter()
function
This post explains the code for each method and discusses various factors to consider when deciding which method to use.
For our practice question, we’ll work on a function that finds the intersection of two lists with duplicate items in each list. For instance, if we pass the lists [1, 1, 1, 2, 3, 3, 5]
and [1, 1, 3, 6, 7]
to the function, it should return the list [1, 1, 3]
.
Table of Contents
Using sets to find intersection of two lists in Python
The first method to find the intersection of two lists in Python is to convert the lists to sets and use either the &
operator or the built-in intersection()
method.
But first, what is a set?
A set is similar to a list, in that it can be used to store a collection of items (such as a collection of numbers).
For instance, we can create a set to store the numbers 1, 22, 43, 64 and 57.
To do that, we use curly braces, as shown in the example below:
mySet = {1, 22, 43, 64, 57}
As you can see, creating a set is very similar to creating a list, except that the former uses curly braces while the latter uses square brackets.
A set is indeed very similar to a list. However, there are a few key differences between them, as explained in the table below:
Key differences between a set and a list
List | Set |
---|---|
Items are ordered (i.e. the order of the items is important). Hence, | Items are not ordered (i.e. the order of the items is disregarded). Hence, |
Items need not be distinct.
| Items must be distinct.
|
Items need not be hashable* | Items must be hashable* Examples of items that are hashable include integers, floating-point numbers and strings. Examples of items that are not hashable include nested lists or dictionaries |
- Hashable items refer to items that have a hash value which never changes during its lifetime. A full discussion of this is beyond the scope of this tutorial. If you are interested, you can check out the official documentation at https://docs.python.org/3/glossary.html#term-hashable.
To find the intersection of two lists in Python, we can convert the lists to sets and use the built-in &
operator or the intersection()
method.
Let’s look at some examples:
Converting lists to sets and using the & operator
list1 = [1, 2, 3, 4, 5] list2 = [3, 4, 5, 6, 7] set1 = set(list1) set2 = set(list2) intersect = list(set1 & set2) print(intersect)
Here, we first declare two lists – list1
and list2
– on lines 1 and 2.
Next, on lines 4 and 5, we use the set()
function to convert the lists to sets.
On line 7, we use the &
operator to find the intersection between the two sets. This operator returns a new set with elements common to both set1
and set2
. We pass this resulting set to the list()
function to convert it back to a list and assign the resulting list to intersect
.
Finally, we print the value of intersect
on line 9.
If you run the code above, you’ll get the following output:
[3, 4, 5]
Converting lists to sets and using the intersection() method
Next, let’s look at an example that uses the intersection()
method.
While the &
operator can only be used on two sets, the intersection()
method can be used with other types of iterables, such as a list or tuple.
Its syntax is as follows:
<name of set>.intersection(<names of interables>)
Let’s look at an example:
list1 = [1, 2, 3, 4, 5] list2 = [3, 4, 5, 6, 7] set1 = set(list1) intersect = list(set1.intersection(list2)) print(intersect)
Here, we only convert list1
to a set. This is because we can pass a list (e.g. list2
) directly to the intersection()
method, without having to convert the list to a set.
We do that on line 6, where we use set1
to call the intersection()
method and pass list2
as an argument to the method.
This method returns a set. We convert the set back to a list using the list()
function, and assign the result to intersect
.
Finally, we print the value of intersect
on line 8.
If you run the code above, you’ll get the following output:
[3, 4, 5]
Error in converting lists to sets
Last but not least, let’s look at an example where we are unable to convert a list to a set. That happens when one or more of the items in the list is not hashable. For instance, if the list has a nested list (which is not hashable), we’ll get an error when we try to convert the list to a set.
list1 = [1, 2, 3, 4, 5, [7, 8, 9]] set1 = set(list1)
If you run the code above, you’ll get the following error:
Traceback (most recent call last): File "...", line ..., in <module> set1 = set(list1) TypeError: unhashable type: 'list'
Using list comprehension
In the previous section, we saw how we can convert lists to sets and use the &
operator or intersection()
method to find the intersection of two lists in Python.
However, we also saw how that can fail when our lists contain unhashable items (such as a nested list).
In cases like that, we can use list comprehension to find the intersection of two lists. This does not require us to convert any list to a set.
To do that, we need to first understand how list comprehension works. If you are unfamiliar with list comprehension, you may want to refer to some previous posts, such as this and this.
To recap, one possible way to use list comprehension is to use the following syntax:
[<item to add to new list> <for loop to iterate through existing list> <if condition>]
Using list comprehension to create a new intersection list
Suppose we have two lists – list1
and list2
.
If we want to add items in list2
to a new list called intersect
, only when the item in list2
is also in list1
, we can use the code below:
list1 = [1, 2, 3, 4] list2 = [3, 4, 5, 6] intersect = [x for x in list2 if x in list1] print(intersect)
Line 4 in the code above says that for each item in list2
(for x in list2
), add the item (x
) to intersect
if the item is in list1
(if x in list1
).
If you run the code above, you’ll get the following output:
[3, 4]
This gives us the intersection of the two lists – list1
and list2
.
Using list comprehension to find intersection of two lists with nested lists
Next, let’s look at another example where the lists contain nested lists.
list1 = [1, 2, [12, 5], [6, 7], 9, 10] list2 = [2, [6, 7], 15, 17] intersect2 = [x for x in list2 if x in list1] print(intersect2)
If you run the code above, you’ll get the following output:
[2, [6, 7]]
This shows that list comprehension works even with nested lists.
Using the built-in filter() function
Last but not least, let’s look at the third method to find the intersection of two lists in Python – using the filter()
function.
The filter()
function is a built-in function in Python that accepts two arguments – a function that defines the criteria for filtering and an iterable to be filtered.
Suppose we have a list called numbers
, defined as follows:
numbers = [1, 2, 3, 4, 5, 6, 7, 8]
If we want to filter all the even numbers in the list, we can define a function that returns True
when it is passed an even number.
We can then pass this function and the numbers
list to the filter()
function.
Let’s look at an example of how this works.
numbers = [1, 2, 3, 4, 5, 6, 7, 8] def isEven(x): if x%2 == 0: return True even_numbers = list(filter(isEven, numbers)) print(even_numbers) print(numbers)
Here, we first define the numbers
list on line 1.
Next, we define a function called isEven()
that accepts one argument x
and returns True
if x
is even (i.e. if x
gives a remainder of zero when divided by 2).
Next, we call the filter()
function on line 7, passing the function name (isEven
) and the iterable name (numbers
) to the function.
The filter()
function returns a filter object which can be converted to a list, using the built-in list()
function.
We do that on line 7 and assign the resulting list to a variable called even_numbers
.
Finally, we print the values of even_numbers
and numbers
on lines 9 and 10.
If you run the code above, you’ll get the following output:
[2, 4, 6, 8] [1, 2, 3, 4, 5, 6, 7, 8]
even_numbers
only contains even numbers as these are the items in numbers
that “passed” the filter criteria, as defined by the isEven()
function.
numbers
, on the other hand, is not changed after we pass it to the filter()
function.
As mentioned previously, we can use the filter()
function to find the intersection of two lists in Python. Let’s look at some examples.
Using the filter() function to find intersection of two lists in Python
list1 = [1, 2, 3, 4] list2 = [3, 4, 5, 6] def intersectList1(x): if x in list1: return True intersect = list(filter(intersectList1, list2)) print(intersect)
Here, we first define a function called intersectList1()
on lines 4 to 6. This function returns True
if x
is in list1
.
Next, we use the function to filter list2
on line 8. We also pass the resulting filter object to the list()
function to convert it to a list.
Finally, we print the value of the resulting list on line 10.
If you run the code above, you’ll get the following output:
[3, 4]
Using the filter() function with a lambda function
Next, let’s look at one more example of using the filter()
function to find the intersection of two lists in Python. This time, instead of passing a named function to the function, we’ll pass a lambda function. If you are unfamiliar with lambda functions, you can refer to this post to see how lambda functions work.
list1 = [1, 2, 3, 4] list2 = [3, 4, 5, 6] intersect = list(filter(lambda x: x in list1, list2)) print(intersect)
Here, the lambda function is
lambda x: x in list1
The function evaluates the expression x in list1
for each input x
and return True
if x
is in list1
.
We pass this lambda function to the filter()
function on line 4 to filter the items in list2
.
If you run the code above, you’ll get the following output:
[3, 4]
Not surprisingly, this example gives the same output as the preceding example as its lambda function does the same thing as the intersectList1()
function in the preceding example.
Which Method to Use?
Now that we’ve covered three different methods to find the intersection of two lists in Python, let’s discuss some factors to consider when choosing which method to use.
The first factor to consider is whether your list contains un-hashable items (such as a nested list).
If it does, you’ll have to use either list comprehension or the filter()
function.
Next, the second factor to consider is whether your list has duplicate items. If your list has duplicate items and you want the intersection to account for these items, none of the methods above is ideal.
Let’s look at some examples:
list1 = [1, 2, 3, 3, 5, 5, 6, 7, 8] list2 = [1, 3, 3, 3, 5, 6, 6, 9] # Example 1: Using sets set1 = set(list1) set2 = set(list2) intersect_1 = list(set1 & set2) intersect_2 = list(set1.intersection(list2)) print('Using sets') print(intersect_1) print(intersect_2) # Example 2: Using list comprehension intersect_3 = [x for x in list1 if x in list2] intersect_4 = [x for x in list2 if x in list1] print('Using list comprehension') print(intersect_3) print(intersect_4) # Example 3: Using filter() intersect_5 = list(filter(lambda x : x in list1, list2)) intersect_6 = list(filter(lambda x : x in list2, list1)) print('Using filter function') print(intersect_5) print(intersect_6)
If you run the code above, you’ll get the following output:
Using sets [1, 3, 5, 6] [1, 3, 5, 6] Using list comprehension [1, 3, 3, 5, 5, 6] [1, 3, 3, 3, 5, 6, 6] Using filter function [1, 3, 3, 3, 5, 6, 6] [1, 3, 3, 5, 5, 6]
When we convert a list to a set, duplicate items are removed. For instance, if myList = [1, 1, 1, 2, 3]
, set(myList)
gives us the set {1, 2, 3}
.
Therefore, in the first example above, when we convert list1
and list2
to sets, and use the &
operator or the intersection()
method to find the intersection of the two sets, the resulting set does not contain any duplicate items.
Hence, we get [1, 3, 5, 6]
as the output for both intersect_1
and intersect_2
.
Next, for the second example, notice that we get a different answer for intersect_3
and intersect_4
?
This is because for intersect_3
, we iterate through list1
to check if the item is also in list2
. On the other hand, for intersect_4
, we iterate through list2
to check if the item is also in list1
.
This gives us different results as the result for intersect_3
is based on list1
. Hence, the number 3 appears twice in intersect_3
as there are two 3s in list1
. Similarly, the number 5 appears twice as there are two 5s in list1
.
In contrast, the result for intersect_4
is based on list2
. Hence, the number 3 appears thrice in intersect_4
while the number 5 appears once.
The same difference in answers can be found when we use the filter()
function to find the intersection of list1
and list2
(refer to example 3).
If we do not want such discrepancy in our answers, we’ll need to write our own function.
This bring us to the practice question for today.
Practice Question
The practice question for today is to write a function called intersectWithDup()
that accepts two lists as arguments. We can assume that the two lists only contain hashable items.
The function finds the intersection of the two lists, based on the number of times an item appears in both lists.
For instance, suppose we have the lists [1, 1, 2, 2, 2, 3]
and [1, 1, 1, 2, 3, 3, 4, 4]
, our function should return [1, 1, 2, 3]
as the result.
Although 1 appears three times in the second list, it only appears twice in the first. Hence, 1 appears twice in the intersection.
Similarly, although 2 appears three times in the first list, it only appears once in the second. Hence, 2 appear once in the intersection.
As you can see, the intersection is not based on the first or the second list. Instead, it is based on the number of times an item appears in both lists.
Expected Results
To test your function, you can run the following statements:
list1 = [1, 1, 2, 2, 2, 3] list2 = [1, 1, 1, 2, 3, 3, 4, 4] print(intersectWithDup(list1, list2)) list1 = [1, 2, 3, 4, 5, 5, 5] list2 = [2, 2, 2, 2, 3, 4, 7, 7, 8] print(intersectWithDup(list1, list2))
If you run the code above, you should get the following output:
[1, 1, 2, 3] [2, 3, 4]
Hints
There is more than one way to complete the practice question for today. The suggested solution uses the Counter
class and the extend()
method for lists.
You may want to refer to the following posts if you need help:
Suggested Solution
Here’s the suggested solution for today’s question:
from collections import Counter def intersectWithDup(list1, list2): a = Counter(list1) b = Counter(list2) set1 = set(list1) set2 = set(list2) list3 = list(set1 & set2) list4 = [] for i in list3: list4.extend([i]*min(a[i], b[i])) return list4
Here, we first import the Counter
class on line 1.
Next, we define the intersectWithDup()
function from lines 3 to 17.
Within the function, we pass list1
and list2
to the Counter()
constructor and assign the resulting Counter
objects to a
and b
.
A Counter
object gives the number of times an item appears in an iterable.
For instance, Counter(['p', 'p', 'q'])
gives us the Counter
object {'p': 2, 'q': 1}
, as 'p'
appears twice ('p': 2
) in ['p', 'p', 'q']
while 'q'
appears once ('q': 1
).
If we assign this object to a variable, say myCounter
, we can access the values in myCounter
like how we access values in a dictionary.
For instance, myCounter['p']
gives us the value 2.
In our suggested solution, a
and b
give us the frequency of each item in list1
and list2
.
After we get this frequency, we are ready to find the intersection of the two lists. We first find the intersection without duplicates. To do that, we convert list1
and list2
to sets and use the &
operator to get the intersection.
Next, we convert this intersection to a list and assign the resulting list to list3
(on line 11).
Next, we initialize an empty list called list4
.
We then use a for
loop to iterate through list3
(lines 14 and 15).
On line 15, a[i]
and b[i]
give the number of times i
appears in list1
and list2
respectively.
For instance, suppose
list1 = [1, 1, 2, 3] list2 = [1, 1, 1, 2, 2, 3]
list3
equals [1, 2, 3]
.
The first time the for
loop runs, i
equals 1.
a[1]
gives us 2 while b[1]
gives us 3.
min(a[1], b[1])
gives us the lower of the two numbers, which is 2.
[1]*min(a[1], b[1])
gives us [1]*2
, which equals [1, 1]
.
We then pass this list to the extend()
method to add it to list4
.
We keep doing this until we finish iterating through all the items in list3
.
When that happens, list4
contains all the duplicate items that we need for the intersection. Hence, we simply return list4
on line 17.
With that, the function is complete.