Set Operations
Union, intersection, difference, symmetric difference, set comprehensions, and fast membership testing.
Yesterday I got excited about sets removing duplicates. But you said the "real power" was something else — operations between two sets?
That's where it gets fun. Sets support mathematical set operations — union, intersection, difference, and symmetric difference. If you've ever stared at two spreadsheets wondering "which customers are on both lists?" or "who's on list A but not list B?" — sets answer those questions in one line.
Let's say you manage two training programs:
python_students = {"Alice", "Bob", "Charlie", "Diana"}
java_students = {"Charlie", "Diana", "Eli", "Fay"}
Union — everyone across both programs:
python_students | java_students
# {"Alice", "Bob", "Charlie", "Diana", "Eli", "Fay"}
# Or use the method:
python_students.union(java_students)
Intersection — people enrolled in both:
python_students & java_students
# {"Charlie", "Diana"}
python_students.intersection(java_students)
That's the "who's on both lists" question! What about "who's on one but not the other"?
Difference — in the first set but not the second:
python_students - java_students
# {"Alice", "Bob"} — Python-only students
java_students - python_students
# {"Eli", "Fay"} — Java-only students
Order matters here. A - B gives you what's in A but not in B. Flip it and you get the opposite.
Symmetric difference — in one or the other, but not both:
python_students ^ java_students
# {"Alice", "Bob", "Eli", "Fay"} — students taking exactly one language
python_students.symmetric_difference(java_students)
Think of it as the union minus the intersection. Everyone except the people who appear on both lists.
This is like doing VLOOKUP comparisons between two spreadsheets, except it's one line instead of an hour.
That's exactly what it is. And it scales. Whether your sets have 4 items or 400,000, the syntax is identical and the performance stays fast.
You can also check relationships between sets:
beginners = {"Alice", "Bob"}
all_students = {"Alice", "Bob", "Charlie", "Diana"}
beginners.issubset(all_students) # True — every beginner is in all_students
all_students.issuperset(beginners) # True — all_students contains every beginner
beginners.isdisjoint(java_students) # False — they share no common elements? Let's check...
.isdisjoint() returns True when two sets share nothing. It's your "do these groups overlap at all?" check.
We have list comprehensions. Is there a set comprehension too?
Same syntax, curly braces instead of square brackets:
numbers = [1, 2, 2, 3, 3, 3, 4, 5, 5]
squares = {n ** 2 for n in numbers}
print(squares) # {1, 4, 9, 16, 25} — unique squares only
# Filter with a condition
even_squares = {n ** 2 for n in numbers if n % 2 == 0}
print(even_squares) # {4, 16}
Set comprehensions automatically deduplicate. You get unique values with zero extra effort.
When should I reach for set operations instead of just looping through lists?
Anytime you're asking a question about membership across collections: Who's in both? Who's only here? Do these overlap? Compare the two approaches:
# With loops (slow, verbose)
common = []
for student in python_students:
if student in java_students:
common.append(student)
# With sets (fast, clear)
common = python_students & java_students
One line. Reads like English. And for large collections, it's orders of magnitude faster because set membership checks are O(1) — instant lookup, no matter the size.
Let's put these operations to work on a real problem.
Practice your skills
Sign up to write and run code in this lesson.
Set Operations
Union, intersection, difference, symmetric difference, set comprehensions, and fast membership testing.
Yesterday I got excited about sets removing duplicates. But you said the "real power" was something else — operations between two sets?
That's where it gets fun. Sets support mathematical set operations — union, intersection, difference, and symmetric difference. If you've ever stared at two spreadsheets wondering "which customers are on both lists?" or "who's on list A but not list B?" — sets answer those questions in one line.
Let's say you manage two training programs:
python_students = {"Alice", "Bob", "Charlie", "Diana"}
java_students = {"Charlie", "Diana", "Eli", "Fay"}
Union — everyone across both programs:
python_students | java_students
# {"Alice", "Bob", "Charlie", "Diana", "Eli", "Fay"}
# Or use the method:
python_students.union(java_students)
Intersection — people enrolled in both:
python_students & java_students
# {"Charlie", "Diana"}
python_students.intersection(java_students)
That's the "who's on both lists" question! What about "who's on one but not the other"?
Difference — in the first set but not the second:
python_students - java_students
# {"Alice", "Bob"} — Python-only students
java_students - python_students
# {"Eli", "Fay"} — Java-only students
Order matters here. A - B gives you what's in A but not in B. Flip it and you get the opposite.
Symmetric difference — in one or the other, but not both:
python_students ^ java_students
# {"Alice", "Bob", "Eli", "Fay"} — students taking exactly one language
python_students.symmetric_difference(java_students)
Think of it as the union minus the intersection. Everyone except the people who appear on both lists.
This is like doing VLOOKUP comparisons between two spreadsheets, except it's one line instead of an hour.
That's exactly what it is. And it scales. Whether your sets have 4 items or 400,000, the syntax is identical and the performance stays fast.
You can also check relationships between sets:
beginners = {"Alice", "Bob"}
all_students = {"Alice", "Bob", "Charlie", "Diana"}
beginners.issubset(all_students) # True — every beginner is in all_students
all_students.issuperset(beginners) # True — all_students contains every beginner
beginners.isdisjoint(java_students) # False — they share no common elements? Let's check...
.isdisjoint() returns True when two sets share nothing. It's your "do these groups overlap at all?" check.
We have list comprehensions. Is there a set comprehension too?
Same syntax, curly braces instead of square brackets:
numbers = [1, 2, 2, 3, 3, 3, 4, 5, 5]
squares = {n ** 2 for n in numbers}
print(squares) # {1, 4, 9, 16, 25} — unique squares only
# Filter with a condition
even_squares = {n ** 2 for n in numbers if n % 2 == 0}
print(even_squares) # {4, 16}
Set comprehensions automatically deduplicate. You get unique values with zero extra effort.
When should I reach for set operations instead of just looping through lists?
Anytime you're asking a question about membership across collections: Who's in both? Who's only here? Do these overlap? Compare the two approaches:
# With loops (slow, verbose)
common = []
for student in python_students:
if student in java_students:
common.append(student)
# With sets (fast, clear)
common = python_students & java_students
One line. Reads like English. And for large collections, it's orders of magnitude faster because set membership checks are O(1) — instant lookup, no matter the size.
Let's put these operations to work on a real problem.
Practice your skills
Sign up to write and run code in this lesson.