Day 25 · ~13m

Set Operations

Union, intersection, difference, symmetric difference, set comprehensions, and fast membership testing.

🧑‍💻

Yesterday I got excited about sets removing duplicates. But you said the "real power" was something else — operations between two sets?

👩‍🏫

That's where it gets fun. Sets support mathematical set operations — union, intersection, difference, and symmetric difference. If you've ever stared at two spreadsheets wondering "which customers are on both lists?" or "who's on list A but not list B?" — sets answer those questions in one line.

Let's say you manage two training programs:

python_students = {"Alice", "Bob", "Charlie", "Diana"}
java_students = {"Charlie", "Diana", "Eli", "Fay"}

Union — everyone across both programs:

python_students | java_students
# {"Alice", "Bob", "Charlie", "Diana", "Eli", "Fay"}

# Or use the method:
python_students.union(java_students)

Intersection — people enrolled in both:

python_students & java_students
# {"Charlie", "Diana"}

python_students.intersection(java_students)
🧑‍💻

That's the "who's on both lists" question! What about "who's on one but not the other"?

👩‍🏫

Difference — in the first set but not the second:

python_students - java_students
# {"Alice", "Bob"} — Python-only students

java_students - python_students
# {"Eli", "Fay"} — Java-only students

Order matters here. A - B gives you what's in A but not in B. Flip it and you get the opposite.

Symmetric difference — in one or the other, but not both:

python_students ^ java_students
# {"Alice", "Bob", "Eli", "Fay"} — students taking exactly one language

python_students.symmetric_difference(java_students)

Think of it as the union minus the intersection. Everyone except the people who appear on both lists.

🧑‍💻

This is like doing VLOOKUP comparisons between two spreadsheets, except it's one line instead of an hour.

👩‍🏫

That's exactly what it is. And it scales. Whether your sets have 4 items or 400,000, the syntax is identical and the performance stays fast.

You can also check relationships between sets:

beginners = {"Alice", "Bob"}
all_students = {"Alice", "Bob", "Charlie", "Diana"}

beginners.issubset(all_students)     # True — every beginner is in all_students
all_students.issuperset(beginners)   # True — all_students contains every beginner
beginners.isdisjoint(java_students)  # False — they share no common elements? Let's check...

.isdisjoint() returns True when two sets share nothing. It's your "do these groups overlap at all?" check.

🧑‍💻

We have list comprehensions. Is there a set comprehension too?

👩‍🏫

Same syntax, curly braces instead of square brackets:

numbers = [1, 2, 2, 3, 3, 3, 4, 5, 5]
squares = {n ** 2 for n in numbers}
print(squares)  # {1, 4, 9, 16, 25} — unique squares only

# Filter with a condition
even_squares = {n ** 2 for n in numbers if n % 2 == 0}
print(even_squares)  # {4, 16}

Set comprehensions automatically deduplicate. You get unique values with zero extra effort.

🧑‍💻

When should I reach for set operations instead of just looping through lists?

👩‍🏫

Anytime you're asking a question about membership across collections: Who's in both? Who's only here? Do these overlap? Compare the two approaches:

# With loops (slow, verbose)
common = []
for student in python_students:
    if student in java_students:
        common.append(student)

# With sets (fast, clear)
common = python_students & java_students

One line. Reads like English. And for large collections, it's orders of magnitude faster because set membership checks are O(1) — instant lookup, no matter the size.

Let's put these operations to work on a real problem.

Practice your skills

Sign up to write and run code in this lesson.

Already have an account? Sign in