The warehouse team gave me a CSV with 800 product rows and Diane's first question was: how many unique categories do we carry? Walk me through exactly how you would have solved that in Track 1.
I'd loop through every row, check if the category is already in a list, append it if not, then count at the end. I literally wrote that exact code for the sales region report. Six lines, one helper function.
Does it work?
It worked. It also ran noticeably slowly when the list of seen categories got long. I noticed but I blamed the CSV reading, not my code.
The CSV reading was fine. You were checking membership in a list, which requires scanning every element one by one. On 800 products with 15 categories, Python was checking up to 15 values per row. Now multiply that by however many times you run it.
So the issue is if category not in seen_list — that's a linear scan every time. I need something that checks membership in constant time.
That something is a set. A set in Python is an unordered collection that stores only unique values — and checks membership using a hash table, not a scan. Let me show you the simplest version:
categories = {"Hardware", "Safety", "Hardware", "Electronics", "Safety", "Hardware"}
print(categories)
# {'Hardware', 'Safety', 'Electronics'}
print(len(categories)) # 3Six values in, three out. It silently dropped the duplicates?
Silent and automatic. That's the set contract. Every value appears at most once, and Python doesn't warn you about dropped duplicates — it just enforces uniqueness. Think of it like the shipping manifest at the receiving dock. Each SKU appears once on the manifest regardless of how many times the scanner beeped when a pallet came through.
Okay. So how do I actually build a set from real warehouse data? My categories aren't typed out as a literal — they're in a list of product tuples.
One line:
products = [
("Widget-A", "SKU-1001", 24.99, 150, "Hardware"),
("Safety-Vest","SKU-2011", 14.99, 60, "Safety"),
("Drill-Bit", "SKU-3042", 9.99, 200, "Hardware"),
("Hard-Hat", "SKU-4005", 19.99, 45, "Safety"),
]
category_set = set(product[4] for product in products)
print(category_set) # {'Hardware', 'Safety'}The six-line deduplication loop collapses to one line. And I didn't write a single if not in check.
The set handles all of that internally. And membership testing on a set is near-instant regardless of how many elements it holds — the hash table lets Python jump directly to the right slot. Checking "Hardware" in category_set on a set with a million items takes roughly the same time as on a set with ten.
That's the thing I was missing. My approved-supplier check in Track 1 had a list of four hundred entries and it was noticeably sluggish. A set would have fixed that.
Four hundred comparisons per lookup versus one hash calculation. Diane's supplier report would have loaded faster. Don't tell her.
She thinks everything I build runs at the speed of light. I'm not changing that perception.
Now — the misconception that bites almost everyone. What do you think happens if you try to add a list to a set?
bad_set = {["Hardware", "Safety"]}The list is mutable... and if sets use hashing, you can't hash something that can change. So this fails?
TypeError: unhashable type: 'list'. A set can only hold hashable values — values whose identity won't change. Strings, numbers, and tuples are hashable. Lists and dicts are not.
Tuples are hashable? So I could have a set of product spec tuples — the ones we built yesterday?
Exactly. {("Widget-A", "SKU-1001"), ("Drill-Bit", "SKU-3042")} is a valid set. Each tuple is immutable, so its hash is stable. This is the practical payoff of yesterday's immutability lesson — tuples can live in sets and serve as dict keys. A list version of the same data cannot.
So immutability wasn't just a philosophical property. It's the price of admission to a data structure that checks membership in constant time.
That's the connection I wanted you to make. Immutability is what makes a value reliable enough to identify. Now — sets are unordered, which means you can't depend on them for sorted output. Diane's report needs categories alphabetically. Here's the pattern:
category_set = {"Hardware", "Electronics", "Safety"}
sorted_cats = sorted(category_set)
print(sorted_cats) # ['Electronics', 'Hardware', 'Safety']Collect into a set to deduplicate, sort to get a predictable list for output. That's the full pipeline.
That's exactly what unique_categories does. Try writing it before I show you — category is at index 4 in each product tuple.
Generator expression to extract the categories, wrap in set(), then sorted().
def unique_categories(products: list[tuple]) -> list[str]:
return sorted(set(product[4] for product in products))One line. Six lines in Track 1, one now. And every line is readable — set deduplicates, sorted orders, the generator extracts.
I want to go rewrite the sales region code. I'm not going to, but I want to.
Before you file this away — one trap that catches everyone. What does {} create?
Curly braces... a dict? But sets also use curly braces in the literal syntax. Which one wins?
Dict wins. {} is always an empty dict in Python. Sets were added after dicts, so {} was already claimed. If you want an empty set, you must write set() — there is no empty-set literal. Write this on a sticky note and put it somewhere visible.
That's genuinely inconsistent. Empty list is [], empty dict is {}, and empty set is... the constructor call. Okay.
It's the most common gotcha in Python set work. You'll hit it exactly once, and after that you'll never forget it. Tomorrow we make unpacking intentional — you've been doing it since Track 1 without naming it. And we cover what the * operator does when it appears on the left side of an assignment: first, *rest = items. You guessed at that yesterday without being shown it.
I saw it in a Stack Overflow answer once. Grabbed the first element from a list and collected the rest. I never actually used it on purpose.
Tomorrow you will.
A set is an unordered collection of unique, hashable values. Python implements sets using hash tables: each value is hashed to a slot, and membership checking requires only computing the hash and looking up the slot — not scanning all elements.
The two core use cases for sets:
set() to get unique elements.value in my_set is O(1) regardless of set size; value in my_list is O(n).sku_list = ["SKU-1001", "SKU-2042", "SKU-1001", "SKU-3300"]
unique_skus = set(sku_list) # {'SKU-1001', 'SKU-2042', 'SKU-3300'}
approved = {"SKU-1001", "SKU-2042", "SKU-4010"}
if "SKU-2042" in approved: # hash lookup, not scan
print("approved")Sets also support set operations: | (union), & (intersection), - (difference), ^ (symmetric difference). These are useful for comparing product lists between warehouses or finding SKUs that appear in one shipment but not another.
Pitfall 1: {} creates a dict, not an empty set. An empty set must be written set(). {"a", "b"} is a set literal with elements, but {} with no elements defaults to dict.
Pitfall 2: Mutable values cannot be set elements. Lists and dicts are not hashable — attempting {["a", "b"]} raises TypeError: unhashable type: 'list'. Use tuples if you need a composite set element.
Pitfall 3: Sets have no guaranteed order. sorted() on a set returns a list with a consistent order, but the set itself has none. Never rely on the iteration order of a set.
Set comprehensions ({x for x in items}) produce sets using the same syntax as list comprehensions. frozenset is the immutable variant — it is hashable and can therefore be used as a dict key or as a set element. For large membership-testing workloads, replacing a list with a set or frozenset is one of the highest-leverage performance improvements in Python.
Sign up to write and run code in this lesson.
The warehouse team gave me a CSV with 800 product rows and Diane's first question was: how many unique categories do we carry? Walk me through exactly how you would have solved that in Track 1.
I'd loop through every row, check if the category is already in a list, append it if not, then count at the end. I literally wrote that exact code for the sales region report. Six lines, one helper function.
Does it work?
It worked. It also ran noticeably slowly when the list of seen categories got long. I noticed but I blamed the CSV reading, not my code.
The CSV reading was fine. You were checking membership in a list, which requires scanning every element one by one. On 800 products with 15 categories, Python was checking up to 15 values per row. Now multiply that by however many times you run it.
So the issue is if category not in seen_list — that's a linear scan every time. I need something that checks membership in constant time.
That something is a set. A set in Python is an unordered collection that stores only unique values — and checks membership using a hash table, not a scan. Let me show you the simplest version:
categories = {"Hardware", "Safety", "Hardware", "Electronics", "Safety", "Hardware"}
print(categories)
# {'Hardware', 'Safety', 'Electronics'}
print(len(categories)) # 3Six values in, three out. It silently dropped the duplicates?
Silent and automatic. That's the set contract. Every value appears at most once, and Python doesn't warn you about dropped duplicates — it just enforces uniqueness. Think of it like the shipping manifest at the receiving dock. Each SKU appears once on the manifest regardless of how many times the scanner beeped when a pallet came through.
Okay. So how do I actually build a set from real warehouse data? My categories aren't typed out as a literal — they're in a list of product tuples.
One line:
products = [
("Widget-A", "SKU-1001", 24.99, 150, "Hardware"),
("Safety-Vest","SKU-2011", 14.99, 60, "Safety"),
("Drill-Bit", "SKU-3042", 9.99, 200, "Hardware"),
("Hard-Hat", "SKU-4005", 19.99, 45, "Safety"),
]
category_set = set(product[4] for product in products)
print(category_set) # {'Hardware', 'Safety'}The six-line deduplication loop collapses to one line. And I didn't write a single if not in check.
The set handles all of that internally. And membership testing on a set is near-instant regardless of how many elements it holds — the hash table lets Python jump directly to the right slot. Checking "Hardware" in category_set on a set with a million items takes roughly the same time as on a set with ten.
That's the thing I was missing. My approved-supplier check in Track 1 had a list of four hundred entries and it was noticeably sluggish. A set would have fixed that.
Four hundred comparisons per lookup versus one hash calculation. Diane's supplier report would have loaded faster. Don't tell her.
She thinks everything I build runs at the speed of light. I'm not changing that perception.
Now — the misconception that bites almost everyone. What do you think happens if you try to add a list to a set?
bad_set = {["Hardware", "Safety"]}The list is mutable... and if sets use hashing, you can't hash something that can change. So this fails?
TypeError: unhashable type: 'list'. A set can only hold hashable values — values whose identity won't change. Strings, numbers, and tuples are hashable. Lists and dicts are not.
Tuples are hashable? So I could have a set of product spec tuples — the ones we built yesterday?
Exactly. {("Widget-A", "SKU-1001"), ("Drill-Bit", "SKU-3042")} is a valid set. Each tuple is immutable, so its hash is stable. This is the practical payoff of yesterday's immutability lesson — tuples can live in sets and serve as dict keys. A list version of the same data cannot.
So immutability wasn't just a philosophical property. It's the price of admission to a data structure that checks membership in constant time.
That's the connection I wanted you to make. Immutability is what makes a value reliable enough to identify. Now — sets are unordered, which means you can't depend on them for sorted output. Diane's report needs categories alphabetically. Here's the pattern:
category_set = {"Hardware", "Electronics", "Safety"}
sorted_cats = sorted(category_set)
print(sorted_cats) # ['Electronics', 'Hardware', 'Safety']Collect into a set to deduplicate, sort to get a predictable list for output. That's the full pipeline.
That's exactly what unique_categories does. Try writing it before I show you — category is at index 4 in each product tuple.
Generator expression to extract the categories, wrap in set(), then sorted().
def unique_categories(products: list[tuple]) -> list[str]:
return sorted(set(product[4] for product in products))One line. Six lines in Track 1, one now. And every line is readable — set deduplicates, sorted orders, the generator extracts.
I want to go rewrite the sales region code. I'm not going to, but I want to.
Before you file this away — one trap that catches everyone. What does {} create?
Curly braces... a dict? But sets also use curly braces in the literal syntax. Which one wins?
Dict wins. {} is always an empty dict in Python. Sets were added after dicts, so {} was already claimed. If you want an empty set, you must write set() — there is no empty-set literal. Write this on a sticky note and put it somewhere visible.
That's genuinely inconsistent. Empty list is [], empty dict is {}, and empty set is... the constructor call. Okay.
It's the most common gotcha in Python set work. You'll hit it exactly once, and after that you'll never forget it. Tomorrow we make unpacking intentional — you've been doing it since Track 1 without naming it. And we cover what the * operator does when it appears on the left side of an assignment: first, *rest = items. You guessed at that yesterday without being shown it.
I saw it in a Stack Overflow answer once. Grabbed the first element from a list and collected the rest. I never actually used it on purpose.
Tomorrow you will.
A set is an unordered collection of unique, hashable values. Python implements sets using hash tables: each value is hashed to a slot, and membership checking requires only computing the hash and looking up the slot — not scanning all elements.
The two core use cases for sets:
set() to get unique elements.value in my_set is O(1) regardless of set size; value in my_list is O(n).sku_list = ["SKU-1001", "SKU-2042", "SKU-1001", "SKU-3300"]
unique_skus = set(sku_list) # {'SKU-1001', 'SKU-2042', 'SKU-3300'}
approved = {"SKU-1001", "SKU-2042", "SKU-4010"}
if "SKU-2042" in approved: # hash lookup, not scan
print("approved")Sets also support set operations: | (union), & (intersection), - (difference), ^ (symmetric difference). These are useful for comparing product lists between warehouses or finding SKUs that appear in one shipment but not another.
Pitfall 1: {} creates a dict, not an empty set. An empty set must be written set(). {"a", "b"} is a set literal with elements, but {} with no elements defaults to dict.
Pitfall 2: Mutable values cannot be set elements. Lists and dicts are not hashable — attempting {["a", "b"]} raises TypeError: unhashable type: 'list'. Use tuples if you need a composite set element.
Pitfall 3: Sets have no guaranteed order. sorted() on a set returns a list with a consistent order, but the set itself has none. Never rely on the iteration order of a set.
Set comprehensions ({x for x in items}) produce sets using the same syntax as list comprehensions. frozenset is the immutable variant — it is hashable and can therefore be used as a dict key or as a set element. For large membership-testing workloads, replacing a list with a set or frozenset is one of the highest-leverage performance improvements in Python.