Dict and Set Comprehensions in Python: Filter Data in One Shot
Yesterday's list comprehension filtered orders into a list. Today you build lookup tables and deduplicated sets — same syntax, completely different powers.
So yesterday I wrote [order['total'] for order in orders if order['status'] == 'paid'] and it worked. Now I have a new problem. I need to look up any order by its ID. I started writing a loop but it's already four lines and I haven't even added the lookup yet.
Show me what you have so far.
It's like this:
order_lookup = {}
for order in orders:
order_lookup[order['id']] = order['total']
That works. And four lines is not a crime. But there is a pattern here — you are iterating over a collection and building a dictionary. That is exactly what a dict comprehension does.
A dict comprehension? Like a list comprehension but for dicts?
Exactly like that. Same idea, different brackets and a colon. Here is your four lines in one:
order_lookup = {order['id']: order['total'] for order in orders}
Wait. Where did the curly braces come from? I thought curly braces were dicts. But yesterday my list comprehension used square brackets.
Right. Python uses the brackets to tell you what collection you get back. Square brackets give you a list. Curly braces with a colon — key: value — give you a dict. Curly braces without a colon give you something else, and we will get there in a minute.
So the syntax is {key: value for item in collection}? And the key and value can be any expression, same as the left side of a list comprehension?
Exactly. And you can add a filter too. Let's say you only want paid orders in your lookup:
orders = [
{"id": 101, "customer": "Alice Chen", "total": 124.50, "status": "paid"},
{"id": 102, "customer": "Bob Kumar", "total": 89.00, "status": "pending"},
{"id": 103, "customer": "Carol Santos", "total": 340.00, "status": "paid"},
]
paid_lookup = {order['id']: order['total'] for order in orders if order['status'] == 'paid'}
print(paid_lookup) # {101: 124.5, 103: 340.0}
Oh. That's the same if at the end from yesterday. I can filter in the comprehension itself.
Three minutes ago you told me you were going to write four lines. Now you are doing it in one with a filter.
I'm starting to see why Amir never writes loops.
He does write loops. He writes loops when the logic is complex enough that a comprehension would be hard to read. The goal is not to avoid loops — it is to know when a comprehension is cleaner.
Okay, you said curly braces without a colon give you something else. What's the something else?
A set. In Python, {1, 2, 3} is a set — an unordered collection with no duplicates. And a set comprehension looks like this:
unique_customers = {order['customer'] for order in orders}
print(unique_customers) # {'Alice Chen', 'Bob Kumar', 'Carol Santos'}
No colon. No key-value pair. Just the expression. But then how does Python know it's a set and not a dict?
Because there is no colon. A dict needs a colon to separate keys from values. If you write {expr for item in collection} with no colon, Python knows it is a set. If you write {key: value for item in collection}, it is a dict. And here is the honest part — I know that is subtle. Python is reusing the same bracket syntax for two different things, and you have to look for the colon.
So {} by itself — is that an empty dict or an empty set?
Empty dict. Python decided {} would be a dict, and set() is how you make an empty set. I agree it is inconsistent. The rule to remember: {} is dict, set() is empty set, {1, 2} is a set, {1: 'a', 2: 'b'} is a dict.
When would I actually use a set comprehension? I've barely used sets at all.
Any time you want a unique collection. Say your orders have a lot of duplicates — maybe the same customer ordered multiple times — and you want a list of every customer who placed an order today, no duplicates:
orders = [
{"id": 201, "customer": "Alice Chen", "total": 50.00, "status": "paid"},
{"id": 202, "customer": "Alice Chen", "total": 30.00, "status": "paid"},
{"id": 203, "customer": "Bob Kumar", "total": 89.00, "status": "pending"},
]
paid_customers = {order['customer'] for order in orders if order['status'] == 'paid'}
print(paid_customers) # {'Alice Chen'} — Bob is pending, Alice appears once
The set just deduplicates Alice automatically. I would have had to track that myself with a separate variable.
That is the entire use case. When you want uniqueness without managing it yourself.
Okay, I want to try this. We have a list of products and I need a dict that maps product ID to product name so I can look up names quickly. Something like {product['id']: product['name'] for product in products}?
You just wrote it. That is correct.
I didn't even realize I was doing it. I just read the syntax and said it back.
That is the point where something moves from syntax you are copying to syntax you are thinking in. Try the challenge and build the order lookup dict.
One question before I do — the three comprehension types, is that all of them? List, dict, set?
There is one more — a generator expression — but it looks like a list comprehension with parentheses instead of brackets, and it behaves very differently. That is Week 1 later this week. For now: square brackets for lists, key: value in curly braces for dicts, just an expression in curly braces for sets.
Square brackets — list. Curly braces with colon — dict. Curly braces without colon — set. Got it.
One more thing. Dict comprehensions require unique keys. If two orders had the same ID and you built a dict from them, the second value would silently overwrite the first. That is a bug that does not raise an error. Worth knowing before you rely on a lookup table in production.
So if I had duplicate IDs in my order data, I would not know. The dict would just look correct and be wrong.
Correct. Validate your data before you build the lookup, or deduplicate with a set first. Tomorrow — what if you need to loop over a list but you also need the index? Or what if you need to loop over two lists at the same time, matching them up? That is what enumerate and zip are for, and once you see them, you will stop writing index-tracking variables forever.
Practice your skills
Sign up to write and run code in this lesson.
Dict and Set Comprehensions in Python: Filter Data in One Shot
Yesterday's list comprehension filtered orders into a list. Today you build lookup tables and deduplicated sets — same syntax, completely different powers.
So yesterday I wrote [order['total'] for order in orders if order['status'] == 'paid'] and it worked. Now I have a new problem. I need to look up any order by its ID. I started writing a loop but it's already four lines and I haven't even added the lookup yet.
Show me what you have so far.
It's like this:
order_lookup = {}
for order in orders:
order_lookup[order['id']] = order['total']
That works. And four lines is not a crime. But there is a pattern here — you are iterating over a collection and building a dictionary. That is exactly what a dict comprehension does.
A dict comprehension? Like a list comprehension but for dicts?
Exactly like that. Same idea, different brackets and a colon. Here is your four lines in one:
order_lookup = {order['id']: order['total'] for order in orders}
Wait. Where did the curly braces come from? I thought curly braces were dicts. But yesterday my list comprehension used square brackets.
Right. Python uses the brackets to tell you what collection you get back. Square brackets give you a list. Curly braces with a colon — key: value — give you a dict. Curly braces without a colon give you something else, and we will get there in a minute.
So the syntax is {key: value for item in collection}? And the key and value can be any expression, same as the left side of a list comprehension?
Exactly. And you can add a filter too. Let's say you only want paid orders in your lookup:
orders = [
{"id": 101, "customer": "Alice Chen", "total": 124.50, "status": "paid"},
{"id": 102, "customer": "Bob Kumar", "total": 89.00, "status": "pending"},
{"id": 103, "customer": "Carol Santos", "total": 340.00, "status": "paid"},
]
paid_lookup = {order['id']: order['total'] for order in orders if order['status'] == 'paid'}
print(paid_lookup) # {101: 124.5, 103: 340.0}
Oh. That's the same if at the end from yesterday. I can filter in the comprehension itself.
Three minutes ago you told me you were going to write four lines. Now you are doing it in one with a filter.
I'm starting to see why Amir never writes loops.
He does write loops. He writes loops when the logic is complex enough that a comprehension would be hard to read. The goal is not to avoid loops — it is to know when a comprehension is cleaner.
Okay, you said curly braces without a colon give you something else. What's the something else?
A set. In Python, {1, 2, 3} is a set — an unordered collection with no duplicates. And a set comprehension looks like this:
unique_customers = {order['customer'] for order in orders}
print(unique_customers) # {'Alice Chen', 'Bob Kumar', 'Carol Santos'}
No colon. No key-value pair. Just the expression. But then how does Python know it's a set and not a dict?
Because there is no colon. A dict needs a colon to separate keys from values. If you write {expr for item in collection} with no colon, Python knows it is a set. If you write {key: value for item in collection}, it is a dict. And here is the honest part — I know that is subtle. Python is reusing the same bracket syntax for two different things, and you have to look for the colon.
So {} by itself — is that an empty dict or an empty set?
Empty dict. Python decided {} would be a dict, and set() is how you make an empty set. I agree it is inconsistent. The rule to remember: {} is dict, set() is empty set, {1, 2} is a set, {1: 'a', 2: 'b'} is a dict.
When would I actually use a set comprehension? I've barely used sets at all.
Any time you want a unique collection. Say your orders have a lot of duplicates — maybe the same customer ordered multiple times — and you want a list of every customer who placed an order today, no duplicates:
orders = [
{"id": 201, "customer": "Alice Chen", "total": 50.00, "status": "paid"},
{"id": 202, "customer": "Alice Chen", "total": 30.00, "status": "paid"},
{"id": 203, "customer": "Bob Kumar", "total": 89.00, "status": "pending"},
]
paid_customers = {order['customer'] for order in orders if order['status'] == 'paid'}
print(paid_customers) # {'Alice Chen'} — Bob is pending, Alice appears once
The set just deduplicates Alice automatically. I would have had to track that myself with a separate variable.
That is the entire use case. When you want uniqueness without managing it yourself.
Okay, I want to try this. We have a list of products and I need a dict that maps product ID to product name so I can look up names quickly. Something like {product['id']: product['name'] for product in products}?
You just wrote it. That is correct.
I didn't even realize I was doing it. I just read the syntax and said it back.
That is the point where something moves from syntax you are copying to syntax you are thinking in. Try the challenge and build the order lookup dict.
One question before I do — the three comprehension types, is that all of them? List, dict, set?
There is one more — a generator expression — but it looks like a list comprehension with parentheses instead of brackets, and it behaves very differently. That is Week 1 later this week. For now: square brackets for lists, key: value in curly braces for dicts, just an expression in curly braces for sets.
Square brackets — list. Curly braces with colon — dict. Curly braces without colon — set. Got it.
One more thing. Dict comprehensions require unique keys. If two orders had the same ID and you built a dict from them, the second value would silently overwrite the first. That is a bug that does not raise an error. Worth knowing before you rely on a lookup table in production.
So if I had duplicate IDs in my order data, I would not know. The dict would just look correct and be wrong.
Correct. Validate your data before you build the lookup, or deduplicate with a set first. Tomorrow — what if you need to loop over a list but you also need the index? Or what if you need to loop over two lists at the same time, matching them up? That is what enumerate and zip are for, and once you see them, you will stop writing index-tracking variables forever.