Generator Expressions in Python: The Lazy List Comprehension
Parentheses instead of brackets: that's a generator expression. Learn sum, any, all, next with generators, and when to keep the list comprehension instead.
Yesterday you showed me a generator function — yield, stepping through values one at a time. That was cool. But you said there's a one-line version?
Same power. One character difference.
One character.
One character.
Okay show me.
You wrote this on Day 3:
paid_totals = [order["total"] for order in orders if order["status"] == "paid"]
Now change the brackets to parentheses:
paid_totals = (order["total"] for order in orders if order["status"] == "paid")
That's it? Square brackets to round brackets and now it's a generator? You're telling me the entire difference between "build the list now" and "compute values lazily on demand" is one character on each end?
That's the whole syntax change. What happens under the hood is different — but yes, the source code difference is two characters.
So what does it actually give me? If I print it, do I get the values?
Try it:
orders = [
{"id": 101, "customer": "Alice Chen", "total": 200.00, "status": "paid"},
{"id": 102, "customer": "Bob Kumar", "total": 80.00, "status": "pending"},
{"id": 103, "customer": "Carol Santos", "total": 500.00, "status": "paid"},
{"id": 104, "customer": "Dev Patel", "total": 45.00, "status": "cancelled"},
]
paid_totals = (order["total"] for order in orders if order["status"] == "paid")
print(paid_totals)
# <generator object <genexpr> at 0x10a3b2c40>
A memory address. That's not the values. It hasn't run yet?
Nothing has run yet. That's the point. The generator expression is a promise — "when you ask me for values, I'll produce them one at a time." Compared to the list comprehension, which runs immediately and hands you the complete list right now.
Okay so when does it actually run? If I can't print it, what do I do with it?
You feed it to a function that pulls values from it. That's where generator expressions earn their place — they work directly with sum(), any(), all(), and next().
Watch:
# Calculate total revenue from paid orders — one line, no list built
revenue = sum(order["total"] for order in orders if order["status"] == "paid")
print(revenue) # 700.0
OH. sum() just... accepts a generator? It doesn't need a list?
sum() accepts anything iterable. A generator is iterable. So sum() pulls values from it one at a time — adds the first, discards it, pulls the second, adds that, never holds all of them in memory at once.
So I didn't need [order["total"] for order in orders if ...] — I could have passed the generator expression directly?
Exactly. And notice something: when a generator expression is the only argument to a function, you can drop one set of parentheses:
# These are identical:
revenue = sum((order["total"] for order in orders if order["status"] == "paid"))
revenue = sum(order["total"] for order in orders if order["status"] == "paid")
So the function call parentheses count as the generator's parentheses. Python is very... efficient with punctuation.
The person who designed this says it's elegant. I would describe it differently but he's right and I've made peace with it.
What about any() and all()? How do those work with generators?
Same pattern. any() asks: "does at least one value come out as True?" all() asks: "are all values True?"
# Is any order over 400?
has_big_order = any(order["total"] > 400 for order in orders)
print(has_big_order) # True — Carol's 500.00
# Are all paid orders above 50?
all_big_paid = all(
order["total"] > 50
for order in orders
if order["status"] == "paid"
)
print(all_big_paid) # True — 200.00 and 500.00 both qualify
And these are better than building a list because...?
any() can stop the moment it finds a True. all() can stop the moment it finds a False. With a list comprehension, you build the entire list first — even if the answer was obvious after the first element. With a generator, it short-circuits.
# With 100,000 orders, this stops at the first unpaid one:
all_paid = all(order["status"] == "paid" for order in orders)
# This builds a 100,000-element list first, THEN checks all of them:
all_paid = all([order["status"] == "paid" for order in orders])
So on big data, the generator version can be dramatically faster.
On big data, the generator version might not even touch most of the data. That's the whole idea from Day 6 — laziness as a feature.
What about next()? You mentioned it.
next() pulls one value from a generator. Useful when you want the first match and nothing else:
# First paid order — stop as soon as we find one
first_paid = next(
order for order in orders if order["status"] == "paid"
)
print(first_paid["customer"]) # Alice Chen
# Safe version with a default if nothing matches
first_cancelled = next(
(order for order in orders if order["status"] == "refunded"),
None # default if generator is exhausted with no match
)
print(first_cancelled) # None
So instead of filter() and then grabbing index zero, I can just next() the generator. Hmm. Amir does this — I've seen it in the order processing code and I never knew why.
When did you start reading the order processing code?
After Day 6. Once generators made sense, a bunch of stuff I'd been skipping actually became readable.
That's exactly what's supposed to happen. The patterns unlock the code.
Okay so I want to make sure I have the decision rule right. When do I use a generator expression versus a list comprehension?
Three questions.
First: do you need to use the result more than once, or access it by index?
# Need index 0? Need to loop through it twice? Use a list:
paid_orders = [order for order in orders if order["status"] == "paid"]
first = paid_orders[0] # works
count = len(paid_orders) # works
for o in paid_orders: ... # works again — generators exhaust after one pass
Second: are you passing the result directly to sum(), any(), all(), next(), or similar?
# Feeding to a function once? Use a generator expression:
revenue = sum(order["total"] for order in orders if order["status"] == "paid")
Third: is the dataset large enough that memory matters?
# 10 orders? Either is fine. 10 million orders? Generator.
total = sum(row["amount"] for row in massive_export if row["type"] == "sale")
So the generator expression is specifically for the case where I'm building a value from the sequence, not keeping the sequence itself.
That's the most precise way I've heard it put. You're computing a result, not storing a collection.
And if I need the full generator function with yield from Day 6?
When the generation logic is too complex for one line, or when you need state between yields:
# This is fine as a generator expression:
revenue = sum(order["total"] for order in orders if order["status"] == "paid")
# This needs a generator function — logic is too complex for one line:
def orders_with_discount(orders, threshold):
running_total = 0
for order in orders:
if order["status"] == "paid":
running_total += order["total"]
discount = 0.1 if running_total > threshold else 0
yield {**order, "discount": discount}
So the three tools are: list comprehension when you need the collection, generator expression when you're feeding it to a function once, generator function when the logic doesn't fit on a line.
Week 1 in one sentence. That's the whole week — expressions instead of loops, and three levels of how lazy you want to be.
The revenue calculation is going to be one line now. I have five places in the codebase that build a list, loop through it to sum a field, and throw the list away. Generator expression plus sum. Done.
Five places. I believe you.
Actually, wait. The orders in our codebase — they're dicts. Every field is a string key lookup. order["total"], order["status"], order["customer"]. If I typo a key name, I don't find out until runtime. There's no... checking.
You're about to invent the argument for classes.
I mean — what if the order knew what it was? Like, order.total instead of order["total"]? Autocomplete would work. The typo would be caught.
Next week, we build that. An order that knows it's an order. A product that validates its own price. Data that talks back when you give it nonsense. Dicts are flexible — but flexible means anything goes, including things that shouldn't.
So we go from "a dict with keys" to "an object with attributes and rules."
And when you understand that transition, you'll finally understand why half the code in your codebase is structured the way it is. Amir didn't write classes because he likes ceremony. He wrote classes because at some point a dict accepted a negative total and it caused a problem at 2 AM.
Practice your skills
Sign up to write and run code in this lesson.
Generator Expressions in Python: The Lazy List Comprehension
Parentheses instead of brackets: that's a generator expression. Learn sum, any, all, next with generators, and when to keep the list comprehension instead.
Yesterday you showed me a generator function — yield, stepping through values one at a time. That was cool. But you said there's a one-line version?
Same power. One character difference.
One character.
One character.
Okay show me.
You wrote this on Day 3:
paid_totals = [order["total"] for order in orders if order["status"] == "paid"]
Now change the brackets to parentheses:
paid_totals = (order["total"] for order in orders if order["status"] == "paid")
That's it? Square brackets to round brackets and now it's a generator? You're telling me the entire difference between "build the list now" and "compute values lazily on demand" is one character on each end?
That's the whole syntax change. What happens under the hood is different — but yes, the source code difference is two characters.
So what does it actually give me? If I print it, do I get the values?
Try it:
orders = [
{"id": 101, "customer": "Alice Chen", "total": 200.00, "status": "paid"},
{"id": 102, "customer": "Bob Kumar", "total": 80.00, "status": "pending"},
{"id": 103, "customer": "Carol Santos", "total": 500.00, "status": "paid"},
{"id": 104, "customer": "Dev Patel", "total": 45.00, "status": "cancelled"},
]
paid_totals = (order["total"] for order in orders if order["status"] == "paid")
print(paid_totals)
# <generator object <genexpr> at 0x10a3b2c40>
A memory address. That's not the values. It hasn't run yet?
Nothing has run yet. That's the point. The generator expression is a promise — "when you ask me for values, I'll produce them one at a time." Compared to the list comprehension, which runs immediately and hands you the complete list right now.
Okay so when does it actually run? If I can't print it, what do I do with it?
You feed it to a function that pulls values from it. That's where generator expressions earn their place — they work directly with sum(), any(), all(), and next().
Watch:
# Calculate total revenue from paid orders — one line, no list built
revenue = sum(order["total"] for order in orders if order["status"] == "paid")
print(revenue) # 700.0
OH. sum() just... accepts a generator? It doesn't need a list?
sum() accepts anything iterable. A generator is iterable. So sum() pulls values from it one at a time — adds the first, discards it, pulls the second, adds that, never holds all of them in memory at once.
So I didn't need [order["total"] for order in orders if ...] — I could have passed the generator expression directly?
Exactly. And notice something: when a generator expression is the only argument to a function, you can drop one set of parentheses:
# These are identical:
revenue = sum((order["total"] for order in orders if order["status"] == "paid"))
revenue = sum(order["total"] for order in orders if order["status"] == "paid")
So the function call parentheses count as the generator's parentheses. Python is very... efficient with punctuation.
The person who designed this says it's elegant. I would describe it differently but he's right and I've made peace with it.
What about any() and all()? How do those work with generators?
Same pattern. any() asks: "does at least one value come out as True?" all() asks: "are all values True?"
# Is any order over 400?
has_big_order = any(order["total"] > 400 for order in orders)
print(has_big_order) # True — Carol's 500.00
# Are all paid orders above 50?
all_big_paid = all(
order["total"] > 50
for order in orders
if order["status"] == "paid"
)
print(all_big_paid) # True — 200.00 and 500.00 both qualify
And these are better than building a list because...?
any() can stop the moment it finds a True. all() can stop the moment it finds a False. With a list comprehension, you build the entire list first — even if the answer was obvious after the first element. With a generator, it short-circuits.
# With 100,000 orders, this stops at the first unpaid one:
all_paid = all(order["status"] == "paid" for order in orders)
# This builds a 100,000-element list first, THEN checks all of them:
all_paid = all([order["status"] == "paid" for order in orders])
So on big data, the generator version can be dramatically faster.
On big data, the generator version might not even touch most of the data. That's the whole idea from Day 6 — laziness as a feature.
What about next()? You mentioned it.
next() pulls one value from a generator. Useful when you want the first match and nothing else:
# First paid order — stop as soon as we find one
first_paid = next(
order for order in orders if order["status"] == "paid"
)
print(first_paid["customer"]) # Alice Chen
# Safe version with a default if nothing matches
first_cancelled = next(
(order for order in orders if order["status"] == "refunded"),
None # default if generator is exhausted with no match
)
print(first_cancelled) # None
So instead of filter() and then grabbing index zero, I can just next() the generator. Hmm. Amir does this — I've seen it in the order processing code and I never knew why.
When did you start reading the order processing code?
After Day 6. Once generators made sense, a bunch of stuff I'd been skipping actually became readable.
That's exactly what's supposed to happen. The patterns unlock the code.
Okay so I want to make sure I have the decision rule right. When do I use a generator expression versus a list comprehension?
Three questions.
First: do you need to use the result more than once, or access it by index?
# Need index 0? Need to loop through it twice? Use a list:
paid_orders = [order for order in orders if order["status"] == "paid"]
first = paid_orders[0] # works
count = len(paid_orders) # works
for o in paid_orders: ... # works again — generators exhaust after one pass
Second: are you passing the result directly to sum(), any(), all(), next(), or similar?
# Feeding to a function once? Use a generator expression:
revenue = sum(order["total"] for order in orders if order["status"] == "paid")
Third: is the dataset large enough that memory matters?
# 10 orders? Either is fine. 10 million orders? Generator.
total = sum(row["amount"] for row in massive_export if row["type"] == "sale")
So the generator expression is specifically for the case where I'm building a value from the sequence, not keeping the sequence itself.
That's the most precise way I've heard it put. You're computing a result, not storing a collection.
And if I need the full generator function with yield from Day 6?
When the generation logic is too complex for one line, or when you need state between yields:
# This is fine as a generator expression:
revenue = sum(order["total"] for order in orders if order["status"] == "paid")
# This needs a generator function — logic is too complex for one line:
def orders_with_discount(orders, threshold):
running_total = 0
for order in orders:
if order["status"] == "paid":
running_total += order["total"]
discount = 0.1 if running_total > threshold else 0
yield {**order, "discount": discount}
So the three tools are: list comprehension when you need the collection, generator expression when you're feeding it to a function once, generator function when the logic doesn't fit on a line.
Week 1 in one sentence. That's the whole week — expressions instead of loops, and three levels of how lazy you want to be.
The revenue calculation is going to be one line now. I have five places in the codebase that build a list, loop through it to sum a field, and throw the list away. Generator expression plus sum. Done.
Five places. I believe you.
Actually, wait. The orders in our codebase — they're dicts. Every field is a string key lookup. order["total"], order["status"], order["customer"]. If I typo a key name, I don't find out until runtime. There's no... checking.
You're about to invent the argument for classes.
I mean — what if the order knew what it was? Like, order.total instead of order["total"]? Autocomplete would work. The typo would be caught.
Next week, we build that. An order that knows it's an order. A product that validates its own price. Data that talks back when you give it nonsense. Dicts are flexible — but flexible means anything goes, including things that shouldn't.
So we go from "a dict with keys" to "an object with attributes and rules."
And when you understand that transition, you'll finally understand why half the code in your codebase is structured the way it is. Amir didn't write classes because he likes ceremony. He wrote classes because at some point a dict accepted a negative total and it caused a problem at 2 AM.