Two ways to compute the sum of squares of 0..N-1:
# List version
def sum_squares_list(n):
return sum([x * x for x in range(n)])
# Generator version
def sum_squares_gen(n):
return sum(x * x for x in range(n))
print(sum_squares_list(5)) # 30
print(sum_squares_gen(5)) # 30Same answer. Different cost.
What's the cost difference?
Memory. The list version builds a full list of N squared values before sum starts iterating. For N=10_000_000, that's ~80MB held in RAM all at once. The generator version produces one value at a time — sum consumes each before the next is generated. Peak memory: a few KB.
And if I add a break?
That's the second win. With a generator, next() only fires when consumed. So in next(x * x for x in range(10_000_000)), you compute exactly one value — 0 — and stop. The list version would have already built all 10 million.
Are lists ever the right choice?
Three cases. (1) You'll iterate the result more than once — generators are one-shot, so you'd have to recreate. (2) You need len(result) — generators don't know their length. (3) You need indexed access — result[42] requires a list. For "compute and consume once," generator. For "compute and reuse," list.
And sum(), max(), any(), all() — they all accept generators?
All of them. Anything in the standard library that takes an iterable will happily take a generator. The shape sum(x * x for x in source) (no brackets) is the cheapest version of "sum of squares."
# List comprehension — builds a list
result = sum([x * x for x in range(N)])
# Generator expression — no list, lazy
result = sum(x * x for x in range(N)) # note: no [ ]When the consumer (sum, max, any, for...) only iterates once, drop the brackets. The generator version uses constant memory regardless of N.
| N | List version peak memory | Generator version peak |
|---|---|---|
| 1_000 | ~8 KB | ~few bytes |
| 1_000_000 | ~8 MB | ~few bytes |
| 100_000_000 | crash (OOM on small machines) | ~few bytes |
For N up to ~1000, the difference is invisible. Past 1M it starts to matter; past 10M it's the difference between "works" and "crashes."
# List version — computes ALL squares before any() runs
any([x * x > 100 for x in range(1_000_000)])
# Generator version — stops at the first x where x*x > 100
any(x * x > 100 for x in range(1_000_000))any short-circuits at the first True. With a generator, the rest never compute — you get the answer in microseconds even for huge N.
1. Multi-pass. If you'll iterate the result twice:
squares = [x * x for x in range(10)]
first_pass = sum(squares)
second_pass = max(squares) # generator would already be exhausted2. Length needed.
result = [x for x in source if matches(x)]
print(f"got {len(result)} matches") # generators don't know their length3. Indexing.
first = result[0]
last = result[-1]
result[5] = "new value"4. Membership tests against the result.
if 42 in result:
... # for a generator, this consumes it up to the match — usually wrong1. One-pass aggregations. sum, max, min, any, all, ''.join(...) — all happy with generators.
2. Pipelines. f(g(h(source))) — if every layer is a generator, no intermediate list is ever built.
3. Streams. Reading lines from a file, bytes from a socket, events from a queue — sources you can't materialise into a list anyway.
4. Possibly-infinite sequences.
def counter():
i = 0
while True:
yield i
i += 1
for x in counter():
if x > 5:
break
print(x)Dropping the [ ] from sum([x * x for x in source]) to sum(x * x for x in source) is the simplest, no-cost performance win in Python. Default to generator expressions; reach for lists only when you need their specific properties.
Write both versions — sum_squares_list(n) using a list comp and sum_squares_gen(n) using a generator expression. Verify they return the same result for n=10. Print which version uses constant memory.
Two ways to compute the sum of squares of 0..N-1:
# List version
def sum_squares_list(n):
return sum([x * x for x in range(n)])
# Generator version
def sum_squares_gen(n):
return sum(x * x for x in range(n))
print(sum_squares_list(5)) # 30
print(sum_squares_gen(5)) # 30Same answer. Different cost.
What's the cost difference?
Memory. The list version builds a full list of N squared values before sum starts iterating. For N=10_000_000, that's ~80MB held in RAM all at once. The generator version produces one value at a time — sum consumes each before the next is generated. Peak memory: a few KB.
And if I add a break?
That's the second win. With a generator, next() only fires when consumed. So in next(x * x for x in range(10_000_000)), you compute exactly one value — 0 — and stop. The list version would have already built all 10 million.
Are lists ever the right choice?
Three cases. (1) You'll iterate the result more than once — generators are one-shot, so you'd have to recreate. (2) You need len(result) — generators don't know their length. (3) You need indexed access — result[42] requires a list. For "compute and consume once," generator. For "compute and reuse," list.
And sum(), max(), any(), all() — they all accept generators?
All of them. Anything in the standard library that takes an iterable will happily take a generator. The shape sum(x * x for x in source) (no brackets) is the cheapest version of "sum of squares."
# List comprehension — builds a list
result = sum([x * x for x in range(N)])
# Generator expression — no list, lazy
result = sum(x * x for x in range(N)) # note: no [ ]When the consumer (sum, max, any, for...) only iterates once, drop the brackets. The generator version uses constant memory regardless of N.
| N | List version peak memory | Generator version peak |
|---|---|---|
| 1_000 | ~8 KB | ~few bytes |
| 1_000_000 | ~8 MB | ~few bytes |
| 100_000_000 | crash (OOM on small machines) | ~few bytes |
For N up to ~1000, the difference is invisible. Past 1M it starts to matter; past 10M it's the difference between "works" and "crashes."
# List version — computes ALL squares before any() runs
any([x * x > 100 for x in range(1_000_000)])
# Generator version — stops at the first x where x*x > 100
any(x * x > 100 for x in range(1_000_000))any short-circuits at the first True. With a generator, the rest never compute — you get the answer in microseconds even for huge N.
1. Multi-pass. If you'll iterate the result twice:
squares = [x * x for x in range(10)]
first_pass = sum(squares)
second_pass = max(squares) # generator would already be exhausted2. Length needed.
result = [x for x in source if matches(x)]
print(f"got {len(result)} matches") # generators don't know their length3. Indexing.
first = result[0]
last = result[-1]
result[5] = "new value"4. Membership tests against the result.
if 42 in result:
... # for a generator, this consumes it up to the match — usually wrong1. One-pass aggregations. sum, max, min, any, all, ''.join(...) — all happy with generators.
2. Pipelines. f(g(h(source))) — if every layer is a generator, no intermediate list is ever built.
3. Streams. Reading lines from a file, bytes from a socket, events from a queue — sources you can't materialise into a list anyway.
4. Possibly-infinite sequences.
def counter():
i = 0
while True:
yield i
i += 1
for x in counter():
if x > 5:
break
print(x)Dropping the [ ] from sum([x * x for x in source]) to sum(x * x for x in source) is the simplest, no-cost performance win in Python. Default to generator expressions; reach for lists only when you need their specific properties.
Write both versions — sum_squares_list(n) using a list comp and sum_squares_gen(n) using a generator expression. Verify they return the same result for n=10. Print which version uses constant memory.
Create a free account to get started. Paid plans unlock all tracks.