The cProfile session found the bottleneck in your inventory script. You fixed it. Runtime dropped from 28 seconds to 4. Did you look at memory?
I did not. I assumed once the time was fixed the memory was fine. The script runs in 4 seconds now. I closed the ticket.
Runtime and memory are separate dimensions. Your fixed script creates 40,000 objects before it processes any of them. That does not show up as a slow script — it shows up as a script that crashes at 100,000 records, or gets killed by the OS on a constrained server, or slows down because the garbage collector runs constantly under memory pressure. cProfile measures time. tracemalloc measures memory. But today I want to show you the design pattern that eliminates the problem before you need to measure it.
Generators. You mentioned them in the anti-patterns lesson.
The comparison in concrete numbers:
import sys
# List: all 40,000 records in memory at once
records = [{f: item[f] for f in fields} for item in raw_data]
print(sys.getsizeof(records)) # ~320KB for the list structure
# plus ~14MB for the 40,000 dict objects it holds
# Generator: one record at a time
records = ({f: item[f] for f in fields} for item in raw_data)
print(sys.getsizeof(records)) # ~112 bytes, always, regardless of data size112 bytes versus 14+ megabytes. The generator object is always the same size. It is a suspended function with local state. It yields one dict, the caller processes it, that dict is collected, and the generator advances to the next item.
112 bytes. Regardless of how many records the data has?
The generator object itself, yes. The dict it yields at each step is also in memory, but only until the caller is done with it. Peak memory at any moment: one record. This is why generators are the standard tool for processing files and database cursors — not because they are clever, because they are the only way to process more data than you have RAM. But they have a constraint you named correctly: sequential, single-pass only. If you need random access — record 10,000 while processing record 3 — you need a list.
The inventory export processes each record once, writes to a report, moves on. I was building a 14-megabyte list to traverse it sequentially once. The generator version would have the same output with a fraction of the memory.
The second tool is __slots__. Instead of reducing how many objects you hold, it reduces how much each object costs. A regular Python class instance carries a __dict__ — a full hash table for attribute storage:
class InventoryItem:
def __init__(self, sku, warehouse, quantity):
self.sku = sku
self.warehouse = warehouse
self.quantity = quantity
item = InventoryItem("SKU-001", "WH-A", 100)
print(sys.getsizeof(item.__dict__)) # ~232 bytes just for the dict structureThat 232-byte overhead exists on every instance, even if the class only has three attributes. __slots__ replaces the per-instance dict with fixed descriptors:
class InventoryItem:
__slots__ = ("sku", "warehouse", "quantity")
def __init__(self, sku, warehouse, quantity):
self.sku = sku
self.warehouse = warehouse
self.quantity = quantityNo __dict__ at all. Instance size drops from ~280 bytes to ~88 bytes. 40,000 instances: 11 megabytes down to 3.5 megabytes.
And the trade-off: you cannot add attributes dynamically. item.new_field = "x" raises AttributeError. No __dict__ to fall back to. Appropriate for data objects with a fixed schema — your OrderSummary class, your InventoryItem — not for framework base classes or objects whose attributes depend on runtime configuration. If you want __slots__ with class generation syntax, dataclass(slots=True) does it automatically:
from dataclasses import dataclass
@dataclass(frozen=True, slots=True)
class InventoryItem:
sku: str
warehouse: str
quantity: intI have an OrderSummary class with twelve attributes. I create 80,000 of them in a batch export. I have never thought about the memory cost. 28 megabytes in __dict__ overhead alone, before counting the attribute values.
Swap to __slots__ and that drops to about 7 megabytes. Same output, 75% less overhead. The doctor analogy: you do not prescribe __slots__ without running tracemalloc first. Measure which lines allocate the most, decide whether the trade-off is worth it. But for a class you create tens of thousands of with a fixed schema, it almost always is.
Two tools for memory efficiency: generators for sequential data that does not need to live in memory all at once, and __slots__ for objects created in bulk with a fixed attribute schema. Both decisions follow measurement — tracemalloc shows where the allocations are, then you decide which pattern eliminates the problem.
Exactly the frame. Tomorrow is string performance — measuring the actual ratio between += in a loop and str.join. You mentioned the billing report generator. Bring it.
__slots__ Work in CPythonGenerator objects as suspended function frames. When Python encounters a function with a yield statement, it compiles it as a generator function. Calling a generator function does not execute any code — it creates a generator object, which is a C struct holding a reference to the code object, a frame (local variable state), and a position indicator. The sys.getsizeof() of a generator is always the same because it measures the struct itself, not the data the generator will produce. When next() is called, CPython resumes execution at the suspended frame, runs until the next yield, and suspends again. This is why generators provide O(1) memory for the iterator — the data is produced on demand, not pre-computed and stored.
Generator protocol and one-pass constraint. A generator is an iterator that implements __iter__ (returning itself) and __next__ (advancing to the next yielded value). Once a value is yielded and the generator advances, there is no way to go back. Functions that need multiple passes — like str.join, which needs to measure lengths before filling a buffer — must materialize the generator into a list. This is why list(gen) and [x for x in gen] produce the same result: both exhaust the generator into a concrete list.
How __slots__ replaces __dict__. A regular Python class stores instance attributes in a __dict__ — a full Python dict object attached to each instance. This costs roughly 200-300 bytes per instance just for the hash table structure, regardless of how many attributes are stored. When __slots__ is defined, CPython instead creates slot descriptor objects at the class level —one per declared attribute. Each descriptor holds a fixed offset into the instance's memory layout. Attribute access becomes a direct memory read at that offset, not a dict lookup. The instance does not get a __dict__ at all, which eliminates the 200-300 byte overhead.
__slots__ and inheritance. If a __slots__ class inherits from a regular class (one without __slots__), the subclass still has a __dict__ because the parent class does. To get the memory savings throughout a hierarchy, every class in the chain must declare __slots__. If a subclass omits __slots__, Python adds a __dict__ to it, defeating the purpose. dataclass(slots=True) handles this correctly for the common case: it generates __slots__ on the dataclass itself, and if you do not subclass it, you get the savings.
Sign up to write and run code in this lesson.
The cProfile session found the bottleneck in your inventory script. You fixed it. Runtime dropped from 28 seconds to 4. Did you look at memory?
I did not. I assumed once the time was fixed the memory was fine. The script runs in 4 seconds now. I closed the ticket.
Runtime and memory are separate dimensions. Your fixed script creates 40,000 objects before it processes any of them. That does not show up as a slow script — it shows up as a script that crashes at 100,000 records, or gets killed by the OS on a constrained server, or slows down because the garbage collector runs constantly under memory pressure. cProfile measures time. tracemalloc measures memory. But today I want to show you the design pattern that eliminates the problem before you need to measure it.
Generators. You mentioned them in the anti-patterns lesson.
The comparison in concrete numbers:
import sys
# List: all 40,000 records in memory at once
records = [{f: item[f] for f in fields} for item in raw_data]
print(sys.getsizeof(records)) # ~320KB for the list structure
# plus ~14MB for the 40,000 dict objects it holds
# Generator: one record at a time
records = ({f: item[f] for f in fields} for item in raw_data)
print(sys.getsizeof(records)) # ~112 bytes, always, regardless of data size112 bytes versus 14+ megabytes. The generator object is always the same size. It is a suspended function with local state. It yields one dict, the caller processes it, that dict is collected, and the generator advances to the next item.
112 bytes. Regardless of how many records the data has?
The generator object itself, yes. The dict it yields at each step is also in memory, but only until the caller is done with it. Peak memory at any moment: one record. This is why generators are the standard tool for processing files and database cursors — not because they are clever, because they are the only way to process more data than you have RAM. But they have a constraint you named correctly: sequential, single-pass only. If you need random access — record 10,000 while processing record 3 — you need a list.
The inventory export processes each record once, writes to a report, moves on. I was building a 14-megabyte list to traverse it sequentially once. The generator version would have the same output with a fraction of the memory.
The second tool is __slots__. Instead of reducing how many objects you hold, it reduces how much each object costs. A regular Python class instance carries a __dict__ — a full hash table for attribute storage:
class InventoryItem:
def __init__(self, sku, warehouse, quantity):
self.sku = sku
self.warehouse = warehouse
self.quantity = quantity
item = InventoryItem("SKU-001", "WH-A", 100)
print(sys.getsizeof(item.__dict__)) # ~232 bytes just for the dict structureThat 232-byte overhead exists on every instance, even if the class only has three attributes. __slots__ replaces the per-instance dict with fixed descriptors:
class InventoryItem:
__slots__ = ("sku", "warehouse", "quantity")
def __init__(self, sku, warehouse, quantity):
self.sku = sku
self.warehouse = warehouse
self.quantity = quantityNo __dict__ at all. Instance size drops from ~280 bytes to ~88 bytes. 40,000 instances: 11 megabytes down to 3.5 megabytes.
And the trade-off: you cannot add attributes dynamically. item.new_field = "x" raises AttributeError. No __dict__ to fall back to. Appropriate for data objects with a fixed schema — your OrderSummary class, your InventoryItem — not for framework base classes or objects whose attributes depend on runtime configuration. If you want __slots__ with class generation syntax, dataclass(slots=True) does it automatically:
from dataclasses import dataclass
@dataclass(frozen=True, slots=True)
class InventoryItem:
sku: str
warehouse: str
quantity: intI have an OrderSummary class with twelve attributes. I create 80,000 of them in a batch export. I have never thought about the memory cost. 28 megabytes in __dict__ overhead alone, before counting the attribute values.
Swap to __slots__ and that drops to about 7 megabytes. Same output, 75% less overhead. The doctor analogy: you do not prescribe __slots__ without running tracemalloc first. Measure which lines allocate the most, decide whether the trade-off is worth it. But for a class you create tens of thousands of with a fixed schema, it almost always is.
Two tools for memory efficiency: generators for sequential data that does not need to live in memory all at once, and __slots__ for objects created in bulk with a fixed attribute schema. Both decisions follow measurement — tracemalloc shows where the allocations are, then you decide which pattern eliminates the problem.
Exactly the frame. Tomorrow is string performance — measuring the actual ratio between += in a loop and str.join. You mentioned the billing report generator. Bring it.
__slots__ Work in CPythonGenerator objects as suspended function frames. When Python encounters a function with a yield statement, it compiles it as a generator function. Calling a generator function does not execute any code — it creates a generator object, which is a C struct holding a reference to the code object, a frame (local variable state), and a position indicator. The sys.getsizeof() of a generator is always the same because it measures the struct itself, not the data the generator will produce. When next() is called, CPython resumes execution at the suspended frame, runs until the next yield, and suspends again. This is why generators provide O(1) memory for the iterator — the data is produced on demand, not pre-computed and stored.
Generator protocol and one-pass constraint. A generator is an iterator that implements __iter__ (returning itself) and __next__ (advancing to the next yielded value). Once a value is yielded and the generator advances, there is no way to go back. Functions that need multiple passes — like str.join, which needs to measure lengths before filling a buffer — must materialize the generator into a list. This is why list(gen) and [x for x in gen] produce the same result: both exhaust the generator into a concrete list.
How __slots__ replaces __dict__. A regular Python class stores instance attributes in a __dict__ — a full Python dict object attached to each instance. This costs roughly 200-300 bytes per instance just for the hash table structure, regardless of how many attributes are stored. When __slots__ is defined, CPython instead creates slot descriptor objects at the class level —one per declared attribute. Each descriptor holds a fixed offset into the instance's memory layout. Attribute access becomes a direct memory read at that offset, not a dict lookup. The instance does not get a __dict__ at all, which eliminates the 200-300 byte overhead.
__slots__ and inheritance. If a __slots__ class inherits from a regular class (one without __slots__), the subclass still has a __dict__ because the parent class does. To get the memory savings throughout a hierarchy, every class in the chain must declare __slots__. If a subclass omits __slots__, Python adds a __dict__ to it, defeating the purpose. dataclass(slots=True) handles this correctly for the common case: it generates __slots__ on the dataclass itself, and if you do not subclass it, you get the savings.