Day 21 · ~17m

dataclasses Advanced: field(), frozen=True, __post_init__, and __slots__

Master field() with default_factory, frozen=True for immutable dataclasses, __post_init__ for validation, and slots=True for memory efficiency.

student (curious)

You just sent me the code for an order system. Three PendingOrder instances, all created with the same default: items=[]. Then we append to orders[0].items. And I expected only orders[0] to have the item. But all three have it. The shared list bug. I've seen it before.

teacher (serious)

You have. This is the most dangerous bug in Python. Mutable default arguments. The list is created once when the function is defined, then reused for every call. Three orders, one list. This week is about preventing that.

student (thinking)

We learned basic dataclasses in Intermediate. Now we're fixing the real-world problems with them. field() is about this, right?

teacher (neutral)

field() is the answer. But also frozen=True for immutability, post_init for validation, and slots=True for memory. Four weapons that make dataclasses production-grade.

Let me start with the mutable default problem. You have a dataclass:

from dataclasses import dataclass

@dataclass
class PendingOrder:
    order_id: str
    items: list = []  # DANGER: One list shared by all instances

Every PendingOrder instance shares the same list object. You modify it in one order, all orders see the change.

student (amused)

That is hilariously bad. How does anyone use dataclasses without hitting this?

teacher (amused)

They do not know better, or they learn the hard way. The fix is field(default_factory):

from dataclasses import dataclass, field

@dataclass
class PendingOrder:
    order_id: str
    items: list = field(default_factory=list)  # Each instance gets its own list

Now dataclasses calls list() for each instance. Three orders, three separate lists.

student (thinking)

So field(default_factory=list) means "call list() every time a new PendingOrder is created, and use that new list as the default."

teacher (focused)

Exactly. default_factory is a function that produces a fresh default. You can use any callable:

from dataclasses import dataclass, field
from datetime import datetime

@dataclass
class Order:
    order_id: str
    items: list = field(default_factory=list)
    tags: set = field(default_factory=set)
    metadata: dict = field(default_factory=dict)
    created_at: datetime = field(default_factory=datetime.now)

Each field gets a fresh object. The list is new, the set is new, the dict is new, the timestamp is fresh.

student (excited)

This solves the shared mutable default problem.

teacher (encouraging)

Completely. Now move to immutability. An order is placed, payment is collected. It should never change. If you need to modify an order, you create a new one. This is where frozen=True comes in:

from dataclasses import dataclass, field

@dataclass(frozen=True)
class OrderSnapshot:
    order_id: str
    amount: float
    timestamp: str = field(default_factory=lambda: str(datetime.now()))

Frozen dataclasses generate setattr and delattr methods that raise FrozenInstanceError if you try to modify:

snapshot = OrderSnapshot('ORD-001', 99.99)
snapshot.amount = 50.0  # raises FrozenInstanceError: cannot assign to field 'amount'
student (thinking)

So frozen=True is for audit logs, historical records. Things that should never change once created.

teacher (neutral)

Exactly. And frozen dataclasses have a bonus: they are hashable by default. You can use them as dict keys or in sets, which you cannot do with mutable dataclasses:

frozen_order = OrderSnapshot('ORD-001', 99.99)
order_set = {frozen_order}  # Works

# Mutable dataclass
@dataclass
class PendingOrder:
    order_id: str
    items: list = field(default_factory=list)

pending = PendingOrder('ORD-002', [])
order_set = {pending}  # TypeError: unhashable type
student (focused)

Frozen dataclasses are safer. Immutable, hashable, and they prevent accidental modification. But what if I need to validate data when it is created? Like, an OrderSnapshot with a negative amount should raise an error?

teacher (serious)

That is post_init. It runs after the init method, after all fields are assigned. Use it for validation:

from dataclasses import dataclass, field
from datetime import datetime

@dataclass(frozen=True)
class OrderSnapshot:
    order_id: str
    amount: float
    timestamp: str = field(default_factory=lambda: str(datetime.now()))
    
    def __post_init__(self):
        if self.amount <= 0:
            raise ValueError(f'Order amount must be positive, got {self.amount}')
        if not self.order_id.startswith('ORD-'):
            raise ValueError(f'Order ID must start with ORD-, got {self.order_id}')

Now creating an OrderSnapshot with invalid data fails immediately:

OrderSnapshot('INVALID', 99.99)  # raises ValueError: Order ID must start with ORD-
OrderSnapshot('ORD-001', -10.0)   # raises ValueError: Order amount must be positive
student (curious)

post_init runs after init finishes, after the field values are set. So I can inspect the fields and raise an error if they are invalid.

teacher (encouraging)

And the error prevents the instance from ever existing. The object is not partially constructed. Either it is fully valid, or post_init raises and the constructor fails.

student (thinking)

But frozen=True. If post_init runs after fields are set and the dataclass is frozen, how does post_init modify fields? Or does it not modify them?

teacher (focused)

Great question. post_init runs before the dataclass is frozen. The fields are set, post_init runs, validation happens, and then the dataclass locks itself. You cannot modify fields from outside post_init because the dataclass is frozen. Inside post_init, you can — if needed — use object.setattr() to bypass the freeze, but validation is cleaner:

@dataclass(frozen=True)
class OrderSnapshot:
    order_id: str
    amount: float
    
    def __post_init__(self):
        # Validate, not modify
        if self.amount <= 0:
            raise ValueError(f'Order amount must be positive')

You validate. If validation fails, you raise. If validation passes, the instance is complete and frozen.

student (excited)

So the pattern is: frozen=True for immutability, post_init for validation. Create an OrderSnapshot, either it is valid and frozen, or post_init raises and the instance never exists.

teacher (proud)

You just described the entire pattern. Immutable, validated, safe. This is production-grade dataclass design.

student (curious)

One more thing. I see Kai mention slots=True in the hints. That is from basic Python, right? Something about slots reducing memory overhead?

teacher (neutral)

You remember. slots tells Python to allocate a fixed set of fields instead of a dict. By default, every instance has a dict that stores all the field names and values. With slots, Python preallocates memory for just those fields, no dict. For a class with 10 fields, you save roughly 240 bytes per instance — the dict overhead.

Dataclasses (Python 3.10+) support slots=True:

from dataclasses import dataclass, field

@dataclass(frozen=True, slots=True)
class OrderSnapshot:
    order_id: str
    amount: float

Now OrderSnapshot instances are smaller in memory and faster to access fields because there is no dict lookup.

student (thinking)

So frozen=True for immutability, post_init for validation, field(default_factory=...) for safe defaults, and slots=True for memory efficiency. These are four separate features that work together.

teacher (serious)

They are. And they let you build dataclasses that are:

  1. Immutable: frozen=True prevents modification
  2. Safe defaults: field(default_factory) prevents the mutable default bug
  3. Validated: post_init enforces invariants
  4. Efficient: slots=True reduces memory

You also get repr=False and compare=False on individual fields if you want to exclude them from the string representation or equality checks:

from dataclasses import dataclass, field

@dataclass
class User:
    name: str
    email: str
    password_hash: str = field(repr=False)  # Not shown in str(user)
    internal_id: int = field(compare=False)  # Not used in == and !=
student (amused)

So repr=False keeps the password hash out of the repr, and compare=False means two users with the same name and email are equal even if they have different internal IDs. Useful for privacy and for custom equality logic.

teacher (amused)

Exactly. The field() function is your control panel. repr, compare, hash, init, default, default_factory — each flag controls one aspect of how the field behaves.

student (focused)

All right. I need to build two dataclasses. An OrderSnapshot that is frozen, with post_init validation, and a PendingOrder with field(default_factory=list) for items so each order gets its own list.

teacher (encouraging)

Perfect. OrderSnapshot is the audit log — immutable, validated, hashable. PendingOrder is the work-in-progress — mutable, collecting items, not frozen. Two different patterns for two different use cases.

Next week we will compose these patterns with all the type system features you learned this week — Protocols, TypedDict, ABCs. The type system is solid. Next week: putting the pieces together.