Day 14 · ~18m

Python Dataclasses: OOP Without the Boilerplate

Classes validate data. But writing __init__ is tedious. Learn how @dataclass generates it for you — and why you still need to write your own sometimes.

student (thinking)

Wait. We're back to classes. I thought we just spent a week learning how to write __init__ and __repr__ and __eq__. Why are we doing this again?

teacher (neutral)

Because you wrote enough boilerplate that you're starting to notice the pattern. Watch — here's an Order class the way you write it:

class Order:
    def __init__(self, order_id, customer, total, status="pending"):
        self.id = order_id
        self.customer = customer
        self.total = total
        self.status = status
    
    def __repr__(self):
        return f"Order({self.id}, {self.customer}, {self.total}, {self.status})"
    
    def __eq__(self, other):
        if not isinstance(other, Order):
            return False
        return (self.id == other.id and 
                self.customer == other.customer and 
                self.total == other.total and 
                self.status == other.status)
student (focused)

Okay that's... eleven lines of boilerplate just to store four attributes. And I didn't even add validation yet.

teacher (amused)

Exactly. Now watch the same class with a dataclass:

from dataclasses import dataclass

@dataclass
class Order:
    id: int
    customer: str
    total: float
    status: str = "pending"
student (surprised)

That's it? That's the entire class?

teacher (focused)

That's the entire class. The @dataclass decorator generates __init__, __repr__, and __eq__ automatically. All of it.

student (excited)

OH. That's what the decorator is for. It sees the attributes and builds the boilerplate.

teacher (encouraging)

Exactly. A decorator is a function that takes a class, modifies it, and returns it. @dataclass looks at your class definition, sees id: int, customer: str, etc., and generates all the methods you would have written by hand.

student (thinking)

So type hints aren't just for type checking — the dataclass actually uses them?

teacher (serious)

Yes. In a regular class, type hints are optional — they're comments the linter reads. In a dataclass, the type hints are the definition. They tell the decorator what attributes to create and what their types are.

student (curious)

What does the generated __init__ look like?

teacher (neutral)

It looks exactly like what you would write. def __init__(self, id, customer, total, status="pending"): followed by self.id = id, etc. The decorator generates it for you so you don't have to.

student (focused)

Let me test this. If I create two Order instances with the same data, does == work?

teacher (encouraging)

Yes. The generated __eq__ compares all the attributes. Watch:

from dataclasses import dataclass

@dataclass
class Order:
    id: int
    customer: str
    total: float
    status: str = "pending"

order1 = Order(101, "Alice", 200.0)
order2 = Order(101, "Alice", 200.0)
order3 = Order(102, "Bob", 150.0)

print(order1 == order2)  # True — same data
print(order1 == order3)  # False — different data
print(repr(order1))      # Order(id=101, customer='Alice', total=200.0, status='pending')
student (thinking)

So the __repr__ is generated too, and it shows the field names. That's actually useful for debugging.

teacher (focused)

Much more useful than the default <Order object at 0x...>. When you're debugging and print an order, you see the actual data.

student (curious)

What if I want a default value? Like status defaulting to "pending"?

teacher (neutral)

You saw it in the example. Just set it in the class definition:

@dataclass
class Order:
    id: int
    customer: str
    total: float
    status: str = "pending"

order = Order(101, "Alice", 200.0)  # status defaults to "pending"
print(order.status)  # pending
student (focused)

What if the default is a list?

teacher (serious)

Don't do this:

@dataclass
class Order:
    id: int
    items: list = []  # WRONG — mutable default

All Order instances would share the same list. When you append to one, they all see it. It's the mutable default problem we mentioned on Day 9.

student (confused)

Wait, mutable defaults break dataclasses too?

teacher (encouraging)

They break everything. But dataclasses give you a tool to fix it — field(default_factory=...):

from dataclasses import dataclass, field

@dataclass
class Order:
    id: int
    customer: str
    items: list = field(default_factory=list)

order1 = Order(101, "Alice")
order2 = Order(102, "Bob")
order1.items.append("Widget")

print(order1.items)  # ['Widget']
print(order2.items)  # [] — not shared
student (thinking)

So field(default_factory=list) creates a new empty list for each instance.

teacher (focused)

Yes. default_factory is a function that returns a new default value each time. list creates a new empty list. dict creates a new empty dict. You can use any callable.

student (curious)

So now I can write classes in one line instead of ten. But what about validation? I still need to check that the total is positive.

teacher (neutral)

Good question. Dataclasses don't prevent you from adding custom logic. Use __post_init__:

from dataclasses import dataclass

@dataclass
class Order:
    id: int
    customer: str
    total: float
    status: str = "pending"
    
    def __post_init__(self):
        if self.total <= 0:
            raise ValueError("Order total must be positive")

order = Order(101, "Alice", 200.0)  # Works
bad_order = Order(102, "Bob", -50.0)  # ValueError in __post_init__
student (excited)

__post_init__ runs after __init__ is done?

teacher (encouraging)

Exactly. The dataclass generates __init__, and if you define __post_init__, it calls it automatically at the end. Perfect for validation.

student (thinking)

So the decorator generates __init__, then my __post_init__ validates. That's clean.

teacher (proud)

That's exactly how it works. And if you need __repr__ to be different, you can override it. But most of the time, the generated one is fine.

student (focused)

What if I don't want two Order instances to be equal if the status is different?

teacher (neutral)

You override __eq__:

@dataclass
class Order:
    id: int
    customer: str
    total: float
    status: str = "pending"
    
    def __eq__(self, other):
        if not isinstance(other, Order):
            return False
        return self.id == other.id  # Only compare ID

order1 = Order(101, "Alice", 200.0, status="pending")
order2 = Order(101, "Alice", 200.0, status="shipped")
print(order1 == order2)  # True — same ID, status ignored
student (amused)

So I can have it both ways. Dataclass generates the boilerplate, and I override the parts I need.

teacher (amused)

That's the idea. A dataclass is a shortcut that works 80% of the time. When the 20% case comes up, you add the methods you need.

student (curious)

What's frozen=True? I saw that in the codebase.

teacher (focused)

@dataclass(frozen=True) makes the instance immutable. After it's created, you can't change the fields:

@dataclass(frozen=True)
class OrderSnapshot:
    id: int
    total: float
    created_at: str

snap = OrderSnapshot(101, 200.0, "2025-01-15")
print(snap.total)  # 200.0

snap.total = 250.0  # FrozenInstanceError — can't modify
student (thinking)

So frozen dataclasses are like... a record that can't be changed after creation.

teacher (encouraging)

Exactly. When you need to capture a snapshot of data that should never change — a billing record, an order receipt, a historical transaction — frozen makes sure nobody accidentally mutates it.

student (focused)

When would I use a regular class instead of a dataclass?

teacher (serious)

When you need complex __init__ logic that dataclasses don't support. If you're doing a lot of computation, or calling other methods during setup, or building something from components — write a regular class.

student (thinking)

What's an example?

teacher (neutral)

A User class that hashes a password in __init__, or an APIClient that connects to a server during setup, or a DataFrame that computes column statistics. Those are too complex for a dataclass's simple attribute assignment.

student (curious)

So dataclasses are for data holders. Regular classes are for... everything else?

teacher (encouraging)

More or less. Dataclasses are for classes whose main job is to hold data. Regular classes are for anything that does something.

student (proud)

Got it. So I use dataclasses to replace all that boilerplate, and I fall back to regular classes when I need control.

teacher (proud)

And this is the exact moment when you stop thinking about syntax and start thinking about design. You're not memorizing anymore — you're choosing tools that fit the problem.

student (focused)

So what's the code challenge?

teacher (neutral)

Define a Product dataclass with three attributes: name, price, and category. Add an is_affordable method that returns True if the price is less than 50. The entry point is get_product_info(name, price, category) — it creates a Product and returns a dict with name, price, and is_affordable.

student (thinking)

So the dataclass holds the data, the method checks affordability, and the function ties it together.

teacher (focused)

Exactly. Your dataclass doesn't need __post_init__ or validation for this one. Just the three fields and the method.

student (curious)

And is_affordable should be a method on the dataclass?

teacher (encouraging)

Yes. The Product owns the logic to determine if it's affordable. That's better design than checking it outside the class.

student (thinking)

Okay. I think I can write this. Define the dataclass, add the method, write the entry point function.

teacher (serious)

And remember — the first type hint tells the dataclass decorator what to generate. If it's not a type hint, it's not an attribute.

student (proud)

This is Week 2. I can model any data now. Dicts for simple stuff, dataclasses for structured data with behavior, regular classes when I need full control.