Data classes¶
A lot of classes exist for one reason: bundle a few values together under a name. The fields, an __init__ that assigns each one to self, an __eq__ that compares them, a __repr__ that lists them. It's all boilerplate, and you write it the same way every time.
@dataclass (added in Python 3.7) generates all of that from a class body that just lists the field names with their types. This notebook covers @dataclass in depth, then introduces NamedTuple and TypedDict as lighter-weight alternatives. The decision recipe ties the three together.
A worked example: Point as a data class¶
The hand-rolled Point from the previous notebook was about fifteen lines. The dataclass version is four:
from dataclasses import dataclass
@dataclass
class Point:
x: float
y: float
p1 = Point(3, 4)
p2 = Point(3, 4)
print(p1) # __repr__ for free
print(p1 == p2) # __eq__ for free
From a list of annotated fields, @dataclass generated:
__init__that takesxandyas parameters and assigns them toself.__repr__that printsPoint(x=3, y=4).__eq__that compares all the fields in order.
The annotations (x: float, y: float) aren't enforced at runtime — they're hints, the same as anywhere else in Python. But they're required for @dataclass to recognise the field. A bare x = 0 line wouldn't be picked up.
Defaults and field()¶
Simple defaults work as you'd expect:
@dataclass
class Page:
title: str
word_count: int = 0
published: bool = False
print(Page("Untitled"))
print(Page("Hello", word_count=42, published=True))
Just like ordinary functions, fields with defaults must come after fields without. If you try to put an undefaulted field after a defaulted one, @dataclass raises a TypeError.
Mutable defaults need default_factory¶
You can't use a mutable object — [], {}, set() — as a default value, because every instance would share the same object. @dataclass catches this and refuses to apply the decorator:
from dataclasses import field
try:
@dataclass
class Bag:
items: list = []
except ValueError as e:
print(f"{type(e).__name__}: {e}")
Use field(default_factory=...) instead. The factory is called fresh for each new instance:
@dataclass
class Bag:
items: list = field(default_factory=list)
a = Bag()
b = Bag()
a.items.append("apple")
print(a.items, b.items) # b is unaffected
frozen=True — immutable instances¶
Pass frozen=True to the decorator and you get an immutable dataclass. Trying to assign to a field after construction raises. As a bonus, frozen dataclasses get a __hash__, so instances can go in sets and dict keys.
@dataclass(frozen=True)
class Coord:
lat: float
lon: float
home = Coord(51.5, -0.1)
try:
home.lat = 52.0
except Exception as e:
print(f"{type(e).__name__}: {e}")
print({Coord(51.5, -0.1), Coord(51.5, -0.1)}) # hashable; one survives
Use frozen=True for value-like types: coordinates, money, configuration records — anything where two instances with the same fields are the same thing.
slots=True — smaller, stricter instances¶
Pass slots=True (added in Python 3.10) and the generated class uses __slots__ instead of an instance __dict__. Two practical effects:
- Instances use noticeably less memory — useful when you're creating millions of them.
- Setting an attribute that wasn't declared raises
AttributeError, instead of silently adding it.
@dataclass(slots=True)
class Pixel:
x: int
y: int
colour: str
p = Pixel(0, 0, "red")
try:
p.alpha = 0.5 # not declared — refused
except AttributeError as e:
print(f"{type(e).__name__}: {e}")
order=True — free comparisons¶
Pass order=True and @dataclass adds __lt__, __le__, __gt__, and __ge__. Comparison is tuple-style: it compares fields left-to-right, in declaration order.
@dataclass(order=True)
class Task:
priority: int
title: str
tasks = [Task(2, "write tests"), Task(1, "fix bug"), Task(2, "update docs")]
for t in sorted(tasks):
print(t)
Field declaration order matters here. Task(1, ...) sorts before any Task(2, ...) because priority is the first field. If you want to sort by title first, declare title first.
__post_init__ — derived fields and validation¶
When the generated __init__ isn't enough — you need a derived field, or you want to validate the inputs — define __post_init__. Dataclass calls it after the auto-generated __init__ finishes.
@dataclass
class Rectangle:
width: float
height: float
def __post_init__(self):
if self.width <= 0 or self.height <= 0:
raise ValueError("sides must be positive")
self.area = self.width * self.height
r = Rectangle(3, 4)
print(r.area)
try:
Rectangle(-1, 4)
except ValueError as e:
print(f"{type(e).__name__}: {e}")
NamedTuple — when you really do want a tuple¶
If your record-like type is genuinely tuple-shaped — small, immutable, sometimes unpacked — typing.NamedTuple is even lighter than a frozen dataclass. It is a tuple, with attribute access added on top.
from typing import NamedTuple
class Coord(NamedTuple):
lat: float
lon: float
home = Coord(51.5, -0.1)
print(home.lat, home.lon) # attribute access
lat, lon = home # tuple unpacking
print(lat, lon)
print(home == (51.5, -0.1)) # equal to a plain tuple with same contents!
Two things that follow from "it's a tuple":
- Always immutable. No
frozen=Truedecision to make. - Equal to plain tuples with the same contents —
Coord(51.5, -0.1) == (51.5, -0.1)isTrue. Sometimes useful, sometimes a footgun.
Reach for NamedTuple when the type is small (two or three fields), immutable, and you genuinely want tuple-like behaviour. For anything bigger or mutable, prefer @dataclass.
TypedDict — typed dicts, not classes¶
If your data really is a dict — comes from JSON, gets passed to a library that expects a dict — but you want type-checker support for the keys, use typing.TypedDict. It's not a class in the runtime sense; it's a hint for type checkers like mypy and pyright. The type hints guide covers it in detail.
Choosing between the three¶
A quick decision tree:
- Reach for
@dataclassfor almost everything. Mutable record types, value types (withfrozen=True), anything with more than three fields, anything that needs__post_init__validation. - Use
NamedTuplewhen the type is small, immutable, and tuple-like — coordinate pairs, key-value entries, points in time. - Use
TypedDictwhen the data is genuinely a dict and you only need static type information. - Hand-write the class when you need behaviour that's hard to express through dataclass field declarations — heavy custom dunders, descriptors, complex constructor logic.
The choose-between recipe goes deeper into the trade-offs.
Exercise¶
Define an Order dataclass for a small e-commerce system:
id: strcustomer: stritems: list[str](default to an empty list — rememberdefault_factory)discount: float = 0.0- A
__post_init__that raisesValueErrorifdiscountis outside0.0 <= discount <= 1.0. - Make it
frozen=Trueandslots=True.
Test that an order with a 0.1 discount works, that the items field is independent across instances, and that a discount of 1.5 raises.
# Your code here
Solution
from dataclasses import dataclass, field
@dataclass(frozen=True, slots=True)
class Order:
id: str
customer: str
items: list[str] = field(default_factory=list)
discount: float = 0.0
def __post_init__(self):
if not 0.0 <= self.discount <= 1.0:
raise ValueError("discount must be between 0 and 1")
# Note: with frozen=True you can't mutate `items` by reassigning it,
# but you can still mutate the list contents — frozen freezes the
# attribute binding, not the objects it points to.
Recap¶
@dataclassgenerates__init__,__repr__, and__eq__from an annotated field list.- Mutable defaults must use
field(default_factory=...). frozen=Truemakes instances immutable and hashable.slots=Truesaves memory and refuses undeclared attributes.order=Trueadds__lt__and friends; comparison follows declaration order.__post_init__runs after the generated__init__— use it for validation or derived fields.- For small immutable tuple-like types,
typing.NamedTupleis even lighter. - For dict-shaped data,
typing.TypedDictgives static type support.
Next: Inheritance and composition, where we'll see why Python programmers reach for inheritance less often than other-language programmers expect.