Type a data structure¶

The question. You have some named-field data — a user record, an address, a config block — and you want to annotate its shape. Python has four candidates: plain dict[K, V], TypedDict, NamedTuple, and @dataclass. They overlap, and each earns its place for different reasons.

The short answer: default to @dataclass. Reach for TypedDict when the data stays a dict (JSON, YAML, config files); reach for NamedTuple for tiny immutable records you'll unpack; reach for dict[K, V] when the keys are data, not named fields.

In [ ]:

Copied!





# The default: @dataclass for records with named fields
from dataclasses import dataclass, field


@dataclass
class Address:
    street: str
    postcode: str


@dataclass
class User:
    id: int
    name: str
    address: Address                                  # nested dataclass
    active: bool = True
    tags: list[str] = field(default_factory=list)     # mutable default

    def display(self) -> str:
        return f'{self.name} (id={self.id}) @ {self.address.postcode}'


alice = User(
    id=1,
    name='Alice',
    address=Address('42 High St', 'SW1A 1AA'),
    tags=['admin'],
)

print(alice.display())
print(alice)                          # auto-generated __repr__

# Typed collections of records compose naturally
users: list[User] = [alice]
by_id: dict[int, User] = {u.id: u for u in users}
print(by_id[1].name)
# The default: @dataclass for records with named fields
from dataclasses import dataclass, field


@dataclass
class Address:
    street: str
    postcode: str


@dataclass
class User:
    id: int
    name: str
    address: Address                                  # nested dataclass
    active: bool = True
    tags: list[str] = field(default_factory=list)     # mutable default

    def display(self) -> str:
        return f'{self.name} (id={self.id}) @ {self.address.postcode}'


alice = User(
    id=1,
    name='Alice',
    address=Address('42 High St', 'SW1A 1AA'),
    tags=['admin'],
)

print(alice.display())
print(alice)                          # auto-generated __repr__

# Typed collections of records compose naturally
users: list[User] = [alice]
by_id: dict[int, User] = {u.id: u for u in users}
print(by_id[1].name)

In [ ]:

Copied!





# Variant: TypedDict — when the data is already a dict (JSON, YAML, CSV rows)
from typing import TypedDict
try:
    from typing import NotRequired           # Python 3.11+
except ImportError:
    from typing_extensions import NotRequired

class UserRecord(TypedDict):
    id: int
    name: str
    active: bool
    email: NotRequired[str]           # this key may be missing

alice: UserRecord = {'id': 1, 'name': 'Alice', 'active': True}
print(alice['name'])      # still a dict at runtime — bracket access
# Variant: TypedDict — when the data is already a dict (JSON, YAML, CSV rows)
from typing import TypedDict
try:
    from typing import NotRequired           # Python 3.11+
except ImportError:
    from typing_extensions import NotRequired

class UserRecord(TypedDict):
    id: int
    name: str
    active: bool
    email: NotRequired[str]           # this key may be missing

alice: UserRecord = {'id': 1, 'name': 'Alice', 'active': True}
print(alice['name'])      # still a dict at runtime — bracket access

In [ ]:

Copied!





# Variant: NamedTuple — small, immutable, unpackable
from typing import NamedTuple

class Point(NamedTuple):
    x: float
    y: float

p = Point(3.0, 4.0)
print(p.x, p.y)             # attribute access
x, y = p                     # tuple unpacking works
print('sum:', x + y)
print('== tuple:', p == (3.0, 4.0))    # equal to regular tuples — feature or footgun
# Variant: NamedTuple — small, immutable, unpackable
from typing import NamedTuple

class Point(NamedTuple):
    x: float
    y: float

p = Point(3.0, 4.0)
print(p.x, p.y)             # attribute access
x, y = p                     # tuple unpacking works
print('sum:', x + y)
print('== tuple:', p == (3.0, 4.0))    # equal to regular tuples — feature or footgun

In [ ]:

Copied!





# Variant: plain dict[K, V] — when the keys are DATA, not named fields
# Counters, caches, indexes, lookups — all best expressed as a dict[K, V]
char_counts: dict[str, int] = {}
for ch in 'hello':
    char_counts[ch] = char_counts.get(ch, 0) + 1
print(char_counts)
# Variant: plain dict[K, V] — when the keys are DATA, not named fields
# Counters, caches, indexes, lookups — all best expressed as a dict[K, V]
char_counts: dict[str, int] = {}
for ch in 'hello':
    char_counts[ch] = char_counts.get(ch, 0) + 1
print(char_counts)

Why `@dataclass` is the default¶

A dataclass gives you __init__, __repr__, and __eq__ for free, plus attribute access (user.name, not user['name']), methods, inheritance, and type hints per field — with almost no boilerplate. It handles nested records, mutable defaults (field(default_factory=list)), and immutability (frozen=True) through decorator parameters. The IDE experience is also better: rename-symbol works on attributes, but not on dict keys.

TypedDict earns its place when the data is already a dict at the boundary — parsed JSON, csv.DictReader rows, yaml.safe_load output. Converting it to a class means two places for the types to live (the class and the JSON schema) and runtime cost on every conversion. TypedDict adds type-checking without changing the runtime shape.

NamedTuple earns its place for small, genuinely immutable records where tuple-like behaviour (unpacking, equality with plain tuples, hashability) is an asset rather than a surprise. Two or three fields; coordinates, RGB values, (key, value) pairs with names.

Plain dict[K, V] is the right call when the keys are data — word counts, user-by-id lookups, request caches. When you find yourself typing dict[str, str | int | bool | list[...]], the keys have become named fields and you should promote to a class.

Trade-offs¶

TypedDict can't do methods. It's a hint, not a runtime class. You can't attach behaviour to it, can't inherit from non-TypedDict bases, and the IDE sees bracket access, not attribute access. For anything more than wire-format documentation, convert to a dataclass once you've validated the payload.

NamedTuple's tuple-ness leaks. Point(3, 4) == (3, 4) is True, which is occasionally useful and occasionally a silent bug when you're comparing heterogeneous records. Iteration, indexing, and slicing all work; that's the design.

Don't convert between shapes casually. Each conversion is code you have to maintain and a place where types can drift from runtime. If a TypedDict and a @dataclass of the same shape exist side by side, one of them is almost certainly in the wrong role.

Decision flow, in one line:

Keys are data? → dict[K, V]
Already a dict at the boundary? → TypedDict
Tiny, immutable, want to unpack? → NamedTuple
Everything else → @dataclass

Choose between @dataclass, NamedTuple, and a plain class — the class-side view of the same decision.
@dataclass parameters reference — every decorator option in one place.
Type a function signature — how the same types look in parameter and return positions.

Type a data structure¶

Why @dataclass is the default¶

Trade-offs¶

Related reading¶

Why `@dataclass` is the default¶