{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Type a data structure\n",
    "\n",
    "**The question.** You have some named-field data — a user record, an address, a config block — and you want to annotate its shape. Python has four candidates: plain `dict[K, V]`, `TypedDict`, `NamedTuple`, and `@dataclass`. They overlap, and each earns its place for different reasons.\n",
    "\n",
    "The short answer: **default to `@dataclass`**. Reach for `TypedDict` when the data stays a dict (JSON, YAML, config files); reach for `NamedTuple` for tiny immutable records you'll unpack; reach for `dict[K, V]` when the keys are data, not named fields."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The default: @dataclass for records with named fields\n",
    "from dataclasses import dataclass, field\n",
    "\n",
    "\n",
    "@dataclass\n",
    "class Address:\n",
    "    street: str\n",
    "    postcode: str\n",
    "\n",
    "\n",
    "@dataclass\n",
    "class User:\n",
    "    id: int\n",
    "    name: str\n",
    "    address: Address                                  # nested dataclass\n",
    "    active: bool = True\n",
    "    tags: list[str] = field(default_factory=list)     # mutable default\n",
    "\n",
    "    def display(self) -> str:\n",
    "        return f'{self.name} (id={self.id}) @ {self.address.postcode}'\n",
    "\n",
    "\n",
    "alice = User(\n",
    "    id=1,\n",
    "    name='Alice',\n",
    "    address=Address('42 High St', 'SW1A 1AA'),\n",
    "    tags=['admin'],\n",
    ")\n",
    "\n",
    "print(alice.display())\n",
    "print(alice)                          # auto-generated __repr__\n",
    "\n",
    "# Typed collections of records compose naturally\n",
    "users: list[User] = [alice]\n",
    "by_id: dict[int, User] = {u.id: u for u in users}\n",
    "print(by_id[1].name)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Variant: TypedDict — when the data is already a dict (JSON, YAML, CSV rows)\n",
    "from typing import TypedDict\n",
    "try:\n",
    "    from typing import NotRequired           # Python 3.11+\n",
    "except ImportError:\n",
    "    from typing_extensions import NotRequired\n",
    "\n",
    "class UserRecord(TypedDict):\n",
    "    id: int\n",
    "    name: str\n",
    "    active: bool\n",
    "    email: NotRequired[str]           # this key may be missing\n",
    "\n",
    "alice: UserRecord = {'id': 1, 'name': 'Alice', 'active': True}\n",
    "print(alice['name'])      # still a dict at runtime — bracket access\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Variant: NamedTuple — small, immutable, unpackable\n",
    "from typing import NamedTuple\n",
    "\n",
    "class Point(NamedTuple):\n",
    "    x: float\n",
    "    y: float\n",
    "\n",
    "p = Point(3.0, 4.0)\n",
    "print(p.x, p.y)             # attribute access\n",
    "x, y = p                     # tuple unpacking works\n",
    "print('sum:', x + y)\n",
    "print('== tuple:', p == (3.0, 4.0))    # equal to regular tuples — feature or footgun\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Variant: plain dict[K, V] — when the keys are DATA, not named fields\n",
    "# Counters, caches, indexes, lookups — all best expressed as a dict[K, V]\n",
    "char_counts: dict[str, int] = {}\n",
    "for ch in 'hello':\n",
    "    char_counts[ch] = char_counts.get(ch, 0) + 1\n",
    "print(char_counts)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Why `@dataclass` is the default\n",
    "\n",
    "A dataclass gives you `__init__`, `__repr__`, and `__eq__` for free, plus attribute access (`user.name`, not `user['name']`), methods, inheritance, and type hints per field — with almost no boilerplate. It handles nested records, mutable defaults (`field(default_factory=list)`), and immutability (`frozen=True`) through decorator parameters. The IDE experience is also better: rename-symbol works on attributes, but not on dict keys.\n",
    "\n",
    "`TypedDict` earns its place when the data is **already a dict** at the boundary — parsed JSON, `csv.DictReader` rows, `yaml.safe_load` output. Converting it to a class means two places for the types to live (the class *and* the JSON schema) and runtime cost on every conversion. `TypedDict` adds type-checking without changing the runtime shape.\n",
    "\n",
    "`NamedTuple` earns its place for small, genuinely immutable records where tuple-like behaviour (unpacking, equality with plain tuples, hashability) is an asset rather than a surprise. Two or three fields; coordinates, RGB values, `(key, value)` pairs with names.\n",
    "\n",
    "Plain `dict[K, V]` is the right call when the keys are **data** — word counts, user-by-id lookups, request caches. When you find yourself typing `dict[str, str | int | bool | list[...]]`, the keys have become named fields and you should promote to a class."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Trade-offs\n",
    "\n",
    "**`TypedDict` can't do methods.** It's a hint, not a runtime class. You can't attach behaviour to it, can't inherit from non-TypedDict bases, and the IDE sees bracket access, not attribute access. For anything more than wire-format documentation, convert to a dataclass once you've validated the payload.\n",
    "\n",
    "**`NamedTuple`'s tuple-ness leaks.** `Point(3, 4) == (3, 4)` is `True`, which is occasionally useful and occasionally a silent bug when you're comparing heterogeneous records. Iteration, indexing, and slicing all work; that's the design.\n",
    "\n",
    "**Don't convert between shapes casually.** Each conversion is code you have to maintain and a place where types can drift from runtime. If a `TypedDict` and a `@dataclass` of the same shape exist side by side, one of them is almost certainly in the wrong role.\n",
    "\n",
    "**Decision flow, in one line:**\n",
    "\n",
    "1. Keys are data? → `dict[K, V]`\n",
    "2. Already a dict at the boundary? → `TypedDict`\n",
    "3. Tiny, immutable, want to unpack? → `NamedTuple`\n",
    "4. Everything else → `@dataclass`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Related reading\n",
    "\n",
    "- [Choose between @dataclass, NamedTuple, and a plain class](https://agilearn.co.uk/guides/classes-and-objects/recipes/choose-between-dataclass-namedtuple-class) — the class-side view of the same decision.\n",
    "- [`@dataclass` parameters reference](https://agilearn.co.uk/guides/classes-and-objects/reference/dataclass-parameters) — every decorator option in one place.\n",
    "- [Type a function signature](https://agilearn.co.uk/guides/type-hints/recipes/type-a-function-signature) — how the same types look in parameter and return positions.\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}