{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# defaultdict\n",
    "\n",
    "A recurring annoyance with plain dicts: before you can append to `d[key]` or increment it, the key has to exist, so your code fills up with `if key not in d` guards. `collections.defaultdict` removes them. You give it a **factory** — a function that produces the default value — and any missing key is created on first access by calling that factory.\n",
    "\n",
    "It's a `dict` subclass, so it behaves like a dict everywhere else. The only addition is what happens when a key is missing."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The grouping pattern (factory: `list`)\n",
    "\n",
    "This is the use case that justifies the type on its own: sorting items into buckets. With `defaultdict(list)`, the first time you touch a key its value is a fresh empty list, ready to append to."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from collections import defaultdict\n",
    "\n",
    "words = ['apple', 'avocado', 'banana', 'cherry', 'blueberry', 'apricot']\n",
    "\n",
    "by_letter = defaultdict(list)\n",
    "for word in words:\n",
    "    by_letter[word[0]].append(word)   # no 'if key in dict' needed\n",
    "\n",
    "print(dict(by_letter))\n",
    "# {'a': ['apple', 'avocado', 'apricot'], 'b': ['banana', 'blueberry'], 'c': ['cherry']}"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The counting pattern (factory: `int`)\n",
    "\n",
    "`int()` returns `0`, so `defaultdict(int)` gives you a counter where every new key starts at zero. (For pure counting, `Counter` from the [last notebook](https://agilearn.co.uk/guides/collections/learn/01-counter) is usually nicer — but this pattern generalises to sums and other accumulations.)"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from collections import defaultdict\n",
    "\n",
    "totals = defaultdict(int)\n",
    "sales = [('north', 100), ('south', 50), ('north', 75), ('east', 30)]\n",
    "for region, amount in sales:\n",
    "    totals[region] += amount          # starts from 0 automatically\n",
    "\n",
    "print(dict(totals))    # {'north': 175, 'south': 50, 'east': 30}"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Other factories: `set`, and nesting\n",
    "\n",
    "Any zero-argument callable works. `set` collects unique items per key; a `lambda` returning another `defaultdict` builds nested structures."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from collections import defaultdict\n",
    "\n",
    "# group into sets — duplicates collapse\n",
    "seen = defaultdict(set)\n",
    "seen['a'].add(1); seen['a'].add(1); seen['a'].add(2)\n",
    "print(dict(seen))      # {'a': {1, 2}}\n",
    "\n",
    "# nested: a dict of dicts of ints\n",
    "grid = defaultdict(lambda: defaultdict(int))\n",
    "grid['x']['y'] += 5\n",
    "print(grid['x']['y'])  # 5"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The gotcha: reading a missing key *creates* it\n",
    "\n",
    "This is the one surprise. Because the factory fires on **any** access to a missing key — including a plain read — merely looking can mutate the dictionary. This is the opposite of `Counter` (which returns `0` without inserting) and of `dict.get`."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from collections import defaultdict\n",
    "\n",
    "d = defaultdict(list)\n",
    "print(d['never_set'])      # [] — looks harmless...\n",
    "print(dict(d))             # {'never_set': []} — ...but the key now exists!\n",
    "\n",
    "# to check without creating, use 'in' or .get:\n",
    "d2 = defaultdict(list)\n",
    "print('x' in d2)           # False, and nothing is created\n",
    "print(d2.get('x'))         # None, and nothing is created"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## `defaultdict` versus `dict.setdefault`\n",
    "\n",
    "A plain dict can do the same job with `setdefault(key, default)`, which returns the existing value or inserts and returns the default. The difference is style and a subtle efficiency point: `setdefault` builds the default value on *every* call (even when unused), while `defaultdict` only calls the factory when the key is actually missing."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "# setdefault — works, but constructs a new [] on every iteration\n",
    "groups = {}\n",
    "for word in ['apple', 'avocado', 'banana']:\n",
    "    groups.setdefault(word[0], []).append(word)\n",
    "print(groups)              # {'a': ['apple', 'avocado'], 'b': ['banana']}"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Reach for `defaultdict` when one dictionary is built up with the same default throughout a function; reach for `setdefault` for a one-off, or when you want the dict to stay a plain `dict` with no lingering factory."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Converting back to a plain dict\n",
    "\n",
    "A `defaultdict` keeps its factory, so it will keep auto-creating keys. When you're done building and want a normal dict (e.g. before returning it, or to make missing-key access raise again), wrap it in `dict()`."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from collections import defaultdict\n",
    "\n",
    "dd = defaultdict(list)\n",
    "dd['a'].append(1)\n",
    "plain = dict(dd)           # a regular dict — no more auto-creation\n",
    "print(plain)               # {'a': [1]}\n",
    "print(type(plain).__name__)  # dict"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Recap\n",
    "\n",
    "- `defaultdict(factory)` calls `factory()` to supply a value for any missing key.\n",
    "- `list` for grouping, `int` for counting/summing, `set` for unique grouping, a `lambda` for nesting.\n",
    "- **Reading** a missing key creates it — use `in` or `.get` to check without inserting.\n",
    "- `dict.setdefault` is the plain-dict alternative; `defaultdict` avoids rebuilding the default each time.\n",
    "- Wrap in `dict()` to freeze it back into an ordinary dictionary.\n",
    "\n",
    "Next: [deque](https://agilearn.co.uk/guides/collections/learn/03-deque) — fast appends and pops at both ends."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}