{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Group items with defaultdict\n",
    "\n",
    "**The question.** You have a flat sequence and you want it bucketed into a `{key: [items]}` mapping — records by category, words by first letter, files by extension, students by grade.\n",
    "\n",
    "The answer is `defaultdict(list)`: append each item to its bucket and let missing keys create themselves. Below are the variations — lists, sets, computed keys, an index, and nesting — plus when a different tool fits."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Group into lists\n",
    "\n",
    "The core pattern. Pick the grouping key for each item and append."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from collections import defaultdict\n",
    "\n",
    "people = [\n",
    "    ('engineering', 'Ada'), ('sales', 'Bo'),\n",
    "    ('engineering', 'Cleo'), ('sales', 'Dev'), ('engineering', 'Eve'),\n",
    "]\n",
    "\n",
    "by_team = defaultdict(list)\n",
    "for team, name in people:\n",
    "    by_team[team].append(name)\n",
    "\n",
    "print(dict(by_team))\n",
    "# {'engineering': ['Ada', 'Cleo', 'Eve'], 'sales': ['Bo', 'Dev']}"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Group by a *computed* key\n",
    "\n",
    "The key can be anything derived from the item — its length, a category, the result of a function."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from collections import defaultdict\n",
    "\n",
    "words = ['hi', 'cat', 'dog', 'a', 'tree', 'ok', 'sun']\n",
    "\n",
    "by_length = defaultdict(list)\n",
    "for word in words:\n",
    "    by_length[len(word)].append(word)\n",
    "\n",
    "print(dict(by_length))        # {2: ['hi', 'ok'], 3: ['cat', 'dog', 'sun'], 1: ['a'], 4: ['tree']}"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Group into sets to drop duplicates\n",
    "\n",
    "Swap the factory to `set` when each bucket should hold *unique* members."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from collections import defaultdict\n",
    "\n",
    "pairs = [('a', 1), ('a', 2), ('a', 1), ('b', 3)]\n",
    "\n",
    "unique = defaultdict(set)\n",
    "for key, value in pairs:\n",
    "    unique[key].add(value)\n",
    "\n",
    "print(dict(unique))           # {'a': {1, 2}, 'b': {3}}"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Build an index (item to positions)\n",
    "\n",
    "A grouping where the value accumulates *where* each item appeared — the basis of a search index."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from collections import defaultdict\n",
    "\n",
    "text = 'the cat sat on the mat'.split()\n",
    "index = defaultdict(list)\n",
    "for position, word in enumerate(text):\n",
    "    index[word].append(position)\n",
    "\n",
    "print(dict(index))            # {'the': [0, 4], 'cat': [1], 'sat': [2], 'on': [3], 'mat': [5]}"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Nested grouping\n",
    "\n",
    "A `lambda` factory that returns another `defaultdict` builds two-level groups — here, names grouped by team and then by role."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from collections import defaultdict\n",
    "\n",
    "records = [\n",
    "    ('eng', 'senior', 'Ada'), ('eng', 'junior', 'Bo'),\n",
    "    ('eng', 'senior', 'Cleo'), ('sales', 'junior', 'Dev'),\n",
    "]\n",
    "\n",
    "grouped = defaultdict(lambda: defaultdict(list))\n",
    "for team, role, name in records:\n",
    "    grouped[team][role].append(name)\n",
    "\n",
    "print(grouped['eng']['senior'])   # ['Ada', 'Cleo']"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## When another tool fits better\n",
    "\n",
    "- **`itertools.groupby`** groups *consecutive* equal keys, so it only matches `defaultdict` grouping if the data is already sorted by the key. For unsorted data, `defaultdict` is simpler and doesn't need a sort.\n",
    "- **`dict.setdefault`** does the same job for a one-off without leaving a factory on the dict: `groups.setdefault(key, []).append(item)`.\n",
    "- For grouping *and counting* rather than collecting, reach for `Counter` ([recipe](https://agilearn.co.uk/guides/collections/recipes/count-and-tally-items))."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## In short\n",
    "\n",
    "- `defaultdict(list)` + `append` is the grouping workhorse; the key can be any value you compute.\n",
    "- Use `set` to dedupe within buckets, a nested `lambda` factory for multi-level groups.\n",
    "- `itertools.groupby` needs sorted input; `setdefault` is the plain-dict one-off."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}