{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Counter\n",
    "\n",
    "Counting how often things occur is one of the most common small tasks in programming — words in a document, votes per candidate, hits per page. You *can* do it with a plain `dict` and a few lines of `if key in counts` bookkeeping, but `collections.Counter` does it in one line and adds a toolkit on top: ranking by frequency, combining counts, and treating tallies like the multisets they are.\n",
    "\n",
    "A `Counter` is a `dict` subclass that maps each item to its count. Everything you know about dicts still works; it just adds counting-specific behaviour."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Building a Counter\n",
    "\n",
    "Pass any iterable and `Counter` tallies it. That's the headline feature — no loop, no initialisation."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from collections import Counter\n",
    "\n",
    "counts = Counter('mississippi')        # count the letters\n",
    "print(counts)                          # Counter({'i': 4, 's': 4, 'p': 2, 'm': 1})\n",
    "\n",
    "votes = Counter(['red', 'blue', 'red', 'green', 'red', 'blue'])\n",
    "print(votes)                           # Counter({'red': 3, 'blue': 2, 'green': 1})"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also build one from a mapping or from keyword arguments, when you already know the counts."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from collections import Counter\n",
    "\n",
    "print(Counter({'a': 3, 'b': 1}))       # from a dict\n",
    "print(Counter(a=3, b=1))               # from keyword arguments"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Missing keys count as zero\n",
    "\n",
    "Look up an item that was never counted and you get `0`, not a `KeyError`. Crucially, the lookup does **not** add the key — the `Counter` stays clean. This makes counting code safe without any `if` guards."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from collections import Counter\n",
    "\n",
    "counts = Counter('aab')\n",
    "print(counts['a'])        # 2\n",
    "print(counts['z'])        # 0 — missing, but no error\n",
    "print(counts)             # Counter({'a': 2, 'b': 1}) — 'z' was NOT added"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Ranking with `most_common`\n",
    "\n",
    "`most_common(n)` returns the `n` highest-count items as `(item, count)` pairs, already sorted. With no argument it returns them all, most-frequent first. This is the method you'll use constantly."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from collections import Counter\n",
    "\n",
    "counts = Counter('mississippi')\n",
    "print(counts.most_common(2))      # [('i', 4), ('s', 4)]\n",
    "print(counts.most_common())       # all, ranked: [('i', 4), ('s', 4), ('p', 2), ('m', 1)]\n",
    "\n",
    "# the least common are the tail — slice the full list\n",
    "print(counts.most_common()[-1])   # ('m', 1)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ties are broken by insertion order (the order each item was first seen), so the ranking is stable and predictable."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Updating counts\n",
    "\n",
    "`update` *adds* counts (it doesn't replace them, the way `dict.update` would), accepting another iterable or mapping. `subtract` does the reverse and, unlike the `-` operator below, is happy to go negative."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from collections import Counter\n",
    "\n",
    "inventory = Counter(apple=3, pear=2)\n",
    "inventory.update(['apple', 'apple', 'banana'])   # add more\n",
    "print(inventory)            # Counter({'apple': 5, 'pear': 2, 'banana': 1})\n",
    "\n",
    "inventory.subtract(apple=6)                       # can go below zero\n",
    "print(inventory['apple'])   # -1"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Counter arithmetic: tallies as multisets\n",
    "\n",
    "Counters support `+`, `-`, `&`, and `|`, treating each as a multiset. This turns \"merge these tallies\" or \"what's common to both\" into a single operator. Note the arithmetic operators **discard zero and negative counts** in their result."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from collections import Counter\n",
    "\n",
    "a = Counter(x=3, y=1)\n",
    "b = Counter(x=1, y=2, z=5)\n",
    "\n",
    "print(a + b)      # add counts:        Counter({'z': 5, 'x': 4, 'y': 3})\n",
    "print(a - b)      # subtract, keep >0: Counter({'x': 2})\n",
    "print(a & b)      # min (intersection): Counter({'x': 1, 'y': 1})\n",
    "print(a | b)      # max (union):        Counter({'z': 5, 'x': 3, 'y': 2})"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Because `-` drops anything that isn't positive, `a - b` keeps only `x` (3−1=2); `y` (1−2=−1) and the absent `z` vanish. Use the `subtract` *method* when you need the negatives kept."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## `total` and `elements`\n",
    "\n",
    "`total()` sums all the counts; `elements()` expands the `Counter` back into an iterator of items, each repeated by its count (handy for re-feeding into other tools)."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from collections import Counter\n",
    "\n",
    "counts = Counter(a=2, b=1, c=0)\n",
    "print(counts.total())             # 3 — sum of all counts\n",
    "print(list(counts.elements()))    # ['a', 'a', 'b'] — c (count 0) is skipped"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Putting it together: word frequency\n",
    "\n",
    "The classic example, in three lines: split text into words, count them, rank them."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from collections import Counter\n",
    "\n",
    "text = 'the cat sat on the mat the cat purred'\n",
    "freq = Counter(text.split())\n",
    "print(freq.most_common(3))        # [('the', 3), ('cat', 2), ('sat', 1)]"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Recap\n",
    "\n",
    "- `Counter(iterable)` tallies in one line; it's a `dict`, so all dict operations still work.\n",
    "- Missing items read as `0` without raising or being added.\n",
    "- `most_common(n)` ranks by frequency (ties broken by insertion order).\n",
    "- `update`/`subtract` add and remove counts in place; `subtract` allows negatives.\n",
    "- `+`, `-`, `&`, `|` treat counters as multisets and drop non-positive results.\n",
    "- `total()` sums counts; `elements()` expands them back out.\n",
    "\n",
    "Next: [defaultdict](https://agilearn.co.uk/guides/collections/learn/02-defaultdict) — a dict that conjures a default for missing keys, ideal for grouping."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}