{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Count and tally items\n",
    "\n",
    "**The question.** You have a pile of items — words, log lines, events, purchases — and you want to know how often each occurs, which are the most common, or how two tallies compare.\n",
    "\n",
    "The answer is `Counter`. Below are the patterns you'll use most: a basic frequency count, the top-N, counting only items that meet a condition, and combining or comparing tallies with `Counter` arithmetic."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Frequency count from any iterable\n",
    "\n",
    "Pass the iterable straight to `Counter`. Anything iterable works — a list, a string, a generator, the lines of a file."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from collections import Counter\n",
    "\n",
    "orders = ['tea', 'coffee', 'tea', 'juice', 'coffee', 'tea']\n",
    "counts = Counter(orders)\n",
    "print(counts)                 # Counter({'tea': 3, 'coffee': 2, 'juice': 1})\n",
    "print(counts['tea'])          # 3\n",
    "print(counts['water'])        # 0 — unseen items are zero, not an error"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The top N\n",
    "\n",
    "`most_common(n)` gives the `n` highest, ranked. Leave out `n` for the full ranking; slice from the end for the rarest."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from collections import Counter\n",
    "\n",
    "text = 'the quick brown fox the lazy dog the end'\n",
    "freq = Counter(text.split())\n",
    "\n",
    "print(freq.most_common(2))    # [('the', 3), ('quick', 1)]\n",
    "print(freq.most_common()[-1]) # ('end', 1) — the least common"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Counting only what matches a condition\n",
    "\n",
    "Feed `Counter` a generator expression to count a filtered or transformed view — no intermediate list needed."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from collections import Counter\n",
    "\n",
    "words = ['Apple', 'apricot', 'Banana', 'avocado', 'Cherry', 'almond']\n",
    "\n",
    "# count first letters, case-insensitively, for words longer than 5 letters\n",
    "first_letters = Counter(w[0].lower() for w in words if len(w) > 5)\n",
    "print(first_letters)          # Counter({'a': 3, 'b': 1, 'c': 1})"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Combining and comparing tallies\n",
    "\n",
    "`Counter` arithmetic merges counts (`+`), finds what one has over another (`-`), or takes the shared minimum (`&`) and the per-item maximum (`|`). Great for rolling up daily counts or diffing two inventories."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from collections import Counter\n",
    "\n",
    "monday = Counter(tea=8, coffee=5, juice=2)\n",
    "tuesday = Counter(tea=6, coffee=7, water=3)\n",
    "\n",
    "print(monday + tuesday)       # combined: Counter({'tea': 14, 'coffee': 12, 'water': 3, 'juice': 2})\n",
    "print(tuesday - monday)       # grew on Tuesday: Counter({'water': 3, 'coffee': 2})"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Total, unique count, and items over a threshold\n",
    "\n",
    "`total()` sums every count; `len()` gives the number of *distinct* items; filtering `most_common()` (or `.items()`) finds items above a cut-off."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "source": [
    "from collections import Counter\n",
    "\n",
    "counts = Counter('mississippi')\n",
    "print(counts.total())                              # 11 — total letters\n",
    "print(len(counts))                                 # 4  — distinct letters\n",
    "print([item for item, n in counts.items() if n >= 4])  # ['i', 's'] — appear 4+ times"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## In short\n",
    "\n",
    "- `Counter(iterable)` tallies anything iterable in one line.\n",
    "- `most_common(n)` ranks; slice the full list for the rarest.\n",
    "- A generator expression inside `Counter(...)` counts a filtered/transformed view.\n",
    "- `+ - & |` roll up and compare tallies; `total()` and `len()` give the grand total and the distinct count."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
