{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "285ac6b3",
   "metadata": {},
   "source": "# Generator functions\n\nIn the previous notebook we wrote an iterator class by hand \u2014 two classes, six methods, thirty-ish lines of boilerplate for what's really \"produce these values one at a time\". Generator functions replace all of that. They use one new keyword, `yield`, and a couple of rules about how functions that use it behave.\n\nBy the end of this notebook you'll be able to write an iterator in three lines instead of three classes."
  },
  {
   "cell_type": "markdown",
   "id": "cc065cbd",
   "metadata": {},
   "source": "## `yield` turns a function into a generator\n\nAny function with a `yield` anywhere in its body is a **generator function**. Calling it doesn't run the body \u2014 it returns a **generator object**, which is an iterator."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d6438bea",
   "metadata": {},
   "outputs": [],
   "source": "def count_up_to(n):\n    i = 1\n    while i <= n:\n        yield i\n        i += 1\n\n\ng = count_up_to(3)\nprint(type(g))    # generator\nprint(next(g))\nprint(next(g))\nprint(next(g))\n\ntry:\n    next(g)\nexcept StopIteration:\n    print('done')"
  },
  {
   "cell_type": "markdown",
   "id": "37f4626a",
   "metadata": {},
   "source": "Notice what happened: the first `next(g)` ran the function body up to the first `yield`, paused, and returned `1`. The next `next(g)` resumed from where we paused, ran one more loop iteration, hit `yield` again, and paused once more. When control falls off the end of the function, Python raises `StopIteration` automatically."
  },
  {
   "cell_type": "markdown",
   "id": "c720ea70",
   "metadata": {},
   "source": "This pausing is the magic. The function's local state \u2014 `i` here \u2014 is preserved between `yield`s. You don't have to manage it in `self` attributes like you would with an iterator class.\n\nCompare the generator version to the equivalent class:\n\n```python\nclass CountUpTo:\n    def __init__(self, n):\n        self.n = n\n\n    def __iter__(self):\n        return _CountUpToIterator(self.n)\n\nclass _CountUpToIterator:\n    def __init__(self, n):\n        self.n = n\n        self.i = 1\n    def __iter__(self):\n        return self\n    def __next__(self):\n        if self.i > self.n:\n            raise StopIteration\n        value = self.i\n        self.i += 1\n        return value\n```\n\nSame behaviour \u2014 but `count_up_to` reads like a normal function with one keyword change."
  },
  {
   "cell_type": "markdown",
   "id": "fceacd4c",
   "metadata": {},
   "source": "## Generators are iterators, so they plug into everything\n\nBecause a generator object is already an iterator, it works directly with `for`, `list`, `sum`, `max`, comprehensions, `itertools`, and so on."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "05352264",
   "metadata": {},
   "outputs": [],
   "source": "def even_numbers(stop):\n    for n in range(stop):\n        if n % 2 == 0:\n            yield n\n\n\nprint(list(even_numbers(10)))\nprint(sum(even_numbers(100)))\nprint(max(even_numbers(20)))"
  },
  {
   "cell_type": "markdown",
   "id": "9d2e305e",
   "metadata": {},
   "source": "## One-shot semantics \u2014 same rule as before\n\nA generator is an iterator, so it has the same consumed-once behaviour. Each *call* to the generator function produces a new, fresh iterator; but an existing generator object, once exhausted, stays exhausted."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "87fcd908",
   "metadata": {},
   "outputs": [],
   "source": "g = count_up_to(3)\nprint(sum(g))    # 6 \u2014 consumes the generator\nprint(sum(g))    # 0 \u2014 already exhausted"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "29d93394",
   "metadata": {},
   "outputs": [],
   "source": "# You want a fresh generator each time? Re-call the function.\nprint(sum(count_up_to(3)))\nprint(sum(count_up_to(3)))"
  },
  {
   "cell_type": "markdown",
   "id": "59babd0b",
   "metadata": {},
   "source": "This is exactly the iterable/iterator distinction: the *function* `count_up_to` behaves like an iterable (call it to get an iterator); the *generator object* `g` is the iterator itself."
  },
  {
   "cell_type": "markdown",
   "id": "ae20b13b",
   "metadata": {},
   "source": "## Lazy evaluation \u2014 the headline feature\n\nGenerators compute values one at a time, on demand. This means you can work with sequences that are enormous or even infinite, as long as you don't materialise them all at once."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a7a4d27f",
   "metadata": {},
   "outputs": [],
   "source": "def integers_from(start):\n    '''An infinite stream of integers.'''\n    n = start\n    while True:\n        yield n\n        n += 1\n\n\ng = integers_from(1)\nprint(next(g), next(g), next(g))   # 1 2 3\n# We'd never call list(integers_from(1)) \u2014 it would never return."
  },
  {
   "cell_type": "markdown",
   "id": "b254e064",
   "metadata": {},
   "source": "You consume just as many values as you need. `itertools.islice` (next notebook) is the usual way to take a bounded slice from an unbounded generator."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3d1f064c",
   "metadata": {},
   "outputs": [],
   "source": "from itertools import islice\nprint(list(islice(integers_from(10), 5)))   # [10, 11, 12, 13, 14]"
  },
  {
   "cell_type": "markdown",
   "id": "bb47c975",
   "metadata": {},
   "source": "## Memory matters: generators vs lists\n\nA generator holds *one value at a time*. A list holds *all of them*. For large collections the difference is dramatic."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "263b2986",
   "metadata": {},
   "outputs": [],
   "source": "import sys\n\n# A list of a million ints\nbig_list = [x * 2 for x in range(1_000_000)]\n# A generator of a million ints\nbig_gen = (x * 2 for x in range(1_000_000))    # generator expression \u2014 notebook 3\n\nprint(f'list:      {sys.getsizeof(big_list):>12,} bytes')\nprint(f'generator: {sys.getsizeof(big_gen):>12,} bytes')"
  },
  {
   "cell_type": "markdown",
   "id": "e2136475",
   "metadata": {},
   "source": "The generator's size is basically constant \u2014 it's just a small control object that knows where it is in the loop. The list's size scales with the number of elements.\n\nIf you only need to *consume* the values once (`sum`, `max`, filter-and-print, write-to-file), a generator is both faster (no intermediate list allocation) and cheaper on memory."
  },
  {
   "cell_type": "markdown",
   "id": "2d15c777",
   "metadata": {},
   "source": "## `yield from` \u2014 delegating to another iterable\n\nInside a generator function, `yield from some_iterable` yields every value from that iterable in turn. It's equivalent to `for x in some_iterable: yield x` but more concise, and it also forwards `.send()` and exceptions cleanly when you start using generator-based coroutines."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "37a38a78",
   "metadata": {},
   "outputs": [],
   "source": "def first_half(xs):\n    yield from xs[:len(xs)//2]\n\ndef second_half(xs):\n    yield from xs[len(xs)//2:]\n\ndef halves(xs):\n    yield from first_half(xs)\n    yield from second_half(xs)\n\n\nprint(list(halves([1, 2, 3, 4, 5, 6])))"
  },
  {
   "cell_type": "markdown",
   "id": "efa17db8",
   "metadata": {},
   "source": "`yield from` is especially handy for flattening or composing generators \u2014 you'll see it again when we talk about pipelines in the recipes section."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "63ad6b67",
   "metadata": {},
   "outputs": [],
   "source": "def flatten(nested):\n    for item in nested:\n        if isinstance(item, list):\n            yield from flatten(item)     # recursion works naturally\n        else:\n            yield item\n\nprint(list(flatten([1, [2, [3, 4], 5], 6])))"
  },
  {
   "cell_type": "markdown",
   "id": "9848e299",
   "metadata": {},
   "source": "## What happens when you `return` from a generator\n\nA bare `return` (or reaching the end of the function) stops iteration \u2014 same effect as raising `StopIteration`. A `return value` *attaches* `value` to the `StopIteration` exception, but plain `for` loops discard it. You'll rarely use it outside `yield from` contexts."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7311ddd4",
   "metadata": {},
   "outputs": [],
   "source": "def up_to_zero(xs):\n    for x in xs:\n        if x == 0:\n            return                 # stops the generator\n        yield x\n\n\nprint(list(up_to_zero([1, 2, 3, 0, 4, 5])))"
  },
  {
   "cell_type": "markdown",
   "id": "642e6abb",
   "metadata": {},
   "source": "## A small pipeline, functional-style\n\nGenerators compose. Each stage reads from the previous one lazily, so the whole pipeline still uses constant memory \u2014 no intermediate lists."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f279da3a",
   "metadata": {},
   "outputs": [],
   "source": "def read_lines(text):\n    '''Stage 1: yield one line at a time.'''\n    for line in text.splitlines():\n        yield line\n\n\ndef only_nonempty(lines):\n    '''Stage 2: drop blank lines.'''\n    for line in lines:\n        if line.strip():\n            yield line\n\n\ndef parse_int(lines):\n    '''Stage 3: convert to int.'''\n    for line in lines:\n        yield int(line)\n\n\nsample = '''\n10\n\n20\n 30\n\n'''\n\npipeline = parse_int(only_nonempty(read_lines(sample)))\nprint(sum(pipeline))   # 60"
  },
  {
   "cell_type": "markdown",
   "id": "db56131b",
   "metadata": {},
   "source": "Nothing has been computed yet when you *build* the pipeline \u2014 each stage is just a paused generator. The `sum(pipeline)` call drives the chain: it pulls one integer, which pulls one non-empty line, which pulls one raw line. One value flows through end-to-end, then the next. This is the pattern we'll return to in the recipes."
  },
  {
   "cell_type": "markdown",
   "id": "d354fde1",
   "metadata": {},
   "source": "## Quick check \u2014 moving average\n\nWrite a generator function `moving_average(iterable, window)` that yields the mean of the most recent `window` values as it streams through the iterable. After fewer than `window` values have been seen it yields the mean of whatever is available.\n\nRequirements:\n\n- Don't materialise the whole input into a list.\n- Use a `collections.deque` with `maxlen=window` to keep a fixed-size window.\n- The last element of the window should always be the most recent input."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "446dcb24",
   "metadata": {},
   "outputs": [],
   "source": "from collections import deque\n\n# Your turn:\n\ndef moving_average(iterable, window):\n    ...\n\n\n# Expected:\n# list(moving_average([1, 2, 3, 4, 5], 3))\n# -> [1.0, 1.5, 2.0, 3.0, 4.0]"
  },
  {
   "cell_type": "markdown",
   "id": "fe3490fc",
   "metadata": {},
   "source": "### Working solution"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f9b121e4",
   "metadata": {},
   "outputs": [],
   "source": "from collections import deque\n\ndef moving_average(iterable, window):\n    buf = deque(maxlen=window)\n    for x in iterable:\n        buf.append(x)\n        yield sum(buf) / len(buf)\n\n\nprint(list(moving_average([1, 2, 3, 4, 5], 3)))\n# With an infinite source:\nfrom itertools import islice\nprint(list(islice(moving_average(integers_from(1), 4), 6)))"
  },
  {
   "cell_type": "markdown",
   "id": "9623627d",
   "metadata": {},
   "source": "## Summary\n\n- A function with `yield` becomes a generator function. Calling it returns a generator object \u2014 an iterator that's paused between `yield`s.\n- Local variables survive across `yield`s, so you get stateful iteration for free without the iterator class boilerplate.\n- Generators are lazy: they produce one value at a time, enabling work with very large or infinite sequences.\n- `yield from` delegates to another iterable cleanly.\n- Generators compose into pipelines that use constant memory regardless of input size.\n\nNext: **generator expressions** (`(x*2 for x in xs)`) \u2014 the inline version of all this \u2014 and the `itertools` module, which is basically a toolkit of generator combinators you'd otherwise write yourself."
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}