Generator functions¶
In the previous notebook we wrote an iterator class by hand — two classes, six methods, thirty-ish lines of boilerplate for what's really "produce these values one at a time". Generator functions replace all of that. They use one new keyword, yield, and a couple of rules about how functions that use it behave.
By the end of this notebook you'll be able to write an iterator in three lines instead of three classes.
yield turns a function into a generator¶
Any function with a yield anywhere in its body is a generator function. Calling it doesn't run the body — it returns a generator object, which is an iterator.
def count_up_to(n):
i = 1
while i <= n:
yield i
i += 1
g = count_up_to(3)
print(type(g)) # generator
print(next(g))
print(next(g))
print(next(g))
try:
next(g)
except StopIteration:
print('done')
Notice what happened: the first next(g) ran the function body up to the first yield, paused, and returned 1. The next next(g) resumed from where we paused, ran one more loop iteration, hit yield again, and paused once more. When control falls off the end of the function, Python raises StopIteration automatically.
This pausing is the magic. The function's local state — i here — is preserved between yields. You don't have to manage it in self attributes like you would with an iterator class.
Compare the generator version to the equivalent class:
class CountUpTo:
def __init__(self, n):
self.n = n
def __iter__(self):
return _CountUpToIterator(self.n)
class _CountUpToIterator:
def __init__(self, n):
self.n = n
self.i = 1
def __iter__(self):
return self
def __next__(self):
if self.i > self.n:
raise StopIteration
value = self.i
self.i += 1
return value
Same behaviour — but count_up_to reads like a normal function with one keyword change.
Generators are iterators, so they plug into everything¶
Because a generator object is already an iterator, it works directly with for, list, sum, max, comprehensions, itertools, and so on.
def even_numbers(stop):
for n in range(stop):
if n % 2 == 0:
yield n
print(list(even_numbers(10)))
print(sum(even_numbers(100)))
print(max(even_numbers(20)))
One-shot semantics — same rule as before¶
A generator is an iterator, so it has the same consumed-once behaviour. Each call to the generator function produces a new, fresh iterator; but an existing generator object, once exhausted, stays exhausted.
g = count_up_to(3)
print(sum(g)) # 6 — consumes the generator
print(sum(g)) # 0 — already exhausted
# You want a fresh generator each time? Re-call the function.
print(sum(count_up_to(3)))
print(sum(count_up_to(3)))
This is exactly the iterable/iterator distinction: the function count_up_to behaves like an iterable (call it to get an iterator); the generator object g is the iterator itself.
Lazy evaluation — the headline feature¶
Generators compute values one at a time, on demand. This means you can work with sequences that are enormous or even infinite, as long as you don't materialise them all at once.
def integers_from(start):
'''An infinite stream of integers.'''
n = start
while True:
yield n
n += 1
g = integers_from(1)
print(next(g), next(g), next(g)) # 1 2 3
# We'd never call list(integers_from(1)) — it would never return.
You consume just as many values as you need. itertools.islice (next notebook) is the usual way to take a bounded slice from an unbounded generator.
from itertools import islice
print(list(islice(integers_from(10), 5))) # [10, 11, 12, 13, 14]
Memory matters: generators vs lists¶
A generator holds one value at a time. A list holds all of them. For large collections the difference is dramatic.
import sys
# A list of a million ints
big_list = [x * 2 for x in range(1_000_000)]
# A generator of a million ints
big_gen = (x * 2 for x in range(1_000_000)) # generator expression — notebook 3
print(f'list: {sys.getsizeof(big_list):>12,} bytes')
print(f'generator: {sys.getsizeof(big_gen):>12,} bytes')
The generator's size is basically constant — it's just a small control object that knows where it is in the loop. The list's size scales with the number of elements.
If you only need to consume the values once (sum, max, filter-and-print, write-to-file), a generator is both faster (no intermediate list allocation) and cheaper on memory.
yield from — delegating to another iterable¶
Inside a generator function, yield from some_iterable yields every value from that iterable in turn. It's equivalent to for x in some_iterable: yield x but more concise, and it also forwards .send() and exceptions cleanly when you start using generator-based coroutines.
def first_half(xs):
yield from xs[:len(xs)//2]
def second_half(xs):
yield from xs[len(xs)//2:]
def halves(xs):
yield from first_half(xs)
yield from second_half(xs)
print(list(halves([1, 2, 3, 4, 5, 6])))
yield from is especially handy for flattening or composing generators — you'll see it again when we talk about pipelines in the recipes section.
def flatten(nested):
for item in nested:
if isinstance(item, list):
yield from flatten(item) # recursion works naturally
else:
yield item
print(list(flatten([1, [2, [3, 4], 5], 6])))
What happens when you return from a generator¶
A bare return (or reaching the end of the function) stops iteration — same effect as raising StopIteration. A return value attaches value to the StopIteration exception, but plain for loops discard it. You'll rarely use it outside yield from contexts.
def up_to_zero(xs):
for x in xs:
if x == 0:
return # stops the generator
yield x
print(list(up_to_zero([1, 2, 3, 0, 4, 5])))
A small pipeline, functional-style¶
Generators compose. Each stage reads from the previous one lazily, so the whole pipeline still uses constant memory — no intermediate lists.
def read_lines(text):
'''Stage 1: yield one line at a time.'''
for line in text.splitlines():
yield line
def only_nonempty(lines):
'''Stage 2: drop blank lines.'''
for line in lines:
if line.strip():
yield line
def parse_int(lines):
'''Stage 3: convert to int.'''
for line in lines:
yield int(line)
sample = '''
10
20
30
'''
pipeline = parse_int(only_nonempty(read_lines(sample)))
print(sum(pipeline)) # 60
Nothing has been computed yet when you build the pipeline — each stage is just a paused generator. The sum(pipeline) call drives the chain: it pulls one integer, which pulls one non-empty line, which pulls one raw line. One value flows through end-to-end, then the next. This is the pattern we'll return to in the recipes.
Quick check — moving average¶
Write a generator function moving_average(iterable, window) that yields the mean of the most recent window values as it streams through the iterable. After fewer than window values have been seen it yields the mean of whatever is available.
Requirements:
- Don't materialise the whole input into a list.
- Use a
collections.dequewithmaxlen=windowto keep a fixed-size window. - The last element of the window should always be the most recent input.
from collections import deque
# Your turn:
def moving_average(iterable, window):
...
# Expected:
# list(moving_average([1, 2, 3, 4, 5], 3))
# -> [1.0, 1.5, 2.0, 3.0, 4.0]
Working solution¶
from collections import deque
def moving_average(iterable, window):
buf = deque(maxlen=window)
for x in iterable:
buf.append(x)
yield sum(buf) / len(buf)
print(list(moving_average([1, 2, 3, 4, 5], 3)))
# With an infinite source:
from itertools import islice
print(list(islice(moving_average(integers_from(1), 4), 6)))
Summary¶
- A function with
yieldbecomes a generator function. Calling it returns a generator object — an iterator that's paused betweenyields. - Local variables survive across
yields, so you get stateful iteration for free without the iterator class boilerplate. - Generators are lazy: they produce one value at a time, enabling work with very large or infinite sequences.
yield fromdelegates to another iterable cleanly.- Generators compose into pipelines that use constant memory regardless of input size.
Next: generator expressions ((x*2 for x in xs)) — the inline version of all this — and the itertools module, which is basically a toolkit of generator combinators you'd otherwise write yourself.