Custom iterators¶
Generator functions cover most of what you need. So when would you write an iterator class by hand?
The honest answer is: rarely. But "rarely" isn't "never". This notebook walks through the cases where a class is the right shape, the mechanics of the protocol, and a couple of patterns — restartable iteration, attaching state, and integrating with len() or indexing — that don't fit comfortably into a generator function.
The protocol, recap¶
Two methods:
__iter__(self)— return something with a__next__. For a class that is the iterator, returnself. For a class that just makes iterators, return a fresh iterator object.__next__(self)— return the next value, or raiseStopIterationwhen there are no more.
That's it. There is no other contract — no length, no indexing, no rewind. Anything more is something you're choosing to add.
class Counter:
'''A simple iterator that counts from 1 to stop.'''
def __init__(self, stop):
self.stop = stop
self.current = 0
def __iter__(self):
return self # this object IS its own iterator
def __next__(self):
if self.current >= self.stop:
raise StopIteration
self.current += 1
return self.current
for x in Counter(3):
print(x)
When to reach for a class instead of a generator¶
Three situations make a class the better choice.
1. You want indexing or len() alongside iteration¶
A generator function is purely sequential. If callers also want to ask "how many items?" or "give me the third one", you need a class that implements __len__ and __getitem__ as well as iteration.
class Range3D:
'''A 3D grid of (x, y, z) coordinates — iterable, indexable, sized.'''
def __init__(self, nx, ny, nz):
self.nx, self.ny, self.nz = nx, ny, nz
def __len__(self):
return self.nx * self.ny * self.nz
def __getitem__(self, i):
# decode flat index into (x, y, z)
z, rem = divmod(i, self.nx * self.ny)
y, x = divmod(rem, self.nx)
return (x, y, z)
def __iter__(self):
for i in range(len(self)):
yield self[i]
grid = Range3D(2, 2, 2)
print(len(grid))
print(grid[3])
print(list(grid))
Notice the trick on the last line: even when you have __getitem__, you can still write __iter__ as a generator function inside the class. The two ways aren't mutually exclusive.
(In fact, Python will fall back to __getitem__-with-integer-keys-starting-at-0 if you don't define __iter__, but that fallback is brittle and best avoided. Define __iter__ explicitly.)
2. You want naturally re-iterable behaviour¶
A generator function returns a one-shot iterator. If you want for over the same object to work multiple times, you need something to hold the configuration and produce a fresh iterator each time. The cleanest version is two classes: an outer "iterable" and an inner "iterator". The outer's __iter__ returns a new instance of the inner.
class Chunks:
'''Iterable: split an iterable into chunks of size n. Re-iterable.'''
def __init__(self, source, size):
self.source = source
self.size = size
def __iter__(self):
return _ChunksIterator(self.source, self.size)
class _ChunksIterator:
def __init__(self, source, size):
self.source = iter(source) # store the underlying iterator
self.size = size
def __iter__(self):
return self
def __next__(self):
chunk = []
for _ in range(self.size):
try:
chunk.append(next(self.source))
except StopIteration:
if chunk:
return chunk
raise
return chunk
c = Chunks([1, 2, 3, 4, 5, 6, 7], 3)
print(list(c)) # [[1,2,3], [4,5,6], [7]]
print(list(c)) # [[1,2,3], [4,5,6], [7]] — works again
You could almost do this with a generator function:
def chunks(source, size):
chunk = []
for x in source:
chunk.append(x)
if len(chunk) == size:
yield chunk
chunk = []
if chunk:
yield chunk
…but chunks(my_list, 3) returns a one-shot generator. Calling list(...) on it twice would empty it the first time. The class form is naturally re-iterable because each for loop calls __iter__ and gets a fresh _ChunksIterator.
3. The iterator owns external state — files, sockets, database cursors¶
If your iterator wraps a resource that needs explicit setup or teardown (open a file, dial a connection), the class form gives you __enter__ / __exit__ and __del__ to manage that resource. Generators can do this with try/finally, but a class makes the lifecycle visible.
class LineReader:
'''Iterate over the lines of a file. Closes the file on exhaustion.'''
def __init__(self, path):
self.path = path
self._file = None
def __iter__(self):
# open lazily so that constructing a LineReader doesn't open the file
self._file = open(self.path)
return self
def __next__(self):
if self._file is None:
raise StopIteration
line = self._file.readline()
if not line:
self._file.close()
self._file = None
raise StopIteration
return line.rstrip('\n')
# (Skipping the live demo — would need a real file. The pattern is what matters.)
print('LineReader defined')
For most file work in Python you'd just write with open(path) as f: for line in f: ... — files are already iterable. The pattern above is what you'd reach for when you're wrapping something that isn't already a file but feels like one (an HTTP stream, a custom protocol parser).
Patterns that come up often¶
A peekable iterator¶
Sometimes you want to look at the next value without consuming it — for parsing, for instance. A class lets you cache the look-ahead in an attribute.
class Peekable:
'''Wraps any iterable; adds a peek() that returns the next value
without advancing.'''
_SENTINEL = object()
def __init__(self, iterable):
self._it = iter(iterable)
self._cache = self._SENTINEL
def __iter__(self):
return self
def __next__(self):
if self._cache is not self._SENTINEL:
v, self._cache = self._cache, self._SENTINEL
return v
return next(self._it)
def peek(self, default=_SENTINEL):
if self._cache is self._SENTINEL:
try:
self._cache = next(self._it)
except StopIteration:
if default is self._SENTINEL:
raise
return default
return self._cache
p = Peekable([10, 20, 30])
print(p.peek()) # 10 — non-destructive
print(p.peek()) # 10 — still
print(next(p)) # 10
print(next(p)) # 20
A counting iterator¶
When you want to know "how many of those did I just process?" without doing a second pass, wrap the iterator in something that tracks it.
class Counted:
'''Wraps an iterable and exposes how many items have been yielded.'''
def __init__(self, iterable):
self._it = iter(iterable)
self.count = 0
def __iter__(self):
return self
def __next__(self):
v = next(self._it) # propagates StopIteration
self.count += 1
return v
nums = Counted(range(100))
total = sum(x for x in nums if x % 7 == 0)
print(f'sum={total}, processed={nums.count}')
Restartable iterator over a callable source¶
If the data isn't already a sequence — say it comes from calling a function each time — you can build a re-iterable around the function. Each __iter__ call creates a fresh iterator that calls the function again.
import random
class Sampled:
'''Re-iterable: each iteration draws a fresh sample of the same shape.'''
def __init__(self, sample_fn, n):
self.sample_fn = sample_fn
self.n = n
def __iter__(self):
for _ in range(self.n):
yield self.sample_fn()
rnd = random.Random(0)
s = Sampled(lambda: rnd.randint(1, 10), 5)
print(list(s))
print(list(s)) # different sample, but same shape and source
Generator-as-method — the hybrid¶
Often the cleanest approach is a class that holds the configuration and a generator method. You get:
- A re-iterable object (because
__iter__is a generator function — calling it returns a fresh generator each time). - Tidy initialisation in
__init__. - Other methods on the same object for related behaviour.
This is by far the most common shape in real code.
class FibUpTo:
'''Iterable: Fibonacci numbers up to a cap. Re-iterable. Sized? No —
we don't precompute. But re-iteration works.'''
def __init__(self, cap):
self.cap = cap
def __iter__(self):
a, b = 0, 1
while a <= self.cap:
yield a
a, b = b, a + b
def first(self):
'''Convenience — return just the first value.'''
return next(iter(self))
f = FibUpTo(50)
print(list(f)) # works
print(list(f)) # still works
print(f.first()) # 0
This pattern is the "right" answer most of the time you find yourself wanting a custom iterator. Skip the boilerplate __next__ unless you need it.
Quick check — sliding-window iterator¶
Implement a class Window(iterable, size) that, when iterated, yields tuples representing a sliding window of size elements over the source. So Window([1,2,3,4,5], 3) yields (1,2,3), (2,3,4), (3,4,5).
Requirements:
- The class is re-iterable —
list(w)should work twice (assume the source iterable can also be iterated twice; for cleanness, accept any iterable). - Use the generator-method pattern.
- Use a
collections.deque(maxlen=size)to maintain the window.
from collections import deque
class Window:
def __init__(self, iterable, size):
...
def __iter__(self):
...
# Expected:
# w = Window([1, 2, 3, 4, 5], 3)
# print(list(w)) # [(1,2,3), (2,3,4), (3,4,5)]
# print(list(w)) # same — re-iterable
Working solution¶
from collections import deque
class Window:
def __init__(self, iterable, size):
self.iterable = iterable
self.size = size
def __iter__(self):
buf = deque(maxlen=self.size)
for x in self.iterable:
buf.append(x)
if len(buf) == self.size:
yield tuple(buf)
w = Window([1, 2, 3, 4, 5], 3)
print(list(w))
print(list(w))
print(list(Window(range(6), 4)))
Summary¶
- The iterator protocol needs only
__iter__and__next__. Anything else (__len__,__getitem__, peeking) is something you choose to add. - For most cases a generator function is shorter and clearer than an iterator class.
- Reach for a class when you need indexing/sizing alongside iteration, when you want re-iterable behaviour without an outer wrapper, or when iteration owns external state (files, connections).
- The most common practical pattern is a class with a generator method — config in
__init__, behaviour in__iter__defined withyield.
That closes the Learn track. The Recipes section has worked examples — streaming a large file, building pipelines, common iterator mistakes — and the Reference is where to look for the protocol, generator syntax, and the full itertools table.