Write recursive functions¶
The question. You've got a problem with a naturally self-similar shape — walking a tree of files, flattening nested lists, turning a mathematical recurrence into code — and you want a recursive function that terminates, stays under the recursion limit, and reads at least as clearly as the iterative version would.
Two parts. A base case that returns without recursing. A recursive case that calls the same function with a strictly smaller or simpler input. Get those right and the rest falls out.
def flatten(data):
"""Flatten an arbitrarily nested list."""
result = []
for item in data:
if isinstance(item, list):
result.extend(flatten(item)) # recursive case
else:
result.append(item) # base case (per item)
return result
def total_size(tree: dict) -> int:
"""Sum the file sizes in a nested directory-like dict."""
total = 0
for value in tree.values():
if isinstance(value, dict):
total += total_size(value) # recurse into subfolder
else:
total += value # leaf: a file size
return total
nested = [1, [2, 3], [4, [5, 6, [7]]], 8]
print('flatten:', flatten(nested))
file_tree = {
'src': {'main.py': 120, 'utils': {'helpers.py': 45, 'validators.py': 80}},
'tests': {'test_main.py': 95},
'README.md': 30,
}
print('total size:', total_size(file_tree), 'lines')
Why it works¶
flatten shrinks its input on every recursive call. When it meets a non-list item, it takes the base path and appends. When it meets a list, it calls itself on the inner list — which is structurally smaller than the outer one. Every call is guaranteed to make progress, so the call stack is guaranteed to unwind.
total_size is the same shape applied to a dict tree. The base case is hiding in plain sight: when value isn't a dict, we don't recurse. Only the inner dicts trigger another total_size call, and each of those receives a strictly smaller sub-tree.
Recursion fits naturally where the data is recursive. A file system is a tree of folders of folders; JSON is objects inside objects; a mathematical recurrence like f(n) = f(n-1) + f(n-2) is a recursion in your code because it was a recursion on paper first. Writing the iterative version of these problems means re-inventing the call stack as an explicit list — often more code, usually worse.
Trade-offs and the traps¶
- Python's default recursion limit is 1000. Anything deeper raises
RecursionError. For bounded depths (trees of filesystems, JSON documents) that's fine. For unbounded or very-deep recursions — walking a million-element linked list, say — rewrite as a loop. - Python has no tail-call optimisation. A "tail-recursive" function that looks like it should reuse one stack frame still allocates N of them. Don't rewrite a working loop as a tail-recursive function expecting a performance win — there isn't one.
- Missing base case is the number one bug. Your recursive function loops until Python kills it. Second most common: the input doesn't actually get smaller. Both show up as
RecursionErrorwith a 1000-deep traceback — stare at the recursive case and ask, "is the argument I'm passing in strictly smaller than what I was called with?" - Memoise or rewrite if the same sub-problem appears many times. Naive
fibonaccirecomputes the same values exponentially many times. Decorate with@functools.lru_cache(maxsize=None)and the running time collapses from exponential to linear; or rewrite bottom-up as a loop and avoid the stack entirely.
Related¶
- Learn — Defining functions for the call-stack mental model recursion relies on.
- Recipe — Create decorators for
@functools.lru_cacheused to tame recursive recomputation. - Guide — Iterators and generators for the iterative equivalents when depth becomes a problem.
- Reference — Built-in functions for
sys.setrecursionlimit,sys.getrecursionlimit, and the rest of thesystoolkit.