defaultdict¶
A recurring annoyance with plain dicts: before you can append to d[key] or increment it, the key has to exist, so your code fills up with if key not in d guards. collections.defaultdict removes them. You give it a factory — a function that produces the default value — and any missing key is created on first access by calling that factory.
It's a dict subclass, so it behaves like a dict everywhere else. The only addition is what happens when a key is missing.
The grouping pattern (factory: list)¶
This is the use case that justifies the type on its own: sorting items into buckets. With defaultdict(list), the first time you touch a key its value is a fresh empty list, ready to append to.
from collections import defaultdict
words = ['apple', 'avocado', 'banana', 'cherry', 'blueberry', 'apricot']
by_letter = defaultdict(list)
for word in words:
by_letter[word[0]].append(word) # no 'if key in dict' needed
print(dict(by_letter))
# {'a': ['apple', 'avocado', 'apricot'], 'b': ['banana', 'blueberry'], 'c': ['cherry']}
The counting pattern (factory: int)¶
int() returns 0, so defaultdict(int) gives you a counter where every new key starts at zero. (For pure counting, Counter from the last notebook is usually nicer — but this pattern generalises to sums and other accumulations.)
from collections import defaultdict
totals = defaultdict(int)
sales = [('north', 100), ('south', 50), ('north', 75), ('east', 30)]
for region, amount in sales:
totals[region] += amount # starts from 0 automatically
print(dict(totals)) # {'north': 175, 'south': 50, 'east': 30}
Other factories: set, and nesting¶
Any zero-argument callable works. set collects unique items per key; a lambda returning another defaultdict builds nested structures.
from collections import defaultdict
# group into sets — duplicates collapse
seen = defaultdict(set)
seen['a'].add(1); seen['a'].add(1); seen['a'].add(2)
print(dict(seen)) # {'a': {1, 2}}
# nested: a dict of dicts of ints
grid = defaultdict(lambda: defaultdict(int))
grid['x']['y'] += 5
print(grid['x']['y']) # 5
The gotcha: reading a missing key creates it¶
This is the one surprise. Because the factory fires on any access to a missing key — including a plain read — merely looking can mutate the dictionary. This is the opposite of Counter (which returns 0 without inserting) and of dict.get.
from collections import defaultdict
d = defaultdict(list)
print(d['never_set']) # [] — looks harmless...
print(dict(d)) # {'never_set': []} — ...but the key now exists!
# to check without creating, use 'in' or .get:
d2 = defaultdict(list)
print('x' in d2) # False, and nothing is created
print(d2.get('x')) # None, and nothing is created
defaultdict versus dict.setdefault¶
A plain dict can do the same job with setdefault(key, default), which returns the existing value or inserts and returns the default. The difference is style and a subtle efficiency point: setdefault builds the default value on every call (even when unused), while defaultdict only calls the factory when the key is actually missing.
# setdefault — works, but constructs a new [] on every iteration
groups = {}
for word in ['apple', 'avocado', 'banana']:
groups.setdefault(word[0], []).append(word)
print(groups) # {'a': ['apple', 'avocado'], 'b': ['banana']}
Reach for defaultdict when one dictionary is built up with the same default throughout a function; reach for setdefault for a one-off, or when you want the dict to stay a plain dict with no lingering factory.
Converting back to a plain dict¶
A defaultdict keeps its factory, so it will keep auto-creating keys. When you're done building and want a normal dict (e.g. before returning it, or to make missing-key access raise again), wrap it in dict().
from collections import defaultdict
dd = defaultdict(list)
dd['a'].append(1)
plain = dict(dd) # a regular dict — no more auto-creation
print(plain) # {'a': [1]}
print(type(plain).__name__) # dict
Recap¶
defaultdict(factory)callsfactory()to supply a value for any missing key.listfor grouping,intfor counting/summing,setfor unique grouping, alambdafor nesting.- Reading a missing key creates it — use
inor.getto check without inserting. dict.setdefaultis the plain-dict alternative;defaultdictavoids rebuilding the default each time.- Wrap in
dict()to freeze it back into an ordinary dictionary.
Next: deque — fast appends and pops at both ends.