Group items with defaultdict¶
The question. You have a flat sequence and you want it bucketed into a {key: [items]} mapping — records by category, words by first letter, files by extension, students by grade.
The answer is defaultdict(list): append each item to its bucket and let missing keys create themselves. Below are the variations — lists, sets, computed keys, an index, and nesting — plus when a different tool fits.
Group into lists¶
The core pattern. Pick the grouping key for each item and append.
from collections import defaultdict
people = [
('engineering', 'Ada'), ('sales', 'Bo'),
('engineering', 'Cleo'), ('sales', 'Dev'), ('engineering', 'Eve'),
]
by_team = defaultdict(list)
for team, name in people:
by_team[team].append(name)
print(dict(by_team))
# {'engineering': ['Ada', 'Cleo', 'Eve'], 'sales': ['Bo', 'Dev']}
Group by a computed key¶
The key can be anything derived from the item — its length, a category, the result of a function.
from collections import defaultdict
words = ['hi', 'cat', 'dog', 'a', 'tree', 'ok', 'sun']
by_length = defaultdict(list)
for word in words:
by_length[len(word)].append(word)
print(dict(by_length)) # {2: ['hi', 'ok'], 3: ['cat', 'dog', 'sun'], 1: ['a'], 4: ['tree']}
Group into sets to drop duplicates¶
Swap the factory to set when each bucket should hold unique members.
from collections import defaultdict
pairs = [('a', 1), ('a', 2), ('a', 1), ('b', 3)]
unique = defaultdict(set)
for key, value in pairs:
unique[key].add(value)
print(dict(unique)) # {'a': {1, 2}, 'b': {3}}
Build an index (item to positions)¶
A grouping where the value accumulates where each item appeared — the basis of a search index.
from collections import defaultdict
text = 'the cat sat on the mat'.split()
index = defaultdict(list)
for position, word in enumerate(text):
index[word].append(position)
print(dict(index)) # {'the': [0, 4], 'cat': [1], 'sat': [2], 'on': [3], 'mat': [5]}
Nested grouping¶
A lambda factory that returns another defaultdict builds two-level groups — here, names grouped by team and then by role.
from collections import defaultdict
records = [
('eng', 'senior', 'Ada'), ('eng', 'junior', 'Bo'),
('eng', 'senior', 'Cleo'), ('sales', 'junior', 'Dev'),
]
grouped = defaultdict(lambda: defaultdict(list))
for team, role, name in records:
grouped[team][role].append(name)
print(grouped['eng']['senior']) # ['Ada', 'Cleo']
When another tool fits better¶
itertools.groupbygroups consecutive equal keys, so it only matchesdefaultdictgrouping if the data is already sorted by the key. For unsorted data,defaultdictis simpler and doesn't need a sort.dict.setdefaultdoes the same job for a one-off without leaving a factory on the dict:groups.setdefault(key, []).append(item).- For grouping and counting rather than collecting, reach for
Counter(recipe).
In short¶
defaultdict(list)+appendis the grouping workhorse; the key can be any value you compute.- Use
setto dedupe within buckets, a nestedlambdafactory for multi-level groups. itertools.groupbyneeds sorted input;setdefaultis the plain-dict one-off.