Group items with defaultdict¶

The question. You have a flat sequence and you want it bucketed into a {key: [items]} mapping — records by category, words by first letter, files by extension, students by grade.

The answer is defaultdict(list): append each item to its bucket and let missing keys create themselves. Below are the variations — lists, sets, computed keys, an index, and nesting — plus when a different tool fits.

Group into lists¶

The core pattern. Pick the grouping key for each item and append.

In [ ]:

Copied!





from collections import defaultdict

people = [
    ('engineering', 'Ada'), ('sales', 'Bo'),
    ('engineering', 'Cleo'), ('sales', 'Dev'), ('engineering', 'Eve'),
]

by_team = defaultdict(list)
for team, name in people:
    by_team[team].append(name)

print(dict(by_team))
# {'engineering': ['Ada', 'Cleo', 'Eve'], 'sales': ['Bo', 'Dev']}
from collections import defaultdict

people = [
    ('engineering', 'Ada'), ('sales', 'Bo'),
    ('engineering', 'Cleo'), ('sales', 'Dev'), ('engineering', 'Eve'),
]

by_team = defaultdict(list)
for team, name in people:
    by_team[team].append(name)

print(dict(by_team))
# {'engineering': ['Ada', 'Cleo', 'Eve'], 'sales': ['Bo', 'Dev']}

Group by a computed key¶

The key can be anything derived from the item — its length, a category, the result of a function.

In [ ]:

Copied!

from collections import defaultdict

words = ['hi', 'cat', 'dog', 'a', 'tree', 'ok', 'sun']

by_length = defaultdict(list)
for word in words:
    by_length[len(word)].append(word)

print(dict(by_length))        # {2: ['hi', 'ok'], 3: ['cat', 'dog', 'sun'], 1: ['a'], 4: ['tree']}
from collections import defaultdict

words = ['hi', 'cat', 'dog', 'a', 'tree', 'ok', 'sun']

by_length = defaultdict(list)
for word in words:
    by_length[len(word)].append(word)

print(dict(by_length))        # {2: ['hi', 'ok'], 3: ['cat', 'dog', 'sun'], 1: ['a'], 4: ['tree']}

Group into sets to drop duplicates¶

Swap the factory to set when each bucket should hold unique members.

In [ ]:

Copied!

from collections import defaultdict

pairs = [('a', 1), ('a', 2), ('a', 1), ('b', 3)]

unique = defaultdict(set)
for key, value in pairs:
    unique[key].add(value)

print(dict(unique))           # {'a': {1, 2}, 'b': {3}}
from collections import defaultdict

pairs = [('a', 1), ('a', 2), ('a', 1), ('b', 3)]

unique = defaultdict(set)
for key, value in pairs:
    unique[key].add(value)

print(dict(unique))           # {'a': {1, 2}, 'b': {3}}

Build an index (item to positions)¶

A grouping where the value accumulates where each item appeared — the basis of a search index.

In [ ]:

Copied!





from collections import defaultdict

text = 'the cat sat on the mat'.split()
index = defaultdict(list)
for position, word in enumerate(text):
    index[word].append(position)

print(dict(index))            # {'the': [0, 4], 'cat': [1], 'sat': [2], 'on': [3], 'mat': [5]}
from collections import defaultdict

text = 'the cat sat on the mat'.split()
index = defaultdict(list)
for position, word in enumerate(text):
    index[word].append(position)

print(dict(index))            # {'the': [0, 4], 'cat': [1], 'sat': [2], 'on': [3], 'mat': [5]}

Nested grouping¶

A lambda factory that returns another defaultdict builds two-level groups — here, names grouped by team and then by role.

In [ ]:

Copied!





from collections import defaultdict

records = [
    ('eng', 'senior', 'Ada'), ('eng', 'junior', 'Bo'),
    ('eng', 'senior', 'Cleo'), ('sales', 'junior', 'Dev'),
]

grouped = defaultdict(lambda: defaultdict(list))
for team, role, name in records:
    grouped[team][role].append(name)

print(grouped['eng']['senior'])   # ['Ada', 'Cleo']
from collections import defaultdict

records = [
    ('eng', 'senior', 'Ada'), ('eng', 'junior', 'Bo'),
    ('eng', 'senior', 'Cleo'), ('sales', 'junior', 'Dev'),
]

grouped = defaultdict(lambda: defaultdict(list))
for team, role, name in records:
    grouped[team][role].append(name)

print(grouped['eng']['senior'])   # ['Ada', 'Cleo']

When another tool fits better¶

itertools.groupby groups consecutive equal keys, so it only matches defaultdict grouping if the data is already sorted by the key. For unsorted data, defaultdict is simpler and doesn't need a sort.
dict.setdefault does the same job for a one-off without leaving a factory on the dict: groups.setdefault(key, []).append(item).
For grouping and counting rather than collecting, reach for Counter (recipe).

In short¶

defaultdict(list) + append is the grouping workhorse; the key can be any value you compute.
Use set to dedupe within buckets, a nested lambda factory for multi-level groups.
itertools.groupby needs sorted input; setdefault is the plain-dict one-off.