Counter¶
Counting how often things occur is one of the most common small tasks in programming — words in a document, votes per candidate, hits per page. You can do it with a plain dict and a few lines of if key in counts bookkeeping, but collections.Counter does it in one line and adds a toolkit on top: ranking by frequency, combining counts, and treating tallies like the multisets they are.
A Counter is a dict subclass that maps each item to its count. Everything you know about dicts still works; it just adds counting-specific behaviour.
Building a Counter¶
Pass any iterable and Counter tallies it. That's the headline feature — no loop, no initialisation.
from collections import Counter
counts = Counter('mississippi') # count the letters
print(counts) # Counter({'i': 4, 's': 4, 'p': 2, 'm': 1})
votes = Counter(['red', 'blue', 'red', 'green', 'red', 'blue'])
print(votes) # Counter({'red': 3, 'blue': 2, 'green': 1})
You can also build one from a mapping or from keyword arguments, when you already know the counts.
from collections import Counter
print(Counter({'a': 3, 'b': 1})) # from a dict
print(Counter(a=3, b=1)) # from keyword arguments
Missing keys count as zero¶
Look up an item that was never counted and you get 0, not a KeyError. Crucially, the lookup does not add the key — the Counter stays clean. This makes counting code safe without any if guards.
from collections import Counter
counts = Counter('aab')
print(counts['a']) # 2
print(counts['z']) # 0 — missing, but no error
print(counts) # Counter({'a': 2, 'b': 1}) — 'z' was NOT added
Ranking with most_common¶
most_common(n) returns the n highest-count items as (item, count) pairs, already sorted. With no argument it returns them all, most-frequent first. This is the method you'll use constantly.
from collections import Counter
counts = Counter('mississippi')
print(counts.most_common(2)) # [('i', 4), ('s', 4)]
print(counts.most_common()) # all, ranked: [('i', 4), ('s', 4), ('p', 2), ('m', 1)]
# the least common are the tail — slice the full list
print(counts.most_common()[-1]) # ('m', 1)
Ties are broken by insertion order (the order each item was first seen), so the ranking is stable and predictable.
Updating counts¶
update adds counts (it doesn't replace them, the way dict.update would), accepting another iterable or mapping. subtract does the reverse and, unlike the - operator below, is happy to go negative.
from collections import Counter
inventory = Counter(apple=3, pear=2)
inventory.update(['apple', 'apple', 'banana']) # add more
print(inventory) # Counter({'apple': 5, 'pear': 2, 'banana': 1})
inventory.subtract(apple=6) # can go below zero
print(inventory['apple']) # -1
Counter arithmetic: tallies as multisets¶
Counters support +, -, &, and |, treating each as a multiset. This turns "merge these tallies" or "what's common to both" into a single operator. Note the arithmetic operators discard zero and negative counts in their result.
from collections import Counter
a = Counter(x=3, y=1)
b = Counter(x=1, y=2, z=5)
print(a + b) # add counts: Counter({'z': 5, 'x': 4, 'y': 3})
print(a - b) # subtract, keep >0: Counter({'x': 2})
print(a & b) # min (intersection): Counter({'x': 1, 'y': 1})
print(a | b) # max (union): Counter({'z': 5, 'x': 3, 'y': 2})
Because - drops anything that isn't positive, a - b keeps only x (3−1=2); y (1−2=−1) and the absent z vanish. Use the subtract method when you need the negatives kept.
total and elements¶
total() sums all the counts; elements() expands the Counter back into an iterator of items, each repeated by its count (handy for re-feeding into other tools).
from collections import Counter
counts = Counter(a=2, b=1, c=0)
print(counts.total()) # 3 — sum of all counts
print(list(counts.elements())) # ['a', 'a', 'b'] — c (count 0) is skipped
Putting it together: word frequency¶
The classic example, in three lines: split text into words, count them, rank them.
from collections import Counter
text = 'the cat sat on the mat the cat purred'
freq = Counter(text.split())
print(freq.most_common(3)) # [('the', 3), ('cat', 2), ('sat', 1)]
Recap¶
Counter(iterable)tallies in one line; it's adict, so all dict operations still work.- Missing items read as
0without raising or being added. most_common(n)ranks by frequency (ties broken by insertion order).update/subtractadd and remove counts in place;subtractallows negatives.+,-,&,|treat counters as multisets and drop non-positive results.total()sums counts;elements()expands them back out.
Next: defaultdict — a dict that conjures a default for missing keys, ideal for grouping.