Hash tables and dictionaries¶
Dictionaries are one of the most powerful and frequently used data structures in Python. Understanding how they work under the hood helps you use them effectively and appreciate why certain rules exist.
What is a hash table?¶
A hash table is a data structure that maps keys to values using a hash function. The hash function converts a key into a number (the hash value), which determines where the value is stored in memory. This allows lookups to happen in constant time on average — you do not need to search through every item.
Python dictionaries are built on hash tables.
How hashing works¶
When you add a key-value pair to a dictionary, Python does the following:
- Calls
hash()on the key to produce an integer hash value - Uses that hash value to calculate a position (bucket) in an internal array
- Stores the key-value pair at that position
When you look up a key, the same process runs in reverse: Python hashes the key, finds the bucket, and retrieves the value.
# The hash() function returns an integer
print(hash("hello")) # A large integer
print(hash(42)) # 42
print(hash((1, 2, 3))) # A large integer
Why keys must be hashable¶
For this system to work, a key must produce the same hash value every time. If the key could change (that is, if it were mutable), its hash value could change, and the dictionary would not be able to find it again.
This is why only immutable types can be dictionary keys:
- Strings, numbers, and tuples — can be dictionary keys
- Lists, sets, and dictionaries — cannot be dictionary keys
# This works
locations = {(51.5, -0.1): "London"}
# This raises TypeError
# locations = {[51.5, -0.1]: "London"}
Why dictionary lookups are fast¶
In a list, finding an item requires checking each element one by one — this takes O(n) time. In a dictionary, the hash function jumps directly to the right position — this takes O(1) time on average.
This difference becomes significant with large collections. Looking up a key in a dictionary of one million items is roughly as fast as looking up a key in a dictionary of ten items.
Hash collisions¶
Occasionally, two different keys produce the same hash value. This is called a hash collision. Python handles collisions by probing — if the calculated bucket is already occupied by a different key, Python looks for the next available bucket using a deterministic algorithm.
Collisions slow things down slightly because Python needs to do extra comparisons, but they are uncommon enough that average-case performance remains O(1).
The __hash__ and __eq__ relationship¶
For an object to work as a dictionary key, it must implement both __hash__() (to compute the hash value) and __eq__() (to check equality when collisions occur).
The fundamental rule is: if two objects are equal, they must have the same hash value. The reverse does not need to hold — two objects with the same hash value can be unequal.
Dictionary ordering¶
Before Python 3.7¶
Dictionaries did not guarantee any particular order. Iterating over a dictionary could yield items in any order.
Python 3.7 and later¶
Dictionaries are guaranteed to preserve insertion order. Items are yielded in the order they were added.
person = {}
person["name"] = "Alice"
person["age"] = 30
person["city"] = "London"
# Always prints: name, age, city
for key in person:
print(key)
This ordering is a property of the dictionary implementation, not of the hash table concept in general.
Performance characteristics¶
| Operation | Average case | Worst case |
|---|---|---|
Lookup (d[key]) |
O(1) | O(n) |
Insert (d[key] = value) |
O(1) | O(n) |
Delete (del d[key]) |
O(1) | O(n) |
| Iteration | O(n) | O(n) |
Membership (key in d) |
O(1) | O(n) |
The worst case of O(n) occurs only when there are many hash collisions, which is rare in practice.
Memory trade-off¶
Dictionaries use more memory than lists because they need to store the hash table structure alongside the data. This is a deliberate trade-off: extra memory buys faster lookup performance.
For small collections (fewer than a dozen items), the overhead matters little. For large collections where fast lookups are important, the extra memory is well worth it.
When to use dictionaries¶
Dictionaries are the right choice when:
- You need to look up values by key
- You need fast membership testing for keys
- You are counting, grouping, or categorising data
- You need a mapping from identifiers to objects
Dictionaries are overkill when:
- You just need an ordered sequence of items (use a list)
- You only need to check membership without associated values (use a set)
- You have a small, fixed set of named fields (use a named tuple)
Summary¶
Python dictionaries are built on hash tables, which provide fast average-case lookups by converting keys into array positions using a hash function. This design requires keys to be immutable and hashable. Understanding these internals helps explain why dictionaries behave the way they do and guides you in choosing the right data structure for your needs.