Threads and futures¶
Threads are Python's most direct route to overlapping work. A thread is a separate line of execution inside your process that shares all the same memory. Because of the GIL, threads won't speed up pure computation — but for I/O-bound work, where each task spends its time waiting, they're ideal: while one thread waits on the network, the others run.
This notebook starts with the low-level threading.Thread to show what a thread is, then moves quickly to concurrent.futures.ThreadPoolExecutor, which is what you should actually use almost every time.
These examples spawn real OS threads and won't run inside the in-browser sandbox — run them locally to see the timings.
A raw thread¶
threading.Thread takes a target function and its arguments. .start() runs it on a new thread; .join() blocks until it finishes.
import threading
import time
def worker(name):
print(f'{name} starting')
time.sleep(0.5) # stand-in for a blocking I/O call
print(f'{name} done')
t = threading.Thread(target=worker, args=('A',))
t.start() # runs worker('A') on a new thread
print('main thread keeps going while A works')
t.join() # wait for the thread to finish
print('A has finished')
Run several at once and the waits overlap. Three threads each sleeping 0.5s finish in about 0.5s total, not 1.5s — that's the whole point.
threads = [threading.Thread(target=worker, args=(f'T{i}',)) for i in range(3)]
start = time.perf_counter()
for t in threads:
t.start() # start all three
for t in threads:
t.join() # then wait for all three
print(f'elapsed: {time.perf_counter() - start:.2f}s')
This works, but it's clumsy for real use. Two problems jump out: there's no clean way to get a return value back from worker, and managing lists of threads by hand gets fiddly fast. Both are solved by the thread pool.
ThreadPoolExecutor: the tool you'll actually use¶
concurrent.futures.ThreadPoolExecutor manages a fixed pool of worker threads for you. You hand it work; it gives you back a Future — an object representing a result that isn't ready yet. Call .result() on the future to get the value (blocking until it's ready), and any exception raised in the worker is re-raised there.
Use it as a context manager so the pool is cleaned up automatically.
from concurrent.futures import ThreadPoolExecutor
def fetch(url):
time.sleep(0.3) # pretend this is a network request
return f'{url} -> 200 OK'
with ThreadPoolExecutor(max_workers=4) as pool:
future = pool.submit(fetch, 'https://example.com') # returns immediately
print('submitted; doing other things...')
print(future.result()) # blocks until ready
submit many, collect results¶
submit returns one future per call. To run many tasks, submit them all (they start running straight away, up to max_workers at a time), then collect the results.
urls = [f'https://site/{i}' for i in range(8)]
start = time.perf_counter()
with ThreadPoolExecutor(max_workers=4) as pool:
futures = [pool.submit(fetch, u) for u in urls]
results = [f.result() for f in futures] # results in submission order
print(f'{len(results)} fetched in {time.perf_counter() - start:.2f}s')
print(results[0])
Eight tasks of 0.3s each, four at a time: two waves, about 0.6s — versus 2.4s sequentially. The pool size caps how many run concurrently, which matters when you don't want to open 10,000 connections at once.
map: simplest when inputs map to outputs¶
If you're applying one function to many inputs and want the results back in order, executor.map is the cleanest form. It mirrors the built-in map, but runs the calls concurrently.
with ThreadPoolExecutor(max_workers=4) as pool:
for result in pool.map(fetch, urls): # yields in input order
print(result)
as_completed: handle results as they arrive¶
Sometimes you don't want to wait for the slowest task to start processing the fast ones. concurrent.futures.as_completed yields futures in the order they finish, not the order you submitted them. Keep a dict mapping each future back to its input so you know what finished.
from concurrent.futures import as_completed
import random
def variable_fetch(url):
time.sleep(random.uniform(0.1, 0.6)) # finishing times will vary
return f'{url} done'
with ThreadPoolExecutor(max_workers=4) as pool:
future_to_url = {pool.submit(variable_fetch, u): u for u in urls}
for future in as_completed(future_to_url):
url = future_to_url[future]
print(f'{url}: {future.result()}') # printed as each finishes
Exceptions surface at .result()¶
An exception raised inside a worker doesn't crash the pool and isn't printed — it's stored on the future and re-raised when you call .result(). This means you must call .result() (or otherwise inspect the future) to find out a task failed. A common bug is firing off tasks and never checking them, so failures vanish silently.
def flaky(n):
if n == 3:
raise ValueError(f'cannot process {n}')
return n * 10
with ThreadPoolExecutor(max_workers=3) as pool:
futures = {pool.submit(flaky, n): n for n in range(5)}
for future in as_completed(futures):
n = futures[future]
try:
print(f'{n} -> {future.result()}')
except ValueError as exc:
print(f'{n} failed: {exc}') # handle per-task failure
Shared state is the catch¶
Threads share memory, which is convenient and dangerous. When two threads modify the same object, operations that look atomic often aren't. counter += 1 is really read, add, write — three steps — and two threads can interleave so an update is lost. This is a race condition.
counter = 0
def increment():
global counter
for _ in range(100_000):
counter += 1 # NOT atomic: read, +1, write
threads = [threading.Thread(target=increment) for _ in range(4)]
for t in threads: t.start()
for t in threads: t.join()
print('expected:', 4 * 100_000)
print('got: ', counter) # usually LESS — updates were lost
The fix is a Lock: a flag that only one thread can hold at a time. Wrap the critical section in with lock: so updates can't interleave.
counter = 0
lock = threading.Lock()
def safe_increment():
global counter
for _ in range(100_000):
with lock: # only one thread in here at a time
counter += 1
threads = [threading.Thread(target=safe_increment) for _ in range(4)]
for t in threads: t.start()
for t in threads: t.join()
print('got:', counter) # now exactly 400000
Locks fix correctness but cost speed — threads queue for the lock. The best defence is to avoid shared mutable state: have each task return a value and combine the results in the main thread (as the submit/map examples do), rather than having threads write into shared objects.
Recap¶
- Threads overlap I/O-bound work; the GIL stops them helping CPU-bound work.
- Prefer
ThreadPoolExecutorover rawThread.submitreturns aFuture;.result()gives the value and re-raises exceptions. mapfor ordered results,as_completedfor results as they finish.- Always retrieve results or failures stay hidden.
- Shared mutable state needs a
Lock— or, better, design it away.
Next: Processes and parallelism, for the CPU-bound work threads can't accelerate.