Threads and futures¶

Threads are Python's most direct route to overlapping work. A thread is a separate line of execution inside your process that shares all the same memory. Because of the GIL, threads won't speed up pure computation — but for I/O-bound work, where each task spends its time waiting, they're ideal: while one thread waits on the network, the others run.

This notebook starts with the low-level threading.Thread to show what a thread is, then moves quickly to concurrent.futures.ThreadPoolExecutor, which is what you should actually use almost every time.

These examples spawn real OS threads and won't run inside the in-browser sandbox — run them locally to see the timings.

A raw thread¶

threading.Thread takes a target function and its arguments. .start() runs it on a new thread; .join() blocks until it finishes.

In [ ]:

no-run

Copied!





import threading
import time

def worker(name):
    print(f'{name} starting')
    time.sleep(0.5)        # stand-in for a blocking I/O call
    print(f'{name} done')

t = threading.Thread(target=worker, args=('A',))
t.start()      # runs worker('A') on a new thread
print('main thread keeps going while A works')
t.join()       # wait for the thread to finish
print('A has finished')
import threading
import time

def worker(name):
    print(f'{name} starting')
    time.sleep(0.5)        # stand-in for a blocking I/O call
    print(f'{name} done')

t = threading.Thread(target=worker, args=('A',))
t.start()      # runs worker('A') on a new thread
print('main thread keeps going while A works')
t.join()       # wait for the thread to finish
print('A has finished')

Run several at once and the waits overlap. Three threads each sleeping 0.5s finish in about 0.5s total, not 1.5s — that's the whole point.

In [ ]:

no-run

Copied!





threads = [threading.Thread(target=worker, args=(f'T{i}',)) for i in range(3)]

start = time.perf_counter()
for t in threads:
    t.start()      # start all three
for t in threads:
    t.join()       # then wait for all three
print(f'elapsed: {time.perf_counter() - start:.2f}s')
threads = [threading.Thread(target=worker, args=(f'T{i}',)) for i in range(3)]

start = time.perf_counter()
for t in threads:
    t.start()      # start all three
for t in threads:
    t.join()       # then wait for all three
print(f'elapsed: {time.perf_counter() - start:.2f}s')

This works, but it's clumsy for real use. Two problems jump out: there's no clean way to get a return value back from worker, and managing lists of threads by hand gets fiddly fast. Both are solved by the thread pool.

`ThreadPoolExecutor`: the tool you'll actually use¶

concurrent.futures.ThreadPoolExecutor manages a fixed pool of worker threads for you. You hand it work; it gives you back a Future — an object representing a result that isn't ready yet. Call .result() on the future to get the value (blocking until it's ready), and any exception raised in the worker is re-raised there.

Use it as a context manager so the pool is cleaned up automatically.

In [ ]:

no-run

Copied!





from concurrent.futures import ThreadPoolExecutor

def fetch(url):
    time.sleep(0.3)              # pretend this is a network request
    return f'{url} -> 200 OK'

with ThreadPoolExecutor(max_workers=4) as pool:
    future = pool.submit(fetch, 'https://example.com')   # returns immediately
    print('submitted; doing other things...')
    print(future.result())                               # blocks until ready
from concurrent.futures import ThreadPoolExecutor

def fetch(url):
    time.sleep(0.3)              # pretend this is a network request
    return f'{url} -> 200 OK'

with ThreadPoolExecutor(max_workers=4) as pool:
    future = pool.submit(fetch, 'https://example.com')   # returns immediately
    print('submitted; doing other things...')
    print(future.result())                               # blocks until ready

`submit` many, collect results¶

submit returns one future per call. To run many tasks, submit them all (they start running straight away, up to max_workers at a time), then collect the results.

In [ ]:

no-run

Copied!





urls = [f'https://site/{i}' for i in range(8)]

start = time.perf_counter()
with ThreadPoolExecutor(max_workers=4) as pool:
    futures = [pool.submit(fetch, u) for u in urls]
    results = [f.result() for f in futures]   # results in submission order

print(f'{len(results)} fetched in {time.perf_counter() - start:.2f}s')
print(results[0])
urls = [f'https://site/{i}' for i in range(8)]

start = time.perf_counter()
with ThreadPoolExecutor(max_workers=4) as pool:
    futures = [pool.submit(fetch, u) for u in urls]
    results = [f.result() for f in futures]   # results in submission order

print(f'{len(results)} fetched in {time.perf_counter() - start:.2f}s')
print(results[0])

Eight tasks of 0.3s each, four at a time: two waves, about 0.6s — versus 2.4s sequentially. The pool size caps how many run concurrently, which matters when you don't want to open 10,000 connections at once.

`map`: simplest when inputs map to outputs¶

If you're applying one function to many inputs and want the results back in order, executor.map is the cleanest form. It mirrors the built-in map, but runs the calls concurrently.

In [ ]:

no-run

Copied!

with ThreadPoolExecutor(max_workers=4) as pool:
    for result in pool.map(fetch, urls):   # yields in input order
        print(result)
with ThreadPoolExecutor(max_workers=4) as pool:
    for result in pool.map(fetch, urls):   # yields in input order
        print(result)

`as_completed`: handle results as they arrive¶

Sometimes you don't want to wait for the slowest task to start processing the fast ones. concurrent.futures.as_completed yields futures in the order they finish, not the order you submitted them. Keep a dict mapping each future back to its input so you know what finished.

In [ ]:

no-run

Copied!





from concurrent.futures import as_completed
import random

def variable_fetch(url):
    time.sleep(random.uniform(0.1, 0.6))   # finishing times will vary
    return f'{url} done'

with ThreadPoolExecutor(max_workers=4) as pool:
    future_to_url = {pool.submit(variable_fetch, u): u for u in urls}
    for future in as_completed(future_to_url):
        url = future_to_url[future]
        print(f'{url}: {future.result()}')   # printed as each finishes
from concurrent.futures import as_completed
import random

def variable_fetch(url):
    time.sleep(random.uniform(0.1, 0.6))   # finishing times will vary
    return f'{url} done'

with ThreadPoolExecutor(max_workers=4) as pool:
    future_to_url = {pool.submit(variable_fetch, u): u for u in urls}
    for future in as_completed(future_to_url):
        url = future_to_url[future]
        print(f'{url}: {future.result()}')   # printed as each finishes

Exceptions surface at `.result()`¶

An exception raised inside a worker doesn't crash the pool and isn't printed — it's stored on the future and re-raised when you call .result(). This means you must call .result() (or otherwise inspect the future) to find out a task failed. A common bug is firing off tasks and never checking them, so failures vanish silently.

In [ ]:

no-run

Copied!





def flaky(n):
    if n == 3:
        raise ValueError(f'cannot process {n}')
    return n * 10

with ThreadPoolExecutor(max_workers=3) as pool:
    futures = {pool.submit(flaky, n): n for n in range(5)}
    for future in as_completed(futures):
        n = futures[future]
        try:
            print(f'{n} -> {future.result()}')
        except ValueError as exc:
            print(f'{n} failed: {exc}')   # handle per-task failure
def flaky(n):
    if n == 3:
        raise ValueError(f'cannot process {n}')
    return n * 10

with ThreadPoolExecutor(max_workers=3) as pool:
    futures = {pool.submit(flaky, n): n for n in range(5)}
    for future in as_completed(futures):
        n = futures[future]
        try:
            print(f'{n} -> {future.result()}')
        except ValueError as exc:
            print(f'{n} failed: {exc}')   # handle per-task failure

Shared state is the catch¶

Threads share memory, which is convenient and dangerous. When two threads modify the same object, operations that look atomic often aren't. counter += 1 is really read, add, write — three steps — and two threads can interleave so an update is lost. This is a race condition.

In [ ]:

no-run

Copied!





counter = 0

def increment():
    global counter
    for _ in range(100_000):
        counter += 1        # NOT atomic: read, +1, write

threads = [threading.Thread(target=increment) for _ in range(4)]
for t in threads: t.start()
for t in threads: t.join()

print('expected:', 4 * 100_000)
print('got:     ', counter)   # usually LESS — updates were lost
counter = 0

def increment():
    global counter
    for _ in range(100_000):
        counter += 1        # NOT atomic: read, +1, write

threads = [threading.Thread(target=increment) for _ in range(4)]
for t in threads: t.start()
for t in threads: t.join()

print('expected:', 4 * 100_000)
print('got:     ', counter)   # usually LESS — updates were lost

The fix is a Lock: a flag that only one thread can hold at a time. Wrap the critical section in with lock: so updates can't interleave.

In [ ]:

no-run

Copied!





counter = 0
lock = threading.Lock()

def safe_increment():
    global counter
    for _ in range(100_000):
        with lock:          # only one thread in here at a time
            counter += 1

threads = [threading.Thread(target=safe_increment) for _ in range(4)]
for t in threads: t.start()
for t in threads: t.join()

print('got:', counter)      # now exactly 400000
counter = 0
lock = threading.Lock()

def safe_increment():
    global counter
    for _ in range(100_000):
        with lock:          # only one thread in here at a time
            counter += 1

threads = [threading.Thread(target=safe_increment) for _ in range(4)]
for t in threads: t.start()
for t in threads: t.join()

print('got:', counter)      # now exactly 400000

Locks fix correctness but cost speed — threads queue for the lock. The best defence is to avoid shared mutable state: have each task return a value and combine the results in the main thread (as the submit/map examples do), rather than having threads write into shared objects.

Recap¶

Threads overlap I/O-bound work; the GIL stops them helping CPU-bound work.
Prefer ThreadPoolExecutor over raw Thread. submit returns a Future; .result() gives the value and re-raises exceptions.
map for ordered results, as_completed for results as they finish.
Always retrieve results or failures stay hidden.
Shared mutable state needs a Lock — or, better, design it away.

Next: Processes and parallelism, for the CPU-bound work threads can't accelerate.

Threads and futures¶

A raw thread¶

ThreadPoolExecutor: the tool you'll actually use¶

submit many, collect results¶

map: simplest when inputs map to outputs¶

as_completed: handle results as they arrive¶

Exceptions surface at .result()¶

Shared state is the catch¶

Recap¶

`ThreadPoolExecutor`: the tool you'll actually use¶

`submit` many, collect results¶

`map`: simplest when inputs map to outputs¶

`as_completed`: handle results as they arrive¶

Exceptions surface at `.result()`¶