Concurrency models¶

Before any syntax, one idea decides everything that follows: what is your program actually doing while it's slow? Almost all slowness is one of two kinds. Either your program is waiting — for a web server, a disk, a database — or it's computing — running a loop that pins a CPU core flat out. These are called I/O-bound and CPU-bound work, and they want completely different tools.

This notebook builds the mental model. By the end you'll be able to look at a slow piece of code and say which of Python's three concurrency tools — threads, processes, or async/await — is the right one, and why.

Concurrency is not parallelism¶

The two words get used interchangeably, but they mean different things, and the difference is the whole topic in miniature.

Concurrency is structuring a program as multiple tasks that can make progress independently. They take turns. At any single instant only one might be running — but because they hand off whenever one is stuck waiting, the whole set finishes sooner. One cook juggling four pans.
Parallelism is actually doing more than one thing at the same instant, on more than one CPU core. Four cooks, four pans, four hobs.

You can have concurrency without parallelism (one core, tasks taking turns) and parallelism is just one way to implement concurrency. The reason this matters: Python's async and threads give you concurrency but, for pure computation, not parallelism. Only processes give you real parallelism. The next sections explain why.

The distinction that drives everything: I/O-bound vs CPU-bound¶

I/O-bound work spends most of its time waiting for something outside the program — a network response, a disk read, a user. The CPU is idle during the wait. If you have a hundred such tasks, they can all wait at the same time; the waiting overlaps for free.

CPU-bound work spends its time computing — hashing, parsing, number-crunching, image processing. The CPU is busy the whole time. The only way to do more of it per second is to use more cores.

Let's make the distinction concrete with two functions that each take about the same wall-clock time but for opposite reasons.

In [ ]:

Copied!





import time

def io_task(n):
    """I/O-bound: mostly waiting. We simulate a network call with sleep."""
    time.sleep(0.2)          # the CPU does nothing during this 0.2s
    return n

def cpu_task(n):
    """CPU-bound: the core is busy the whole time."""
    total = 0
    for i in range(2_000_000):
        total += i * i
    return total

start = time.perf_counter()
io_task(0)
print(f'one io_task : {time.perf_counter() - start:.2f}s')

start = time.perf_counter()
cpu_task(0)
print(f'one cpu_task: {time.perf_counter() - start:.2f}s')
import time

def io_task(n):
    """I/O-bound: mostly waiting. We simulate a network call with sleep."""
    time.sleep(0.2)          # the CPU does nothing during this 0.2s
    return n

def cpu_task(n):
    """CPU-bound: the core is busy the whole time."""
    total = 0
    for i in range(2_000_000):
        total += i * i
    return total

start = time.perf_counter()
io_task(0)
print(f'one io_task : {time.perf_counter() - start:.2f}s')

start = time.perf_counter()
cpu_task(0)
print(f'one cpu_task: {time.perf_counter() - start:.2f}s')

Run four of each, one after another, and they both take roughly four times as long. Sequential code doesn't care why something is slow — it just does one thing, then the next.

In [ ]:

Copied!





start = time.perf_counter()
for n in range(4):
    io_task(n)
print(f'4 io_tasks sequentially : {time.perf_counter() - start:.2f}s')
start = time.perf_counter()
for n in range(4):
    io_task(n)
print(f'4 io_tasks sequentially : {time.perf_counter() - start:.2f}s')

That 0.8s is almost entirely waiting. Nothing was computed — the program just sat there four times. This is the slack that concurrency reclaims: while one io_task waits, another could be waiting too. Overlap the four waits and the whole batch takes about as long as one of them.

The CPU-bound batch is different. There's no slack to reclaim — the core is already busy. Running four of them "at once" on a single core just means slicing the same core four ways; the total work, and total time, is unchanged. To finish faster you need more cores.

The GIL, in one paragraph¶

CPython has a Global Interpreter Lock (the GIL): a mutex that lets only one thread execute Python bytecode at a time. So Python threads do not run Python code in parallel — even on a 16-core machine, two threads doing pure computation take turns on one core. This sounds fatal for threads, and for CPU-bound work it is. But there's a crucial exception: a thread releases the GIL while it waits on I/O (and inside many C extensions). So while one thread is blocked on a socket, another thread runs. That single fact is why threads are excellent for I/O-bound work and useless for CPU-bound work. The GIL concept essay goes deeper, including the new free-threaded builds that remove it.

Python's three tools, and the work each one fits¶

With that distinction in hand, the three tools sort themselves out:

Tool	Gives you	GIL?	Best for
Threads (`threading`, `ThreadPoolExecutor`)	Concurrency, shared memory	Held — one thread runs Python at a time	I/O-bound work, a moderate number of tasks
Processes (`multiprocessing`, `ProcessPoolExecutor`)	True parallelism, isolated memory	Sidestepped — each process has its own	CPU-bound work
`async`/`await` (`asyncio`)	Concurrency, one thread, cooperative	Held — but irrelevant; one thread	I/O-bound work at high volume (thousands of connections)

Notice that two of the three tools target I/O-bound work. That's not redundancy: threads suit a few dozen blocking calls and existing blocking libraries; async suits thousands of connections and code written to cooperate. Only processes target CPU-bound work, because only processes give you more than one core's worth of Python.

A decision in three questions¶

Is the work CPU-bound or I/O-bound? If the core is busy the whole time, it's CPU-bound → reach for processes. Everything else is I/O-bound.
For I/O-bound work, how many tasks at once? A handful to a few hundred, often using ordinary blocking libraries → threads (ThreadPoolExecutor) are simplest. Many hundreds to thousands of simultaneous connections → async.
Do you control the code being called? async only helps if the libraries you call are themselves async (or you push blocking calls to a thread). If you're stuck with blocking libraries, threads are the pragmatic choice.

A frequent and correct answer is "none yet": if the program is fast enough, adding concurrency only adds bugs. Concurrency is a tool for a measured problem, not a default.

What's next¶

The remaining three notebooks take the tools in order of how often you'll reach for them:

Threads and futures — the everyday I/O-bound workhorse, via the friendly concurrent.futures interface.
Processes and parallelism — real multi-core computation, and the rules processes impose.
Async and await — single-threaded concurrency for high-volume I/O.

Keep the I/O-versus-CPU question in your head as you go; every design choice in the next three notebooks traces back to it.