multiprocessing: True Parallelism for CPU-Bound Python
GIL blocks threading for CPU work. Multiprocessing spawns separate processes, enabling true parallelism for CPU-bound tasks like batch order price calculations.
Yesterday I learned threading solves I/O-bound problems — you make requests, threads wait in parallel, GIL releases during the I/O block, requests come back. But this morning I ran the price recalculation on 100,000 orders with threading and it was actually slower than sequential.
That is the GIL at work. Your price calculation does not hit the network or disk. It is pure CPU: loop through items, apply discounts, add taxes, accumulate the result. That is Python bytecode executing continuously. The GIL will not release.
So you're saying threading threads are waiting to execute in the same process, fighting over one Python interpreter lock? Even though I have a multicore machine?
Exactly. The GIL ensures only one thread can execute Python bytecode at a moment. If threads are doing CPU work — not waiting on I/O — they just trade time slices and fight each other for the lock. You get context switching overhead without parallelism. The threads get slower than sequential, and it looks broken.
What if you do not use threads at all? What if you started a whole separate Python interpreter process for each batch of orders?
Now you understand the problem multiprocessing solves. Instead of threads in one interpreter, you spawn actual separate processes — each with its own Python interpreter, its own GIL. They can run in parallel on multiple cores. True parallelism.
But that sounds expensive. Starting a whole separate process just to calculate prices?
It is expensive. Each process carries overhead: startup time, memory for its own interpreter state, communication overhead to pass data between processes. But if the work is heavy enough — calculating prices for 10,000 orders, image processing, numerical computation — the CPU-bound work eventually pays back the overhead cost and then dominates.
So you only use multiprocessing if the workload is actually substantial. How do you know the breakeven point?
You measure. Benchmark your actual work on your actual machine. But the rule of thumb is: if a single order takes under a millisecond, multiprocessing overhead will kill you. If it takes ten or more milliseconds per order, multiprocessing likely wins. And Python's multiprocessing.Pool is built exactly for this problem.
Pool? Like a pool of worker processes?
Exactly. You create a Pool, give it a batch of work and a function, and it distributes the work across processes. Instead of you spawning processes manually, the Pool manages the workers, reuses them, and gives you back the results.
from multiprocessing import Pool
def calculate_order_price(order):
subtotal = order['total']
discount = subtotal * order['discount']
after_discount = subtotal - discount
tax = after_discount * 0.1
final = after_discount + tax
return final
if __name__ == '__main__':
orders = [
{'id': 'ORD-001', 'total': 100.0, 'discount': 0.1},
{'id': 'ORD-002', 'total': 200.0, 'discount': 0.2},
{'id': 'ORD-003', 'total': 50.0, 'discount': 0.05},
]
with Pool(processes=4) as pool:
prices = pool.map(calculate_order_price, orders)
print(prices) # [90.0, 160.0, 47.5]
Notice the if __name__ == '__main__': guard. That is not optional with multiprocessing. When a child process starts, it re-imports your module to bootstrap itself. Without the guard, you would spawn an infinite cascade of processes.
So pool.map() is like the built-in map() function, but it runs the function in multiple processes?
Exactly analogous. map(func, items) applies func to each item sequentially. pool.map(func, items) distributes the items across worker processes and returns results in the same order. The API is almost identical — the execution model is completely different.
What happens if one of the orders is missing the 'total' key?
The worker process will raise a KeyError. The Pool will re-raise that exception in your main process, and the entire pool.map() call fails. You handle it like any other error — try-except, validation before sending to the Pool, whatever is appropriate.
So the data has to be serializable. The order dict has to be passed to another process.
That is the hidden cost. Multiprocessing uses pickle to serialize the function and every input item, send it through a pipe to the worker process, deserialize it, run the function, serialize the result, send it back, and deserialize again in the parent. This is the "carrying ingredients between kitchens" I mentioned. For simple dicts and numbers, it is fine. For complex objects with custom state, unpicklable objects like file handles or network sockets, it breaks.
import pickle
import socket
# This will fail
s = socket.socket()
pickle.dumps(s) # TypeError: cannot pickle '_socket.socket' object
So you cannot pass a database connection to a worker process?
Not the way you think. The connection dies during pickling. What you do instead is have each worker open its own connection inside the worker process, after it starts. The parent sends only the data — orders, IDs — not the connections.
This is starting to sound complicated. "Do not pickle database connections. Reconnect inside the worker. Also do not forget the if __name__ == '__main__' guard or you will explode the process tree."
Welcome to multiprocessing. It is powerful and it has footguns. There is a reason people reach for async first — fewer moving parts. But when you have genuine CPU-bound work, this is the tool that actually solves it.
Okay — so my use case: 100,000 orders, each one a dict, each one needs discount + tax calculation. That should serialize fine and the work is heavy enough to pay back the overhead.
That is a textbook multiprocessing case. You would do something like:
from multiprocessing import Pool
def calculate_order_price(order):
subtotal = order['total']
discount = subtotal * order['discount']
after_discount = subtotal - discount
tax = after_discount * 0.1
final = after_discount + tax
return final
def calculate_batch_prices(orders, num_workers=4):
if not orders:
return []
with Pool(processes=num_workers) as pool:
return pool.map(calculate_order_price, orders)
if __name__ == '__main__':
orders = [
{'id': 'ORD-001', 'total': 100.0, 'discount': 0.1},
{'id': 'ORD-002', 'total': 200.0, 'discount': 0.2},
{'id': 'ORD-003', 'total': 50.0, 'discount': 0.05},
{'id': 'ORD-004', 'total': 150.0, 'discount': 0.15},
]
prices = calculate_batch_prices(orders)
print(prices) # [90.0, 160.0, 47.5, 127.5]
The Pool is a context manager — it automatically waits for all worker processes to finish and cleans them up when you exit the with block.
What is starmap()? I saw it in the docs.
map() passes each item as a single argument. starmap() unpacks each item as multiple arguments. If you had orders with nested structure, you might do:
def calculate_prices_for_customer(customer_id, order_ids, discount_rate):
# Process all orders for one customer with a specific rate
pass
with Pool(processes=4) as pool:
results = pool.starmap(calculate_prices_for_customer, [
('CUST-001', ['ORD-001', 'ORD-002'], 0.1),
('CUST-002', ['ORD-003', 'ORD-004'], 0.2),
])
Each tuple in the list becomes separate arguments to the function. But for your use case — one order at a time — map() is what you want.
If I am using 4 worker processes and I have 100,000 orders, they all get processed eventually, right? The workers keep taking new orders until the queue is empty?
Exactly. The Pool queues all 100,000 items. Workers take an order, process it, return the result, and take the next one. The with block does not return until all items are processed. You get back a list of 100,000 prices in the same order as the input orders.
So the cost-benefit is: pay upfront overhead for process startup and pickle serialization, but get true parallelism. If the work is light, you lose. If the work is heavy, you win.
That is the precise tradeoff. And you have three concurrency models now: threading for I/O, multiprocessing for CPU, and async for both if you write your own coroutines. Each one is the right answer to exactly one type of problem.
What about the third one? You mentioned it before.
Tomorrow we get to async and await — the syntax that lets you write code that looks sequential but is actually driven by event loops, managing I/O concurrency without threads. It is the most flexible model once you understand it. But it requires you to break your code into pieces and surrender control to the loop. Threading and multiprocessing let you write straight-line code and let Python handle the concurrency. Async requires you to think asynchronously from the start.
So I learn threading for I/O on Day 10 — got it. Day 11 is multiprocessing for CPU — got it. Day 12 is async for... both? But harder?
Harder to understand, more flexible once you do. But yes — three tools, three mental models, and by end of week you will know which hammer to reach for depending on the nail.
Practice your skills
Sign up to write and run code in this lesson.
multiprocessing: True Parallelism for CPU-Bound Python
GIL blocks threading for CPU work. Multiprocessing spawns separate processes, enabling true parallelism for CPU-bound tasks like batch order price calculations.
Yesterday I learned threading solves I/O-bound problems — you make requests, threads wait in parallel, GIL releases during the I/O block, requests come back. But this morning I ran the price recalculation on 100,000 orders with threading and it was actually slower than sequential.
That is the GIL at work. Your price calculation does not hit the network or disk. It is pure CPU: loop through items, apply discounts, add taxes, accumulate the result. That is Python bytecode executing continuously. The GIL will not release.
So you're saying threading threads are waiting to execute in the same process, fighting over one Python interpreter lock? Even though I have a multicore machine?
Exactly. The GIL ensures only one thread can execute Python bytecode at a moment. If threads are doing CPU work — not waiting on I/O — they just trade time slices and fight each other for the lock. You get context switching overhead without parallelism. The threads get slower than sequential, and it looks broken.
What if you do not use threads at all? What if you started a whole separate Python interpreter process for each batch of orders?
Now you understand the problem multiprocessing solves. Instead of threads in one interpreter, you spawn actual separate processes — each with its own Python interpreter, its own GIL. They can run in parallel on multiple cores. True parallelism.
But that sounds expensive. Starting a whole separate process just to calculate prices?
It is expensive. Each process carries overhead: startup time, memory for its own interpreter state, communication overhead to pass data between processes. But if the work is heavy enough — calculating prices for 10,000 orders, image processing, numerical computation — the CPU-bound work eventually pays back the overhead cost and then dominates.
So you only use multiprocessing if the workload is actually substantial. How do you know the breakeven point?
You measure. Benchmark your actual work on your actual machine. But the rule of thumb is: if a single order takes under a millisecond, multiprocessing overhead will kill you. If it takes ten or more milliseconds per order, multiprocessing likely wins. And Python's multiprocessing.Pool is built exactly for this problem.
Pool? Like a pool of worker processes?
Exactly. You create a Pool, give it a batch of work and a function, and it distributes the work across processes. Instead of you spawning processes manually, the Pool manages the workers, reuses them, and gives you back the results.
from multiprocessing import Pool
def calculate_order_price(order):
subtotal = order['total']
discount = subtotal * order['discount']
after_discount = subtotal - discount
tax = after_discount * 0.1
final = after_discount + tax
return final
if __name__ == '__main__':
orders = [
{'id': 'ORD-001', 'total': 100.0, 'discount': 0.1},
{'id': 'ORD-002', 'total': 200.0, 'discount': 0.2},
{'id': 'ORD-003', 'total': 50.0, 'discount': 0.05},
]
with Pool(processes=4) as pool:
prices = pool.map(calculate_order_price, orders)
print(prices) # [90.0, 160.0, 47.5]
Notice the if __name__ == '__main__': guard. That is not optional with multiprocessing. When a child process starts, it re-imports your module to bootstrap itself. Without the guard, you would spawn an infinite cascade of processes.
So pool.map() is like the built-in map() function, but it runs the function in multiple processes?
Exactly analogous. map(func, items) applies func to each item sequentially. pool.map(func, items) distributes the items across worker processes and returns results in the same order. The API is almost identical — the execution model is completely different.
What happens if one of the orders is missing the 'total' key?
The worker process will raise a KeyError. The Pool will re-raise that exception in your main process, and the entire pool.map() call fails. You handle it like any other error — try-except, validation before sending to the Pool, whatever is appropriate.
So the data has to be serializable. The order dict has to be passed to another process.
That is the hidden cost. Multiprocessing uses pickle to serialize the function and every input item, send it through a pipe to the worker process, deserialize it, run the function, serialize the result, send it back, and deserialize again in the parent. This is the "carrying ingredients between kitchens" I mentioned. For simple dicts and numbers, it is fine. For complex objects with custom state, unpicklable objects like file handles or network sockets, it breaks.
import pickle
import socket
# This will fail
s = socket.socket()
pickle.dumps(s) # TypeError: cannot pickle '_socket.socket' object
So you cannot pass a database connection to a worker process?
Not the way you think. The connection dies during pickling. What you do instead is have each worker open its own connection inside the worker process, after it starts. The parent sends only the data — orders, IDs — not the connections.
This is starting to sound complicated. "Do not pickle database connections. Reconnect inside the worker. Also do not forget the if __name__ == '__main__' guard or you will explode the process tree."
Welcome to multiprocessing. It is powerful and it has footguns. There is a reason people reach for async first — fewer moving parts. But when you have genuine CPU-bound work, this is the tool that actually solves it.
Okay — so my use case: 100,000 orders, each one a dict, each one needs discount + tax calculation. That should serialize fine and the work is heavy enough to pay back the overhead.
That is a textbook multiprocessing case. You would do something like:
from multiprocessing import Pool
def calculate_order_price(order):
subtotal = order['total']
discount = subtotal * order['discount']
after_discount = subtotal - discount
tax = after_discount * 0.1
final = after_discount + tax
return final
def calculate_batch_prices(orders, num_workers=4):
if not orders:
return []
with Pool(processes=num_workers) as pool:
return pool.map(calculate_order_price, orders)
if __name__ == '__main__':
orders = [
{'id': 'ORD-001', 'total': 100.0, 'discount': 0.1},
{'id': 'ORD-002', 'total': 200.0, 'discount': 0.2},
{'id': 'ORD-003', 'total': 50.0, 'discount': 0.05},
{'id': 'ORD-004', 'total': 150.0, 'discount': 0.15},
]
prices = calculate_batch_prices(orders)
print(prices) # [90.0, 160.0, 47.5, 127.5]
The Pool is a context manager — it automatically waits for all worker processes to finish and cleans them up when you exit the with block.
What is starmap()? I saw it in the docs.
map() passes each item as a single argument. starmap() unpacks each item as multiple arguments. If you had orders with nested structure, you might do:
def calculate_prices_for_customer(customer_id, order_ids, discount_rate):
# Process all orders for one customer with a specific rate
pass
with Pool(processes=4) as pool:
results = pool.starmap(calculate_prices_for_customer, [
('CUST-001', ['ORD-001', 'ORD-002'], 0.1),
('CUST-002', ['ORD-003', 'ORD-004'], 0.2),
])
Each tuple in the list becomes separate arguments to the function. But for your use case — one order at a time — map() is what you want.
If I am using 4 worker processes and I have 100,000 orders, they all get processed eventually, right? The workers keep taking new orders until the queue is empty?
Exactly. The Pool queues all 100,000 items. Workers take an order, process it, return the result, and take the next one. The with block does not return until all items are processed. You get back a list of 100,000 prices in the same order as the input orders.
So the cost-benefit is: pay upfront overhead for process startup and pickle serialization, but get true parallelism. If the work is light, you lose. If the work is heavy, you win.
That is the precise tradeoff. And you have three concurrency models now: threading for I/O, multiprocessing for CPU, and async for both if you write your own coroutines. Each one is the right answer to exactly one type of problem.
What about the third one? You mentioned it before.
Tomorrow we get to async and await — the syntax that lets you write code that looks sequential but is actually driven by event loops, managing I/O concurrency without threads. It is the most flexible model once you understand it. But it requires you to break your code into pieces and surrender control to the loop. Threading and multiprocessing let you write straight-line code and let Python handle the concurrency. Async requires you to think asynchronously from the start.
So I learn threading for I/O on Day 10 — got it. Day 11 is multiprocessing for CPU — got it. Day 12 is async for... both? But harder?
Harder to understand, more flexible once you do. But yes — three tools, three mental models, and by end of week you will know which hammer to reach for depending on the nail.