You have a function that processes 50,000 log entries. The ops team wants to know: is it fast enough? What do you do right now to measure it?
I record time.time() before and after the call and print the difference. start = time.time(); process_logs(entries); print(time.time() - start). It works but it's a single measurement — if the first run hits a cold cache it looks slower than it is.
timeit is the proper answer. It runs the code multiple times, handles measurement overhead, and gives you a reliable timing:
import timeit
# Time a function call — runs it 10000 times by default
result = timeit.timeit(
stmt="process_log_line(sample_line)",
setup="from __main__ import process_log_line, sample_line",
number=10000
)
print(f"Total: {result:.3f}s for 10000 runs")
print(f"Per call: {result/10000*1000:.3f}ms")number=10000 means it runs the statement 10,000 times and reports the total. Dividing by 10,000 gives the average per-call time. But what's the setup parameter? Why do I need from __main__ import ...?
The setup runs once before the timed loop and is not included in the measurement. The stmt runs in a fresh namespace — your module's globals aren't automatically available. setup imports what the statement needs. In practice, using timeit.timeit() with a lambda is cleaner:
import timeit
# Using a callable — no import needed
def process_log_line(line):
return len(line.split())
sample_line = "2026-04-07 ERROR auth: Token expired"
elapsed = timeit.timeit(lambda: process_log_line(sample_line), number=10000)
print(f"{elapsed/10000*1000:.4f}ms per call")Lambda as the statement — the closure captures process_log_line and sample_line from the surrounding scope. No import dance needed. That's the version I'll actually use.
And for comparing two implementations — old vs new — timeit.repeat() runs the timer multiple times and gives you the distribution:
import timeit
times = timeit.repeat(
lambda: process_log_line(sample_line),
number=10000,
repeat=5
)
print(f"Best: {min(times)/10000*1000:.4f}ms")
print(f"Worst: {max(times)/10000*1000:.4f}ms")Take the minimum across repeats — it's the most reliable estimate of actual performance. The worst value captures system interruptions, not your code's speed.
What about pprint? I've seen it but never understood when you'd use it over regular print.
pprint.pprint() is for inspecting complex data structures. When a nested dict or list spans multiple lines, regular print() gives you one undifferentiated wall of text. pprint formats it with indentation:
import pprint
log_summary = {
"total": 50000,
"errors": {"auth": 245, "api": 112, "db": 8},
"warnings": {"auth": 78, "api": 1203, "db": 445},
"slow_requests": [
{"service": "db", "ms": 2341, "timestamp": "2026-04-07T09:14:33Z"},
{"service": "api", "ms": 1205, "timestamp": "2026-04-07T09:15:01Z"},
]
}
pprint.pprint(log_summary, width=60, depth=2)depth=2 limits how deep the nested structure is printed — useful for very deep structures where I only need the top-level shape. And width=60 controls the line width for wrapping. This is what I want when I'm debugging the output structure of a function, not for final output.
Exactly — pprint is a debugging microscope. You use it during development to see the shape of your data clearly. You don't ship it in production output. For production output, json.dumps(indent=2) or logging with formatted messages.
pprint.pformat() returns the string instead of printing it — useful for logging: logger.debug(pprint.pformat(data)). That way the pretty-printed structure shows up in the debug log when I need it but doesn't pollute stdout.
pformat for logging, pprint for interactive debugging. You've just described professional debug workflow. Today's problem: use timeit to compare two implementations of the same parsing function, use pprint to inspect the output structure, and use the timing results to decide which to keep.
pprint and timeit are not general-purpose application modules — they are developer tools that belong in your workflow for understanding and measuring code, not in production output paths.
pprint.pprint(obj, width=80, depth=None, indent=1, compact=False) formats Python objects with indentation and line-breaking that makes nested structures legible. The default print() calls str() or repr() and outputs everything on one line. pprint inserts newlines and indentation based on the width constraint — if the object fits on one line within width characters, it stays on one line. If not, it's broken into multiple lines with indented children.
depth limits recursion depth: depth=2 shows the top two levels of nesting and replaces deeper structures with .... This is useful when you need the shape of a deeply nested object without seeing every leaf value.
pprint.pformat(obj) returns the formatted string instead of printing it. This makes it composable with logger.debug(), assert, and any other string-consuming context.
time.time() before and after a call measures wall-clock time including OS scheduling jitter. timeit mitigates this by running the statement many times and reporting the total. The minimum of multiple repeat() runs is the standard recommendation — it represents the fastest your hardware can execute the code, with minimal interference from other processes.
The number parameter controls how many times stmt runs per timing call. For fast operations (microseconds), use number=100000 or more. For slow operations (milliseconds), number=100 is sufficient. Multiply total time by 1/number to get average per-call time; multiply by 1000 for milliseconds.
timeit.timeit(lambda: f(x), number=N) is the idiomatic form for timing function calls in interactive and test contexts. The lambda closure captures local variables, eliminating the setup parameter's import ceremony. The overhead of the lambda call itself is included in the measurement — it's consistent and negligible for calls that take more than a few nanoseconds.
Profile before optimizing. Identify the function that dominates runtime by measuring the whole pipeline, then the major stages, then individual functions. timeit is for comparing specific alternatives (regex vs string split, dict vs defaultdict) once you've identified the hot path. Optimize where the data tells you to, not where intuition suggests.
Sign up to write and run code in this lesson.
You have a function that processes 50,000 log entries. The ops team wants to know: is it fast enough? What do you do right now to measure it?
I record time.time() before and after the call and print the difference. start = time.time(); process_logs(entries); print(time.time() - start). It works but it's a single measurement — if the first run hits a cold cache it looks slower than it is.
timeit is the proper answer. It runs the code multiple times, handles measurement overhead, and gives you a reliable timing:
import timeit
# Time a function call — runs it 10000 times by default
result = timeit.timeit(
stmt="process_log_line(sample_line)",
setup="from __main__ import process_log_line, sample_line",
number=10000
)
print(f"Total: {result:.3f}s for 10000 runs")
print(f"Per call: {result/10000*1000:.3f}ms")number=10000 means it runs the statement 10,000 times and reports the total. Dividing by 10,000 gives the average per-call time. But what's the setup parameter? Why do I need from __main__ import ...?
The setup runs once before the timed loop and is not included in the measurement. The stmt runs in a fresh namespace — your module's globals aren't automatically available. setup imports what the statement needs. In practice, using timeit.timeit() with a lambda is cleaner:
import timeit
# Using a callable — no import needed
def process_log_line(line):
return len(line.split())
sample_line = "2026-04-07 ERROR auth: Token expired"
elapsed = timeit.timeit(lambda: process_log_line(sample_line), number=10000)
print(f"{elapsed/10000*1000:.4f}ms per call")Lambda as the statement — the closure captures process_log_line and sample_line from the surrounding scope. No import dance needed. That's the version I'll actually use.
And for comparing two implementations — old vs new — timeit.repeat() runs the timer multiple times and gives you the distribution:
import timeit
times = timeit.repeat(
lambda: process_log_line(sample_line),
number=10000,
repeat=5
)
print(f"Best: {min(times)/10000*1000:.4f}ms")
print(f"Worst: {max(times)/10000*1000:.4f}ms")Take the minimum across repeats — it's the most reliable estimate of actual performance. The worst value captures system interruptions, not your code's speed.
What about pprint? I've seen it but never understood when you'd use it over regular print.
pprint.pprint() is for inspecting complex data structures. When a nested dict or list spans multiple lines, regular print() gives you one undifferentiated wall of text. pprint formats it with indentation:
import pprint
log_summary = {
"total": 50000,
"errors": {"auth": 245, "api": 112, "db": 8},
"warnings": {"auth": 78, "api": 1203, "db": 445},
"slow_requests": [
{"service": "db", "ms": 2341, "timestamp": "2026-04-07T09:14:33Z"},
{"service": "api", "ms": 1205, "timestamp": "2026-04-07T09:15:01Z"},
]
}
pprint.pprint(log_summary, width=60, depth=2)depth=2 limits how deep the nested structure is printed — useful for very deep structures where I only need the top-level shape. And width=60 controls the line width for wrapping. This is what I want when I'm debugging the output structure of a function, not for final output.
Exactly — pprint is a debugging microscope. You use it during development to see the shape of your data clearly. You don't ship it in production output. For production output, json.dumps(indent=2) or logging with formatted messages.
pprint.pformat() returns the string instead of printing it — useful for logging: logger.debug(pprint.pformat(data)). That way the pretty-printed structure shows up in the debug log when I need it but doesn't pollute stdout.
pformat for logging, pprint for interactive debugging. You've just described professional debug workflow. Today's problem: use timeit to compare two implementations of the same parsing function, use pprint to inspect the output structure, and use the timing results to decide which to keep.
pprint and timeit are not general-purpose application modules — they are developer tools that belong in your workflow for understanding and measuring code, not in production output paths.
pprint.pprint(obj, width=80, depth=None, indent=1, compact=False) formats Python objects with indentation and line-breaking that makes nested structures legible. The default print() calls str() or repr() and outputs everything on one line. pprint inserts newlines and indentation based on the width constraint — if the object fits on one line within width characters, it stays on one line. If not, it's broken into multiple lines with indented children.
depth limits recursion depth: depth=2 shows the top two levels of nesting and replaces deeper structures with .... This is useful when you need the shape of a deeply nested object without seeing every leaf value.
pprint.pformat(obj) returns the formatted string instead of printing it. This makes it composable with logger.debug(), assert, and any other string-consuming context.
time.time() before and after a call measures wall-clock time including OS scheduling jitter. timeit mitigates this by running the statement many times and reporting the total. The minimum of multiple repeat() runs is the standard recommendation — it represents the fastest your hardware can execute the code, with minimal interference from other processes.
The number parameter controls how many times stmt runs per timing call. For fast operations (microseconds), use number=100000 or more. For slow operations (milliseconds), number=100 is sufficient. Multiply total time by 1/number to get average per-call time; multiply by 1000 for milliseconds.
timeit.timeit(lambda: f(x), number=N) is the idiomatic form for timing function calls in interactive and test contexts. The lambda closure captures local variables, eliminating the setup parameter's import ceremony. The overhead of the lambda call itself is included in the measurement — it's consistent and negligible for calls that take more than a few nanoseconds.
Profile before optimizing. Identify the function that dominates runtime by measuring the whole pipeline, then the major stages, then individual functions. timeit is for comparing specific alternatives (regex vs string split, dict vs defaultdict) once you've identified the hot path. Optimize where the data tells you to, not where intuition suggests.