You brought the inventory script.
28 seconds to process the weekly export. I optimized the loops twice, added list comprehensions everywhere I could find them. Two seconds faster. I still have no idea where the other 26 went.
You diagnosed before you measured. That is the exact mistake we are fixing this week. When you decided to optimize the loops — what told you the loops were the problem?
Three nested loops in the main function. Long loops feel slow. I assumed that was where the time was going.
A doctor sees a patient with fatigue and a slow pulse. Before prescribing anything, they run tests. Blood panel. They do not start treatment based on a feeling about what is probably wrong. You felt the loops were slow. You treated the loops. The patient still runs for 28 seconds. Today you learn to take the temperature first. The tool is timeit — not the bottleneck finder, that is cProfile tomorrow. timeit is the instrument you use when you have two ways to do something and need to know which one is actually faster:
import timeit
data = list(range(10000))
time_a = timeit.timeit(lambda: set(item for item in data), number=1000)
time_b = timeit.timeit(lambda: set(data), number=1000)
print(f"Loop: {time_a:.4f}s")
print(f"set(): {time_b:.4f}s")
print(f"Speedup: {time_a / time_b:.1f}x")The number parameter runs the call that many times?
timeit.timeit runs your callable number times and returns the total elapsed time. A single call is too noisy — garbage collection, OS scheduling, cache state all add variance. Run it a thousand times and the noise averages out. The result: set(data) is roughly 4x faster than the loop version on 10,000 items. Because the loop calls .add ten thousand times — ten thousand Python method lookups, stack frames, attribute resolutions. set() hands the list to a C constructor that iterates it at native speed. You cannot see that from the syntax.
4x faster for something I would have called equivalent. I would have picked whichever version was more readable and assumed the performance was the same.
Most developers do. Intuition about Python performance is wrong more often than it is right. For more statistical confidence, use timeit.repeat:
import timeit, statistics
results = timeit.repeat(lambda: set(data), number=100, repeat=5)
print(f"Min: {min(results):.4f}s") # most meaningful
print(f"Median: {statistics.median(results):.4f}s")
print(f"Stdev: {statistics.stdev(results):.4f}s")Use the minimum. External interruptions only make a run slower, never faster. The fastest run is closest to the true cost of the code.
Does timeit disable garbage collection? I assumed the noise included GC pauses.
It does. timeit.timeit disables the GC during the run. That is one reason it is more reliable than wrapping code in time.time() calls — the approach you were using with your timing_utils.py module.
I have a file called timing_utils.py that wraps time.time() before and after a call. I thought it was professional.
It is professional in the same way a sundial is professional — works, on a clear day, if you are not in a hurry. Your wrapper includes GC pauses, function call overhead, and gives one noisy measurement. timeit excludes GC, runs in a tight loop, and averages hundreds of measurements. Now — the second use: calibration. You need a function to run in under 10ms for your API SLA. You write it, you timeit it, you discover it takes 40ms. You know you have a real problem before the function ships. Not because a customer complained — because you measured.
Two uses: head-to-head comparison between implementations to pick the faster one, and calibration to verify a function meets a performance requirement before it ships. timeit needs a hypothesis — I have to know which functions to test. It cannot tell me where the 26 seconds are going without a suspect to benchmark.
Exactly the limit. timeit is a scalpel. You use it when you know which operation is the patient. Tomorrow is cProfile — the full blood panel. Run it on your entire script. Get a table of every function called, how many times, and how much time each took. No hypothesis required. Bring the inventory script.
timeit Eliminates Measurement Noisetimeit is not just a convenience wrapper around time.time(). It makes several deliberate choices to produce measurements that are repeatable and comparable.
GC disabling. By default, timeit.timeit calls gc.disable() before the timed loop and restores the GC state afterward. This prevents garbage collection pauses from inflating measurements. The tradeoff: if your code allocates heavily and the GC would normally run during the loop, you are measuring a slightly unrealistic scenario. For most microbenchmarks, the GC-disabled case is what you want — it measures the code itself, not the interaction between the code and the collector.
Why the minimum is the right statistic. The timeit.repeat function returns a list of total times, one per trial. The CPython documentation explicitly recommends using min() rather than mean() or median(). The reasoning: external noise (OS scheduling, cache misses, other processes) only adds time to a run, never removes it. The fastest run is the one where Python had the cleanest access to CPU and cache. That is the ground truth cost of the code. The slower runs are the code plus noise. Using mean() includes the noise in your reported number, which makes comparisons between two approaches less reliable — the noisier measurement looks slower than it really is.
The setup parameter and closure capture. The string-based interface (stmt="...", setup="...") evaluates the setup string once and the statement string number times, both in a clean namespace. The lambda interface captures variables from the enclosing scope. Both approaches avoid counting setup time, but the string interface has a subtle advantage: it avoids the lambda call overhead on each iteration. For tight microbenchmarks where the operation itself takes microseconds, the lambda's __call__ overhead can be a measurable fraction of the total. For real-world function benchmarks, the lambda interface is cleaner and the overhead is negligible.
When timeit is the wrong tool. timeit measures one isolated operation in a synthetic loop. It does not measure the interaction between parts of a real program — memory pressure, cache effects from preceding operations, I/O interleaving. A function that takes 2μs in a timeit loop may take 20μs in a real program if the cache is cold from unrelated operations. timeit answers "how fast is this operation under ideal conditions." cProfile answers "how much time does this operation consume in the actual program." Use both: cProfile to find where time goes, timeit to verify that your optimization made it faster.
Sign up to write and run code in this lesson.
You brought the inventory script.
28 seconds to process the weekly export. I optimized the loops twice, added list comprehensions everywhere I could find them. Two seconds faster. I still have no idea where the other 26 went.
You diagnosed before you measured. That is the exact mistake we are fixing this week. When you decided to optimize the loops — what told you the loops were the problem?
Three nested loops in the main function. Long loops feel slow. I assumed that was where the time was going.
A doctor sees a patient with fatigue and a slow pulse. Before prescribing anything, they run tests. Blood panel. They do not start treatment based on a feeling about what is probably wrong. You felt the loops were slow. You treated the loops. The patient still runs for 28 seconds. Today you learn to take the temperature first. The tool is timeit — not the bottleneck finder, that is cProfile tomorrow. timeit is the instrument you use when you have two ways to do something and need to know which one is actually faster:
import timeit
data = list(range(10000))
time_a = timeit.timeit(lambda: set(item for item in data), number=1000)
time_b = timeit.timeit(lambda: set(data), number=1000)
print(f"Loop: {time_a:.4f}s")
print(f"set(): {time_b:.4f}s")
print(f"Speedup: {time_a / time_b:.1f}x")The number parameter runs the call that many times?
timeit.timeit runs your callable number times and returns the total elapsed time. A single call is too noisy — garbage collection, OS scheduling, cache state all add variance. Run it a thousand times and the noise averages out. The result: set(data) is roughly 4x faster than the loop version on 10,000 items. Because the loop calls .add ten thousand times — ten thousand Python method lookups, stack frames, attribute resolutions. set() hands the list to a C constructor that iterates it at native speed. You cannot see that from the syntax.
4x faster for something I would have called equivalent. I would have picked whichever version was more readable and assumed the performance was the same.
Most developers do. Intuition about Python performance is wrong more often than it is right. For more statistical confidence, use timeit.repeat:
import timeit, statistics
results = timeit.repeat(lambda: set(data), number=100, repeat=5)
print(f"Min: {min(results):.4f}s") # most meaningful
print(f"Median: {statistics.median(results):.4f}s")
print(f"Stdev: {statistics.stdev(results):.4f}s")Use the minimum. External interruptions only make a run slower, never faster. The fastest run is closest to the true cost of the code.
Does timeit disable garbage collection? I assumed the noise included GC pauses.
It does. timeit.timeit disables the GC during the run. That is one reason it is more reliable than wrapping code in time.time() calls — the approach you were using with your timing_utils.py module.
I have a file called timing_utils.py that wraps time.time() before and after a call. I thought it was professional.
It is professional in the same way a sundial is professional — works, on a clear day, if you are not in a hurry. Your wrapper includes GC pauses, function call overhead, and gives one noisy measurement. timeit excludes GC, runs in a tight loop, and averages hundreds of measurements. Now — the second use: calibration. You need a function to run in under 10ms for your API SLA. You write it, you timeit it, you discover it takes 40ms. You know you have a real problem before the function ships. Not because a customer complained — because you measured.
Two uses: head-to-head comparison between implementations to pick the faster one, and calibration to verify a function meets a performance requirement before it ships. timeit needs a hypothesis — I have to know which functions to test. It cannot tell me where the 26 seconds are going without a suspect to benchmark.
Exactly the limit. timeit is a scalpel. You use it when you know which operation is the patient. Tomorrow is cProfile — the full blood panel. Run it on your entire script. Get a table of every function called, how many times, and how much time each took. No hypothesis required. Bring the inventory script.
timeit Eliminates Measurement Noisetimeit is not just a convenience wrapper around time.time(). It makes several deliberate choices to produce measurements that are repeatable and comparable.
GC disabling. By default, timeit.timeit calls gc.disable() before the timed loop and restores the GC state afterward. This prevents garbage collection pauses from inflating measurements. The tradeoff: if your code allocates heavily and the GC would normally run during the loop, you are measuring a slightly unrealistic scenario. For most microbenchmarks, the GC-disabled case is what you want — it measures the code itself, not the interaction between the code and the collector.
Why the minimum is the right statistic. The timeit.repeat function returns a list of total times, one per trial. The CPython documentation explicitly recommends using min() rather than mean() or median(). The reasoning: external noise (OS scheduling, cache misses, other processes) only adds time to a run, never removes it. The fastest run is the one where Python had the cleanest access to CPU and cache. That is the ground truth cost of the code. The slower runs are the code plus noise. Using mean() includes the noise in your reported number, which makes comparisons between two approaches less reliable — the noisier measurement looks slower than it really is.
The setup parameter and closure capture. The string-based interface (stmt="...", setup="...") evaluates the setup string once and the statement string number times, both in a clean namespace. The lambda interface captures variables from the enclosing scope. Both approaches avoid counting setup time, but the string interface has a subtle advantage: it avoids the lambda call overhead on each iteration. For tight microbenchmarks where the operation itself takes microseconds, the lambda's __call__ overhead can be a measurable fraction of the total. For real-world function benchmarks, the lambda interface is cleaner and the overhead is negligible.
When timeit is the wrong tool. timeit measures one isolated operation in a synthetic loop. It does not measure the interaction between parts of a real program — memory pressure, cache effects from preceding operations, I/O interleaving. A function that takes 2μs in a timeit loop may take 20μs in a real program if the cache is cold from unrelated operations. timeit answers "how fast is this operation under ideal conditions." cProfile answers "how much time does this operation consume in the actual program." Use both: cProfile to find where time goes, timeit to verify that your optimization made it faster.