Tell me exactly what you did when the script got slow. Walk me through the steps.
I read the code. I found the part that felt most expensive — the loop that builds the summary dict — and I rewrote it. Tighter logic, fewer assignments. Still thirty seconds. Then I moved the database call outside the loop because I had seen that pattern fix things before. Still thirty seconds. Then I switched the deduplication from a list to a set because sets have O(1) lookup. Thirty seconds. Three days of changes and I am exactly where I started.
You optimized three things. You measured none of them. That is the fundamental problem — you were doing surgery on organs you had not examined. You rewrote the dict loop because it felt expensive. You moved the database call because you had seen that fix things before. You switched to a set because sets are faster. None of those decisions came from data. They came from intuition. And the bottleneck is almost never where intuition points.
But I know from experience that those are common performance problems. Database calls in loops are slow. List lookups are O(n). These are real issues.
They are real issues in code that has those issues. The question is whether your code has them. cProfile would have answered that in one command. Sort by cumulative time. Look at the top five functions. The function taking 80% of the runtime is your bottleneck. Everything else is noise. You spent three days optimizing noise.
What if cProfile had shown me that the database call was the bottleneck? Would I have found the same answer eventually?
Possibly. But consider what you actually did: you spent three days making changes that did not help, and you still do not know what is slow. With cProfile you would have had the answer in five minutes. Even if your intuition had been right — even if the database call had been the bottleneck — the profiler would have confirmed it, and the confirm matters. You need to stop doing optimization that you cannot measure, because optimization you cannot measure is not optimization. It is guessing with extra steps.
This week I learn timeit, cProfile, memory profiling, string performance, and caching. What I want by the end of it is to never spend three days guessing again. I want to be able to open a slow script and have a diagnosis in under ten minutes.
That is exactly the right goal. And it is achievable. By Day 22 you will have the full toolkit: timeit for micro-benchmarks, cProfile for application-level bottlenecks, generators and slots for memory, join patterns for string performance, and lru_cache for expensive repeated calls. The discipline is simpler than the tools: measure first, optimize second, measure again to confirm. Evidence, not intuition. Doctor, not fortune teller.
The most common performance mistake is not writing slow code. It is optimizing code without measuring it first. Developers who write fast code do not have better intuitions about what is slow — they have better tools for finding out, and they use those tools before writing a single optimization.
This distinction matters because human intuition about performance is systematically wrong. Studies of experienced developers asked to identify the bottleneck in slow programs show accuracy rates below 50%. The bottleneck is usually in a function the developer did not suspect, doing work the developer did not realize was being done. The string formatting function called inside a loop that is called inside a loop. The datetime parse happening on every row of a million-row file. The HTTP response deserialization happening synchronously in a hot path. These are not obvious from reading the code. They are obvious from reading a profiler output sorted by cumulative time.
timeit and cProfile are the two tools that change this picture. timeit is for micro-benchmarks: given two implementations of the same function, which is faster, and by how much? It handles the statistical noise problem by running the statement thousands of times and reporting averages, which eliminates OS scheduling jitter and CPU cache effects. Use it when you have two candidate implementations and need data to choose between them.
cProfile is for application-level profiling: given a slow program, where does the time go? It instruments every function call in the Python process, records call counts and cumulative time, and outputs a table sorted by any column you choose. Sorted by cumulative time, the top row is almost always your bottleneck. This is the tool Priya needed three days ago. Five seconds to run, five seconds to read, five seconds to know where to look.
Memory performance follows the same discipline but with different tools. Generator expressions versus list comprehensions is not a style preference — it is the difference between holding one element in memory at a time and holding all elements simultaneously. For a million-row file, this is the difference between constant memory and gigabytes. __slots__ on a class with millions of instances eliminates the per-instance dictionary overhead, saving hundreds of bytes per object. These are not micro-optimizations — they are structural decisions with measurable consequences, and tracemalloc can quantify them.
The week ahead is not about learning to write faster code. It is about learning to find out what is actually slow before you touch anything. Measure first. The bottleneck will surprise you.
Sign up to save your notes.
Tell me exactly what you did when the script got slow. Walk me through the steps.
I read the code. I found the part that felt most expensive — the loop that builds the summary dict — and I rewrote it. Tighter logic, fewer assignments. Still thirty seconds. Then I moved the database call outside the loop because I had seen that pattern fix things before. Still thirty seconds. Then I switched the deduplication from a list to a set because sets have O(1) lookup. Thirty seconds. Three days of changes and I am exactly where I started.
You optimized three things. You measured none of them. That is the fundamental problem — you were doing surgery on organs you had not examined. You rewrote the dict loop because it felt expensive. You moved the database call because you had seen that fix things before. You switched to a set because sets are faster. None of those decisions came from data. They came from intuition. And the bottleneck is almost never where intuition points.
But I know from experience that those are common performance problems. Database calls in loops are slow. List lookups are O(n). These are real issues.
They are real issues in code that has those issues. The question is whether your code has them. cProfile would have answered that in one command. Sort by cumulative time. Look at the top five functions. The function taking 80% of the runtime is your bottleneck. Everything else is noise. You spent three days optimizing noise.
What if cProfile had shown me that the database call was the bottleneck? Would I have found the same answer eventually?
Possibly. But consider what you actually did: you spent three days making changes that did not help, and you still do not know what is slow. With cProfile you would have had the answer in five minutes. Even if your intuition had been right — even if the database call had been the bottleneck — the profiler would have confirmed it, and the confirm matters. You need to stop doing optimization that you cannot measure, because optimization you cannot measure is not optimization. It is guessing with extra steps.
This week I learn timeit, cProfile, memory profiling, string performance, and caching. What I want by the end of it is to never spend three days guessing again. I want to be able to open a slow script and have a diagnosis in under ten minutes.
That is exactly the right goal. And it is achievable. By Day 22 you will have the full toolkit: timeit for micro-benchmarks, cProfile for application-level bottlenecks, generators and slots for memory, join patterns for string performance, and lru_cache for expensive repeated calls. The discipline is simpler than the tools: measure first, optimize second, measure again to confirm. Evidence, not intuition. Doctor, not fortune teller.
The most common performance mistake is not writing slow code. It is optimizing code without measuring it first. Developers who write fast code do not have better intuitions about what is slow — they have better tools for finding out, and they use those tools before writing a single optimization.
This distinction matters because human intuition about performance is systematically wrong. Studies of experienced developers asked to identify the bottleneck in slow programs show accuracy rates below 50%. The bottleneck is usually in a function the developer did not suspect, doing work the developer did not realize was being done. The string formatting function called inside a loop that is called inside a loop. The datetime parse happening on every row of a million-row file. The HTTP response deserialization happening synchronously in a hot path. These are not obvious from reading the code. They are obvious from reading a profiler output sorted by cumulative time.
timeit and cProfile are the two tools that change this picture. timeit is for micro-benchmarks: given two implementations of the same function, which is faster, and by how much? It handles the statistical noise problem by running the statement thousands of times and reporting averages, which eliminates OS scheduling jitter and CPU cache effects. Use it when you have two candidate implementations and need data to choose between them.
cProfile is for application-level profiling: given a slow program, where does the time go? It instruments every function call in the Python process, records call counts and cumulative time, and outputs a table sorted by any column you choose. Sorted by cumulative time, the top row is almost always your bottleneck. This is the tool Priya needed three days ago. Five seconds to run, five seconds to read, five seconds to know where to look.
Memory performance follows the same discipline but with different tools. Generator expressions versus list comprehensions is not a style preference — it is the difference between holding one element in memory at a time and holding all elements simultaneously. For a million-row file, this is the difference between constant memory and gigabytes. __slots__ on a class with millions of instances eliminates the per-instance dictionary overhead, saving hundreds of bytes per object. These are not micro-optimizations — they are structural decisions with measurable consequences, and tracemalloc can quantify them.
The week ahead is not about learning to write faster code. It is about learning to find out what is actually slow before you touch anything. Measure first. The bottleneck will surprise you.