Last week you turned text into structured data. That's table stakes. Real log tools answer questions across thousands of entries — "how many unique users errored?", "which hosts appear in both streams?", "was the last five minutes abnormal?"
Those questions sound like they all reduce to grouping and counting. Is aggregation just a variation on the frequency map?
The frequency map is one recipe. This week adds two more — sets for uniqueness and cross-reference, and sliding windows for rolling analytics. Python's set type gives you O(1) membership and free dedup; paired with a generator you can stream windowed analysis without loading the whole file.
Generators — that's the yield keyword? I've seen it but never used it.
A generator function produces values one at a time with yield, so the caller can pull them lazily. For log analytics that's perfect: millions of lines, one rolling window at a time, no giant lists in memory. You'll use it on Day 14 for the sliding window max.
setyieldGoal: by Friday you can aggregate, deduplicate, and window any log stream.
7 lessons this week
Last week you turned text into structured data. That's table stakes. Real log tools answer questions across thousands of entries — "how many unique users errored?", "which hosts appear in both streams?", "was the last five minutes abnormal?"
Those questions sound like they all reduce to grouping and counting. Is aggregation just a variation on the frequency map?
The frequency map is one recipe. This week adds two more — sets for uniqueness and cross-reference, and sliding windows for rolling analytics. Python's set type gives you O(1) membership and free dedup; paired with a generator you can stream windowed analysis without loading the whole file.
Generators — that's the yield keyword? I've seen it but never used it.
A generator function produces values one at a time with yield, so the caller can pull them lazily. For log analytics that's perfect: millions of lines, one rolling window at a time, no giant lists in memory. You'll use it on Day 14 for the sliding window max.
setyieldGoal: by Friday you can aggregate, deduplicate, and window any log stream.