Two servers, two log streams, one question: which IPs talked to both? How do you find the overlap?
Pull the IPs out of each stream as two collections, then find what's common. Like the intersection of two sets.
Exactly right. Python's set type has a .intersection method and an & operator — both return a new set of elements present in both:
ips_a = {log["ip"] for log in stream_a}
ips_b = {log["ip"] for log in stream_b}
shared = ips_a & ips_bTwo set comprehensions, one &. O(n) total.
Why sets and not a nested loop? Wouldn't that also work?
It would — and it would be O(n·m). For each IP in stream A you'd scan all of stream B. With 10,000 on each side that's 100 million comparisons. Set intersection is O(n+m) because sets use hash lookups. At scale, the difference is massive.
Does the order matter in ips_a & ips_b?
No — intersection is commutative. a & b equals b & a. But if the sets are very different sizes, Python iterates over the smaller one internally, so there's a tiny speedup from putting the smaller set on the left. For correctness, it doesn't matter:
return sorted(shared)sorted(set) returns a list — sets are unordered, so you convert to a sorted list whenever you need stable output.
So every cross-reference question reduces to: build two sets, intersect, sort. The same shape applies to shared users across days, shared error codes across services — anything.
Set algebra — intersection, union, difference — is the cleanest vocabulary for comparing collections. Once you see a problem through that lens, the code writes itself.
TL;DR: set_a & set_b returns elements present in both, in O(n+m).
& — intersection (both)| — union (either)- — difference (in A, not in B)^ — symmetric difference (in either, not both)| Question | Operator |
|---|---|
| shared | a & b |
| combined | a | b |
| only in A | a - b |
| exclusive | a ^ b |
Always wrap the result in sorted(...) for deterministic output.
Write `shared_ips(stream_a, stream_b)` that takes two lists of log dicts (each with an `ip` key) and returns a sorted list of IPs that appear in both streams. Use set comprehensions and the `&` operator.
Tap each step for scaffolded hints.
No blank-editor panic.
Two servers, two log streams, one question: which IPs talked to both? How do you find the overlap?
Pull the IPs out of each stream as two collections, then find what's common. Like the intersection of two sets.
Exactly right. Python's set type has a .intersection method and an & operator — both return a new set of elements present in both:
ips_a = {log["ip"] for log in stream_a}
ips_b = {log["ip"] for log in stream_b}
shared = ips_a & ips_bTwo set comprehensions, one &. O(n) total.
Why sets and not a nested loop? Wouldn't that also work?
It would — and it would be O(n·m). For each IP in stream A you'd scan all of stream B. With 10,000 on each side that's 100 million comparisons. Set intersection is O(n+m) because sets use hash lookups. At scale, the difference is massive.
Does the order matter in ips_a & ips_b?
No — intersection is commutative. a & b equals b & a. But if the sets are very different sizes, Python iterates over the smaller one internally, so there's a tiny speedup from putting the smaller set on the left. For correctness, it doesn't matter:
return sorted(shared)sorted(set) returns a list — sets are unordered, so you convert to a sorted list whenever you need stable output.
So every cross-reference question reduces to: build two sets, intersect, sort. The same shape applies to shared users across days, shared error codes across services — anything.
Set algebra — intersection, union, difference — is the cleanest vocabulary for comparing collections. Once you see a problem through that lens, the code writes itself.
TL;DR: set_a & set_b returns elements present in both, in O(n+m).
& — intersection (both)| — union (either)- — difference (in A, not in B)^ — symmetric difference (in either, not both)| Question | Operator |
|---|---|
| shared | a & b |
| combined | a | b |
| only in A | a - b |
| exclusive | a ^ b |
Always wrap the result in sorted(...) for deterministic output.