Five hundred errors in the last hour — but not all distinct. Some messages repeat dozens of times. Which ones are worth fixing first?
The ones that repeat the most. So I'd count each unique message, then sort by count descending, then take the top N. Classic frequency-rank-slice.
Exactly the shape. Build the count dict first, filtering only ERROR-level logs:
counts = {}
for log in logs:
if log.get("level") == "ERROR":
msg = log.get("message", "")
counts[msg] = counts.get(msg, 0) + 1dict.get(key, 0) + 1 is the canonical increment — safe when the key is missing.
And then sorted with a key function to sort by count?
Right. counts.items() gives (message, count) tuples. A lambda picks the second element as the sort key. reverse=True flips the order so largest comes first:
ranked = sorted(counts.items(), key=lambda x: x[1], reverse=True)
return [[msg, count] for msg, count in ranked[:n]]Why return a list of lists instead of the tuples directly? What breaks?
Nothing breaks — but JSON has no tuple type. If your function's output ever gets serialised, lists are the safer, explicit choice. The comprehension [[msg, count] for ...] makes the intent clear: pairs of [string, int].
So the recipe is: count, sort by value, slice. Works for top IPs, top user IDs, top error codes — anything with repeats.
Swap the filter, swap the key, and the recipe carries. Every data analysis tool you'll ever build has this exact loop in it somewhere.
TL;DR: count with a dict → sort .items() by value → slice top N.
counts.get(k, 0) + 1 — safe increment on missing keyskey=lambda x: x[1] — sort tuples by second elementreverse=True — descending order[:n] — take the first N after sorting| Shape | When to use |
|---|---|
[(msg, n), ...] | internal use |
[[msg, n], ...] | JSON-ready output |
{msg: n, ...} | loses ordering |
Write `top_error_messages(logs, n)` that counts unique `message` values among ERROR-level log dicts and returns the top `n` as a list of `[message, count]` pairs sorted by count descending. Empty list if no ERROR logs.
Tap each step for scaffolded hints.
No blank-editor panic.
Five hundred errors in the last hour — but not all distinct. Some messages repeat dozens of times. Which ones are worth fixing first?
The ones that repeat the most. So I'd count each unique message, then sort by count descending, then take the top N. Classic frequency-rank-slice.
Exactly the shape. Build the count dict first, filtering only ERROR-level logs:
counts = {}
for log in logs:
if log.get("level") == "ERROR":
msg = log.get("message", "")
counts[msg] = counts.get(msg, 0) + 1dict.get(key, 0) + 1 is the canonical increment — safe when the key is missing.
And then sorted with a key function to sort by count?
Right. counts.items() gives (message, count) tuples. A lambda picks the second element as the sort key. reverse=True flips the order so largest comes first:
ranked = sorted(counts.items(), key=lambda x: x[1], reverse=True)
return [[msg, count] for msg, count in ranked[:n]]Why return a list of lists instead of the tuples directly? What breaks?
Nothing breaks — but JSON has no tuple type. If your function's output ever gets serialised, lists are the safer, explicit choice. The comprehension [[msg, count] for ...] makes the intent clear: pairs of [string, int].
So the recipe is: count, sort by value, slice. Works for top IPs, top user IDs, top error codes — anything with repeats.
Swap the filter, swap the key, and the recipe carries. Every data analysis tool you'll ever build has this exact loop in it somewhere.
TL;DR: count with a dict → sort .items() by value → slice top N.
counts.get(k, 0) + 1 — safe increment on missing keyskey=lambda x: x[1] — sort tuples by second elementreverse=True — descending order[:n] — take the first N after sorting| Shape | When to use |
|---|---|
[(msg, n), ...] | internal use |
[[msg, n], ...] | JSON-ready output |
{msg: n, ...} | loses ordering |