How many distinct users hit your service in the last hour? You have a list of log dicts, each with a user_id field. What's the Python tool for "unique things"?
A set. Sets store only unique values. I could build one with a loop and .add(), or wrap the whole thing in set(...) around a comprehension.
Both work — and the set comprehension is the one-liner that shouts at you. Same braces as a dict comprehension, but no colon:
unique = {log["user_id"] for log in logs}
count = len(unique)Curly braces plus one value per element — that's a set comprehension.
Why a set and not a list? A list has len() too.
A list keeps duplicates. Ten logs from the same user would give you len == 10. A set collapses duplicates instantly — same input gives len == 1. For "how many distinct" questions, set is the right shape.
And what makes set fast? I've heard it's O(1) membership.
Sets are backed by hash tables. Adding an item or checking if one is present is roughly constant time, regardless of how many items are already in the set. The full pattern for cardinality is two lines:
unique = {log["user_id"] for log in logs}
return len(unique)For a million-log scan, that's the difference between seconds and minutes.
So a set comprehension plus len() is the one-two punch for cardinality questions — "how many distinct IPs, users, hosts" — whatever the key is.
Swap log["user_id"] for log["ip"] and you get unique IPs. Swap for log["host"] and you get unique hosts. The shape is reusable; only the key expression changes.
TL;DR: {expr for item in iterable} builds a set; len(set) gives cardinality.
set() — unordered, unique, hashable items only{x for ...} — set comprehension (curly braces, no key:value)| Need | Use |
|---|---|
| count distinct | len({x for x in ...}) |
| is x present | x in set_of_xs |
| preserve order | list(dict.fromkeys(xs)) |
Sets work with any hashable value — strings, ints, tuples. Dicts and lists are not hashable.
Write `count_unique_users(logs)` that takes a list of log dicts (each with a `user_id` key) and returns the number of distinct user IDs as an int. Use a set comprehension to dedupe in one line.
Tap each step for scaffolded hints.
No blank-editor panic.
How many distinct users hit your service in the last hour? You have a list of log dicts, each with a user_id field. What's the Python tool for "unique things"?
A set. Sets store only unique values. I could build one with a loop and .add(), or wrap the whole thing in set(...) around a comprehension.
Both work — and the set comprehension is the one-liner that shouts at you. Same braces as a dict comprehension, but no colon:
unique = {log["user_id"] for log in logs}
count = len(unique)Curly braces plus one value per element — that's a set comprehension.
Why a set and not a list? A list has len() too.
A list keeps duplicates. Ten logs from the same user would give you len == 10. A set collapses duplicates instantly — same input gives len == 1. For "how many distinct" questions, set is the right shape.
And what makes set fast? I've heard it's O(1) membership.
Sets are backed by hash tables. Adding an item or checking if one is present is roughly constant time, regardless of how many items are already in the set. The full pattern for cardinality is two lines:
unique = {log["user_id"] for log in logs}
return len(unique)For a million-log scan, that's the difference between seconds and minutes.
So a set comprehension plus len() is the one-two punch for cardinality questions — "how many distinct IPs, users, hosts" — whatever the key is.
Swap log["user_id"] for log["ip"] and you get unique IPs. Swap for log["host"] and you get unique hosts. The shape is reusable; only the key expression changes.
TL;DR: {expr for item in iterable} builds a set; len(set) gives cardinality.
set() — unordered, unique, hashable items only{x for ...} — set comprehension (curly braces, no key:value)| Need | Use |
|---|---|
| count distinct | len({x for x in ...}) |
| is x present | x in set_of_xs |
| preserve order | list(dict.fromkeys(xs)) |
Sets work with any hashable value — strings, ints, tuples. Dicts and lists are not hashable.