Yesterday's setdefault pattern works, but it's a mouthful:
result = {}
for word in words:
result.setdefault(word, []).append(word)And every time I want to count something I write if key in d: d[key] += 1; else: d[key] = 1.
The collections module has two specialized dicts that handle exactly these patterns. defaultdict lets you specify a default factory — what to use when a missing key is first accessed:
from collections import defaultdict
result = defaultdict(list) # missing keys default to []
for word in words:
result[word].append(word)
# No setdefault, no `if key in result` checkThe first time you access result[word] for a new word, defaultdict calls list() to create an empty list — then your .append works on it. Subsequent accesses return the existing list.
And for counting?
defaultdict(int) works — int() returns 0:
counts = defaultdict(int)
for c in "banana":
counts[c] += 1
# defaultdict(<class 'int'>, {'b': 1, 'a': 3, 'n': 2})But for counting specifically, Counter is even better — same idea, plus useful methods:
from collections import Counter
counts = Counter("banana")
# Counter({'a': 3, 'n': 2, 'b': 1})
counts.most_common(2) # [('a', 3), ('n', 2)]Counter takes any iterable directly?
Yes. Strings, lists, generators — any iterable. It counts the items. most_common(n) gives you the top-N as a list of pairs. Both tools are stdlib — from collections import Counter, defaultdict. No third-party install.
defaultdict(factory) — auto-create on missing keyfrom collections import defaultdict
d = defaultdict(list) # factory: list ↔ []
d = defaultdict(int) # factory: int ↔ 0
d = defaultdict(set) # factory: set ↔ set()
d = defaultdict(lambda: "missing") # any callableThe factory is called with no arguments the first time a key is accessed via d[key]. The result is stored at d[key] and returned.
setdefault replacement)pairs = [("a", 1), ("b", 2), ("a", 3), ("c", 4), ("b", 5)]
grouped = defaultdict(list)
for key, value in pairs:
grouped[key].append(value)
# defaultdict(<class 'list'>, {'a': [1, 3], 'b': [2, 5], 'c': [4]})int)counts = defaultdict(int)
for word in ["a", "b", "a", "c", "a", "b"]:
counts[word] += 1
# defaultdict(<class 'int'>, {'a': 3, 'b': 2, 'c': 1})defaultdict vs regular dictA defaultdict is a dict — you can pass it to json.dump, iterate it, look up keys with .get(key, default). The only difference is that bracket access d[key] for a missing key creates the entry instead of raising KeyError.
d = defaultdict(list)
print(d["missing"]) # [] — and now d['missing'] EXISTS in the dict
print("missing" in d) # TrueBracket access has a side effect on defaultdict. Use .get() if you want a non-mutating lookup:
d.get("missing") # None — d unchangedCounter(iterable) — count itemsfrom collections import Counter
Counter("banana") # Counter({'a': 3, 'n': 2, 'b': 1})
Counter([1, 1, 2, 3, 3, 3]) # Counter({3: 3, 1: 2, 2: 1})
Counter(["a", "b", "a"]) # Counter({'a': 2, 'b': 1})most_common(n)c = Counter("the quick brown fox jumps over the lazy dog")
c.most_common(3)
# [(' ', 8), ('o', 4), ('e', 3)]Counter arithmeticCounters add and subtract:
a = Counter("abca") # {'a': 2, 'b': 1, 'c': 1}
b = Counter("abc") # {'a': 1, 'b': 1, 'c': 1}
a + b # Counter({'a': 3, 'b': 2, 'c': 2})
a - b # Counter({'a': 1}) — only positive results keptUseful for inventory and difference operations.
c = Counter()
for item in items:
c[item] += 1(Identical to defaultdict(int), but c.most_common(...) is the bonus.)
defaultdict vs Counter — which to use| Need | Use |
|---|---|
| Group items into per-key lists | defaultdict(list) |
| Group items into per-key sets | defaultdict(set) |
Count occurrences and call most_common | Counter |
| Just count, no top-N needed | either works |
Custom default value (e.g. 0.0, dict) | defaultdict(...) |
Reach for Counter whenever the goal is "how many of each". Reach for defaultdict when accumulating into something other than counts.
Yesterday's setdefault pattern works, but it's a mouthful:
result = {}
for word in words:
result.setdefault(word, []).append(word)And every time I want to count something I write if key in d: d[key] += 1; else: d[key] = 1.
The collections module has two specialized dicts that handle exactly these patterns. defaultdict lets you specify a default factory — what to use when a missing key is first accessed:
from collections import defaultdict
result = defaultdict(list) # missing keys default to []
for word in words:
result[word].append(word)
# No setdefault, no `if key in result` checkThe first time you access result[word] for a new word, defaultdict calls list() to create an empty list — then your .append works on it. Subsequent accesses return the existing list.
And for counting?
defaultdict(int) works — int() returns 0:
counts = defaultdict(int)
for c in "banana":
counts[c] += 1
# defaultdict(<class 'int'>, {'b': 1, 'a': 3, 'n': 2})But for counting specifically, Counter is even better — same idea, plus useful methods:
from collections import Counter
counts = Counter("banana")
# Counter({'a': 3, 'n': 2, 'b': 1})
counts.most_common(2) # [('a', 3), ('n', 2)]Counter takes any iterable directly?
Yes. Strings, lists, generators — any iterable. It counts the items. most_common(n) gives you the top-N as a list of pairs. Both tools are stdlib — from collections import Counter, defaultdict. No third-party install.
defaultdict(factory) — auto-create on missing keyfrom collections import defaultdict
d = defaultdict(list) # factory: list ↔ []
d = defaultdict(int) # factory: int ↔ 0
d = defaultdict(set) # factory: set ↔ set()
d = defaultdict(lambda: "missing") # any callableThe factory is called with no arguments the first time a key is accessed via d[key]. The result is stored at d[key] and returned.
setdefault replacement)pairs = [("a", 1), ("b", 2), ("a", 3), ("c", 4), ("b", 5)]
grouped = defaultdict(list)
for key, value in pairs:
grouped[key].append(value)
# defaultdict(<class 'list'>, {'a': [1, 3], 'b': [2, 5], 'c': [4]})int)counts = defaultdict(int)
for word in ["a", "b", "a", "c", "a", "b"]:
counts[word] += 1
# defaultdict(<class 'int'>, {'a': 3, 'b': 2, 'c': 1})defaultdict vs regular dictA defaultdict is a dict — you can pass it to json.dump, iterate it, look up keys with .get(key, default). The only difference is that bracket access d[key] for a missing key creates the entry instead of raising KeyError.
d = defaultdict(list)
print(d["missing"]) # [] — and now d['missing'] EXISTS in the dict
print("missing" in d) # TrueBracket access has a side effect on defaultdict. Use .get() if you want a non-mutating lookup:
d.get("missing") # None — d unchangedCounter(iterable) — count itemsfrom collections import Counter
Counter("banana") # Counter({'a': 3, 'n': 2, 'b': 1})
Counter([1, 1, 2, 3, 3, 3]) # Counter({3: 3, 1: 2, 2: 1})
Counter(["a", "b", "a"]) # Counter({'a': 2, 'b': 1})most_common(n)c = Counter("the quick brown fox jumps over the lazy dog")
c.most_common(3)
# [(' ', 8), ('o', 4), ('e', 3)]Counter arithmeticCounters add and subtract:
a = Counter("abca") # {'a': 2, 'b': 1, 'c': 1}
b = Counter("abc") # {'a': 1, 'b': 1, 'c': 1}
a + b # Counter({'a': 3, 'b': 2, 'c': 2})
a - b # Counter({'a': 1}) — only positive results keptUseful for inventory and difference operations.
c = Counter()
for item in items:
c[item] += 1(Identical to defaultdict(int), but c.most_common(...) is the bonus.)
defaultdict vs Counter — which to use| Need | Use |
|---|---|
| Group items into per-key lists | defaultdict(list) |
| Group items into per-key sets | defaultdict(set) |
Count occurrences and call most_common | Counter |
| Just count, no top-N needed | either works |
Custom default value (e.g. 0.0, dict) | defaultdict(...) |
Reach for Counter whenever the goal is "how many of each". Reach for defaultdict when accumulating into something other than counts.
Create a free account to get started. Paid plans unlock all tracks.