Last code lesson of the track. Today's exercise composes five primitives from the last two weeks on a small generic input — no new concepts, just careful arrangement.
Given a text file words.txt whose content is "a b a c a b d c a" (a single line of space-separated words), produce a JSON file top.json containing the top 3 most common words as a list of [word, count] pairs, sorted by count descending.
Expected top.json:
[["a", 4], ["b", 2], ["c", 2]]Read the file with pathlib, split on whitespace, count with Counter, get top-3, write as JSON.
Six steps, six primitives:
| Step | Primitive | Lesson |
|---|---|---|
| 1. Write the fixture file | Path.write_text | L22 |
| 2. Read the file back | Path.read_text | L22 |
| 3. Tokenize on whitespace | str.split() | foundations |
| 4. Count occurrences | Counter | L26 |
| 5. Get top 3 | .most_common(3) | L26 |
| 6. Write as JSON | json.dump | L16 |
most_common(3) returns tuples — [('a', 4), ('b', 2), ('c', 2)]. JSON writes them as lists?
Right — tuples aren't a JSON type, so json.dump serializes them as arrays. When you read the file back, you get lists: [["a", 4], ["b", 2], ["c", 2]]. That's the expected shape.
And the order — does most_common always return ties in the same order?
Within a tie, Counter.most_common returns items in insertion order (since Python 3.7, dicts preserve insertion order, and Counter inherits this). For our input "a b a c a b d c a", the word "b" appears before "c" in insertion order, and both have count 2, so most_common(3) returns [('a', 4), ('b', 2), ('c', 2)]. Predictable.
| Primitive | From | Used for |
|---|---|---|
Path(...) and .write_text / .read_text | L22 | one-shot file I/O without with ceremony |
str.split() | foundations | break the string on whitespace |
Counter(iterable) | L26 | tally word counts |
.most_common(n) | L26 | top-N as [(word, count), ...] |
json.dump(...) | L16 | write the result as JSON |
import json
from collections import Counter
from pathlib import Path
# Step 1: fixture
p = Path("words.txt")
p.write_text("a b a c a b d c a")
# Step 2-5: read, count, top-3
text = p.read_text()
tokens = text.split() # ['a', 'b', 'a', ...]
counts = Counter(tokens) # Counter({'a': 4, 'b': 2, ...})
top = counts.most_common(3) # [('a', 4), ('b', 2), ('c', 2)]
# Step 6: write as JSON
with open("top.json", "w") as f:
json.dump(top, f)JSON has no tuple type. json.dump writes any tuple as a JSON array — and json.load reads JSON arrays back as Python lists. So [('a', 4)] in memory becomes [["a", 4]] on disk and [['a', 4]] when read back. If you need an exact tuple-vs-list round-trip, JSON is the wrong format.
Counter.most_common already sorts by count descending and uses insertion order for ties.This four-step shape — read raw input, tokenize, aggregate, write structured output — is the spine of countless real scripts. Word-frequency counts, log analyzers, traffic summaries, error grouping. Today's lesson is one variant; you'll write the same shape with different tokenizers and aggregators a hundred times.
Last code lesson of the track. Today's exercise composes five primitives from the last two weeks on a small generic input — no new concepts, just careful arrangement.
Given a text file words.txt whose content is "a b a c a b d c a" (a single line of space-separated words), produce a JSON file top.json containing the top 3 most common words as a list of [word, count] pairs, sorted by count descending.
Expected top.json:
[["a", 4], ["b", 2], ["c", 2]]Read the file with pathlib, split on whitespace, count with Counter, get top-3, write as JSON.
Six steps, six primitives:
| Step | Primitive | Lesson |
|---|---|---|
| 1. Write the fixture file | Path.write_text | L22 |
| 2. Read the file back | Path.read_text | L22 |
| 3. Tokenize on whitespace | str.split() | foundations |
| 4. Count occurrences | Counter | L26 |
| 5. Get top 3 | .most_common(3) | L26 |
| 6. Write as JSON | json.dump | L16 |
most_common(3) returns tuples — [('a', 4), ('b', 2), ('c', 2)]. JSON writes them as lists?
Right — tuples aren't a JSON type, so json.dump serializes them as arrays. When you read the file back, you get lists: [["a", 4], ["b", 2], ["c", 2]]. That's the expected shape.
And the order — does most_common always return ties in the same order?
Within a tie, Counter.most_common returns items in insertion order (since Python 3.7, dicts preserve insertion order, and Counter inherits this). For our input "a b a c a b d c a", the word "b" appears before "c" in insertion order, and both have count 2, so most_common(3) returns [('a', 4), ('b', 2), ('c', 2)]. Predictable.
| Primitive | From | Used for |
|---|---|---|
Path(...) and .write_text / .read_text | L22 | one-shot file I/O without with ceremony |
str.split() | foundations | break the string on whitespace |
Counter(iterable) | L26 | tally word counts |
.most_common(n) | L26 | top-N as [(word, count), ...] |
json.dump(...) | L16 | write the result as JSON |
import json
from collections import Counter
from pathlib import Path
# Step 1: fixture
p = Path("words.txt")
p.write_text("a b a c a b d c a")
# Step 2-5: read, count, top-3
text = p.read_text()
tokens = text.split() # ['a', 'b', 'a', ...]
counts = Counter(tokens) # Counter({'a': 4, 'b': 2, ...})
top = counts.most_common(3) # [('a', 4), ('b', 2), ('c', 2)]
# Step 6: write as JSON
with open("top.json", "w") as f:
json.dump(top, f)JSON has no tuple type. json.dump writes any tuple as a JSON array — and json.load reads JSON arrays back as Python lists. So [('a', 4)] in memory becomes [["a", 4]] on disk and [['a', 4]] when read back. If you need an exact tuple-vs-list round-trip, JSON is the wrong format.
Counter.most_common already sorts by count descending and uses insertion order for ties.This four-step shape — read raw input, tokenize, aggregate, write structured output — is the spine of countless real scripts. Word-frequency counts, log analyzers, traffic summaries, error grouping. Today's lesson is one variant; you'll write the same shape with different tokenizers and aggregators a hundred times.
Create a free account to get started. Paid plans unlock all tracks.