Before we dig in — show me how you've been handling JSON in your scripts. Not what you think you should say. What you actually do.
I do import json and then json.loads() on the string. Or json.load() on a file. Got those from Stack Overflow six months ago, never looked further.
Good — you know the two big ones. But the json module is a translator with two directions. json.loads() converts JSON text to a Python dict. json.dumps() goes the other way — dict back to JSON string. That second direction is where most developers are missing half the toolkit:
import json
log_entry = {
"timestamp": "2026-03-31T14:22:05Z",
"level": "ERROR",
"service": "auth",
"message": "Token expired for user maya.patel"
}
pretty = json.dumps(log_entry, indent=2, sort_keys=True)
print(pretty)
# {
# "level": "ERROR",
# "message": "Token expired for user maya.patel",
# "service": "auth",
# "timestamp": "2026-03-31T14:22:05Z"
# }sort_keys=True alphabetizes the keys? I've been scrolling through walls of JSON trying to find the "level" field — it's in a different position every time depending on which service wrote the log. Two keyword arguments and that problem is gone.
And json.loads() returns a regular Python dict — your Track 2 match/case, dict comprehensions, get() with defaults, all of it applies immediately:
import json
line = '{"timestamp": "2026-03-31T14:22:05Z", "level": "ERROR", "service": "auth"}'
entry = json.loads(line)
print(entry["level"]) # ERROR
print(entry.get("ip", "unknown")) # unknown — no KeyError if missingI've been treating import json like an incantation I copy without understanding. I didn't know it just handed me a regular dict. I thought there was something special about the returned object.
Regular dict. No magic. The incantation was correct — you just didn't know why. Now — what happens when a log line is malformed? The ops team's pipeline has corrupted entries. One bad line in 50,000 should not crash the whole run.
json.loads() would raise an exception on bad input. So I wrap it in try/except... and I should catch json.JSONDecodeError specifically, not Exception. Same surgical-catching principle from Track 2 — catch exactly what can fail here and nothing else.
Exactly right. Here's the pattern:
import json
def safe_parse(line: str) -> dict | None:
try:
return json.loads(line)
except json.JSONDecodeError:
return NoneReturn None on failure, skip it in the main loop with if entry is None: continue. You process 49,999 valid lines and note the one bad one. Catching json.JSONDecodeError specifically means your own bugs elsewhere won't hide behind the same handler.
That's exactly what I need. Parse with json.loads(), extract "level" with .get() defaulting to "UNKNOWN", pretty-print with json.dumps(indent=2, sort_keys=True) for anything human-readable. Three functions, one module. I've been using it wrong for six months.
You've been using half of it. Now you have the full translator. Parse the log entry string, extract the level, produce the pretty-printed version — all in one function. Today's problem puts it together.
One thing — the problem calls for a specific return shape. What if the JSON is valid but the "level" key doesn't exist? Some services don't include severity.
entry.get("level", "UNKNOWN") — same dict skill from the inventory system. If the key exists you get its value. If not, you get "UNKNOWN". The module handles the format boundary; your Python handles the missing-field logic.
I've been manually formatting JSON output with f-strings when json.dumps(indent=2) was sitting right here. Past-Maya, I am disappointed in you.
Tomorrow the ops team's monitoring dashboard exports its metrics as CSV. You're going to discover Python has a module for that too — and you're going to have exactly this reaction when you see what csv.DictReader does compared to line.split(",").
The json module is one of Python's most-used standard library modules, and its design reflects a deliberate philosophy: be the universal translator between Python's rich object model and the text format that every modern system understands.
json.loads() (load from string) and json.load() (load from file object) convert JSON text into Python objects. json.dumps() (dump to string) and json.dump() (dump to file object) go the other direction. The s suffix is the only distinguishing mark — easy to miss, consequential to confuse. In log processing where every line is a string read from a file, json.loads() is the workhorse. When writing a report file, json.dump() saves the file= handle juggling.
JSON and Python have overlapping but not identical type systems. JSON objects become Python dicts. JSON arrays become lists. JSON strings become strings. JSON numbers become int or float depending on whether they have a decimal point. true/false become True/False. null becomes None. One asymmetry matters: Python tuples serialize to JSON arrays, but deserializing that array gives back a list. Round-trips through JSON may silently change tuple to list.
The indent parameter is not just cosmetic. When log entries from different services arrive with keys in different orders, sort_keys=True produces consistent, diffable output. Config files and seed files generated with json.dumps(data, indent=2, sort_keys=True) can be version-controlled meaningfully — the canonical form is reproducible across any machine.
json.JSONDecodeError is a subclass of ValueError, which means legacy code catching ValueError will also catch JSON parse failures. In new code, always catch json.JSONDecodeError specifically. The exception carries .msg, .doc, and .pos attributes that pinpoint exactly where in the string the parse failed — invaluable when you have 50,000 log lines and one corrupted entry.
For high-volume log processing — millions of entries per run — consider orjson or ujson as drop-in replacements. The stdlib json module has a C accelerator in CPython but orjson is compiled Rust and can be 3-10x faster. The API is nearly identical. Start with stdlib; profile before switching. The distinction matters when you're parsing gigabytes, not thousands.
Many log pipelines use JSON Lines (.jsonl): one JSON object per line, no outer array. The stdlib handles this naturally — read each line, call json.loads(). No special parser needed. This is exactly why the per-line safe_parse() pattern is the canonical idiom: each line is independent, and one corrupt entry should never abort the batch.
Sign up to write and run code in this lesson.
Before we dig in — show me how you've been handling JSON in your scripts. Not what you think you should say. What you actually do.
I do import json and then json.loads() on the string. Or json.load() on a file. Got those from Stack Overflow six months ago, never looked further.
Good — you know the two big ones. But the json module is a translator with two directions. json.loads() converts JSON text to a Python dict. json.dumps() goes the other way — dict back to JSON string. That second direction is where most developers are missing half the toolkit:
import json
log_entry = {
"timestamp": "2026-03-31T14:22:05Z",
"level": "ERROR",
"service": "auth",
"message": "Token expired for user maya.patel"
}
pretty = json.dumps(log_entry, indent=2, sort_keys=True)
print(pretty)
# {
# "level": "ERROR",
# "message": "Token expired for user maya.patel",
# "service": "auth",
# "timestamp": "2026-03-31T14:22:05Z"
# }sort_keys=True alphabetizes the keys? I've been scrolling through walls of JSON trying to find the "level" field — it's in a different position every time depending on which service wrote the log. Two keyword arguments and that problem is gone.
And json.loads() returns a regular Python dict — your Track 2 match/case, dict comprehensions, get() with defaults, all of it applies immediately:
import json
line = '{"timestamp": "2026-03-31T14:22:05Z", "level": "ERROR", "service": "auth"}'
entry = json.loads(line)
print(entry["level"]) # ERROR
print(entry.get("ip", "unknown")) # unknown — no KeyError if missingI've been treating import json like an incantation I copy without understanding. I didn't know it just handed me a regular dict. I thought there was something special about the returned object.
Regular dict. No magic. The incantation was correct — you just didn't know why. Now — what happens when a log line is malformed? The ops team's pipeline has corrupted entries. One bad line in 50,000 should not crash the whole run.
json.loads() would raise an exception on bad input. So I wrap it in try/except... and I should catch json.JSONDecodeError specifically, not Exception. Same surgical-catching principle from Track 2 — catch exactly what can fail here and nothing else.
Exactly right. Here's the pattern:
import json
def safe_parse(line: str) -> dict | None:
try:
return json.loads(line)
except json.JSONDecodeError:
return NoneReturn None on failure, skip it in the main loop with if entry is None: continue. You process 49,999 valid lines and note the one bad one. Catching json.JSONDecodeError specifically means your own bugs elsewhere won't hide behind the same handler.
That's exactly what I need. Parse with json.loads(), extract "level" with .get() defaulting to "UNKNOWN", pretty-print with json.dumps(indent=2, sort_keys=True) for anything human-readable. Three functions, one module. I've been using it wrong for six months.
You've been using half of it. Now you have the full translator. Parse the log entry string, extract the level, produce the pretty-printed version — all in one function. Today's problem puts it together.
One thing — the problem calls for a specific return shape. What if the JSON is valid but the "level" key doesn't exist? Some services don't include severity.
entry.get("level", "UNKNOWN") — same dict skill from the inventory system. If the key exists you get its value. If not, you get "UNKNOWN". The module handles the format boundary; your Python handles the missing-field logic.
I've been manually formatting JSON output with f-strings when json.dumps(indent=2) was sitting right here. Past-Maya, I am disappointed in you.
Tomorrow the ops team's monitoring dashboard exports its metrics as CSV. You're going to discover Python has a module for that too — and you're going to have exactly this reaction when you see what csv.DictReader does compared to line.split(",").
The json module is one of Python's most-used standard library modules, and its design reflects a deliberate philosophy: be the universal translator between Python's rich object model and the text format that every modern system understands.
json.loads() (load from string) and json.load() (load from file object) convert JSON text into Python objects. json.dumps() (dump to string) and json.dump() (dump to file object) go the other direction. The s suffix is the only distinguishing mark — easy to miss, consequential to confuse. In log processing where every line is a string read from a file, json.loads() is the workhorse. When writing a report file, json.dump() saves the file= handle juggling.
JSON and Python have overlapping but not identical type systems. JSON objects become Python dicts. JSON arrays become lists. JSON strings become strings. JSON numbers become int or float depending on whether they have a decimal point. true/false become True/False. null becomes None. One asymmetry matters: Python tuples serialize to JSON arrays, but deserializing that array gives back a list. Round-trips through JSON may silently change tuple to list.
The indent parameter is not just cosmetic. When log entries from different services arrive with keys in different orders, sort_keys=True produces consistent, diffable output. Config files and seed files generated with json.dumps(data, indent=2, sort_keys=True) can be version-controlled meaningfully — the canonical form is reproducible across any machine.
json.JSONDecodeError is a subclass of ValueError, which means legacy code catching ValueError will also catch JSON parse failures. In new code, always catch json.JSONDecodeError specifically. The exception carries .msg, .doc, and .pos attributes that pinpoint exactly where in the string the parse failed — invaluable when you have 50,000 log lines and one corrupted entry.
For high-volume log processing — millions of entries per run — consider orjson or ujson as drop-in replacements. The stdlib json module has a C accelerator in CPython but orjson is compiled Rust and can be 3-10x faster. The API is nearly identical. Start with stdlib; profile before switching. The distinction matters when you're parsing gigabytes, not thousands.
Many log pipelines use JSON Lines (.jsonl): one JSON object per line, no outer array. The stdlib handles this naturally — read each line, call json.loads(). No special parser needed. This is exactly why the per-line safe_parse() pattern is the canonical idiom: each line is independent, and one corrupt entry should never abort the batch.