The ops team dropped a new file this morning — a CSV export from the monitoring dashboard. How have you been parsing it?
I'm going to regret answering this. I've been doing line.split(",") and indexing the result. row[0] is the server, row[3] is the response time. It works. Past-Maya was proud of it.
Try this line: web-02,"/api/search?q=hello, world",200,189,"2026-03-31T14:23:11Z". Run your split and tell me what row[1] is.
It splits on every comma... so row[1] would be "/api/search?q=hello and row[2] would be world" instead of the status code. My whole index alignment breaks. That comma inside the quoted field poisons the entire row.
That is exactly why the csv module exists. split(",") fails silently — you don't get an error. You get world" where you expected 200, with no idea why your response times look wrong. Here is the fix:
import csv
import io
csv_text = "server,endpoint,status_code,response_ms\nweb-01,/api/users,200,342\nweb-02,\"/api/search?q=hello, world\",200,189"
reader = csv.DictReader(io.StringIO(csv_text))
for row in reader:
print(row["server"], row["endpoint"], row["response_ms"])
# web-01 /api/users 342
# web-02 /api/search?q=hello, world 189DictReader reads the first row as headers and makes every subsequent row a Python dict. Quoted fields with commas inside them are handled automatically.
It knows about the quotes. That's the entire problem I've been having. split(",") doesn't know CSV rules about quoted fields — and that failure looks like bad data from the ops team, not a bug in my parser.
io.StringIO turns your text into a file-like object — DictReader expects something it can call .readline() on. When you have a real file you'd use open("metrics.csv"). The pattern is the same either way. And because each row is a plain Python dict, all your Track 2 dict skills apply immediately.
Values come out as strings though, right? "342" not 342. If I want to do arithmetic on response times, I need to cast.
Right — CSV has no type information. You own the conversion:
import csv, io
reader = csv.DictReader(io.StringIO(csv_text))
rows = []
for row in reader:
row["response_ms_int"] = int(row["response_ms"])
row["is_slow"] = row["response_ms_int"] > 500
rows.append(dict(row))
slow = [r for r in rows if r["is_slow"]]
print(f"{len(slow)} slow requests out of {len(rows)}")The module owns the parsing. You own the enrichment. One rule: never split on commas for structured CSV data — not because it never works, but because it fails in ways that look like data problems instead of code problems. That is the worst kind of bug.
I sent Diane a report last month where three rows had impossible response times. I told her the data was fine. The data was fine. My parser had a comma in an endpoint path and was treating a URL fragment as a response time. I'm choosing to forgive myself now that I have csv.DictReader.
Past-Maya used the tools she had. Present-Maya has better ones. And notice: yesterday you used json.loads() and got a Python dict. Today you used csv.DictReader and got Python dicts. Different format, same data structure on the other side.
That's the pattern. Pick the right module for the format, get a Python object out, then work in pure Python. The module is just the door.
Exactly right. Tomorrow you'll discover pathlib — and you'll have the same reaction you just had about csv when you see what .stem and .parent replace in that os.path.splitext(os.path.basename(...)) chain you have memorized.
The csv module has been in Python since version 2.3, and yet developers rediscover the need for it every time a query string, address, or description field contains a comma. The reason split(",") works most of the time — and fails in ways that look like data problems — is what makes the csv module essential rather than optional.
CSV is deceptively complex. The format has no single authoritative standard (RFC 4180 is a proposal, not a requirement), but real-world CSV follows shared conventions: fields can be quoted with double-quotes, quoted fields can contain commas and embedded newlines, double-quotes inside quoted fields are escaped as two consecutive double-quotes. The csv module handles all of these. split(",") handles none of them.
csv.reader returns each row as a list. csv.DictReader reads the first row as a header and returns subsequent rows as dicts keyed by column name. For anything with a header row — virtually all monitoring exports — DictReader is the right choice. row["response_ms"] is readable and stable. row[3] breaks when columns are reordered.
io.StringIO wraps a string and gives it a file-like interface: any object with a .readline() method works as a DictReader source. This is the correct pattern when CSV arrives as a string from an API response, a database field, or a network socket rather than a file on disk. io.BytesIO does the same for bytes. Both are in the standard library.
CSV stores everything as text. The module never guesses types — "200" is a string, not an integer. This is intentional: the module's job is parsing, not type inference. Explicit conversion in a post-processing loop — int(row["status_code"]), float(row["response_ms"]) — is more readable and reliable than any automatic coercion system.
For tab-separated values, semicolon-separated European locale exports, or pipe-delimited data, csv.reader accepts a delimiter argument: csv.reader(f, delimiter="\t"). For full dialect control, register a csv.Dialect subclass. Most log processing never needs this — but it is reassuring to know the module handles real-world variation when the data surprises you.
Sign up to write and run code in this lesson.
The ops team dropped a new file this morning — a CSV export from the monitoring dashboard. How have you been parsing it?
I'm going to regret answering this. I've been doing line.split(",") and indexing the result. row[0] is the server, row[3] is the response time. It works. Past-Maya was proud of it.
Try this line: web-02,"/api/search?q=hello, world",200,189,"2026-03-31T14:23:11Z". Run your split and tell me what row[1] is.
It splits on every comma... so row[1] would be "/api/search?q=hello and row[2] would be world" instead of the status code. My whole index alignment breaks. That comma inside the quoted field poisons the entire row.
That is exactly why the csv module exists. split(",") fails silently — you don't get an error. You get world" where you expected 200, with no idea why your response times look wrong. Here is the fix:
import csv
import io
csv_text = "server,endpoint,status_code,response_ms\nweb-01,/api/users,200,342\nweb-02,\"/api/search?q=hello, world\",200,189"
reader = csv.DictReader(io.StringIO(csv_text))
for row in reader:
print(row["server"], row["endpoint"], row["response_ms"])
# web-01 /api/users 342
# web-02 /api/search?q=hello, world 189DictReader reads the first row as headers and makes every subsequent row a Python dict. Quoted fields with commas inside them are handled automatically.
It knows about the quotes. That's the entire problem I've been having. split(",") doesn't know CSV rules about quoted fields — and that failure looks like bad data from the ops team, not a bug in my parser.
io.StringIO turns your text into a file-like object — DictReader expects something it can call .readline() on. When you have a real file you'd use open("metrics.csv"). The pattern is the same either way. And because each row is a plain Python dict, all your Track 2 dict skills apply immediately.
Values come out as strings though, right? "342" not 342. If I want to do arithmetic on response times, I need to cast.
Right — CSV has no type information. You own the conversion:
import csv, io
reader = csv.DictReader(io.StringIO(csv_text))
rows = []
for row in reader:
row["response_ms_int"] = int(row["response_ms"])
row["is_slow"] = row["response_ms_int"] > 500
rows.append(dict(row))
slow = [r for r in rows if r["is_slow"]]
print(f"{len(slow)} slow requests out of {len(rows)}")The module owns the parsing. You own the enrichment. One rule: never split on commas for structured CSV data — not because it never works, but because it fails in ways that look like data problems instead of code problems. That is the worst kind of bug.
I sent Diane a report last month where three rows had impossible response times. I told her the data was fine. The data was fine. My parser had a comma in an endpoint path and was treating a URL fragment as a response time. I'm choosing to forgive myself now that I have csv.DictReader.
Past-Maya used the tools she had. Present-Maya has better ones. And notice: yesterday you used json.loads() and got a Python dict. Today you used csv.DictReader and got Python dicts. Different format, same data structure on the other side.
That's the pattern. Pick the right module for the format, get a Python object out, then work in pure Python. The module is just the door.
Exactly right. Tomorrow you'll discover pathlib — and you'll have the same reaction you just had about csv when you see what .stem and .parent replace in that os.path.splitext(os.path.basename(...)) chain you have memorized.
The csv module has been in Python since version 2.3, and yet developers rediscover the need for it every time a query string, address, or description field contains a comma. The reason split(",") works most of the time — and fails in ways that look like data problems — is what makes the csv module essential rather than optional.
CSV is deceptively complex. The format has no single authoritative standard (RFC 4180 is a proposal, not a requirement), but real-world CSV follows shared conventions: fields can be quoted with double-quotes, quoted fields can contain commas and embedded newlines, double-quotes inside quoted fields are escaped as two consecutive double-quotes. The csv module handles all of these. split(",") handles none of them.
csv.reader returns each row as a list. csv.DictReader reads the first row as a header and returns subsequent rows as dicts keyed by column name. For anything with a header row — virtually all monitoring exports — DictReader is the right choice. row["response_ms"] is readable and stable. row[3] breaks when columns are reordered.
io.StringIO wraps a string and gives it a file-like interface: any object with a .readline() method works as a DictReader source. This is the correct pattern when CSV arrives as a string from an API response, a database field, or a network socket rather than a file on disk. io.BytesIO does the same for bytes. Both are in the standard library.
CSV stores everything as text. The module never guesses types — "200" is a string, not an integer. This is intentional: the module's job is parsing, not type inference. Explicit conversion in a post-processing loop — int(row["status_code"]), float(row["response_ms"]) — is more readable and reliable than any automatic coercion system.
For tab-separated values, semicolon-separated European locale exports, or pipe-delimited data, csv.reader accepts a delimiter argument: csv.reader(f, delimiter="\t"). For full dialect control, register a csv.Dialect subclass. Most log processing never needs this — but it is reassuring to know the module handles real-world variation when the data surprises you.