Parsing CSV Data: Split Lines into Fields
Split CSV lines into fields and parse them into dictionaries. Learn to strip whitespace and zip headers with values.
Yesterday you read the CSV file. You got lines. But look at one:
'Alice Chen,1250.50,West,confirmed' # readlines() includes the newline
You have 7 lines of text. But each line is still just one long string. 'Alice Chen,1250.50,West,confirmed' isn't four values — it's one. You need to split it.
The Problem: Strings vs. Data
Wait, I have the data. It's right there in the string.
You have text that looks like data. But in your code, it's just a string. You can't ask: "What's the amount?" You'd have to do messy string slicing. We need structure.
# This doesn't work:
line = 'Alice Chen,1250.50,West,confirmed'
amount = line[11:19] # ❌ fragile, error-prone
# This is what we want:
record = {'name': 'Alice Chen', 'amount': '1250.50', 'region': 'West', 'status': 'confirmed'}
amount = record['amount'] # ✓ clear, safe
Splitting with .split(',')
The solution is the .split() method. It breaks a string on a delimiter.
line = 'Alice Chen,1250.50,West,confirmed'
fields = line.split(',')
print(fields)
# ['Alice Chen', '1250.50', 'West', 'confirmed']
Now we have a list. But there's a problem: the header row.
Maya(thinking): The first line is different. It's the header. 'name,amount,region,status'. So I read that separately?
Exactly. The first line tells you what each field means. The rest are data.
Header-Driven Parsing
Here's the pattern:
- Read all lines from the file
- The first line becomes your headers (column names)
- Each remaining line becomes a record (a dict mapping header → value)
def parse_csv(filepath):
with open(filepath) as f:
lines = f.readlines()
if not lines:
return []
# First line is headers
header_line = lines[0].strip() # strip() removes the trailing newline
headers = header_line.split(',')
records = []
for line in lines[1:]: # Skip the header, process the rest
line = line.strip() # Clean up whitespace
if not line: # Skip empty lines
continue
values = line.split(',')
record = dict(zip(headers, values))
records.append(record)
return records
Wait, zip()? What's that doing?
zip() pairs up two lists element-by-element:
headers = ['name', 'amount', 'region', 'status']
values = ['Alice Chen', '1250.50', 'West', 'confirmed']
for h, v in zip(headers, values):
print(f"{h}: {v}")
# name: Alice Chen
# amount: 1250.50
# region: West
# status: confirmed
# zip() creates tuples, dict() converts them to key-value pairs
record = dict(zip(headers, values))
# {'name': 'Alice Chen', 'amount': '1250.50', 'region': 'West', 'status': 'confirmed'}
The Whitespace Problem
Real CSV files are messy. Look at the test data:
name,amount,region,status
Alice Chen,1250.50,West,confirmed
Bob Kumar,340.50,East,pending
...Eve Williams,520.00,North,confirmed
Eve's name has a leading space. If you don't strip it, you'll get ' Eve Williams' (with the space). That breaks lookups.
Maya(excited): So I strip each field?
Yes! After you split, strip each value:
values = [v.strip() for v in line.split(',')]
# ['Eve Williams'] ✓ space is gone
Now the dict has clean data.
Your Challenge
Write parse_csv(filepath) that:
- Reads the file (you can reuse
read_csv_lines()from yesterday, or inline it) - Uses the first line as headers
- For each remaining line:
- Strip the line
- Skip empty lines
- Split by comma
- Strip each field
- Zip with headers and create a dict
- Return a list of dicts
Test it with sales.csv. You should get 6 records (header + empty line skipped). Each record is a dict. You can access record['name'], record['amount'], etc.
Next time: String cleaning. Stripping spaces is just the start. Next we handle case normalization, quoted fields, and other CSV gotchas that make the real world messy.
Practice your skills
Sign up to write and run code in this lesson.
Parsing CSV Data: Split Lines into Fields
Split CSV lines into fields and parse them into dictionaries. Learn to strip whitespace and zip headers with values.
Yesterday you read the CSV file. You got lines. But look at one:
'Alice Chen,1250.50,West,confirmed' # readlines() includes the newline
You have 7 lines of text. But each line is still just one long string. 'Alice Chen,1250.50,West,confirmed' isn't four values — it's one. You need to split it.
The Problem: Strings vs. Data
Wait, I have the data. It's right there in the string.
You have text that looks like data. But in your code, it's just a string. You can't ask: "What's the amount?" You'd have to do messy string slicing. We need structure.
# This doesn't work:
line = 'Alice Chen,1250.50,West,confirmed'
amount = line[11:19] # ❌ fragile, error-prone
# This is what we want:
record = {'name': 'Alice Chen', 'amount': '1250.50', 'region': 'West', 'status': 'confirmed'}
amount = record['amount'] # ✓ clear, safe
Splitting with .split(',')
The solution is the .split() method. It breaks a string on a delimiter.
line = 'Alice Chen,1250.50,West,confirmed'
fields = line.split(',')
print(fields)
# ['Alice Chen', '1250.50', 'West', 'confirmed']
Now we have a list. But there's a problem: the header row.
Maya(thinking): The first line is different. It's the header. 'name,amount,region,status'. So I read that separately?
Exactly. The first line tells you what each field means. The rest are data.
Header-Driven Parsing
Here's the pattern:
- Read all lines from the file
- The first line becomes your headers (column names)
- Each remaining line becomes a record (a dict mapping header → value)
def parse_csv(filepath):
with open(filepath) as f:
lines = f.readlines()
if not lines:
return []
# First line is headers
header_line = lines[0].strip() # strip() removes the trailing newline
headers = header_line.split(',')
records = []
for line in lines[1:]: # Skip the header, process the rest
line = line.strip() # Clean up whitespace
if not line: # Skip empty lines
continue
values = line.split(',')
record = dict(zip(headers, values))
records.append(record)
return records
Wait, zip()? What's that doing?
zip() pairs up two lists element-by-element:
headers = ['name', 'amount', 'region', 'status']
values = ['Alice Chen', '1250.50', 'West', 'confirmed']
for h, v in zip(headers, values):
print(f"{h}: {v}")
# name: Alice Chen
# amount: 1250.50
# region: West
# status: confirmed
# zip() creates tuples, dict() converts them to key-value pairs
record = dict(zip(headers, values))
# {'name': 'Alice Chen', 'amount': '1250.50', 'region': 'West', 'status': 'confirmed'}
The Whitespace Problem
Real CSV files are messy. Look at the test data:
name,amount,region,status
Alice Chen,1250.50,West,confirmed
Bob Kumar,340.50,East,pending
...Eve Williams,520.00,North,confirmed
Eve's name has a leading space. If you don't strip it, you'll get ' Eve Williams' (with the space). That breaks lookups.
Maya(excited): So I strip each field?
Yes! After you split, strip each value:
values = [v.strip() for v in line.split(',')]
# ['Eve Williams'] ✓ space is gone
Now the dict has clean data.
Your Challenge
Write parse_csv(filepath) that:
- Reads the file (you can reuse
read_csv_lines()from yesterday, or inline it) - Uses the first line as headers
- For each remaining line:
- Strip the line
- Skip empty lines
- Split by comma
- Strip each field
- Zip with headers and create a dict
- Return a list of dicts
Test it with sales.csv. You should get 6 records (header + empty line skipped). Each record is a dict. You can access record['name'], record['amount'], etc.
Next time: String cleaning. Stripping spaces is just the start. Next we handle case normalization, quoted fields, and other CSV gotchas that make the real world messy.