Day 21 · ~14m

Nested Data: Lists of Dictionaries for Real Records

Combine lists and dicts to structure real analytics data. Learn to group records by region and build mini data pipelines.

student (thinking)

So we have lists. We have dictionaries. We know how to work with each one. But in real data work—like when I load a CSV file—won't I get a list of records? And each record is a dictionary?

teacher (encouraging)

Exactly. A list of dictionaries is the fundamental data structure in modern Python. Every API returns JSON that parses into a list of dicts. Every spreadsheet becomes a list of dicts. Every database query result is a list of dicts. You've been building up to this all week.

Let's say you have a list of sales records. Each record is a dictionary with name, amount, and region:

records = [
    {'name': 'Alice Chen', 'amount': 6800.00, 'region': 'West'},
    {'name': 'Bob Kumar', 'amount': 340.50, 'region': 'East'},
    {'name': 'Carol Santos', 'amount': 1250.00, 'region': 'West'},
    {'name': 'David Park', 'amount': 89.99, 'region': 'North'},
]

You can iterate through this list like any other, but now each item has named fields:

for record in records:
    print(f"{record['name']} from {record['region']}: ${record['amount']}")

# Alice Chen from West: $6800.0
# Bob Kumar from East: $340.5
# Carol Santos from West: $1250.0
# David Park from North: $89.99
student (curious)

So I can use list comprehensions with this too?

teacher (focused)

Yes. You remember list comprehensions from Day 20? They work perfectly here. Want to get all names? All amounts from the West region?

# List of all names
all_names = [record['name'] for record in records]
print(all_names)
# ['Alice Chen', 'Bob Kumar', 'Carol Santos', 'David Park']

# List of amounts only
amounts = [record['amount'] for record in records]

# Amounts from West region only
west_amounts = [record['amount'] for record in records if record['region'] == 'West']
print(west_amounts)
# [6800.0, 1250.0]

That third one—filtering by region—is the bridge to today's real problem. What if you want to group all the records by region? Not just filter West, but organize all regions into separate buckets?

student (focused)

That's when a dictionary of lists comes in, right? A dict where each key is a region and the value is a list of all records from that region?

teacher (proud)

Yes! Now you're thinking like a data engineer. Your function will take a list of sale records and return a dictionary where the keys are regions and the values are lists of records:

result = {
    'West': [record1, record3],   # 2 sales from West
    'East': [record2],             # 1 sale from East
    'North': [record4]             # 1 sale from North
}

Here's the pattern:

def group_by_region(records):
    groups = {}  # Empty dict to hold the groups
    for record in records:
        region = record['region']  # Get the region from this record
        
        # First time seeing this region? Create an empty list.
        if region not in groups:
            groups[region] = []
        
        # Add this record to the list for this region
        groups[region].append(record)
    
    return groups

Walk through it: Start with an empty dict. Loop through each record. If we haven't seen this region before, create a new list for it. Then append the record to that region's list. By the end, every region has its own list of records.

student (thinking)

But what if a record doesn't have a 'region' key? Yesterday you said I should use .get() to be safe.

teacher (focused)

Sharp catch. Yes, use .get() with a default:

region = record.get('region', 'Unknown')

Now if 'region' is missing, it defaults to 'Unknown' instead of crashing. Real data is messy. Building defensive code is part of the job.

def group_by_region(records):
    groups = {}
    for record in records:
        region = record.get('region', 'Unknown')  # Safe access
        
        if region not in groups:
            groups[region] = []
        
        groups[region].append(record)
    
    return groups
student (excited)

So now I can do things like count how many sales per region, or calculate the total amount per region?

teacher (encouraging)

Exactly. Once you have the groups, you can analyze them:

groups = group_by_region(records)

# How many sales per region?
for region, sales in groups.items():
    print(f"{region}: {len(sales)} sales")

# What's the total amount per region?
for region, sales in groups.items():
    total = sum(record['amount'] for record in sales)
    print(f"{region}: ${total:.2f}")

This is the foundational pattern of every analytics script. Organize data → analyze by group → report results. You've just built the "organize" step.

student (proud)

And next week we won't hardcode the records—they'll come from an actual CSV file, right?

teacher (amused)

Exactly. Today you're organizing records you wrote by hand. Next week, the records come from a file. But the grouping function stays the same. You're building the building blocks for a real data pipeline.