Day 28 · ~12m

Building a Data Pipeline: Read CSV → Clean → Process → Write Report

Assemble 5 days of work into one function: read CSV, parse, clean, group by region, write report. The payoff.

student (excited)

28 days. We have loops, dicts, file I/O, string methods, filtering. That's... a lot.

teacher (proud)

That's a data engineer. You have every piece. Today they snap together.

student (curious)

Snap into what?

teacher (focused)

Your boss lands a CSV on your desk Monday morning. 6 sales records. Needs a report by lunch: how much each region sold, who closed each deal, status of each one. You have two hours.

student (thinking)

...we wrote a CSV reader. Day 25. parse_csv().

teacher (encouraging)

Keep going.

student (focused)

Then we learned to clean messy data. Day 26—clean_record(). Titles names, floats amounts, uppercase regions.

teacher (neutral)

And?

student (thinking)

** We grouped records by a key back in... Day 21? That's this loop pattern where you build a dict and append to it.

teacher (proud)

You just laid out your entire pipeline. parse_csv, clean each record, group by region, write the report.

def run_pipeline(input_file, output_file):
    records = parse_csv(input_file)
    cleaned = [clean_record(r) for r in records if r.get('name')]
    
    # Group by region
    by_region = {}
    for record in cleaned:
        region = record['region']
        if region not in by_region:
            by_region[region] = []
        by_region[region].append(record)
    
    # Write report
    with open(output_file, 'w') as f:
        for region, sales in sorted(by_region.items()):
            total = sum(s['amount'] for s in sales)
            f.write(f"=== {region} === Total: ${total:,.2f}\n")
            for s in sales:
                f.write(f"  {s['name']}: ${s['amount']:,.2f} ({s['status']})\n")
student (excited)

So run_pipeline() is just... calling those in order?

teacher (amused)

Write it. See if Monday morning gets solved.

student (proud)

I can actually do this. This is the thing that felt impossible on Day 10.

teacher (encouraging)

You're not the same coder you were on Day 10. Tomorrow we test what stuck—and I think you know the answer.