You have two weeks of functions: format, clean, filter, loop, group. Every time a new wave arrives, you copy the script, change the filename, rerun. What happens when your co-author asks to replicate your analysis?
I send her the script. She opens it, tries to run it, and emails me asking what respondents_data refers to because I named it differently in the wave-3 version than the wave-2 version.
That's the reproducibility gap. A function with a name, parameters, a docstring, and a return value is a specification you can cite. This week you write those. You also read CSV and JSON — the formats your data actually arrives in — so the pipeline starts from real files, not typed-in test data.
Can I really pass a CSV file's content as a string argument and have Python parse it without actually opening a file?
The Pyodide environment has no pre-loaded files — so yes, you pass the CSV text as a string and parse it inside the function. That's actually more reproducible: the function is pure, testable, and works the same way on every wave you feed it. By Friday, the pipeline takes a CSV string in and returns a JSON-serialised summary table out.
make_table_row: a named, documented function wrapping the stats formatting from Week 1summarize_group: default args and tuple unpacking for flexible summariesparse_respondent_csv: read a CSV-formatted string with split() and strip()load_respondents_from_csv: upgrade to csv.DictReader for robust parsingrespondents_to_json: serialise the summary dict to a JSON string with json.dumpsGoal: by Friday the pipeline takes a CSV string in and returns a JSON summary out — one function call your co-author can rerun.
7 lessons this week
You have two weeks of functions: format, clean, filter, loop, group. Every time a new wave arrives, you copy the script, change the filename, rerun. What happens when your co-author asks to replicate your analysis?
I send her the script. She opens it, tries to run it, and emails me asking what respondents_data refers to because I named it differently in the wave-3 version than the wave-2 version.
That's the reproducibility gap. A function with a name, parameters, a docstring, and a return value is a specification you can cite. This week you write those. You also read CSV and JSON — the formats your data actually arrives in — so the pipeline starts from real files, not typed-in test data.
Can I really pass a CSV file's content as a string argument and have Python parse it without actually opening a file?
The Pyodide environment has no pre-loaded files — so yes, you pass the CSV text as a string and parse it inside the function. That's actually more reproducible: the function is pure, testable, and works the same way on every wave you feed it. By Friday, the pipeline takes a CSV string in and returns a JSON-serialised summary table out.
make_table_row: a named, documented function wrapping the stats formatting from Week 1summarize_group: default args and tuple unpacking for flexible summariesparse_respondent_csv: read a CSV-formatted string with split() and strip()load_respondents_from_csv: upgrade to csv.DictReader for robust parsingrespondents_to_json: serialise the summary dict to a JSON string with json.dumpsGoal: by Friday the pipeline takes a CSV string in and returns a JSON summary out — one function call your co-author can rerun.