Five weeks of functions. load_responses_from_csv, safe_compute_avg, rank_groups_by_satisfaction, demographic_summary, categorize_satisfaction. How do you want to connect them?
safe_compute_avg from yesterday makes the averaging crash-safe. I parse the CSV, compute the overall average safely, group by demographic, rank the groups, and flag any group with fewer than 10 responses. That's the whole methodology section in one chain.
That chain is the capstone. Every function you've built this track has a specific role. The pipeline's job is to call them in the right order and wrap the result in a structured output your advisor can read:
def thesis_pipeline(csv_text: str) -> dict:
responses = load_responses_from_csv(csv_text)
overall_avg = safe_compute_avg(responses)
ranked = rank_groups_by_satisfaction(responses, "year")
low_sample = [g for g in ranked if g["count"] < 10]
return {
"ranked_groups": ranked,
"low_sample_groups": [g["group"] for g in low_sample],
"overall_avg_satisfaction": overall_avg
}Should the pipeline take the grouping field as a parameter, or hardcode "year"?
Excellent instinct. Parameterise it — your advisor might want the cross-tab by major next week. But for the capstone the default is "year" because that's the primary analysis dimension. Default args let you do both:
def thesis_pipeline(csv_text: str, field: str = "year") -> dict:
"""Full thesis analysis pipeline: parse → clean → group → rank → flag low-sample."""
responses = load_responses_from_csv(csv_text)
overall_avg = safe_compute_avg(responses)
ranked = rank_groups_by_satisfaction(responses, field)
low_sample = [g["group"] for g in ranked if g["count"] < 10]
print(f"Pipeline: {len(responses)} responses, {len(ranked)} groups, overall avg {overall_avg:.2f}")
return {"ranked_groups": ranked, "low_sample_groups": low_sample, "overall_avg_satisfaction": overall_avg}I ran this on my actual Qualtrics export. Cross-tabs matched SPSS output. My advisor wants the script. That's a real thesis deliverable.
Your committee reviewer just saved a weekend.
Twenty-five functions, four weeks. The capstone calls them all. This is what a reproducible methodology section looks like.
The pipeline is as good as its weakest function. safe_compute_avg handles bad data. load_responses_from_csv handles quoted fields. is_valid_response guards against missing keys. Each function does one thing well — the pipeline just connects them. That's the architecture of reliable code.
A pipeline function chains specialised functions in sequence, each responsible for one transform:
csv_text → parse → safe_avg → group → rank → flag → output dict
field as an arg, not a string literalGroups with fewer than 10 responses are statistically unreliable. Flag them in the output so reviewers can note the limitation — don't silently exclude them.
Five weeks of functions. load_responses_from_csv, safe_compute_avg, rank_groups_by_satisfaction, demographic_summary, categorize_satisfaction. How do you want to connect them?
safe_compute_avg from yesterday makes the averaging crash-safe. I parse the CSV, compute the overall average safely, group by demographic, rank the groups, and flag any group with fewer than 10 responses. That's the whole methodology section in one chain.
That chain is the capstone. Every function you've built this track has a specific role. The pipeline's job is to call them in the right order and wrap the result in a structured output your advisor can read:
def thesis_pipeline(csv_text: str) -> dict:
responses = load_responses_from_csv(csv_text)
overall_avg = safe_compute_avg(responses)
ranked = rank_groups_by_satisfaction(responses, "year")
low_sample = [g for g in ranked if g["count"] < 10]
return {
"ranked_groups": ranked,
"low_sample_groups": [g["group"] for g in low_sample],
"overall_avg_satisfaction": overall_avg
}Should the pipeline take the grouping field as a parameter, or hardcode "year"?
Excellent instinct. Parameterise it — your advisor might want the cross-tab by major next week. But for the capstone the default is "year" because that's the primary analysis dimension. Default args let you do both:
def thesis_pipeline(csv_text: str, field: str = "year") -> dict:
"""Full thesis analysis pipeline: parse → clean → group → rank → flag low-sample."""
responses = load_responses_from_csv(csv_text)
overall_avg = safe_compute_avg(responses)
ranked = rank_groups_by_satisfaction(responses, field)
low_sample = [g["group"] for g in ranked if g["count"] < 10]
print(f"Pipeline: {len(responses)} responses, {len(ranked)} groups, overall avg {overall_avg:.2f}")
return {"ranked_groups": ranked, "low_sample_groups": low_sample, "overall_avg_satisfaction": overall_avg}I ran this on my actual Qualtrics export. Cross-tabs matched SPSS output. My advisor wants the script. That's a real thesis deliverable.
Your committee reviewer just saved a weekend.
Twenty-five functions, four weeks. The capstone calls them all. This is what a reproducible methodology section looks like.
The pipeline is as good as its weakest function. safe_compute_avg handles bad data. load_responses_from_csv handles quoted fields. is_valid_response guards against missing keys. Each function does one thing well — the pipeline just connects them. That's the architecture of reliable code.
A pipeline function chains specialised functions in sequence, each responsible for one transform:
csv_text → parse → safe_avg → group → rank → flag → output dict
field as an arg, not a string literalGroups with fewer than 10 responses are statistically unreliable. Flag them in the output so reviewers can note the limitation — don't silently exclude them.
Alex is submitting your thesis methodology section and needs a single function that takes the raw Qualtrics CSV export as a string and returns the complete analysis: ranked demographic groups, names of low-sample groups (fewer than 10 responses), and the overall average satisfaction. Write `thesis_pipeline(csv_text, field='year')` that chains the full pipeline.
Tap each step for scaffolded hints.
No blank-editor panic.