Four weeks of functions. filter_eligible, safe_compute_outcome, group_by_treatment, treatment_summary, rank_groups_by_outcome. What does the capstone do that none of them do alone?
safe_compute_outcome from Day 27 handles missing outcomes. rank_groups_by_outcome from Day 26 ranks the groups. The capstone chains them all — one function that takes raw respondents and returns the full journal-ready summary. Including outlier group flagging.
Exactly. The pipeline: apply safe_compute_outcome to every respondent to repair missing fields, then filter_eligible to apply the pre-registered inclusion criterion, then treatment_summary for the stats, then rank_groups_by_outcome for the ranked table, then flag any group whose mean outcome exceeds 1.25× the overall mean as an outlier:
overall_mean = sum(r["outcome"] for r in eligible) / len(eligible)
outlier_groups = [g["group"] for g in ranked if g["mean_outcome"] > overall_mean * 1.25]Why 1.25×? Is that a standard threshold?
It's the threshold you pre-registered. In code it's just a constant — change it to 1.5 for a more liberal threshold or 1.1 for a stricter one. The point is that it's explicit and versioned in the script, not hidden in an SPSS dialog.
So the return value is {"ranked_groups": [...], "outlier_groups": [...], "overall_mean_outcome": float} — everything a reviewer needs to verify the analysis.
The entire methods section output, generated by one function call:
def analysis_pipeline(respondents: list) -> dict:
repaired = [{**r, "outcome": safe_compute_outcome(r)} for r in respondents]
eligible = filter_eligible(repaired, 18.0)
ranked = rank_groups_by_outcome(eligible)
overall_mean = round(sum(r["outcome"] for r in eligible) / len(eligible), 2) if eligible else 0.0
outlier_groups = [g["group"] for g in ranked if g["mean_outcome"] > overall_mean * 1.25]
result = {"ranked_groups": ranked, "outlier_groups": outlier_groups, "overall_mean_outcome": overall_mean}
print(f"Pipeline complete: {len(ranked)} groups, {len(outlier_groups)} outlier groups")
return resultI ran this on my head using wave-3 data. Three treatment groups, all stats computed, one outlier flagged. My co-author said she wants to rerun it herself — I just send her the script.
The script is the reproducibility. Not the screenshot, not the SPSS output file, not the email attachment with "final_v3_revised.xlsx" in the name. The Python function, with its inputs documented and its constants named, is the analysis you can cite.
The capstone assembles every function from the track into one composable pipeline:
raw respondents
→ safe_compute_outcome (repair missing outcomes)
→ filter_eligible (apply inclusion criterion)
→ treatment_summary (N, mean outcome, mean age per group)
→ rank_groups_by_outcome (sorted by mean outcome)
→ flag outliers (groups > 1.25× overall mean)
→ return {ranked_groups, outlier_groups, overall_mean_outcome}
A pipeline function is a specification — it has a name, a signature, and a documented output shape. Co-authors can rerun it on new waves. Reviewers can read it. You can version it in Git. That's reproducibility.
{**r, "outcome": value} pattern{**r, "key": new_value} creates a new dict copying all of r's fields and overwriting "key" with new_value. Non-mutating — r is unchanged.
Four weeks of functions. filter_eligible, safe_compute_outcome, group_by_treatment, treatment_summary, rank_groups_by_outcome. What does the capstone do that none of them do alone?
safe_compute_outcome from Day 27 handles missing outcomes. rank_groups_by_outcome from Day 26 ranks the groups. The capstone chains them all — one function that takes raw respondents and returns the full journal-ready summary. Including outlier group flagging.
Exactly. The pipeline: apply safe_compute_outcome to every respondent to repair missing fields, then filter_eligible to apply the pre-registered inclusion criterion, then treatment_summary for the stats, then rank_groups_by_outcome for the ranked table, then flag any group whose mean outcome exceeds 1.25× the overall mean as an outlier:
overall_mean = sum(r["outcome"] for r in eligible) / len(eligible)
outlier_groups = [g["group"] for g in ranked if g["mean_outcome"] > overall_mean * 1.25]Why 1.25×? Is that a standard threshold?
It's the threshold you pre-registered. In code it's just a constant — change it to 1.5 for a more liberal threshold or 1.1 for a stricter one. The point is that it's explicit and versioned in the script, not hidden in an SPSS dialog.
So the return value is {"ranked_groups": [...], "outlier_groups": [...], "overall_mean_outcome": float} — everything a reviewer needs to verify the analysis.
The entire methods section output, generated by one function call:
def analysis_pipeline(respondents: list) -> dict:
repaired = [{**r, "outcome": safe_compute_outcome(r)} for r in respondents]
eligible = filter_eligible(repaired, 18.0)
ranked = rank_groups_by_outcome(eligible)
overall_mean = round(sum(r["outcome"] for r in eligible) / len(eligible), 2) if eligible else 0.0
outlier_groups = [g["group"] for g in ranked if g["mean_outcome"] > overall_mean * 1.25]
result = {"ranked_groups": ranked, "outlier_groups": outlier_groups, "overall_mean_outcome": overall_mean}
print(f"Pipeline complete: {len(ranked)} groups, {len(outlier_groups)} outlier groups")
return resultI ran this on my head using wave-3 data. Three treatment groups, all stats computed, one outlier flagged. My co-author said she wants to rerun it herself — I just send her the script.
The script is the reproducibility. Not the screenshot, not the SPSS output file, not the email attachment with "final_v3_revised.xlsx" in the name. The Python function, with its inputs documented and its constants named, is the analysis you can cite.
The capstone assembles every function from the track into one composable pipeline:
raw respondents
→ safe_compute_outcome (repair missing outcomes)
→ filter_eligible (apply inclusion criterion)
→ treatment_summary (N, mean outcome, mean age per group)
→ rank_groups_by_outcome (sorted by mean outcome)
→ flag outliers (groups > 1.25× overall mean)
→ return {ranked_groups, outlier_groups, overall_mean_outcome}
A pipeline function is a specification — it has a name, a signature, and a documented output shape. Co-authors can rerun it on new waves. Reviewers can read it. You can version it in Git. That's reproducibility.
{**r, "outcome": value} pattern{**r, "key": new_value} creates a new dict copying all of r's fields and overwriting "key" with new_value. Non-mutating — r is unchanged.
Hassan needs the full reproducible analysis pipeline. Write `analysis_pipeline(respondents)` that: (1) applies `safe_compute_outcome` to each respondent, (2) filters by min_age 18.0, (3) computes `rank_groups_by_outcome`, (4) computes overall mean outcome, (5) flags outlier groups whose mean exceeds 1.25× the overall mean. Return `{"ranked_groups": [...], "outlier_groups": [...], "overall_mean_outcome": float}`.
Tap each step for scaffolded hints.
No blank-editor panic.