You have 300 eligible respondents and need to count how many are in each treatment group. In SPSS, that's a Frequencies table or a pivot. What's the Python equivalent?
find_first_outlier from Day 12 shows me how to walk the list. For grouping I'd need... a dict? With group names as keys and lists of respondents as values?
Exactly — a dict of lists. Each treatment group name becomes a key; each value is the list of respondents in that group. The trick is .get(key, []) to initialise a new group on first encounter:
respondents = [{"id": "R_001", "treatment_group": "control", "outcome": 4.5}]
groups = {}
for r in respondents:
g = r["treatment_group"]
groups[g] = groups.get(g, []) + [r]Why groups.get(g, []) and not just groups[g]? I thought dict access used brackets.
Bracket access raises KeyError when the key doesn't exist — the first time a new group appears, it hasn't been added yet. .get(g, []) returns an empty list as the default, so adding [r] creates the new entry cleanly. After the loop, groups.keys() lists all groups, groups.values() lists all respondent arrays, groups.items() gives both together:
def group_by_treatment(respondents: list) -> dict:
groups = {}
for r in respondents:
g = r["treatment_group"]
groups[g] = groups.get(g, []) + [r]
print(f"Groups found: {list(groups.keys())}")
return groupsSo groups["control"] is the list of all control respondents, and len(groups["control"]) is my N for that group. That's the entire pivot table, in one function.
The entire pivot table, plus you can pass each group's list directly into a mean calculation. No screenshot required.
A dict is just a codebook that builds itself from the data. I didn't expect to like this as much as I do.
One gotcha: if the treatment_group field is inconsistently capitalised — "Control" in some rows and "control" in others — they land in separate buckets. Run clean_group_label from Day 4 on r["treatment_group"] before using it as the key.
A dict maps keys to values — like a codebook maps variable names to their definitions.
| Operation | Example | Result |
|---|---|---|
d[k] | groups["control"] | value or KeyError |
d.get(k, default) | groups.get("control", []) | value or [] |
d.keys() | groups.keys() | all group names |
d.values() | groups.values() | all respondent lists |
d.items() | groups.items() | (name, list) pairs |
groups = {}
for item in items:
key = item[field]
groups[key] = groups.get(key, []) + [item]This pattern appears in every grouping operation — by treatment group, by wave, by journal, by country.
You have 300 eligible respondents and need to count how many are in each treatment group. In SPSS, that's a Frequencies table or a pivot. What's the Python equivalent?
find_first_outlier from Day 12 shows me how to walk the list. For grouping I'd need... a dict? With group names as keys and lists of respondents as values?
Exactly — a dict of lists. Each treatment group name becomes a key; each value is the list of respondents in that group. The trick is .get(key, []) to initialise a new group on first encounter:
respondents = [{"id": "R_001", "treatment_group": "control", "outcome": 4.5}]
groups = {}
for r in respondents:
g = r["treatment_group"]
groups[g] = groups.get(g, []) + [r]Why groups.get(g, []) and not just groups[g]? I thought dict access used brackets.
Bracket access raises KeyError when the key doesn't exist — the first time a new group appears, it hasn't been added yet. .get(g, []) returns an empty list as the default, so adding [r] creates the new entry cleanly. After the loop, groups.keys() lists all groups, groups.values() lists all respondent arrays, groups.items() gives both together:
def group_by_treatment(respondents: list) -> dict:
groups = {}
for r in respondents:
g = r["treatment_group"]
groups[g] = groups.get(g, []) + [r]
print(f"Groups found: {list(groups.keys())}")
return groupsSo groups["control"] is the list of all control respondents, and len(groups["control"]) is my N for that group. That's the entire pivot table, in one function.
The entire pivot table, plus you can pass each group's list directly into a mean calculation. No screenshot required.
A dict is just a codebook that builds itself from the data. I didn't expect to like this as much as I do.
One gotcha: if the treatment_group field is inconsistently capitalised — "Control" in some rows and "control" in others — they land in separate buckets. Run clean_group_label from Day 4 on r["treatment_group"] before using it as the key.
A dict maps keys to values — like a codebook maps variable names to their definitions.
| Operation | Example | Result |
|---|---|---|
d[k] | groups["control"] | value or KeyError |
d.get(k, default) | groups.get("control", []) | value or [] |
d.keys() | groups.keys() | all group names |
d.values() | groups.values() | all respondent lists |
d.items() | groups.items() | (name, list) pairs |
groups = {}
for item in items:
key = item[field]
groups[key] = groups.get(key, []) + [item]This pattern appears in every grouping operation — by treatment group, by wave, by journal, by country.
Rosa has a list of filtered respondent dicts, each with a `"treatment_group"` field. Write `group_by_treatment(respondents)` that groups the respondents into a dict keyed by treatment group — `{"control": [...], "treatment_a": [...]}` — using `.get()` to handle new groups safely.
Tap each step for scaffolded hints.
No blank-editor panic.