Your cross-tab needs responses grouped by year-in-school: all Juniors together, all Seniors together, all Sophomores together. That's a pivot by category. How do you do it in Excel?
Pivot table — drag year-in-school to rows, drag satisfaction to values, set aggregate to average. Works until the columns change.
find_first_low_score showed you how to loop with conditions. Now instead of stopping, you accumulate. A dict is like your course syllabus — {topic: week}. Here it's {"Junior": [response1, response2], "Senior": [response3]}. .get(key, []) looks up the group, defaulting to an empty list if the group is new:
responses = [{"year": "Junior", "satisfaction": "4"}, {"year": "Junior", "satisfaction": "3"}]
groups = {}
for r in responses:
key = r["year"]
groups[key] = groups.get(key, []) + [r]
print(groups) # {'Junior': [{...}, {...}]}groups.get(key, []) + [r] — so it gets the existing list (or empty list) and appends to it? That feels like it rebuilds the list every iteration.
Good instinct. append in-place is more efficient than concatenation. Use .get() to retrieve the existing list, then append(r) on that list:
def group_by_demographic(responses: list, field: str) -> dict:
groups = {}
for r in filter_complete_responses(responses, [field]):
key = r[field]
if key not in groups:
groups[key] = []
groups[key].append(r)
print(f"Grouped into {len(groups)} buckets by '{field}'")
return groupsI pass field="year_in_school" today and field="major" tomorrow — same function, different grouping column.
Your pivot table just became a one-line argument change.
And filter_complete_responses runs first so I never group on a missing field. The Week 1 and Week 2 functions are all chaining together.
.get(key, default) is the dict equivalent of IFERROR in Excel — it prevents a KeyError when the key doesn't exist yet. If you use groups[key] without checking, a new demographic value crashes the script on the first occurrence.
A dict maps keys to values. Use it as a grouping structure by accumulating items under a shared key:
groups = {}
for r in responses:
key = r["year"]
if key not in groups:
groups[key] = []
groups[key].append(r)| Operation | Effect |
|---|---|
d.get(k, default) | Return value or default (no KeyError) |
d.keys() | All keys |
d.values() | All values |
d.items() | All (key, value) pairs |
groups[key] = groups.get(key, []) + [r]List concatenation builds a new list every iteration — O(n). append() mutates in-place — O(1). Prefer append.
Your cross-tab needs responses grouped by year-in-school: all Juniors together, all Seniors together, all Sophomores together. That's a pivot by category. How do you do it in Excel?
Pivot table — drag year-in-school to rows, drag satisfaction to values, set aggregate to average. Works until the columns change.
find_first_low_score showed you how to loop with conditions. Now instead of stopping, you accumulate. A dict is like your course syllabus — {topic: week}. Here it's {"Junior": [response1, response2], "Senior": [response3]}. .get(key, []) looks up the group, defaulting to an empty list if the group is new:
responses = [{"year": "Junior", "satisfaction": "4"}, {"year": "Junior", "satisfaction": "3"}]
groups = {}
for r in responses:
key = r["year"]
groups[key] = groups.get(key, []) + [r]
print(groups) # {'Junior': [{...}, {...}]}groups.get(key, []) + [r] — so it gets the existing list (or empty list) and appends to it? That feels like it rebuilds the list every iteration.
Good instinct. append in-place is more efficient than concatenation. Use .get() to retrieve the existing list, then append(r) on that list:
def group_by_demographic(responses: list, field: str) -> dict:
groups = {}
for r in filter_complete_responses(responses, [field]):
key = r[field]
if key not in groups:
groups[key] = []
groups[key].append(r)
print(f"Grouped into {len(groups)} buckets by '{field}'")
return groupsI pass field="year_in_school" today and field="major" tomorrow — same function, different grouping column.
Your pivot table just became a one-line argument change.
And filter_complete_responses runs first so I never group on a missing field. The Week 1 and Week 2 functions are all chaining together.
.get(key, default) is the dict equivalent of IFERROR in Excel — it prevents a KeyError when the key doesn't exist yet. If you use groups[key] without checking, a new demographic value crashes the script on the first occurrence.
A dict maps keys to values. Use it as a grouping structure by accumulating items under a shared key:
groups = {}
for r in responses:
key = r["year"]
if key not in groups:
groups[key] = []
groups[key].append(r)| Operation | Effect |
|---|---|
d.get(k, default) | Return value or default (no KeyError) |
d.keys() | All keys |
d.values() | All values |
d.items() | All (key, value) pairs |
groups[key] = groups.get(key, []) + [r]List concatenation builds a new list every iteration — O(n). append() mutates in-place — O(1). Prefer append.
Noor's thesis data has 500 survey responses and she needs to group them by year-in-school to compute per-group averages. Write `group_by_demographic(responses, field)` that iterates the responses and accumulates them into a dict keyed by the value of `field`. For example, `group_by_demographic(responses, 'year_in_school')` should return `{'Junior': [...], 'Senior': [...], ...}`.
Tap each step for scaffolded hints.
No blank-editor panic.