The SurveyMonkey export arrived. The treatment group column has " Control ", "Treatment_A", and "TREATMENT_B" in different rows. What happens when you run your analysis on that?
format_respondent from Day 3 would print them as-is. And if I try to group by treatment, they'd all land in separate buckets because of the different casing and whitespace.
Exactly the problem. Three steps fix it: strip() removes leading and trailing whitespace, lower() makes everything lowercase, and replace(" ", "_") standardises separators. Chained left to right:
raw = " Control "
clean = raw.strip().lower().replace(" ", "_")
# clean = "control"Do all string methods return new strings? Or do they change the original?
All string methods return new strings — Python strings are immutable. raw.strip() never changes raw; it hands back a cleaned copy. That's why chaining works: each method takes the previous method's output as its input. The original raw is untouched.
So I can call clean_group_label inside format_respondent to get a normalised label before formatting?
That's exactly the Week 2 pipeline taking shape. Clean first, format second:
def clean_group_label(raw: str) -> str:
result = raw.strip().lower().replace(" ", "_")
print(f"Cleaned: {result}")
return resultThree methods and the entire codebook normalisation problem is solved. That would have taken me a macro in SPSS.
The silent trap: replace(" ", "_") only targets spaces, not tabs or non-breaking spaces. If the export uses unusual whitespace, strip() handles the edges but interior oddities survive. For production pipelines, add a re.sub pass — but strip().lower().replace() covers 99% of SurveyMonkey exports.
Three chained methods, one clean label.
| Method | What it does | Example |
|---|---|---|
.strip() | removes leading + trailing whitespace | " control " → "control" |
.lower() | lowercases all characters | "Control" → "control" |
.replace(a, b) | replaces every occurrence of a with b | "control group" → "control_group" |
lower() before replace() means you catch " Control " and " CONTROL " with the same replace call. Reverse the order and uppercase variants slip through.
Strings never change in place. raw.strip() returns a new string — raw is always the original.
The SurveyMonkey export arrived. The treatment group column has " Control ", "Treatment_A", and "TREATMENT_B" in different rows. What happens when you run your analysis on that?
format_respondent from Day 3 would print them as-is. And if I try to group by treatment, they'd all land in separate buckets because of the different casing and whitespace.
Exactly the problem. Three steps fix it: strip() removes leading and trailing whitespace, lower() makes everything lowercase, and replace(" ", "_") standardises separators. Chained left to right:
raw = " Control "
clean = raw.strip().lower().replace(" ", "_")
# clean = "control"Do all string methods return new strings? Or do they change the original?
All string methods return new strings — Python strings are immutable. raw.strip() never changes raw; it hands back a cleaned copy. That's why chaining works: each method takes the previous method's output as its input. The original raw is untouched.
So I can call clean_group_label inside format_respondent to get a normalised label before formatting?
That's exactly the Week 2 pipeline taking shape. Clean first, format second:
def clean_group_label(raw: str) -> str:
result = raw.strip().lower().replace(" ", "_")
print(f"Cleaned: {result}")
return resultThree methods and the entire codebook normalisation problem is solved. That would have taken me a macro in SPSS.
The silent trap: replace(" ", "_") only targets spaces, not tabs or non-breaking spaces. If the export uses unusual whitespace, strip() handles the edges but interior oddities survive. For production pipelines, add a re.sub pass — but strip().lower().replace() covers 99% of SurveyMonkey exports.
Three chained methods, one clean label.
| Method | What it does | Example |
|---|---|---|
.strip() | removes leading + trailing whitespace | " control " → "control" |
.lower() | lowercases all characters | "Control" → "control" |
.replace(a, b) | replaces every occurrence of a with b | "control group" → "control_group" |
lower() before replace() means you catch " Control " and " CONTROL " with the same replace call. Reverse the order and uppercase variants slip through.
Strings never change in place. raw.strip() returns a new string — raw is always the original.
Elias is ingesting a SurveyMonkey export where the treatment_group column has inconsistent casing and whitespace — `" Control "`, `"Treatment_A "`, `"TREATMENT B"`. Write `clean_group_label(raw)` that returns the normalised label: stripped, lowercased, and with spaces replaced by underscores.
Tap each step for scaffolded hints.
No blank-editor panic.