Wave 4 just arrived. Rows 12, 47, and 193 have blank outcome fields — float("") raises ValueError. Row 88 has no outcome key at all — bracket access raises KeyError. What does your pipeline do?
rank_groups_by_outcome from Day 26 chains through treatment_summary which computes mean(outcomes). If even one row causes a crash, the entire analysis fails.
try/except catches the crash and returns a safe default instead. Two exception types: KeyError for missing dict keys, ValueError for bad type conversions. Catch both in one except clause:
def safe_compute_outcome(respondent: dict) -> float:
try:
return float(respondent["outcome"])
except (KeyError, ValueError):
return 0.0Does return 0.0 distort the mean? If a respondent's outcome is actually 0.0 and a respondent with a missing outcome also gets 0.0, the pipeline treats them the same.
Correct — and that's the methodological decision you document in the methods section. Common alternatives: return None and filter None values out before computing the mean, or use a sentinel like -1.0 that's outside the valid range. The choice depends on your pre-registration. The try/except structure handles it the same way regardless:
def safe_compute_outcome(respondent: dict) -> float:
try:
value = respondent["outcome"]
result = float(value)
print(f"Outcome for {respondent.get('id', 'unknown')}: {result}")
return result
except (KeyError, ValueError):
return 0.0So I replace the raw r["outcome"] calls with safe_compute_outcome(r) throughout the pipeline and it runs cleanly on any wave dataset, even messy ones.
The cleaning protocol from Day 4, but for numeric fields. String methods cleaned text; try/except cleans numeric extraction.
I didn't realise how much of data analysis is just defensive programming.
Most of it is. The analysis is ten lines. The defensive layer that ensures the analysis runs on real data is twenty. Document your data-quality decisions — what you flagged, what you imputed, what you dropped — as transparently as the analysis itself.
try:
result = risky_operation()
except (KeyError, ValueError):
result = safe_default| Exception | When it fires |
|---|---|
KeyError | dict access with missing key: d["missing_key"] |
ValueError | bad type conversion: float(""), int("abc") |
TypeError | wrong type: float(None) |
0.0: imputes zero — inflates N but may distort the meanNone + filter: excludes the respondent — reduces N honestlyDocument the choice in the methods section.
Wave 4 just arrived. Rows 12, 47, and 193 have blank outcome fields — float("") raises ValueError. Row 88 has no outcome key at all — bracket access raises KeyError. What does your pipeline do?
rank_groups_by_outcome from Day 26 chains through treatment_summary which computes mean(outcomes). If even one row causes a crash, the entire analysis fails.
try/except catches the crash and returns a safe default instead. Two exception types: KeyError for missing dict keys, ValueError for bad type conversions. Catch both in one except clause:
def safe_compute_outcome(respondent: dict) -> float:
try:
return float(respondent["outcome"])
except (KeyError, ValueError):
return 0.0Does return 0.0 distort the mean? If a respondent's outcome is actually 0.0 and a respondent with a missing outcome also gets 0.0, the pipeline treats them the same.
Correct — and that's the methodological decision you document in the methods section. Common alternatives: return None and filter None values out before computing the mean, or use a sentinel like -1.0 that's outside the valid range. The choice depends on your pre-registration. The try/except structure handles it the same way regardless:
def safe_compute_outcome(respondent: dict) -> float:
try:
value = respondent["outcome"]
result = float(value)
print(f"Outcome for {respondent.get('id', 'unknown')}: {result}")
return result
except (KeyError, ValueError):
return 0.0So I replace the raw r["outcome"] calls with safe_compute_outcome(r) throughout the pipeline and it runs cleanly on any wave dataset, even messy ones.
The cleaning protocol from Day 4, but for numeric fields. String methods cleaned text; try/except cleans numeric extraction.
I didn't realise how much of data analysis is just defensive programming.
Most of it is. The analysis is ten lines. The defensive layer that ensures the analysis runs on real data is twenty. Document your data-quality decisions — what you flagged, what you imputed, what you dropped — as transparently as the analysis itself.
try:
result = risky_operation()
except (KeyError, ValueError):
result = safe_default| Exception | When it fires |
|---|---|
KeyError | dict access with missing key: d["missing_key"] |
ValueError | bad type conversion: float(""), int("abc") |
TypeError | wrong type: float(None) |
0.0: imputes zero — inflates N but may distort the meanNone + filter: excludes the respondent — reduces N honestlyDocument the choice in the methods section.
Rosa's wave-4 dataset has rows with missing `outcome` keys and rows with blank outcome values. Write `safe_compute_outcome(respondent)` that wraps the `float(respondent['outcome'])` extraction in a `try/except (KeyError, ValueError)` block and returns `0.0` on any error.
Tap each step for scaffolded hints.
No blank-editor panic.