top_campaigns_by_cpl handed you a clean list of dicts. But what happens when someone Slacks you the raw data instead — copied straight out of Excel, half-JSON, half-junk?
It happened this morning. Someone pasted something like "name": "Email Blast", "spend": 4200 three times in a row with no outer brackets. I copy-pasted it into Excel for twenty minutes before giving up.
That's exactly the case where re.findall earns its keep. The re module lets you describe a pattern instead of a fixed string. re.findall(pattern, text) scans the whole text and returns every substring that matches — as a list. Here's a minimal example:
import re
text = '"name": "Email Blast", "name": "Paid Search"'
names = re.findall(r'"name":\s*"([^"]+)"', text)
print(names) # ['Email Blast', 'Paid Search']I see the pattern but I have no idea how to read it. What does [^"]+ mean versus just .+?
Great catch — that's the key distinction. .+ is greedy: it matches as many characters as possible, including other " marks, so it can eat across two fields and return garbage. [^"]+ means "any character that is not a "", so it stops the moment it hits the closing quote. For strings delimited by ", [^"]+ is almost always the right choice.
So the parentheses around [^"]+ are what makes re.findall return just the name, not the whole "name": "Email Blast" match?
Exactly. Parentheses create a capturing group. When your pattern has one group, re.findall returns only the contents of that group — not the full match. No group means you get the full match; one group means you get the captured substring.
import re
def extract_campaign_names(raw_text: str) -> list:
names = re.findall(r'"name":\s*"([^"]+)"', raw_text)
result = [clean_campaign_name(n.strip()) for n in names]
print(f"Found {len(result)} campaigns: {result}")
return resultIt calls clean_campaign_name on every match — so the Slack paste goes straight into the same pipeline as a proper HubSpot export. One function, all sources.
That's reuse doing its job. The messy paste doesn't know it's being cleaned by a function you wrote three weeks ago. One last thing to file away: re.search returns the first match object or None — useful when you just need to check whether a pattern exists. re.findall returns all matches as a list — right for bulk extraction. They're not interchangeable.
re| Tool | Returns | When to use |
|---|---|---|
re.search(pattern, text) | First match object or None | Existence check or extracting one value |
re.findall(pattern, text) | List of all matches (strings) | Extracting every occurrence |
Wrap the part you want in (). With one group, re.findall returns the group's content, not the full match.
[^"]+ vs .+.+ — greedy; matches any character including ", can over-consume[^"]+ — matches any character except "; stops at the closing quote\s* between colon and quoteMatches zero or more whitespace characters — handles both "name":"X" and "name": "X".
top_campaigns_by_cpl handed you a clean list of dicts. But what happens when someone Slacks you the raw data instead — copied straight out of Excel, half-JSON, half-junk?
It happened this morning. Someone pasted something like "name": "Email Blast", "spend": 4200 three times in a row with no outer brackets. I copy-pasted it into Excel for twenty minutes before giving up.
That's exactly the case where re.findall earns its keep. The re module lets you describe a pattern instead of a fixed string. re.findall(pattern, text) scans the whole text and returns every substring that matches — as a list. Here's a minimal example:
import re
text = '"name": "Email Blast", "name": "Paid Search"'
names = re.findall(r'"name":\s*"([^"]+)"', text)
print(names) # ['Email Blast', 'Paid Search']I see the pattern but I have no idea how to read it. What does [^"]+ mean versus just .+?
Great catch — that's the key distinction. .+ is greedy: it matches as many characters as possible, including other " marks, so it can eat across two fields and return garbage. [^"]+ means "any character that is not a "", so it stops the moment it hits the closing quote. For strings delimited by ", [^"]+ is almost always the right choice.
So the parentheses around [^"]+ are what makes re.findall return just the name, not the whole "name": "Email Blast" match?
Exactly. Parentheses create a capturing group. When your pattern has one group, re.findall returns only the contents of that group — not the full match. No group means you get the full match; one group means you get the captured substring.
import re
def extract_campaign_names(raw_text: str) -> list:
names = re.findall(r'"name":\s*"([^"]+)"', raw_text)
result = [clean_campaign_name(n.strip()) for n in names]
print(f"Found {len(result)} campaigns: {result}")
return resultIt calls clean_campaign_name on every match — so the Slack paste goes straight into the same pipeline as a proper HubSpot export. One function, all sources.
That's reuse doing its job. The messy paste doesn't know it's being cleaned by a function you wrote three weeks ago. One last thing to file away: re.search returns the first match object or None — useful when you just need to check whether a pattern exists. re.findall returns all matches as a list — right for bulk extraction. They're not interchangeable.
re| Tool | Returns | When to use |
|---|---|---|
re.search(pattern, text) | First match object or None | Existence check or extracting one value |
re.findall(pattern, text) | List of all matches (strings) | Extracting every occurrence |
Wrap the part you want in (). With one group, re.findall returns the group's content, not the full match.
[^"]+ vs .+.+ — greedy; matches any character including ", can over-consume[^"]+ — matches any character except "; stops at the closing quote\s* between colon and quoteMatches zero or more whitespace characters — handles both "name":"X" and "name": "X".
Kenji receives a Slack message with campaign data pasted directly from Excel — not valid JSON, just repeated lines like `"name": "Email Blast", "spend": 4200`. Write `extract_campaign_names(raw_text)` that uses `re.findall(r'"name":\s*"([^"]+)"', raw_text)` to pull every campaign name from the text, then applies `clean_campaign_name` to each result and returns the cleaned list.
Tap each step for scaffolded hints.
No blank-editor panic.