Sometimes client data arrives as raw text — a pasted Notion export, a Slack message, a raw JSON blob. Not clean CSV, not a dict. You need to pull the name values out with pattern matching.
re.findall from the skeleton? I have seen regex but never written it.
One pattern covers the "name": "ClientName" shape that appears in any JSON-like text: r'"name":\s*"([^"]+)"'. The parentheses capture the name value. re.findall returns a list of all matches:
import re
raw = '{"name": "Acme Corp", "rate": 120} {"name": "Brand_Agency", "rate": 80}'
names = re.findall(r'"name":\s*"([^"]+)"', raw)
print(names) # ['Acme Corp', 'Brand_Agency']What does [^"] mean? The bracket syntax looks like a list but it is inside a string.
In regex, [^"]+ means one or more characters that are NOT a double-quote. It matches everything inside the name string up to the closing quote. \s* matches zero or more whitespace characters between the colon and the opening quote. The outer parentheses create a capture group — findall returns what is inside them.
And I can pipe each extracted name through clean_client_name from Week 1 to normalise the casing and underscores. The full pipeline in two calls.
There is the Week 4 payoff — four weeks of functions all composing in one expression:
import re
def extract_client_names(raw_text: str) -> list:
names = re.findall(r'"name":\s*"([^"]+)"', raw_text)
cleaned = [n.strip() for n in names]
print(f"Extracted {len(cleaned)} names")
return cleanedMy copy-paste-from-Notion workflow just became a regex call.
One limitation: this pattern only matches the "name": key. If the export uses "client": or "company": instead, you need a different pattern — or a proper JSON parser. Regex is the right tool for unstructured text; json.loads is the right tool for valid JSON.
re.findall for Structured Extractionre.findall(pattern, string) returns a list of all non-overlapping matches:
import re
names = re.findall(r'"name":\s*"([^"]+)"', raw_text)
# ['Acme Corp', 'Bolt Media', 'Cura Health']Pattern breakdown: "name": matches the literal key; \s* allows optional whitespace after the colon; "([^"]+)" captures the quoted value.
Post-process each match: call clean_client_name(n) on each extracted name to normalise casing:
return [clean_client_name(n) for n in names]re.findall returns an empty list (not an error) when no matches are found.
Sometimes client data arrives as raw text — a pasted Notion export, a Slack message, a raw JSON blob. Not clean CSV, not a dict. You need to pull the name values out with pattern matching.
re.findall from the skeleton? I have seen regex but never written it.
One pattern covers the "name": "ClientName" shape that appears in any JSON-like text: r'"name":\s*"([^"]+)"'. The parentheses capture the name value. re.findall returns a list of all matches:
import re
raw = '{"name": "Acme Corp", "rate": 120} {"name": "Brand_Agency", "rate": 80}'
names = re.findall(r'"name":\s*"([^"]+)"', raw)
print(names) # ['Acme Corp', 'Brand_Agency']What does [^"] mean? The bracket syntax looks like a list but it is inside a string.
In regex, [^"]+ means one or more characters that are NOT a double-quote. It matches everything inside the name string up to the closing quote. \s* matches zero or more whitespace characters between the colon and the opening quote. The outer parentheses create a capture group — findall returns what is inside them.
And I can pipe each extracted name through clean_client_name from Week 1 to normalise the casing and underscores. The full pipeline in two calls.
There is the Week 4 payoff — four weeks of functions all composing in one expression:
import re
def extract_client_names(raw_text: str) -> list:
names = re.findall(r'"name":\s*"([^"]+)"', raw_text)
cleaned = [n.strip() for n in names]
print(f"Extracted {len(cleaned)} names")
return cleanedMy copy-paste-from-Notion workflow just became a regex call.
One limitation: this pattern only matches the "name": key. If the export uses "client": or "company": instead, you need a different pattern — or a proper JSON parser. Regex is the right tool for unstructured text; json.loads is the right tool for valid JSON.
re.findall for Structured Extractionre.findall(pattern, string) returns a list of all non-overlapping matches:
import re
names = re.findall(r'"name":\s*"([^"]+)"', raw_text)
# ['Acme Corp', 'Bolt Media', 'Cura Health']Pattern breakdown: "name": matches the literal key; \s* allows optional whitespace after the colon; "([^"]+)" captures the quoted value.
Post-process each match: call clean_client_name(n) on each extracted name to normalise casing:
return [clean_client_name(n) for n in names]re.findall returns an empty list (not an error) when no matches are found.
Sage pastes client data from Notion and Slack into a script as raw text. The data always has `"name": "ClientName"` fields but is not valid JSON. Write `extract_client_names(raw_text)` that uses `re.findall` to extract all name values and returns them as a stripped list of strings.
Tap each step for scaffolded hints.
No blank-editor panic.