A parser that crashes on the first malformed line is worse than one that keeps going and tells you what was bad. How do you validate a batch of log lines without raising?
Compile a regex for the expected shape once, then call .match() on every line. Separate the valid and invalid lines into two collections so the caller can both see the count and drill into what failed.
Exactly. re.compile gives you a reusable pattern object with a .match method. Compiling once outside the loop beats recompiling on every call:
import re
time_pattern = re.compile(r"^\d{2}:\d{2}:\d{2}$")The ^ and $ anchors force the whole string to match, not just contain a match.
And the validation rule — what qualifies as valid?
At least three space-separated parts and parts[1] matches the HH:MM:SS pattern. Two checks chained with and. Then bucket each line into valid-count or invalid-list:
valid = 0
invalid = []
for line in lines:
parts = line.split()
if len(parts) >= 3 and time_pattern.match(parts[1]):
valid += 1
else:
invalid.append(line)
return [valid, invalid]Why return a list [valid, invalid] and not a tuple or dict?
Lists serialise to JSON cleanly. A tuple serialises as a list anyway, but being explicit about the shape is good hygiene. Unpacking still works: valid, invalid = validate_log_format(lines) — Python destructures any sequence.
So the function returns both the metric and the evidence. The count answers "how clean is my input" and the list answers "which rows do I need to fix."
That's the shape of a useful validation function. A boolean just says yes or no; structured output gives the caller both a dial and a trace.
re.compile + .matchTL;DR: compile once outside loops, anchor with ^ and $, return structured results.
re.compile(pattern) — returns a reusable Pattern object.match(s) — checks from position 0; returns Match or None^...$ — anchors force full-string matchand — short-circuits when the left side fails| Return | Signal |
|---|---|
bool | yes/no only |
[count, invalids] | metric + evidence |
{"valid": n, "bad": [...]} | labelled fields |
Structured results let callers both measure and remediate in one pass.
A parser that crashes on the first malformed line is worse than one that keeps going and tells you what was bad. How do you validate a batch of log lines without raising?
Compile a regex for the expected shape once, then call .match() on every line. Separate the valid and invalid lines into two collections so the caller can both see the count and drill into what failed.
Exactly. re.compile gives you a reusable pattern object with a .match method. Compiling once outside the loop beats recompiling on every call:
import re
time_pattern = re.compile(r"^\d{2}:\d{2}:\d{2}$")The ^ and $ anchors force the whole string to match, not just contain a match.
And the validation rule — what qualifies as valid?
At least three space-separated parts and parts[1] matches the HH:MM:SS pattern. Two checks chained with and. Then bucket each line into valid-count or invalid-list:
valid = 0
invalid = []
for line in lines:
parts = line.split()
if len(parts) >= 3 and time_pattern.match(parts[1]):
valid += 1
else:
invalid.append(line)
return [valid, invalid]Why return a list [valid, invalid] and not a tuple or dict?
Lists serialise to JSON cleanly. A tuple serialises as a list anyway, but being explicit about the shape is good hygiene. Unpacking still works: valid, invalid = validate_log_format(lines) — Python destructures any sequence.
So the function returns both the metric and the evidence. The count answers "how clean is my input" and the list answers "which rows do I need to fix."
That's the shape of a useful validation function. A boolean just says yes or no; structured output gives the caller both a dial and a trace.
re.compile + .matchTL;DR: compile once outside loops, anchor with ^ and $, return structured results.
re.compile(pattern) — returns a reusable Pattern object.match(s) — checks from position 0; returns Match or None^...$ — anchors force full-string matchand — short-circuits when the left side fails| Return | Signal |
|---|---|
bool | yes/no only |
[count, invalids] | metric + evidence |
{"valid": n, "bad": [...]} | labelled fields |
Structured results let callers both measure and remediate in one pass.
Write `validate_log_format(lines)` that returns `[valid_count, invalid_lines]`. A valid line has at least 3 space-separated parts and `parts[1]` matches `HH:MM:SS`. Use `re.compile` once, then `.match` per line.
Tap each step for scaffolded hints.
No blank-editor panic.