You have a giant string of log text — maybe a hundred megabytes, maybe ten gigs piped from tail. Calling .splitlines() builds the entire list in memory. How do you walk it one line at a time instead?
A generator again. Yield each line as the caller consumes them. Skip the empty ones because blank lines crash the parser downstream.
Right. The body of the generator looks like a filter:
def stream_lines(text):
for line in text.splitlines():
stripped = line.strip()
if stripped:
yield stripped.splitlines() itself is lazy-ish — returns a list here, but the caller only processes one at a time. Empty lines and whitespace-only lines are skipped at the source.
Why yield stripped instead of yield line? Does it matter if there's trailing whitespace?
It matters downstream. A line with a trailing \r or space can poison a split or re.search. Stripping at the streaming boundary means every consumer gets clean input — one cleanup point, not N.
Where does list(...) come in? If I want to count the lines, do I drain the generator?
Only when you need the full collection — total = len(list(stream_lines(text))) materialises it. Most real code iterates directly:
for line in stream_lines(text):
process(line)One allocation per line. The memory profile stays flat regardless of input size.
So the streaming shape is the foundation for any real log tool — reading lazily, cleaning at the source, passing clean lines downstream.
Every serious data-processing CLI in Python is a chain of generators. Master this one pattern and you can build pipelines that don't care whether the input is 10 lines or 10 million.
TL;DR: yield lets the caller iterate lazily — memory stays flat regardless of input size.
text.splitlines() — splits on \n, \r\n, \rline.strip() — removes leading/trailing whitespaceif stripped: — empty string is falsy, filters blanks| Shape | Memory | When |
|---|---|---|
[l for l in ...] | O(n) | small input |
(l for l in ...) | O(1) | streaming |
generator def | O(1) | complex filters |
For testability, wrap a call in list(...) when the whole result is small.
You have a giant string of log text — maybe a hundred megabytes, maybe ten gigs piped from tail. Calling .splitlines() builds the entire list in memory. How do you walk it one line at a time instead?
A generator again. Yield each line as the caller consumes them. Skip the empty ones because blank lines crash the parser downstream.
Right. The body of the generator looks like a filter:
def stream_lines(text):
for line in text.splitlines():
stripped = line.strip()
if stripped:
yield stripped.splitlines() itself is lazy-ish — returns a list here, but the caller only processes one at a time. Empty lines and whitespace-only lines are skipped at the source.
Why yield stripped instead of yield line? Does it matter if there's trailing whitespace?
It matters downstream. A line with a trailing \r or space can poison a split or re.search. Stripping at the streaming boundary means every consumer gets clean input — one cleanup point, not N.
Where does list(...) come in? If I want to count the lines, do I drain the generator?
Only when you need the full collection — total = len(list(stream_lines(text))) materialises it. Most real code iterates directly:
for line in stream_lines(text):
process(line)One allocation per line. The memory profile stays flat regardless of input size.
So the streaming shape is the foundation for any real log tool — reading lazily, cleaning at the source, passing clean lines downstream.
Every serious data-processing CLI in Python is a chain of generators. Master this one pattern and you can build pipelines that don't care whether the input is 10 lines or 10 million.
TL;DR: yield lets the caller iterate lazily — memory stays flat regardless of input size.
text.splitlines() — splits on \n, \r\n, \rline.strip() — removes leading/trailing whitespaceif stripped: — empty string is falsy, filters blanks| Shape | Memory | When |
|---|---|---|
[l for l in ...] | O(n) | small input |
(l for l in ...) | O(1) | streaming |
generator def | O(1) | complex filters |
For testability, wrap a call in list(...) when the whole result is small.
Write `stream_lines(text)` that returns a list of non-empty, stripped lines from the input string. Use a generator with `yield` internally and drain it with `list(...)` before returning. Empty or whitespace-only lines must be skipped.
Tap each step for scaffolded hints.
No blank-editor panic.