Foundations covered token-based pagination over 2 pages. At scale you might page through 50, 100, more. Two new concerns: a hard cap so a runaway loop doesn't drain quota, and early exit when you've found what you need.
MAX_PAGES = 20
MAX_ITEMS = 500
all_items = []
page_token = None
for _ in range(MAX_PAGES):
args = {"max_results": 25}
if page_token:
args["page_token"] = page_token
result = toolset.execute_action(Action.GMAIL_FETCH_EMAILS, args)
page = result.get("messages", [])
all_items.extend(page)
page_token = result.get("nextPageToken")
if not page_token:
break # no more pages
if len(all_items) >= MAX_ITEMS:
break # collected enough
else:
print(f"warning: hit MAX_PAGES={MAX_PAGES} cap — there may be more")
print(f"collected {len(all_items)} items across pages")Three exit conditions: no more pages (clean), enough items collected (early), max-page cap (warning).
Why both MAX_PAGES and MAX_ITEMS?
Different bugs they catch. MAX_PAGES protects against the API returning a nextPageToken that never empties (bug or pathological data). MAX_ITEMS is your intent — you only want 500. If page size is 25, you'd hit 500 at page 20 — both caps fire near the same place, by design.
And the for/else warning — that's the pattern from retry?
Same Python feature. The else runs only if no break fired. If all 20 pages were consumed without hitting a clean end, the warning prints. In production scripts that warning would alert (day 18); in lessons we just print it.
MAX_PAGES = 20
MAX_ITEMS = 500
all_items = []
page_token = None
for _ in range(MAX_PAGES):
args = {"max_results": PAGE_SIZE}
if page_token:
args["page_token"] = page_token
result = SOME_FETCH(args)
page = result.get("items", [])
all_items.extend(page)
page_token = result.get("nextPageToken")
if not page_token:
break
if len(all_items) >= MAX_ITEMS:
break
else:
log("warn", reason="hit max_pages", max_pages=MAX_PAGES)| Cap | Catches |
|---|---|
MAX_PAGES (20-100) | runaway pagination bug |
MAX_ITEMS (your intent) | over-collection beyond what you'll use |
nextPageToken empty | clean end of input |
All three exit the loop. The for/else fires only when MAX_PAGES is the reason — log it loudly.
PAGE_SIZE = 25 # most APIs allow up to ~100Larger page = fewer round trips, more memory per page, more bandwidth per call. Smaller page = more flexibility to early-exit, more round trips.
Default: pick the page size such that 1-2 pages covers a typical run. Optimize when you see actual numbers.
If the goal is to find an item, not collect all:
target_id = "abc-123"
found = None
for _ in range(MAX_PAGES):
...
for item in page:
if item["id"] == target_id:
found = item
break # exits inner loop
if found:
break # exits outer loop
if not page_token:
breakDouble-break to exit both loops. For deeper nesting, a def find_item(...): return ... makes the early exit cleaner.
Collecting all items into one list works for hundreds; struggles at thousands; fails at millions. The alternative — process each page as it arrives — is tomorrow's lesson.
Foundations covered token-based pagination over 2 pages. At scale you might page through 50, 100, more. Two new concerns: a hard cap so a runaway loop doesn't drain quota, and early exit when you've found what you need.
MAX_PAGES = 20
MAX_ITEMS = 500
all_items = []
page_token = None
for _ in range(MAX_PAGES):
args = {"max_results": 25}
if page_token:
args["page_token"] = page_token
result = toolset.execute_action(Action.GMAIL_FETCH_EMAILS, args)
page = result.get("messages", [])
all_items.extend(page)
page_token = result.get("nextPageToken")
if not page_token:
break # no more pages
if len(all_items) >= MAX_ITEMS:
break # collected enough
else:
print(f"warning: hit MAX_PAGES={MAX_PAGES} cap — there may be more")
print(f"collected {len(all_items)} items across pages")Three exit conditions: no more pages (clean), enough items collected (early), max-page cap (warning).
Why both MAX_PAGES and MAX_ITEMS?
Different bugs they catch. MAX_PAGES protects against the API returning a nextPageToken that never empties (bug or pathological data). MAX_ITEMS is your intent — you only want 500. If page size is 25, you'd hit 500 at page 20 — both caps fire near the same place, by design.
And the for/else warning — that's the pattern from retry?
Same Python feature. The else runs only if no break fired. If all 20 pages were consumed without hitting a clean end, the warning prints. In production scripts that warning would alert (day 18); in lessons we just print it.
MAX_PAGES = 20
MAX_ITEMS = 500
all_items = []
page_token = None
for _ in range(MAX_PAGES):
args = {"max_results": PAGE_SIZE}
if page_token:
args["page_token"] = page_token
result = SOME_FETCH(args)
page = result.get("items", [])
all_items.extend(page)
page_token = result.get("nextPageToken")
if not page_token:
break
if len(all_items) >= MAX_ITEMS:
break
else:
log("warn", reason="hit max_pages", max_pages=MAX_PAGES)| Cap | Catches |
|---|---|
MAX_PAGES (20-100) | runaway pagination bug |
MAX_ITEMS (your intent) | over-collection beyond what you'll use |
nextPageToken empty | clean end of input |
All three exit the loop. The for/else fires only when MAX_PAGES is the reason — log it loudly.
PAGE_SIZE = 25 # most APIs allow up to ~100Larger page = fewer round trips, more memory per page, more bandwidth per call. Smaller page = more flexibility to early-exit, more round trips.
Default: pick the page size such that 1-2 pages covers a typical run. Optimize when you see actual numbers.
If the goal is to find an item, not collect all:
target_id = "abc-123"
found = None
for _ in range(MAX_PAGES):
...
for item in page:
if item["id"] == target_id:
found = item
break # exits inner loop
if found:
break # exits outer loop
if not page_token:
breakDouble-break to exit both loops. For deeper nesting, a def find_item(...): return ... makes the early exit cleaner.
Collecting all items into one list works for hundreds; struggles at thousands; fails at millions. The alternative — process each page as it arrives — is tomorrow's lesson.
Create a free account to get started. Paid plans unlock all tracks.