Day 18 introduced result.usage().total_tokens — the per-call token count. In a batch, you sum across calls to know the total cost.
The pattern: track running total; print or log per-call; optionally stop early if you hit a budget:
from pydantic_ai import Agent
MAX_TOKENS = 10000
items = ["hello", "this is a longer item to classify", "short"]
results = []
total_tokens = 0
for item in items:
if total_tokens > MAX_TOKENS:
print(f"hit budget at {total_tokens}; stopping early")
break
result = Agent(model).run_sync(f'Classify: "{item}". Reply: positive or negative.')
cost = result.usage().total_tokens
total_tokens += cost
results.append((item, result.output.strip(), cost))
print(f" cost={cost} total={total_tokens}")
print(f"\nfinal total: {total_tokens} tokens across {len(results)} items")What if I want a hard ceiling — never exceed N tokens?
Check before the call. Once a call has been made, the tokens are spent. The pre-check above (if total_tokens > MAX_TOKENS: break) stops before the next call, which is the right granularity.
Why not skip the check and just run them all?
For tiny batches (5-10 items), no need. For large batches (100+), or scripts running on a schedule, a runaway prompt that's 10× longer than expected can blow your monthly budget in one run. The cost cap is cheap insurance.
Two questions:
total_tokens = 0
for item in items:
result = ask_call(item)
total_tokens += result.usage().total_tokens
print(f"total: {total_tokens}")The usage() object has:
.input_tokens — what you sent (prompt + system + history).output_tokens — what came back (the response).total_tokens — sum of input + outputMAX_TOKENS = 10000
for item in items:
if total_tokens > MAX_TOKENS:
break
...The check is before the next call. Once a call has been made, the tokens are spent — no rolling back.
A softer version: warn at 80%, stop at 100%:
if total_tokens > MAX_TOKENS * 0.8 and not warned:
print(f"warning: 80% of budget consumed")
warned = True
if total_tokens > MAX_TOKENS:
breakLog the cost of each call to find expensive outliers:
for item in items:
result = ask_call(item)
cost = result.usage().total_tokens
if cost > 500:
print(f"expensive call ({cost} tokens) on item: {item[:50]}")
total_tokens += costIf one item out of 100 takes 5x the average, your prompt is probably handling that case poorly. Either fix the prompt or skip those items.
Most LLM APIs charge differently for input vs output tokens (output usually 2-5× more expensive per token). For accurate cost forecasting:
def estimate_cost_usd(usage, input_per_million=0.5, output_per_million=1.5):
return (usage.input_tokens / 1e6) * input_per_million + (usage.output_tokens / 1e6) * output_per_millionNumbers above are illustrative. Check your provider's pricing for the model you're using.
Retry-on-bad-output (yesterday) costs N× per item. Combined with self-consistency (5× per item) you get up to 5N× per item — 15× isn't unusual for high-stakes tasks. The cost tracker helps you notice when this stacks up.
total_tokens = 0
for item in items:
label = classify(item, retry_attempts=3) # up to 3 calls
label_majority = classify_majority(item, n=5) # 5 calls
total_tokens += sum(...) # accumulateDay 18 introduced result.usage().total_tokens — the per-call token count. In a batch, you sum across calls to know the total cost.
The pattern: track running total; print or log per-call; optionally stop early if you hit a budget:
from pydantic_ai import Agent
MAX_TOKENS = 10000
items = ["hello", "this is a longer item to classify", "short"]
results = []
total_tokens = 0
for item in items:
if total_tokens > MAX_TOKENS:
print(f"hit budget at {total_tokens}; stopping early")
break
result = Agent(model).run_sync(f'Classify: "{item}". Reply: positive or negative.')
cost = result.usage().total_tokens
total_tokens += cost
results.append((item, result.output.strip(), cost))
print(f" cost={cost} total={total_tokens}")
print(f"\nfinal total: {total_tokens} tokens across {len(results)} items")What if I want a hard ceiling — never exceed N tokens?
Check before the call. Once a call has been made, the tokens are spent. The pre-check above (if total_tokens > MAX_TOKENS: break) stops before the next call, which is the right granularity.
Why not skip the check and just run them all?
For tiny batches (5-10 items), no need. For large batches (100+), or scripts running on a schedule, a runaway prompt that's 10× longer than expected can blow your monthly budget in one run. The cost cap is cheap insurance.
Two questions:
total_tokens = 0
for item in items:
result = ask_call(item)
total_tokens += result.usage().total_tokens
print(f"total: {total_tokens}")The usage() object has:
.input_tokens — what you sent (prompt + system + history).output_tokens — what came back (the response).total_tokens — sum of input + outputMAX_TOKENS = 10000
for item in items:
if total_tokens > MAX_TOKENS:
break
...The check is before the next call. Once a call has been made, the tokens are spent — no rolling back.
A softer version: warn at 80%, stop at 100%:
if total_tokens > MAX_TOKENS * 0.8 and not warned:
print(f"warning: 80% of budget consumed")
warned = True
if total_tokens > MAX_TOKENS:
breakLog the cost of each call to find expensive outliers:
for item in items:
result = ask_call(item)
cost = result.usage().total_tokens
if cost > 500:
print(f"expensive call ({cost} tokens) on item: {item[:50]}")
total_tokens += costIf one item out of 100 takes 5x the average, your prompt is probably handling that case poorly. Either fix the prompt or skip those items.
Most LLM APIs charge differently for input vs output tokens (output usually 2-5× more expensive per token). For accurate cost forecasting:
def estimate_cost_usd(usage, input_per_million=0.5, output_per_million=1.5):
return (usage.input_tokens / 1e6) * input_per_million + (usage.output_tokens / 1e6) * output_per_millionNumbers above are illustrative. Check your provider's pricing for the model you're using.
Retry-on-bad-output (yesterday) costs N× per item. Combined with self-consistency (5× per item) you get up to 5N× per item — 15× isn't unusual for high-stakes tasks. The cost tracker helps you notice when this stacks up.
total_tokens = 0
for item in items:
label = classify(item, retry_attempts=3) # up to 3 calls
label_majority = classify_majority(item, n=5) # 5 calls
total_tokens += sum(...) # accumulateCreate a free account to get started. Paid plans unlock all tracks.