Not every query needs your most expensive model. Routing is the pattern: a cheap first call classifies difficulty, then dispatches to the right model.
def route(query):
label = classify_difficulty(query) # cheap call — single-word output
if label == "easy":
return "cheap-model"
return "strong-model"Two LLM calls instead of one — isn't that more expensive?
The classifier is tiny — one short prompt, single-word output, ~50 tokens total. The strong-model call you'd otherwise have made is 10× larger. Trading a small constant cost for skipping the big call on most queries is net cheaper.
And the routing logic — Python or LLM?
Both. The classifier is an LLM (it understands intent better than regex). The dispatch is Python (deterministic, fast, no quota). The split is the trick — let the LLM do what only an LLM can, then hand off to code.
ALLOWED = {"easy", "hard"}
def classify(query):
prompt = f'Classify difficulty as "easy" (factual recall) or "hard" (reasoning). Reply with one word.\n\nQuery: {query}'
out = Agent(model).run_sync(prompt).output.strip().strip('.').lower()
return out if out in ALLOWED else "hard" # default to strong model on parse fail
def answer(query):
label = classify(query)
if label == "easy":
# In production: switch to a cheaper model here.
# Lessons run a single platform model; we record the routing decision.
return label, Agent(model).run_sync(query).output
else:
return label, Agent(model).run_sync(query).outputOpenRouter / pydantic-ai accepts a model parameter — claude-haiku for easy, claude-sonnet-4 for hard, etc. In platform lessons the model is fixed per call (single quota slot per run_sync), so we log the routing decision rather than swap the actual model. The pedagogy — the gate — is identical.
"easy" or "hard", nothing elseProduction routers often have 3+ tiers — tiny / cheap / strong / huge. Same pattern, more buckets. The classifier returns one of the labels and a dispatch table picks the model.
Not every query needs your most expensive model. Routing is the pattern: a cheap first call classifies difficulty, then dispatches to the right model.
def route(query):
label = classify_difficulty(query) # cheap call — single-word output
if label == "easy":
return "cheap-model"
return "strong-model"Two LLM calls instead of one — isn't that more expensive?
The classifier is tiny — one short prompt, single-word output, ~50 tokens total. The strong-model call you'd otherwise have made is 10× larger. Trading a small constant cost for skipping the big call on most queries is net cheaper.
And the routing logic — Python or LLM?
Both. The classifier is an LLM (it understands intent better than regex). The dispatch is Python (deterministic, fast, no quota). The split is the trick — let the LLM do what only an LLM can, then hand off to code.
ALLOWED = {"easy", "hard"}
def classify(query):
prompt = f'Classify difficulty as "easy" (factual recall) or "hard" (reasoning). Reply with one word.\n\nQuery: {query}'
out = Agent(model).run_sync(prompt).output.strip().strip('.').lower()
return out if out in ALLOWED else "hard" # default to strong model on parse fail
def answer(query):
label = classify(query)
if label == "easy":
# In production: switch to a cheaper model here.
# Lessons run a single platform model; we record the routing decision.
return label, Agent(model).run_sync(query).output
else:
return label, Agent(model).run_sync(query).outputOpenRouter / pydantic-ai accepts a model parameter — claude-haiku for easy, claude-sonnet-4 for hard, etc. In platform lessons the model is fixed per call (single quota slot per run_sync), so we log the routing decision rather than swap the actual model. The pedagogy — the gate — is identical.
"easy" or "hard", nothing elseProduction routers often have 3+ tiers — tiny / cheap / strong / huge. Same pattern, more buckets. The classifier returns one of the labels and a dispatch table picks the model.
Create a free account to get started. Paid plans unlock all tracks.