Search results come back in the engine's ranking. Sometimes that ranking is great; sometimes the top hit is technically on the right keywords but off-topic. How could you have an agent re-score each snippet for real semantic fit?
For each result, ask the agent how well the snippet matches the query, and have it return a rating like high, medium, low?
Exactly. A Literal classifier with three levels, one call per result, collect the labels in a list:
from typing import Literal
agent = Agent(model, result_type=Literal["high", "medium", "low"])
for r in results:
prompt = f"Query: {query} | Snippet: {r['snippet']} | Rate fit"
scores.append(agent.run_sync(prompt).output)
print(scores)You're layering an opinion on top of the engine's mechanical ranking — the agent sees both pieces and judges fit in context.
So the agent sees both the query and the snippet on each call. It is not just classifying sentiment — it is comparing the two?
Exactly the shift. Week 2's classifier looked at one piece of text; today's classifier looks at two and asks "does the second answer the first?" The Literal still constrains the label to three values. The full function:
def score_result_semantic_match(query: str, count: int) -> list:
results = search(query, count=count)
agent = Agent(model, result_type=Literal["high", "medium", "low"])
scores = []
for r in results:
prompt = f"Query: {query} | Snippet: {r['snippet']} | Rate fit"
scores.append(agent.run_sync(prompt).output)
return scoresReturning a list of labels is nice — but how do I use it downstream? Ideally I'd know which snippet got which score.
Align by index. Because the loop walks results in order and append preserves order, scores[i] is the label for results[i]. If you wanted them zipped explicitly, a comprehension like list(zip(results, scores)) gives you pairs — or pass the scores into a later function that filters or sorts.
So the list is the bridge from retrieval to ranking — I can sort, filter, or pick the highest-scoring snippet in the next step.
Exactly. Today's score list is tomorrow's ranking input. Three-value Literal scores are enough to separate signal from noise without overengineering the rubric.
TL;DR: loop results through a Literal["high","medium","low"] agent, keeping score and snippet aligned by index.
Literal three-level — high/medium/low is enough signalagent.run_sync(prompt) — prompt combines query + snippetscores[i] pairs with results[i]| Input | Role |
|---|---|
| query | the target |
| snippet | the candidate |
Every call compares the two and returns one of three labels — the downstream code picks what to do with them.
Search results come back in the engine's ranking. Sometimes that ranking is great; sometimes the top hit is technically on the right keywords but off-topic. How could you have an agent re-score each snippet for real semantic fit?
For each result, ask the agent how well the snippet matches the query, and have it return a rating like high, medium, low?
Exactly. A Literal classifier with three levels, one call per result, collect the labels in a list:
from typing import Literal
agent = Agent(model, result_type=Literal["high", "medium", "low"])
for r in results:
prompt = f"Query: {query} | Snippet: {r['snippet']} | Rate fit"
scores.append(agent.run_sync(prompt).output)
print(scores)You're layering an opinion on top of the engine's mechanical ranking — the agent sees both pieces and judges fit in context.
So the agent sees both the query and the snippet on each call. It is not just classifying sentiment — it is comparing the two?
Exactly the shift. Week 2's classifier looked at one piece of text; today's classifier looks at two and asks "does the second answer the first?" The Literal still constrains the label to three values. The full function:
def score_result_semantic_match(query: str, count: int) -> list:
results = search(query, count=count)
agent = Agent(model, result_type=Literal["high", "medium", "low"])
scores = []
for r in results:
prompt = f"Query: {query} | Snippet: {r['snippet']} | Rate fit"
scores.append(agent.run_sync(prompt).output)
return scoresReturning a list of labels is nice — but how do I use it downstream? Ideally I'd know which snippet got which score.
Align by index. Because the loop walks results in order and append preserves order, scores[i] is the label for results[i]. If you wanted them zipped explicitly, a comprehension like list(zip(results, scores)) gives you pairs — or pass the scores into a later function that filters or sorts.
So the list is the bridge from retrieval to ranking — I can sort, filter, or pick the highest-scoring snippet in the next step.
Exactly. Today's score list is tomorrow's ranking input. Three-value Literal scores are enough to separate signal from noise without overengineering the rubric.
TL;DR: loop results through a Literal["high","medium","low"] agent, keeping score and snippet aligned by index.
Literal three-level — high/medium/low is enough signalagent.run_sync(prompt) — prompt combines query + snippetscores[i] pairs with results[i]| Input | Role |
|---|---|
| query | the target |
| snippet | the candidate |
Every call compares the two and returns one of three labels — the downstream code picks what to do with them.
Create a free account to get started. Paid plans unlock all tracks.