Five lessons upgrading retrieval. Semantic scoring, depth ranking, query caching, domain dedup, and side-by-side comparison. Which one most changed how you think about "search"?
Probably the dedup. I never realized how easily five hits from the same domain drown out the rest — and how tiny the fix is. A set, a split on /, a loop. That's the whole pattern.
That shift — from "search gives me ten results" to "search gives me ten results I should filter" — is the Week 3 mindset. Let's confirm the corners are sharp before Week 4 chains these into full research agents.
One thing I want to double-check — when I sort by a Literal label, why does the alphabetical default fall apart?
"deep", "medium", "shallow" alphabetize as deep < medium < shallow — which means deep comes first in ascending order but shallow comes first in descending. Neither matches the real depth ordering. A label-to-int dict translates the categorical labels into numeric priority before sorted() reads them.
Five retrieval upgrades:
score_result_semantic_match — per-snippet Literal relevance labelrank_results_by_agent_depth — sort by agent depth via int-map dictcache_search_queries — in-function dict cache for dedupdeduplicate_results_by_domain — url.split("/")[2] + seen setcompare_two_queries — two retrievals feeding one agent synthesisKey patterns: agents can score, but sorting needs numbers — use a dict to translate. Sets for membership, lists for order. Caches are dicts with if key not in cache.
Create a free account to get started. Paid plans unlock all tracks.
Five lessons upgrading retrieval. Semantic scoring, depth ranking, query caching, domain dedup, and side-by-side comparison. Which one most changed how you think about "search"?
Probably the dedup. I never realized how easily five hits from the same domain drown out the rest — and how tiny the fix is. A set, a split on /, a loop. That's the whole pattern.
That shift — from "search gives me ten results" to "search gives me ten results I should filter" — is the Week 3 mindset. Let's confirm the corners are sharp before Week 4 chains these into full research agents.
One thing I want to double-check — when I sort by a Literal label, why does the alphabetical default fall apart?
"deep", "medium", "shallow" alphabetize as deep < medium < shallow — which means deep comes first in ascending order but shallow comes first in descending. Neither matches the real depth ordering. A label-to-int dict translates the categorical labels into numeric priority before sorted() reads them.
Five retrieval upgrades:
score_result_semantic_match — per-snippet Literal relevance labelrank_results_by_agent_depth — sort by agent depth via int-map dictcache_search_queries — in-function dict cache for dedupdeduplicate_results_by_domain — url.split("/")[2] + seen setcompare_two_queries — two retrievals feeding one agent synthesisKey patterns: agents can score, but sorting needs numbers — use a dict to translate. Sets for membership, lists for order. Caches are dicts with if key not in cache.