Before you call one programmatically, the mental model. What does an LLM actually do, technically?
Predicts text? Reads a question and writes an answer?
Closer than most first-time guesses. Stripped to one line: an LLM predicts the next token. Given some text in front of it, it asks "what's the most likely word to come next?" — then once that word is chosen, it asks again, and again. The whole "answer" is just thousands of next-token guesses in sequence.
So it's not really thinking?
It's pattern-matching at enormous scale. The model trained on a huge corpus of text — books, code, conversations — and learned the statistical shape of language. When you ask "what's the capital of France?", it produces "Paris" not because it knows facts, but because in training data the pattern "capital of France" → "Paris" appeared often enough to dominate its prediction.
What does this imply for using one?
A few things:
Today's exercise demonstrates the core mechanism. You ask the model for ONE next word — the smallest possible task that exposes next-token prediction. Tomorrow we go deeper on the call shape itself.
A Large Language Model is a neural network trained on a huge text corpus to predict the next token. Given input, it produces tokens one at a time, conditioning each new token on everything that came before.
A token is a chunk of text — usually ~3-4 characters or about 0.75 words for English. "Hello, world!" might tokenize as ["Hello", ",", " world", "!"] — four tokens. Different tokenizers split differently. The cost of an API call is measured in tokens, not words or characters.
When an LLM produces plausible-sounding but factually wrong output, we call it a hallucination. It's not a bug — it's how the system works. The model produces the most likely sequence of tokens given training. If your question's answer wasn't strongly represented in training, the model fills in something that looks right. That something might be wrong.
The defense: never let an LLM be the source of truth on facts that matter. Use it for language tasks, validate its facts via deterministic checks (a calculation, a database lookup, a regex against a known shape).
Most APIs sample tokens probabilistically. The same prompt can yield slightly different outputs each call. Plan for it. Code that asserts the response equals an exact string will be flaky; code that checks the response shape ("contains a label from {positive, negative}") will work.
The practice below sends the model a prompt asking for exactly one word that should follow a given phrase. The whole point of asking for one word: it's the unit the model is producing internally. You're calling the model to do the smallest possible thing — pick the next likely token.
Tomorrow's lesson formalises the call shape (Agent, run_sync, .output, quota). Today: feel the mechanism.
Before you call one programmatically, the mental model. What does an LLM actually do, technically?
Predicts text? Reads a question and writes an answer?
Closer than most first-time guesses. Stripped to one line: an LLM predicts the next token. Given some text in front of it, it asks "what's the most likely word to come next?" — then once that word is chosen, it asks again, and again. The whole "answer" is just thousands of next-token guesses in sequence.
So it's not really thinking?
It's pattern-matching at enormous scale. The model trained on a huge corpus of text — books, code, conversations — and learned the statistical shape of language. When you ask "what's the capital of France?", it produces "Paris" not because it knows facts, but because in training data the pattern "capital of France" → "Paris" appeared often enough to dominate its prediction.
What does this imply for using one?
A few things:
Today's exercise demonstrates the core mechanism. You ask the model for ONE next word — the smallest possible task that exposes next-token prediction. Tomorrow we go deeper on the call shape itself.
A Large Language Model is a neural network trained on a huge text corpus to predict the next token. Given input, it produces tokens one at a time, conditioning each new token on everything that came before.
A token is a chunk of text — usually ~3-4 characters or about 0.75 words for English. "Hello, world!" might tokenize as ["Hello", ",", " world", "!"] — four tokens. Different tokenizers split differently. The cost of an API call is measured in tokens, not words or characters.
When an LLM produces plausible-sounding but factually wrong output, we call it a hallucination. It's not a bug — it's how the system works. The model produces the most likely sequence of tokens given training. If your question's answer wasn't strongly represented in training, the model fills in something that looks right. That something might be wrong.
The defense: never let an LLM be the source of truth on facts that matter. Use it for language tasks, validate its facts via deterministic checks (a calculation, a database lookup, a regex against a known shape).
Most APIs sample tokens probabilistically. The same prompt can yield slightly different outputs each call. Plan for it. Code that asserts the response equals an exact string will be flaky; code that checks the response shape ("contains a label from {positive, negative}") will work.
The practice below sends the model a prompt asking for exactly one word that should follow a given phrase. The whole point of asking for one word: it's the unit the model is producing internally. You're calling the model to do the smallest possible thing — pick the next likely token.
Tomorrow's lesson formalises the call shape (Agent, run_sync, .output, quota). Today: feel the mechanism.
Create a free account to get started. Paid plans unlock all tracks.