Day 18 · ~13m

Parsing AI Responses

Validate and parse LLM output into structured data with Pydantic.

🧑‍💻

I can send prompts to an LLM, but the response is just a string. How do I turn that into structured data my API can use?

👩‍🏫

That's the parsing problem — and it's where most AI integrations break. LLMs return text, but your API needs JSON. The solution: ask for JSON and validate it with Pydantic.

import json
from pydantic import BaseModel, ValidationError

class Sentiment(BaseModel):
    label: str  # "positive", "negative", "neutral"
    confidence: float  # 0.0 to 1.0
    reasoning: str

def parse_sentiment(raw_response: str) -> Sentiment:
    data = json.loads(raw_response)
    return Sentiment(**data)

If the LLM returns {"label": "positive", "confidence": 0.92, "reasoning": "upbeat tone"}, Pydantic validates every field. If the LLM hallucinates a wrong type or missing field, you get a clear ValidationError.

🧑‍💻

But LLMs don't always return clean JSON. What if the response has extra text around it?

👩‍🏫

Good instinct. In practice, you need to extract the JSON from the response:

import re

def extract_json(text: str) -> dict:
    # Try the whole string first
    try:
        return json.loads(text)
    except json.JSONDecodeError:
        pass
    # Look for JSON in code blocks
    match = re.search(r'```(?:json)?\s*\n?(.*?)\n?```', text, re.DOTALL)
    if match:
        return json.loads(match.group(1))
    # Look for first { to last }
    start = text.find('{')
    end = text.rfind('}') + 1
    if start >= 0 and end > start:
        return json.loads(text[start:end])
    raise ValueError("No valid JSON found in response")

This handles three cases: clean JSON, JSON in a code block, and JSON buried in text. Layer by layer, from strict to lenient.

🧑‍💻

How do I handle the case where parsing fails entirely?

👩‍🏫

Retry or fallback. Production AI services always have a plan B:

def safe_parse(raw: str, retries: int = 2) -> dict:
    for attempt in range(retries + 1):
        try:
            data = extract_json(raw)
            return Sentiment(**data).model_dump()
        except (json.JSONDecodeError, ValidationError, ValueError):
            if attempt == retries:
                return {"label": "unknown", "confidence": 0.0, "reasoning": "Parse failed"}
    return {"label": "unknown", "confidence": 0.0, "reasoning": "Parse failed"}

Never let a parse failure crash your API. Return a default, log the error, and move on.

🧑‍💻

Is there a way to make the LLM more likely to return valid JSON?

👩‍🏫

Yes — include the schema in the prompt:

def build_json_prompt(text: str) -> str:
    return f"""Analyze the sentiment of this text. Respond with JSON only.

Schema: {{"label": "positive|negative|neutral", "confidence": 0.0-1.0, "reasoning": "..."}}

Text: {text}

JSON:"""

Most modern LLMs also support a response_format parameter that constrains output to valid JSON. But always validate on your end — trust but verify.

Practice your skills

Sign up to write and run code in this lesson.

Already have an account? Sign in