pydantic-ai

Pydantic in Production Deployment

Master production deployment of Pydantic applications. Learn containerization, monitoring, scaling, and disaster recovery.

3 modules · 12 lessons · free to read

What you'll learn

  • Deploy Pydantic applications with Docker to production environments
  • Monitor APIs with logging, metrics, and distributed tracing
  • Scale applications to handle millions of concurrent requests
  • Implement rate limiting and caching for performance
  • Design resilient architectures with automatic failover and disaster recovery

01Deployment Strategies

Learn to containerize, deploy, configure, and manage Pydantic applications in production.

1.Containerizing Pydantic Applications

Docker containers ensure your Pydantic application runs the same everywhere. Create a Dockerfile:

python
FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . EXPOSE 8000 CMD ["uvicorn", "main:app", "--host", "0.0.0.0"]

Build and run:

python
docker build -t my-api . docker run -p 8000:8000 my-api

Optimizations:

  • Use slim base images (3.11-slim not 3.11)
  • Pin dependencies with requirements.txt
  • Use multi-stage builds for smaller images
  • Set environment variables at runtime

Constraints

  • Use slim base image
  • Set proper WORKDIR
  • Include EXPOSE and CMD
Practice Lesson 1

2.Deploying to Cloud Platforms

Cloud deployment is the process of packaging your Pydantic application and running it on managed infrastructure like AWS, Google Cloud, or Azure. Each platform offers container services, serverless functions, and CI/CD integrations that automate the path from code commit to live traffic.

Most cloud platforms accept Docker images directly. You push your container to a registry (ECR, GCR, or ACR), then point a service like AWS ECS, Cloud Run, or Azure Container Apps at that image. The platform handles scaling, networking, and restarts.

python
def validate_deploy_config(config): required = ["region", "instance_type", "min_instances"] errors = [] for field in required: if field not in config: errors.append(f"missing_{field}") elif field == "min_instances" and config[field] < 1: errors.append("min_instances_must_be_positive") return {"valid": len(errors) == 0, "errors": errors}

Serverless functions let you deploy individual endpoints without managing servers. AWS Lambda, Google Cloud Functions, and Azure Functions all support Python. You write a handler function that receives an event dict, validate it with Pydantic, and return a response. Cold starts add latency on the first request, but subsequent calls reuse the warm container.

Continuous deployment pipelines automate building, testing, and deploying on every push. A typical pipeline checks out code, runs tests, builds the Docker image, pushes it to the registry, and triggers a rolling deployment. Validating your deployment configuration before it reaches production prevents misconfigured services from going live.

python
config = { "region": "us-east-1", "instance_type": "t3.micro", "min_instances": 2 } result = validate_deploy_config(config) # {"valid": True, "errors": []}

Use deployment config validation whenever you automate infrastructure changes. Catching a missing region or an invalid instance count in your pipeline is far cheaper than debugging a failed deployment at 2 AM.

Constraints

  • The function must check for three required fields: region (string), instance_type (string), and min_instances (positive integer).
  • Missing fields should produce errors like 'missing_region'. Invalid min_instances (not int or < 1) should produce 'min_instances_must_be_positive_integer'.
  • Return a dictionary with 'valid' set to True only when errors is empty.
Practice Lesson 2

3.Configuration Management

Configuration management is the practice of separating application settings from code so the same build can behave differently across environments. Instead of hardcoding database URLs or API keys, you store them in environment variables, config files, or secrets managers and load them at startup.

The most common pattern is a layered config: a defaults dictionary holds safe fallback values, and an environment-specific overrides dictionary replaces only the keys that differ. Merging these two dictionaries gives you the final runtime config. Nested settings like database connection details need recursive merging so you can override db.host without losing db.port.

python
def merge_config(defaults, overrides): result = {} all_keys = set(list(defaults.keys()) + list(overrides.keys())) for key in sorted(all_keys): if key in overrides and key in defaults: if isinstance(defaults[key], dict) and isinstance(overrides[key], dict): result[key] = merge_config(defaults[key], overrides[key]) else: result[key] = overrides[key] elif key in overrides: result[key] = overrides[key] else: result[key] = defaults[key] return result

Environment variables should never hold secrets in plain text on shared machines. Use a secrets manager (AWS Secrets Manager, HashiCorp Vault, or even encrypted env files) for API keys and database passwords. Feature flags follow the same pattern: a boolean in your config dict controls whether a new feature is active, and you toggle it per environment without redeploying code.

Merging configs recursively means you only specify what changes. Your production override might be three lines while your defaults file is fifty. This keeps environment-specific files small, readable, and easy to audit.

python
defaults = {"debug": False, "db": {"host": "localhost", "port": 5432}} prod = {"db": {"host": "prod-db.example.com"}} final = merge_config(defaults, prod) # {"db": {"host": "prod-db.example.com", "port": 5432}, "debug": False}

Use layered config merging whenever your application runs in more than one environment. It eliminates copy-paste between config files and ensures production never accidentally inherits a development database URL.

Constraints

  • When both defaults and overrides have a dict value for the same key, merge them recursively instead of replacing.
  • Keys that exist only in overrides should appear in the result. Keys only in defaults should be preserved.
  • Return keys in sorted order at each level of nesting.
Practice Lesson 3

4.Database Integration and Migrations

Database migration is the process of evolving your database schema to match changes in your application models. When you add a field to a Pydantic model, the corresponding database column must be added too. Migrations track these changes as versioned steps that can be applied in order, rolled back, or audited.

The simplest way to compute a migration is to compare two schema snapshots. Each snapshot maps field names to their types. By diffing the old and new snapshots, you can identify three categories of change: added fields (in new but not old), removed fields (in old but not new), and changed fields (present in both but with different types).

python
def compute_migration(old_schema, new_schema): added, removed, changed = [], [], [] for field in sorted(new_schema): if field not in old_schema: added.append({"field": field, "type": new_schema[field]}) elif old_schema[field] != new_schema[field]: changed.append({"field": field, "old_type": old_schema[field], "new_type": new_schema[field]}) for field in sorted(old_schema): if field not in new_schema: removed.append({"field": field, "type": old_schema[field]}) return {"added": added, "removed": removed, "changed": changed}

In production, migrations must preserve existing data. Adding a column is usually safe — you set a default value so existing rows are valid. Removing a column requires careful planning: first stop writing to it, then deploy the code change, then drop the column in a later migration. Changing a column type (like int to float) may need a data transformation step between the old and new type.

Schema versioning means tagging each migration with a sequential number or timestamp. Migration runners apply only the unapplied migrations in order, making deployments repeatable and safe across multiple environments.

python
old = {"name": "str", "age": "int", "email": "str"} new = {"name": "str", "age": "float", "phone": "str"} steps = compute_migration(old, new) # added: [{"field": "phone", "type": "str"}] # removed: [{"field": "email", "type": "str"}] # changed: [{"field": "age", "old_type": "int", "new_type": "float"}]

Use schema diffing whenever you need to auto-generate migration scripts or validate that a model change has a corresponding migration. Catching schema drift early prevents data loss and runtime errors in production.

Constraints

  • Return a dictionary with three keys: 'added', 'removed', and 'changed', each containing a list of dicts.
  • Added and removed entries have 'field' and 'type'. Changed entries have 'field', 'old_type', and 'new_type'.
  • Process fields in sorted alphabetical order within each category.
Practice Lesson 4

02Monitoring and Observability

Monitor and observe Pydantic applications in production with logging, metrics, tracing, and health checks.

1.Logging and Structured Output

Structured logging is the practice of emitting log entries as machine-readable records with consistent, queryable fields instead of free-form text strings. In production Pydantic applications, structured logs let you filter by user ID, trace ID, error type, or any custom field across millions of entries in seconds. Traditional print() debugging falls apart the moment your application runs on more than one server — structured logs are how you keep visibility.

A structured log entry typically contains a timestamp, a severity level, a human-readable message, and a context dictionary with arbitrary metadata. You build each entry as a Python dictionary and serialize it to JSON so that log aggregation tools (ELK, Datadog, CloudWatch) can index every field automatically.

python
import json from datetime import datetime, timezone def make_log(level, message, **context): entry = { "timestamp": datetime.now(timezone.utc).isoformat(), "level": level.upper(), "message": message, "context": context, } return json.dumps(entry)

When multiple services emit structured logs, you aggregate them into a central store and correlate events using shared fields like request_id or user_id. This turns isolated log lines into a timeline of what happened across your entire system for a single request.

python
def enrich_log(log_dict, request_id): log_dict["request_id"] = request_id return json.dumps(log_dict)

Production debugging relies on this pattern: when a user reports a problem, you search for their user_id, find the request_id, and pull every log entry from every service that touched that request. Use structured logging from day one — retrofitting it after an outage is painful, and the cost of doing it up front is almost zero.

Constraints

  • Each output string must be valid JSON with keys `timestamp`, `level`, `message`, and `context`.
  • If `timestamp` or `level` is missing from an input dict, default to `"unknown"` and `"INFO"` respectively. The `level` value must always be uppercased.
  • The `context` dict must contain every key from the input that is not `timestamp`, `level`, or `message`.
Practice Lesson 1

2.Metrics and Performance Monitoring

Performance metrics are numerical measurements collected over time that describe how your application behaves under real traffic. For Pydantic-powered APIs, the most critical metrics are response latency (how long requests take), validation failure rate (how often incoming data is rejected), and throughput (requests per second). Without metrics, you are flying blind — you will not know your API is slow until users start complaining.

The foundation of latency monitoring is percentile analysis. The mean (average) response time hides outliers: if 99 requests take 10ms and one takes 5 seconds, the mean is 60ms, which looks fine. Percentiles tell the real story — p95 means 95% of requests were faster than this value. When p95 spikes, you have a real problem affecting real users.

python
def percentile(sorted_data, p): idx = int(p / 100 * (len(sorted_data) - 1)) return sorted_data[idx]

In production, you collect these metrics continuously and push them to a time-series database like Prometheus or Datadog. You set alert thresholds on key percentiles — for example, trigger an alert when p95 latency exceeds 500ms for five consecutive minutes.

python
def should_alert(p95_values, threshold): return all(v > threshold for v in p95_values)

Metrics turn guesswork into evidence. When a deployment causes a regression, you see p95 jump immediately. When a downstream database slows down, your latency metrics catch it before your error rate climbs. Collect metrics from the start and review them after every deployment — they are the earliest warning system you have.

Constraints

  • If the input list is empty, return a dictionary with all values set to `0.0`.
  • Calculate percentiles using integer index: `idx = int(p / 100 * (len(data) - 1))` on the sorted data.
  • The `mean` value must be rounded to 2 decimal places using the built-in `round()` function.
Practice Lesson 2

3.Distributed Tracing

Distributed tracing is a technique for tracking a single request as it flows through multiple services in a microservices architecture. Each service records a "span" — a named, timed segment of work — and spans are linked together by parent-child relationships to form a trace tree. When a user reports that checkout is slow, distributed tracing tells you whether the bottleneck is in the API gateway, the payment service, or the database.

A span contains the service name, the duration in milliseconds, and a reference to its parent span (or None for the root). The root span represents the entry point — typically the API gateway. Child spans branch out from there, and the critical path is the longest chain from root to leaf.

python
# Span structure span = { "service": "payment-service", "duration_ms": 45, "parent_id": 0 # index of parent span }

To find performance bottlenecks, you need to compute the critical path: the longest sequence of dependent operations from root to leaf. This is the minimum possible latency for the entire request, because these spans run sequentially. Parallel branches do not add to the critical path — only the slowest branch at each fork matters.

python
def trace_depth(spans, idx, children): if idx not in children: return 1 return 1 + max(trace_depth(spans, c, children) for c in children[idx])

Distributed tracing is essential whenever your system has more than one service. Without it, you cannot tell whether a 2-second request spent 1.5 seconds in the database or 1.5 seconds in network transit between services. Instrument every service boundary and you gain the ability to pinpoint bottlenecks in minutes instead of hours.

Constraints

  • Each span has `service` (string), `duration_ms` (int), and `parent_id` (int index or null for root spans). Use list indices as span IDs.
  • Return a dictionary with `total_duration_ms` (sum of durations along the longest chain) and `depth` (number of spans in the deepest chain).
  • If the input list is empty, return `{"total_duration_ms": 0, "depth": 0}`.
Practice Lesson 3

4.Health Checks and Readiness

A health check is an endpoint that reports whether your application is able to serve traffic by verifying the status of every dependency it relies on. Load balancers, container orchestrators like Kubernetes, and deployment pipelines all poll health check endpoints to decide whether to route traffic to an instance, restart it, or hold a rollout. Without health checks, a service with a dead database connection will silently accept requests and fail every one of them.

Health checks inspect each dependency — database, cache, message queue, external API — and classify the overall system as healthy, degraded, or unhealthy. A single unhealthy dependency makes the whole system unhealthy. A degraded dependency (slow but responding) makes it degraded. Only when everything is healthy does the system report healthy.

python
def classify(dep_statuses): if "unhealthy" in dep_statuses: return "unhealthy" if "degraded" in dep_statuses: return "degraded" return "healthy"

Readiness probes go further than basic health checks. A liveness probe answers "is the process running?" while a readiness probe answers "is the process ready to handle requests?" During startup, an application might be alive but not ready — it is still loading configuration, warming caches, or running migrations. Separating liveness from readiness prevents the orchestrator from sending traffic to an instance that is not yet prepared.

python
def readiness_response(status, details): code = 200 if status == "healthy" else 503 return {"status_code": code, "body": {"status": status, "details": details}}

Implement health checks early and make them comprehensive. Every dependency your application touches should appear in the health check response. When a deployment goes wrong, the health check is what triggers automatic rollback. When a database failover happens, the health check is what tells the load balancer to stop sending traffic until the new primary is ready.

Constraints

  • If any dependency has status `"unhealthy"`, the overall status must be `"unhealthy"`. If any has `"degraded"` (and none are unhealthy), the overall status must be `"degraded"`. Otherwise it is `"healthy"`.
  • The `details` dict must map each dependency name to its status string extracted from the input.
  • If the input dictionary is empty, return `{"status": "healthy", "details": {}}`.
Practice Lesson 4

03Scaling and Architecture

Design scalable architectures that handle millions of requests with load balancing, caching, rate limiting, and disaster recovery.

1.Load Balancing and Distribution

Load balancing is the technique of distributing incoming requests across multiple server instances so no single server becomes overwhelmed. In production Pydantic applications, a load balancer sits in front of your API servers and forwards each request to the next available instance. Without a balancer, a traffic spike hits one server and brings down the entire service.

The simplest strategy is round-robin — requests rotate through servers in order. Server 1 gets request 1, server 2 gets request 2, and so on, cycling back to the start. The modulo operator makes this trivial: the server index for request i is i % num_servers.

python
def assign_request(servers, request_number): index = request_number % len(servers) return servers[index]

When your Pydantic models carry session state — like a multi-step form validation where step 2 depends on data validated in step 1 — you may need sticky sessions, where the same user always hits the same server. Without it, partial validation state is lost between requests. Sticky sessions are typically implemented by hashing the client's IP or session ID to a consistent server index.

Fair distribution also means monitoring. If one server is slower due to heavy validation workloads, weighted balancing assigns fewer requests to it. Health checks run on a timer and remove unhealthy instances from the pool entirely, redirecting traffic to the remaining servers.

python
def is_healthy(server_stats): return (server_stats["cpu"] < 90 and server_stats["error_rate"] < 0.05)

Use load balancing whenever you run more than one instance of your application. It improves availability, prevents single points of failure, and lets you scale horizontally by adding servers rather than upgrading hardware. Start with round-robin and add weighted or sticky strategies as your traffic patterns demand it.

Constraints

  • Use the modulo operator (%) to cycle through the server list.
  • Return a list with exactly num_requests elements.
Practice Lesson 1

2.Caching Strategies

Caching is the practice of storing computed results so they can be reused without repeating the original work. In Pydantic applications, validating the same data structure repeatedly wastes CPU — a cache stores the validated result and returns it instantly on the next identical request. A single cached lookup is orders of magnitude faster than re-running field validators, type coercion, and constraint checks.

Multi-level caching uses layers: an in-memory cache (fastest, per-instance) backed by a distributed cache like Redis (shared across instances). The lookup order is local first, then Redis, then compute from scratch. This pattern means each instance avoids redundant work, and instances share results so a cold restart on one server still benefits from the shared cache.

python
def check_cache(local, remote, key): if key in local: return local[key] if key in remote: local[key] = remote[key] return remote[key] return None

The LRU (Least Recently Used) eviction policy keeps the most-accessed items and discards the oldest when the cache is full. Every cache hit moves that item to the "most recent" position. Every miss adds a new item, evicting the least recent if the cache has reached capacity. This guarantees bounded memory usage while keeping frequently accessed data warm.

python
def evict_lru(cache, capacity, new_key): if len(cache) >= capacity: cache.pop(0) # remove least recent cache.append(new_key)

Cache validation results and serialized Pydantic models whenever the same inputs appear repeatedly — API endpoints with overlapping payloads, batch imports with duplicate records, or configuration objects parsed on every request. The tradeoff is memory for speed, and LRU keeps that memory bounded. Python's functools.lru_cache uses this same algorithm under the hood.

Constraints

  • On a hit, move the accessed key to the end of the cache list (most recent).
  • On a miss, evict the first element (least recent) if the cache is at capacity before adding the new key.
  • Return a dictionary with keys "hits", "misses", and "cache".
Practice Lesson 2

3.Rate Limiting and Throttling

Rate limiting is a technique that controls how many requests a client can make within a time window, protecting your API from abuse, accidental floods, and denial-of-service attacks. Without it, a single misbehaving client can saturate your Pydantic validation pipeline and starve legitimate users.

The token bucket algorithm is the most widely used approach. A bucket holds a fixed number of tokens. Each request consumes one token. Tokens refill at a steady rate over time. When the bucket is empty, requests are denied until tokens accumulate again.

python
def check_rate_limit(tokens, capacity, refill_rate, elapsed): tokens = min(capacity, tokens + elapsed * refill_rate) if tokens >= 1: return tokens - 1, True # allowed return tokens, False # denied

Throttling differs from hard rate limiting — instead of rejecting requests outright, throttling slows them down. You might queue excess requests or respond with a Retry-After header telling the client when to try again. This gives clients a graceful signal rather than a hard error.

For Pydantic APIs, rate limiting protects expensive validation endpoints. A complex nested model with custom validators can take significant CPU time — sometimes hundreds of milliseconds per request. Without limits, a single client sending thousands of requests per second can monopolize your validation pipeline and degrade service for everyone else. Limiting calls to those endpoints prevents resource starvation.

python
def should_throttle(request_count, max_per_minute): return request_count > max_per_minute

Apply rate limiting at the API gateway level for broad protection, and at individual endpoint level for fine-grained control over heavy validation routes. The token bucket is preferred because it allows short bursts while enforcing a sustainable average rate.

Constraints

  • Tokens refill based on elapsed time between requests, capped at the bucket capacity using the min() function.
  • Each allowed request consumes exactly 1 token. Requests are denied when tokens are less than 1.
  • The bucket starts full (tokens equal to capacity) and last_time starts at 0.
Practice Lesson 3

4.Disaster Recovery and Failover

Disaster recovery is the set of policies and procedures that restore a system to operation after a failure — whether a crashed server, corrupted database, or entire data center outage. In production Pydantic applications, planning for failure is not optional; it is part of the architecture.

Redundant systems are the foundation. A primary server handles writes and reads, while one or more replicas maintain copies of the data. Health checks monitor each server continuously. When the primary fails, a healthy replica is promoted to take over.

python
def needs_failover(primary_healthy, replica_healthy): return not primary_healthy and replica_healthy

Automatic failover removes human delay from the recovery process. The system detects the failure via health checks, drains the unhealthy server (stops sending it traffic), and promotes a replica — all within seconds rather than waiting for an engineer to respond.

Unhealthy replicas are also drained. Only healthy replicas can be promoted. If no healthy replica exists, the system enters a degraded state and alerts the operations team.

python
def pick_action(role, healthy, primary_down): if role == "primary" and not healthy: return "drain" if role == "replica" and healthy and primary_down: return "promote" if role == "replica" and not healthy: return "drain" return "keep" if role == "primary" else "standby"

Design your Pydantic services with redundancy from day one. Run at least two instances, replicate your data, test failover procedures regularly, and automate the switchover. Use Python's any() built-in to scan a fleet of servers for unhealthy primaries — it short-circuits on the first match, making it efficient even with large clusters. The cost of a tested recovery plan is always less than the cost of an unplanned outage.

Constraints

  • Each decision dict must have "server" and "action" keys.
  • A healthy replica is promoted only if any primary in the snapshot list is unhealthy — use Python's any() function to check this.
  • Process servers in the order they appear in the input list.
Practice Lesson 4

Frequently Asked Questions

In the Dockerfile from the containerization lesson, what does the CMD directive do?
Runs the uvicorn server when the container starts. CMD specifies the default command that runs when the container starts. In the lesson's Dockerfile, CMD ["uvicorn", "main:app", "--host", "0.0.0.0"] launches the ASGI server.
What does the validate_deploy_config function return when the config dictionary is missing the 'region' field?
{"valid": False, "errors": ["missing_region"]}. The function iterates over the required fields and appends 'missing_{field}' to the errors list when a field is absent. Since errors is non-empty, valid is set to False.
When merge_config receives defaults = {"db": {"host": "localhost", "port": 5432}} and overrides = {"db": {"host": "prod.example.com"}}, what is the value of result["db"]["port"]?
5432 — nested dicts are merged recursively so unoverridden keys are preserved. When both the default and override values for a key are dicts, merge_config recurses into them. Since 'port' exists only in defaults, it is preserved alongside the overridden 'host'.
In compute_migration, how is a field classified as 'changed' rather than 'added' or 'removed'?
It exists in both schemas but the type string is different. A 'changed' field is one present in both the old and new schemas whose type value differs. The function records it with field, old_type, and new_type.
Why does the configuration management lesson recommend using a secrets manager instead of plain environment variables for API keys?
Plain env vars on shared machines can be exposed in logs or environment listings. The lesson states that environment variables should never hold secrets in plain text on shared machines because they can be exposed. Secrets managers (AWS Secrets Manager, HashiCorp Vault, or encrypted env files) keep sensitive data secure.
How do I create a Dockerfile that uses python:3.11-slim, sets WORKDIR to /app, copies requirements.txt, installs dependencies, copies code, and runs uvicorn. Validate the Dockerfile structure.?
Docker containers ensure your Pydantic application runs the same everywhere. Create a Dockerfile:
How do I write a function `validate_deploy_config` that takes a config dictionary and validates it contains the required fields `region`, `instance_type`, and `min_instances` with correct types. Return a dict with `valid` (bool) and `errors` (list of strings).?
Cloud deployment is the process of packaging your Pydantic application and running it on managed infrastructure like AWS, Google Cloud, or Azure. Each platform offers container services, serverless functions, and CI/CD integrations that automate the path from code commit to live traffic.
How do I write a function `merge_config` that takes two dictionaries — `defaults` and `overrides` — and returns a merged dictionary where overrides replace defaults, with nested dictionaries merged recursively.?
Configuration management is the practice of separating application settings from code so the same build can behave differently across environments. Instead of hardcoding database URLs or API keys, you store them in environment variables, config files, or secrets managers and load them at startup.
How do I write a function `compute_migration` that takes two schema dictionaries (field name to type mappings) and returns the migration steps: which fields were added, removed, or had their type changed.?
Database migration is the process of evolving your database schema to match changes in your application models. When you add a field to a Pydantic model, the corresponding database column must be added too. Migrations track these changes as versioned steps that can be applied in order, rolled back, or audited.
In a structured log entry, what goes into the `context` dictionary?
All fields from the input that are not timestamp, level, or message. The context dictionary holds arbitrary metadata -- every key that is not timestamp, level, or message -- so that log aggregation tools can index and query custom fields like user_id or request_id.
Why are percentiles (like p95) preferred over the mean for monitoring response latency?
The mean hides outliers, so a few very slow requests will not move the average much. The mean hides outliers. For example, if 99 requests take 10ms and one takes 5 seconds, the mean is ~60ms which looks fine, but p95 reveals the real user experience for the slowest requests.
In distributed tracing, what does the critical path represent?
The longest chain of dependent spans from root to leaf, representing minimum possible latency. The critical path is the longest sequence of dependent (sequential) operations from root to leaf. Parallel branches do not add to it -- only the slowest branch at each fork matters.
A health check finds that the database is healthy, the cache reports `degraded`, and the message queue is healthy. What overall status should the system report?
degraded, because at least one dependency is degraded and none are unhealthy. A single unhealthy dependency makes the whole system unhealthy. If none are unhealthy but at least one is degraded, the overall status is degraded. Only when every dependency is healthy does the system report healthy.
What is the difference between a liveness probe and a readiness probe?
A liveness probe answers whether the process is running; a readiness probe answers whether it is ready to handle requests. A liveness probe confirms the process is running. A readiness probe confirms it is ready to handle requests -- during startup an application might be alive but not ready because it is still loading configuration or warming caches.
How do I write a function called `format_log_entries` that takes a list of log dictionaries and returns a list of JSON strings, each containing `timestamp`, `level` (uppercased), `message`, and a `context` dict with all remaining fields.?
Structured logging is the practice of emitting log entries as machine-readable records with consistent, queryable fields instead of free-form text strings. In production Pydantic applications, structured logs let you filter by user ID, trace ID, error type, or any custom field across millions of entries in seconds. Traditional `print()` debugging falls apart the moment your application runs on more than one server — structured logs are how you keep visibility.
How do I write a function called `compute_latency_metrics` that takes a list of response times (floats) and returns a dictionary with `mean` (rounded to 2 decimals), `p50`, `p95`, `p99`, and `max`.?
Performance metrics are numerical measurements collected over time that describe how your application behaves under real traffic. For Pydantic-powered APIs, the most critical metrics are response latency (how long requests take), validation failure rate (how often incoming data is rejected), and throughput (requests per second). Without metrics, you are flying blind — you will not know your API is slow until users start complaining.
How do I write a function called `find_critical_path` that takes a list of span dictionaries (each with `service`, `duration_ms`, and `parent_id`) and returns the critical path's total duration and depth.?
Distributed tracing is a technique for tracking a single request as it flows through multiple services in a microservices architecture. Each service records a "span" — a named, timed segment of work — and spans are linked together by parent-child relationships to form a trace tree. When a user reports that checkout is slow, distributed tracing tells you whether the bottleneck is in the API gateway, the payment service, or the database.
How do I write a function called `check_health` that takes a dictionary of dependency statuses and returns an overall health result with `status` (healthy, degraded, or unhealthy) and a `details` dict mapping each dependency name to its status string.?
A health check is an endpoint that reports whether your application is able to serve traffic by verifying the status of every dependency it relies on. Load balancers, container orchestrators like Kubernetes, and deployment pipelines all poll health check endpoints to decide whether to route traffic to an instance, restart it, or hold a rollout. Without health checks, a service with a dead database connection will silently accept requests and fail every one of them.
In a round-robin load balancer with 3 servers, which expression determines the server index for request number `i`?
i % len(servers). The modulo operator `i % len(servers)` wraps the request index back to 0 when it exceeds the number of servers, cycling through them in order.
In an LRU cache, what happens when a key that is already in the cache is accessed again?
The key is moved to the most-recent position (end of the list). On a cache hit, the LRU algorithm removes the key from its current position and re-appends it to the end, marking it as the most recently used item.
In the token bucket rate limiter, what happens when a burst of 3 requests arrives at the same timestamp and the bucket capacity is 2?
The first 2 are allowed and the third is denied. The bucket starts full with 2 tokens. Each allowed request consumes 1 token. With no elapsed time between requests at the same timestamp, no tokens refill, so the third request is denied.
In the failover decision system, what action is assigned to a healthy replica when the primary server is unhealthy?
promote. A healthy replica is promoted to take over the primary's role when any primary in the cluster is unhealthy. The code uses `any()` to check for unhealthy primaries before deciding to promote.
What is the key difference between rate limiting and throttling?
Rate limiting rejects excess requests outright while throttling slows them down or queues them. Rate limiting denies requests that exceed the limit, while throttling gives clients a graceful signal (like a Retry-After header) instead of a hard error, slowing requests down rather than rejecting them.
How do I write a function `round_robin_balance` that takes a list of server names and a number of requests, and returns a list showing which server handles each request using round-robin distribution with the modulo operator.?
Load balancing is the technique of distributing incoming requests across multiple server instances so no single server becomes overwhelmed. In production Pydantic applications, a load balancer sits in front of your API servers and forwards each request to the next available instance. Without a balancer, a traffic spike hits one server and brings down the entire service.
How do I write a function `lru_cache_sim` that simulates an LRU cache. It takes a capacity and a list of key accesses, tracks hits and misses, and returns the counts along with the final cache state as a dictionary.?
Caching is the practice of storing computed results so they can be reused without repeating the original work. In Pydantic applications, validating the same data structure repeatedly wastes CPU — a cache stores the validated result and returns it instantly on the next identical request. A single cached lookup is orders of magnitude faster than re-running field validators, type coercion, and constraint checks.
How do I write a function `token_bucket_limiter` that simulates a token bucket rate limiter. It takes a bucket capacity, a refill rate (tokens per second), and a list of request timestamps, and returns a list of "allowed" or "denied" strings.?
Rate limiting is a technique that controls how many requests a client can make within a time window, protecting your API from abuse, accidental floods, and denial-of-service attacks. Without it, a single misbehaving client can saturate your Pydantic validation pipeline and starve legitimate users.
How do I write a function `failover_decisions` that takes a list of server health snapshots (each with name, role, and healthy status) and returns a list of failover decision dictionaries using conditional logic and the `any()` built-in.?
Disaster recovery is the set of policies and procedures that restore a system to operation after a failure — whether a crashed server, corrupted database, or entire data center outage. In production Pydantic applications, planning for failure is not optional; it is part of the architecture.

Ready to write code?

Theory is just the start. Write real code, run tests, build the habit.

Open the playground →