Day 25 · ~13m●

Batch Processing

Validating lists of records, handling partial failures, and collecting errors without stopping.

🧑‍💻

In real systems, I'm usually validating hundreds or thousands of records at once. If one fails, I don't want to stop — I want to skip it, log the error, and keep going. What's the best pattern for that?

👩‍🏫

The batch validation pattern: loop through all records, try-except each one, and collect results into separate buckets:

from pydantic import BaseModel, Field, ValidationError

class Product(BaseModel):
    name: str = Field(min_length=1)
    price: float = Field(gt=0)
    quantity: int = Field(ge=0)

def validate_batch(records: list[dict]) -> dict:
    valid = []
    errors = []

    for i, record in enumerate(records):
        try:
            item = Product(**record)
            valid.append(item.model_dump())
        except ValidationError as e:
            errors.append({
                "index": i,
                "input": record,
                "error_count": e.error_count(),
                "messages": [err["msg"] for err in e.errors()]
            })

    return {"valid": valid, "errors": errors}

One bad record doesn't stop the whole batch. You process everything and report what succeeded and what failed.

🧑‍💻

That makes sense. But what if I want to know the success rate? Like "85% of records passed validation"?

👩‍🏫

Compute it from the counts:

result = validate_batch(records)
total = len(records)
valid_count = len(result["valid"])
error_count = len(result["errors"])
success_rate = round(valid_count / total * 100, 1) if total > 0 else 0

print(f"{success_rate}% success ({valid_count}/{total})")

🧑‍💻

What about TypeAdapter? Can I validate a whole list at once instead of one by one?

👩‍🏫

Yes. TypeAdapter can validate a list[Product] in one call, but it stops at the first invalid item. For batch processing where you want all results, the loop pattern is better:

from pydantic import TypeAdapter

adapter = TypeAdapter(list[Product])

# This validates all items but stops at first error
try:
    products = adapter.validate_python(records)
except ValidationError as e:
    print(e)  # only shows first batch of errors

For partial-failure tolerance, stick with the explicit loop. For all-or-nothing validation (like a database transaction), TypeAdapter works well.

🧑‍💻

How do I handle a really common pattern — collecting unique error types across the whole batch?

👩‍🏫

Aggregate errors by type or field to see patterns:

from collections import Counter

def batch_error_summary(records, model_class):
    error_types = Counter()
    error_fields = Counter()

    for record in records:
        try:
            model_class(**record)
        except ValidationError as e:
            for err in e.errors():
                error_types[err["type"]] += 1
                error_fields[err["loc"][0]] += 1

    return {
        "by_type": dict(error_types.most_common()),
        "by_field": dict(error_fields.most_common())
    }

This tells you things like "42 records failed because of string_too_short on the name field" — invaluable for understanding data quality issues across a whole dataset.

🧑‍💻

This is exactly what ETL pipelines need.

👩‍🏫

Exactly. Extract, Transform, Load — and Validate in the middle. The batch pattern turns Pydantic from a single-record validator into a data quality tool for entire datasets.

Practice your skills

Already have an account? Sign in