Batch Processing
Validating lists of records, handling partial failures, and collecting errors without stopping.
In real systems, I'm usually validating hundreds or thousands of records at once. If one fails, I don't want to stop — I want to skip it, log the error, and keep going. What's the best pattern for that?
The batch validation pattern: loop through all records, try-except each one, and collect results into separate buckets:
from pydantic import BaseModel, Field, ValidationError
class Product(BaseModel):
name: str = Field(min_length=1)
price: float = Field(gt=0)
quantity: int = Field(ge=0)
def validate_batch(records: list[dict]) -> dict:
valid = []
errors = []
for i, record in enumerate(records):
try:
item = Product(**record)
valid.append(item.model_dump())
except ValidationError as e:
errors.append({
"index": i,
"input": record,
"error_count": e.error_count(),
"messages": [err["msg"] for err in e.errors()]
})
return {"valid": valid, "errors": errors}
One bad record doesn't stop the whole batch. You process everything and report what succeeded and what failed.
That makes sense. But what if I want to know the success rate? Like "85% of records passed validation"?
Compute it from the counts:
result = validate_batch(records)
total = len(records)
valid_count = len(result["valid"])
error_count = len(result["errors"])
success_rate = round(valid_count / total * 100, 1) if total > 0 else 0
print(f"{success_rate}% success ({valid_count}/{total})")
What about TypeAdapter? Can I validate a whole list at once instead of one by one?
Yes. TypeAdapter can validate a list[Product] in one call, but it stops at the first invalid item. For batch processing where you want all results, the loop pattern is better:
from pydantic import TypeAdapter
adapter = TypeAdapter(list[Product])
# This validates all items but stops at first error
try:
products = adapter.validate_python(records)
except ValidationError as e:
print(e) # only shows first batch of errors
For partial-failure tolerance, stick with the explicit loop. For all-or-nothing validation (like a database transaction), TypeAdapter works well.
How do I handle a really common pattern — collecting unique error types across the whole batch?
Aggregate errors by type or field to see patterns:
from collections import Counter
def batch_error_summary(records, model_class):
error_types = Counter()
error_fields = Counter()
for record in records:
try:
model_class(**record)
except ValidationError as e:
for err in e.errors():
error_types[err["type"]] += 1
error_fields[err["loc"][0]] += 1
return {
"by_type": dict(error_types.most_common()),
"by_field": dict(error_fields.most_common())
}
This tells you things like "42 records failed because of string_too_short on the name field" — invaluable for understanding data quality issues across a whole dataset.
This is exactly what ETL pipelines need.
Exactly. Extract, Transform, Load — and Validate in the middle. The batch pattern turns Pydantic from a single-record validator into a data quality tool for entire datasets.
Practice your skills
Sign up to write and run code in this lesson.
Batch Processing
Validating lists of records, handling partial failures, and collecting errors without stopping.
In real systems, I'm usually validating hundreds or thousands of records at once. If one fails, I don't want to stop — I want to skip it, log the error, and keep going. What's the best pattern for that?
The batch validation pattern: loop through all records, try-except each one, and collect results into separate buckets:
from pydantic import BaseModel, Field, ValidationError
class Product(BaseModel):
name: str = Field(min_length=1)
price: float = Field(gt=0)
quantity: int = Field(ge=0)
def validate_batch(records: list[dict]) -> dict:
valid = []
errors = []
for i, record in enumerate(records):
try:
item = Product(**record)
valid.append(item.model_dump())
except ValidationError as e:
errors.append({
"index": i,
"input": record,
"error_count": e.error_count(),
"messages": [err["msg"] for err in e.errors()]
})
return {"valid": valid, "errors": errors}
One bad record doesn't stop the whole batch. You process everything and report what succeeded and what failed.
That makes sense. But what if I want to know the success rate? Like "85% of records passed validation"?
Compute it from the counts:
result = validate_batch(records)
total = len(records)
valid_count = len(result["valid"])
error_count = len(result["errors"])
success_rate = round(valid_count / total * 100, 1) if total > 0 else 0
print(f"{success_rate}% success ({valid_count}/{total})")
What about TypeAdapter? Can I validate a whole list at once instead of one by one?
Yes. TypeAdapter can validate a list[Product] in one call, but it stops at the first invalid item. For batch processing where you want all results, the loop pattern is better:
from pydantic import TypeAdapter
adapter = TypeAdapter(list[Product])
# This validates all items but stops at first error
try:
products = adapter.validate_python(records)
except ValidationError as e:
print(e) # only shows first batch of errors
For partial-failure tolerance, stick with the explicit loop. For all-or-nothing validation (like a database transaction), TypeAdapter works well.
How do I handle a really common pattern — collecting unique error types across the whole batch?
Aggregate errors by type or field to see patterns:
from collections import Counter
def batch_error_summary(records, model_class):
error_types = Counter()
error_fields = Counter()
for record in records:
try:
model_class(**record)
except ValidationError as e:
for err in e.errors():
error_types[err["type"]] += 1
error_fields[err["loc"][0]] += 1
return {
"by_type": dict(error_types.most_common()),
"by_field": dict(error_fields.most_common())
}
This tells you things like "42 records failed because of string_too_short on the name field" — invaluable for understanding data quality issues across a whole dataset.
This is exactly what ETL pipelines need.
Exactly. Extract, Transform, Load — and Validate in the middle. The batch pattern turns Pydantic from a single-record validator into a data quality tool for entire datasets.
Practice your skills
Sign up to write and run code in this lesson.