Structured outputs with Pydantic AI

One of the challenges of working with LLMs is getting them to respond with a consistent format, such as a given JSON schema. Anyone who has tried to solve this issue with prompt engineering knows how frustrating it can be. You add a ‘MUST’ here and an ‘always return JSON’ there, but still the output doesn’t reliably parse. Maybe you’re about to add a try-except block to handle parsing errors and start wondering if there’s a better way. There is: structured output techniques ensure a model always responds with a specific format.

This post will focus on using Pydantic AI, but other approaches are available. The advantage of Pydantic AI is that it’s model-agnostic and builds on the foundation of Pydantic. Maybe you’ve heard of Pydantic before and are wondering what a data validation library has to do with AI. To explain that we first need to talk about Python type hints and data validation.

Python type hints

Python is a dynamically typed language: you can assign a string to a variable and later assign an int to it without issue. While this is part of what we like about Python, there are cases where considering types is useful, particularly for writing software rather than scripts. PEP 484 introduced type hints to help here. Adapting an example from the Pydantic AI docs, let’s say we have a class to represent a city location.

class CityLocation:

    def __init__(self, city: str, country: str):
        self.city = city
        self.country = country

In the __init__ method the city and country arguments have type annotations to show they should be strings. By itself this is useful documentation for someone reading the code, but nothing stops you initialising the class with different types. There is no runtime type checking. CityLocation(1, 2) works just fine. However, you can use a static type checker like mypy to analyse your code for typing issues. Mypy would error and flag cases where the class was initialised with values that aren’t strings. Static type analysis is a powerful way to find errors and inconsistencies in your Python software before it’s run. Pydantic, on the other hand, provides a way to check for issues at runtime.

Pydantic

To better understand the syntax of Pydantic it’s worth looking at another way the above code could be written. Here the dataclass decorator is used to specify an equivalent class. That is, city and country are both instance attributes with type hints indicating the values should be strings. The dataclass decorator will generate an __init__ method like the one above to create the attributes. Crucially, the type annotations are required for this to work. Without the type annotations, city and country would be class attributes.

from dataclasses import dataclass

@dataclass
class CityLocation:
    city: str
    country: str

Creating the class using Pydantic looks much like the dataclass example. As above, city and country are instance attributes.

from pydantic import BaseModel


class CityLocation(BaseModel):
    city: str
    country: str

The big difference comes when creating an instance with incorrectly typed arguments. Whereas it worked fine before, now Pydantic will raise a validation error.

CityLocation(city=1, country=2)

ValidationError: 2 validation errors for CityLocation
city
  Input should be a valid string [type=string_type, input_value=1, input_type=int]
    For further information visit https://errors.pydantic.dev/2.12/v/string_type
country
  Input should be a valid string [type=string_type, input_value=2, input_type=int]
    For further information visit https://errors.pydantic.dev/2.12/v/string_type

Perhaps you’re beginning to see what all this has to do with AI and structured output. We started with the problem of getting consistent outputs from LLMs. Pydantic is a library for validating data that matches some definition.

Pydantic AI

While there’s a lot more you can with Pydantic, we’ve introduced enough to see why it’s relevant to structured outputs. Pydantic classes, with their in-built data validation, are a natural fit for validating the output from LLMs. Instead of asking more nicely in the prompt, the output is validated and a new request is made when it fails.

Continuing with the example above, consider this example from the Pydantic AI docs. We’ve seen the CityLocation before. The Agent object is used to define the model to use and the output class. When a question is asked via run_sync the output is automatically an instance of the CityLocation class. No system prompt is required here. Pydantic AI handles it all.

from pydantic import BaseModel
from pydantic_ai import Agent

class CityLocation(BaseModel):
    city: str
    country: str

agent = Agent('google-gla:gemini-3-flash-preview', output_type=CityLocation)
result = agent.run_sync('Where were the olympics held in 2012?')

print(result.output)
#> city='London' country='United Kingdom'

print(result.usage())
#> RunUsage(input_tokens=57, output_tokens=8, requests=1)

There may be cases where the model needs a bit more information about the attributes of the output. Here Field is useful to provide an additional description that is automatically passed to the model. Again, this is preferable to writing a prompt in keeping everything together.

from pydantic import BaseModel, Field

class CityLocation(BaseModel):
    city: str = Field(description="The name of the city")
    country: str  = Field(description="ISO 3166 country code")

So how does it work?

Exactly how structured output works will depend on the model being used. The advantage of Pydantic AI is that it abstracts away this detail. Users don’t need to know how different providers implement structured output. The same code will work whatever model you’re using. Notably, both OpenAI and Anthropic use Pydantic for structured output with their Python SDKs.

As I’m most familiar with accessing models via AWS Bedrock, I’ll show how it works there. Readers who aren’t interested in looking behind the curtain can skip to the conclusion.

Bedrock structured output

Adapting the example to use Bedrock is as simple as changing the model name for the Agent and running it with the right AWS permissions.

agent = Agent('bedrock:eu.amazon.nova-2-lite-v1:0', output_type=CityLocation)
result = agent.run_sync('Where were the olympics held in 2012?')

Looking at result.all_messages() shows three parts that make up the back-and-forth with Bedrock:

A UserPromptPart containing the question ‘Where were the olympics held in 2012?’
A ToolCallPart with args={'city': 'London', 'country': 'United Kingdom'}
A ToolReturnPart

At this point, you may be a bit confused. What has tool calling got to do with anything? This is all due to one technique for getting structured outputs with Bedrock via tool use, as described in this blogpost. In brief, a dummy tool is created that the model will request to use, which has input arguments that match the structured output. The tool input is then returned and used by Pydantic to create the output class. Running agent._get_toolset() shows more on the tool definition created for the structured output.

Interestingly, Bedrock has recently announced a new more straightforward way to do structured output using the output config for a request. Here we see the advantage of using a framework like Pydantic AI rather than the AWS SDK. The Bedrock implementation for structured output can change without needing to change your code. Instead of specifying an output schema in JSON, you can use Python classes directly. Testing, linting and type checking all then become easier and nicer to do.

Conclusion

Getting consistent output is essential to building robust LLM-powered software. Prompt engineering approaches are brittle and error prone, requiring frustrating trial-and-error. Structured output provides a better approach, where outputs are guaranteed to conform to a specified format. Pydantic AI provides a model-agnostic approach that leverages Python type hints and Pydantic validation, allowing you to stay where you want to be: writing Python code!

Grinding Gears

Tales of code crunching from the FreeAgent Engineering team