Introduction
Large Language Models (LLMs) excel in generating text but often struggle to produce structured output. By leveraging Pydantic‘s type validation and prompt engineering, we can enforce and validate the output generated by LLMs.
All code examples in this blog post are written in Python. The LLM used is OpenAI’s gpt-3.5-turbo.
Query the LLM
To query the LLM, we use the following function:
import openai
def query(prompt: str) -> str:
"""Query the LLM with the given prompt."""
completion = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{
"role": "user",
"content": prompt,
}
],
temperature=0.0,
)
return completion.choices[0].message.content
We then call the function with a simple question:
response = query("What is the largest planet in our solar system?")
print(response)
'The largest planet in our solar system is Jupiter.'
Enforcing JSON output with a prompt
In our prompt, we can ask the LLM to respond in a certain format:
prompt = """
I will ask you questions and you will respond. Your response should be in the following format:
```json
{
"thought": "How you think about the question",
"answer": "The answer to the question"
}
```
"""
Then, we query the model:
question = "What is the largest planet in our solar system?"
response = query(prompt + question)
print(response)
'{
"thought": "This is a factual question that can be answered with scientific knowledge.",
"answer": "The largest planet in our solar system is Jupiter."
}'
This is great, because we can easily parse the structured output:
import json
parsed_response = json.loads(response)
print(parsed_response["answer"])
'The largest planet in our solar system is Jupiter.'
Validating the output
from pydantic import BaseModel
class ThoughtAnswerResponse(BaseModel):
thought: str
answer: str
raw_response = query(prompt)
# Note: When you are using pydantic<2.0, use parse_raw instead of model_validate_json
validated_response = ThoughtAnswerResponse.model_validate_json(raw_response)
print(validated_response)
thought='This is a factual question that can be answered with scientific knowledge.' answer='The largest planet in our solar system is Jupiter.'
print(type(validated_response))
<class 'ThoughtAnswerResponse'>
Using the Pydantic model in the prompt
At this moment, we describe our response format in two places:
- a JSON description in our prompt
- a corresponding Pydantic model
When we want to update the response format, we need to change both the prompt and the Pydantic model. This can cause inconsistencies.
We can solve this by exporting the Pydantic model to a JSON schema and adding the schema to the prompt. This will make the response and the Pydantic model consistent.
response_schema_dict = ThoughtAnswerResponse.model_json_schema()
response_schema_json = json.dumps(response_schema_dict, indent=2)
prompt = f"""
I will ask you questions, and you will respond.
Your response should be in the following format:
```json
{response_schema_json}
```
"""
The prompt will now look like this:
I will ask you questions, and you will respond. Your response should be in the following format:
```json
{
"properties": {
"thought": { "title": "Thought", "type": "string" },
"answer": { "title": "Answer", "type": "string" }
},
"required": ["thought", "answer"],
"title": "ThoughtAnswerResponse",
"type": "object"
}
The response will look like this:
{
"thought": "The largest planet in our solar system is Jupiter.",
"answer": "Jupiter"
}
Now, whenever you change the Pydantic model, the corresponding schema will be put in the prompt. Note that the schema has become more complex than it was before. One benefit is that it allows us to be more specific in what responses we require.
Error handling
The LLM may still produce results that are not consistent with our model. We can add some code to catch this:
from pydantic import ValidationError
try:
validated_response = ThoughtAnswerResponse.model_validate_json(raw_response)
except ValidationError as e:
print("Unable to validate LLM response.")
# Add your own error handling here
raise e
Enforce specific values using a Literal
Sometimes, you want to enforce the use of specific values for a given field. We add the field “difficulty” to our response object. The LLM should use it to provide information about the difficulty of the question. In a regular prompt, we would do the following:
prompt = """Your response should be in the following format:
```json
{
"thought": "How you think about the question",
"answer": "The answer to the question",
"difficulty": "How difficult the question was. One of easy, medium or hard"
}
```
"""
Of course, the model could potentially still use other values. To validate it, we would need to write custom code.
With Pydantic, it is a lot easier. We create a new type called Difficulty
using a Literal. A Literal allows us to specify the use of a select list of values. We add a Difficulty
type hint to the difficulty
field in our Pydantic model:
from typing import Literal
from pydantic import BaseModel
# We create a new type
Difficulty = Literal["easy", "medium", "hard"]
class ThoughtAnswerResponse(BaseModel):
thought: str
answer: str
difficulty: Difficulty
The LLM responds may respond with a value we do not allow:
{
"thought": "The largest planet in our solar system is Jupiter.",
"answer": "Jupiter",
"difficulty": "Unknown"
}
When we parse this result, Pydantic will validate the values for the difficulty
field. Unknown
does not match one of the values specified in the Literal type we have defined. So we get the following error:
validated_response = ThoughtAnswerResponse.model_validate_json(response)
ValidationError: 1 validation error for ThoughtAnswerResponse
difficulty
Input should be 'easy', 'medium' or 'hard' [type=literal_error, input_value='Unknown', input_type=str]
Conclusion
By using Pydantic and prompt engineering, you can enforce and validate the output of LLMs. This provides you with greater control of the LLM output and allow you to build more robust AI systems.
Photo by Andrew Ridley on Unsplash