Taming the AI Wild West: How to Get Predictable Data from OpenAI

By Sylvester Das
•October 8, 2025
•6 min read
Artificial intelligence is transforming how we interact with technology. Large Language Models (LLMs) like those from OpenAI offer incredible capabilities, from generating creative text to answering complex questions. However, getting these models to consistently return data in a usable format can feel like herding cats. You ask for a list, and you get a paragraph. You expect JSON, and you receive free-form text. This inconsistency makes it difficult to integrate AI into automated workflows.
Fortunately, OpenAI offers a powerful solution: Structured Output. This feature allows you to define the exact format you want the AI to use when responding, making it much easier to process and utilize the generated data. This article will guide you through the concept of structured output, demonstrate its practical applications, and show you how to implement it in your projects.
Why Structured Output Matters
Imagine you're building an application that recommends recipes based on ingredients provided by the user. You want the AI to analyze the ingredients and return a list of suitable recipes, including their names, cooking times, and a brief description. Without structured output, you might receive responses like:
"Okay, based on those ingredients, you could make a delicious pasta dish! It would take about 30 minutes and is super easy. Or maybe a stir-fry? That's even faster!"
While helpful, this response is difficult to parse programmatically. You'd need to write complex code to extract the recipe name, cooking time, and description. This is where structured output shines. By defining a specific format (e.g., a JSON object), you can ensure the AI returns data that's easily processed and integrated into your application.
Defining Your Desired Structure
The key to using structured output is clearly defining the format you expect. This is typically done using a schema, which specifies the data types and structure of the response. Here's an example using JSON Schema:
{
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The name of the recipe"
},
"cooking_time": {
"type": "integer",
"description": "The cooking time in minutes"
},
"description": {
"type": "string",
"description": "A brief description of the recipe"
}
},
"required": [
"name",
"cooking_time",
"description"
]
}
}
This schema tells the AI to return an array of objects, where each object represents a recipe. Each recipe object must have a name
(string), a cooking_time
(integer), and a description
(string). The required
field ensures that all three properties are present in each recipe object.
Implementing Structured Output with OpenAI
While the exact implementation might vary depending on the specific OpenAI API client you're using, the general process involves:
Defining the schema: As shown above, create a JSON Schema that describes the desired output format.
Providing the schema to the API: Pass the schema to the OpenAI API when making your request. This tells the model how to structure its response.
Processing the response: The API will return a response that conforms to your schema. You can then easily parse the response and use the data in your application.
Here's a Python example using the OpenAI library:
import openai
import json
# Your OpenAI API key
openai.api_key = "YOUR_OPENAI_API_KEY"
# Define the schema
recipe_schema = {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The name of the recipe"
},
"cooking_time": {
"type": "integer",
"description": "The cooking time in minutes"
},
"description": {
"type": "string",
"description": "A brief description of the recipe"
}
},
"required": [
"name",
"cooking_time",
"description"
]
}
}
# The prompt for the AI
prompt = "Suggest some recipes based on these ingredients: chicken, rice, vegetables."
# Make the API call with structured output requested
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo", # Or any suitable model
messages=[{"role": "user", "content": prompt}],
functions=[{
"name": "get_recipes",
"description": "Returns a list of recipes based on the given ingredients.",
"parameters": recipe_schema,
}],
function_call={"name": "get_recipes"}
)
# Extract the structured data
message = response["choices"][0]["message"]
if message.get("function_call"):
function_name = message["function_call"]["name"]
function_args = json.loads(message["function_call"]["arguments"])
recipes = function_args # The recipes are now in a usable Python object (list of dictionaries)
print(recipes)
else:
print("No recipes found.")
# Example of accessing the data
if recipes:
for recipe in recipes:
print(f"Recipe Name: {recipe['name']}")
print(f"Cooking Time: {recipe['cooking_time']} minutes")
print(f"Description: {recipe['description']}")
print("-" * 20)
Technical Deep Dive:
openai.ChatCompletion.create()
: This is the core function for interacting with OpenAI's chat models.functions
: This parameter tells the model about the function it can call and the expected schema of the arguments.function_call
: This parameter forces the model to call theget_recipes
function. If you omit this, the model might choose to respond in free-form text instead of using the function call.json.loads()
: This function parses the JSON string returned by the API into a Python dictionary or list.gpt-3.5-turbo
: This is a powerful and relatively inexpensive model. You can experiment with other models likegpt-4
for potentially better results.
Step-by-Step Breakdown:
Import Libraries: Import the
openai
andjson
libraries.Set API Key: Replace
"YOUR_OPENAI_API_KEY"
with your actual OpenAI API key. You can obtain an API key from the OpenAI website.Define the Schema: The
recipe_schema
variable holds the JSON Schema that defines the structure of the recipe data.Create the Prompt: The
prompt
variable contains the text you want to send to the AI.Make the API Call: The
openai.ChatCompletion.create()
function sends the prompt and schema to the OpenAI API. Thefunctions
argument specifies the function the model can call, and thefunction_call
argument forces the model to call it.Extract the Data: The code extracts the structured data from the API response. It parses the JSON string in the
arguments
field into a Python object.Process the Data: The code iterates through the list of recipes and prints the name, cooking time, and description of each recipe.
Practical Implications and Benefits
Structured output offers numerous benefits for developers:
Simplified Data Processing: No more complex regular expressions or custom parsing logic! The AI returns data in a predictable format, making it easy to integrate into your applications.
Increased Reliability: By enforcing a specific schema, you reduce the risk of errors caused by inconsistent or unexpected AI responses.
Improved Automation: Structured output enables you to automate tasks that previously required manual intervention, such as data entry, report generation, and content creation.
Data Validation: The schema can include data type validation and restrictions, ensuring the AI returns valid and consistent data.
Conclusion
Structured output is a game-changer for working with OpenAI models. It allows you to tame the AI wild west and get predictable, usable data that seamlessly integrates into your applications. By defining a clear schema and using the OpenAI API, you can unlock the full potential of AI and automate tasks that were previously impossible. This leads to more efficient workflows, reduced development time, and increased data reliability. Embrace structured output and transform your AI projects from chaotic experiments into robust and reliable solutions.
Inspired by an article from https://hackernoon.com/structured-output-from-openai-how-to-turn-ai-responses-into-usable-data?source=rss
Advertisement
Shorten Your Links, Amplify Your Reach
Tired of long, clunky URLs? Create short, powerful, and trackable links with MiniFyn. It's fast, free, and easy to use.