How To Build Your First Production Ready Agent With OpenAI’s Agents SDK And Responses API (2026 Guide)
How To Build Your First Production Ready Agent With OpenAI’s Agents SDK And Responses API (2026 Guide)
In early 2026, OpenAI’s Responses API and Agents SDK are the main path for building agents that actually take action: they search the web, look through your files, call APIs, and even operate a computer through a virtual
If you are still using the old Assistants API or a pile of ad hoc prompt chains, this guide walks you step by step from “toy prototype” to a production ready agent built on the same primitives OpenAI uses internally.
Why Agents SDK + Responses Is The Default Stack In 2026
In March 2025, OpenAI launched the Responses API as a unified, stateful interface that combines chat style completion, function calling, and built in tools like web search, file search, and computer use. Since then, hundreds of thousands of developers have used it at scale for agentic workloads.
At the same time, OpenAI introduced the Agents SDK as a lightweight framework for building multi agent workflows on top of Responses and other model providers. The SDK focuses on a few primitives:
- Agents LLMs configured with instructions and tools
- Tools functions or hosted capabilities the agent can call
- Handoffs structured delegation between agents
- Guardrails validation hooks for inputs and outputs
- Tracing automatic traces you can inspect in the OpenAI dashboard
The Python package openai-agents on PyPI and its TypeScript sibling are provider agnostic and already support OpenAI Responses, Chat Completions, and more than one hundred other LLMs via LiteLLM.
OpenAI has also started the clock on deprecating the legacy Assistants API, with a planned sunset in August 2026 once Responses reached full feature parity.
If you are starting something new today, you should build it with Responses and the Agents SDK.
The Mental Model: What You Are Actually Building
OpenAI’s own agent building track describes agent systems as four composable parts:
- Models reasoning engines that follow instructions
- Tools actions the model can take
- State and memory what the agent remembers across steps
- Orchestration how you coordinate multiple agents and tools
The Agents SDK and Responses API give you batteries included support for exactly those four pieces.
In this tutorial, you will build a single agent that:
- Receives a task description from a user
- Calls a custom Python tool to fetch mock data from a ticket system
- Uses that data plus its own reasoning to decide what to do
- Returns a structured JSON result that your backend can act on safely
Then you will see how to wrap it in guardrails and tracing so you can monitor it in production.
Stack Overview: Responses API, Agents SDK, Built In Tools
Responses API in one paragraph
The Responses endpoint (/v1/responses) is OpenAI’s most advanced interface for model responses. It:
- Supports text and image inputs
- Can produce free form text or structured JSON outputs
- Stores conversation state so you can chain calls
- Lets you define
toolsthe model can call, including built in tools for web search, file search, and computer use
You control it with parameters like model, instructions, tools, tool_choice, and conversation IDs.
Agents SDK in one paragraph
The Agents SDK wraps Responses and other APIs in a higher level workflow:
- Define
Agentobjects with a name, instructions, model, tools, and settings - Use
function_tooldecorators to turn Python functions into JSON schema aware tools - Call
Runner.run()to execute the agent loop: the SDK repeatedly calls the model, runs any tools it requested, and loops until done - Attach guardrails, handoffs, and tracing without writing your own orchestration engine
Under the hood, the SDK is still sending Responses API calls; you just get a clean abstraction and automatic traces in the dashboard.
Tools and built in capabilities
Tools come in four main categories in the Agents SDK and Responses ecosystem:
- Hosted tools run on OpenAI’s side, for example web search, file search, computer use, code interpreter, and image generation
- Function tools wrap your own Python or TypeScript code with JSON schemas so the model can call it
- Agents as tools expose one agent as a callable tool of another agent
- MCP tools connect external systems through the Model Context Protocol, from Google Drive to custom servers
This is what turns a plain LLM into a real agent that can read your data and act inside your systems.
Step 1: Set Up Your Environment
The example below uses Python, but the same ideas map to the TypeScript SDK if you prefer Node.
Install dependencies
python -m venv .venv
source .venv/bin/activate # On Windows use: .venv\Scripts\activate
pip install openai openai-agents
The openai-agents package currently requires Python 3.9 or newer and is published frequently; as of January 2026 the latest version is in the 0.6 series.
Configure your API key
Set an environment variable before you run your app:
export OPENAI_API_KEY="sk-your-key-here"
In production, you will usually mount this through a secrets manager instead of hardcoding it in your code or shell history.
Step 2: Design A Small But Real Workflow
For a first agent, pick something narrow and clear. Here is an example we will build:
Use case: support ticket triage for a SaaS product.
Goal: given a short description of a ticket, decide:
- What the issue type is, for example billing, bug, usage question
- What priority it should have
- Whether to auto reply with a template or escalate to a human
In a production system, the agent would call real APIs to your ticketing tool. For this tutorial, you will stub those calls with simple Python functions.
Step 3: Define Structured Output Types
One of the biggest differences between toy and production agents is structured output. Instead of plain text, you ask the model to return JSON that matches a schema. The Agents SDK integrates with Pydantic to make this easy.
from pydantic import BaseModel
from typing import Literal
class TriageDecision(BaseModel):
ticket_id: str
issue_type: Literal["billing", "bug", "how_to", "other"]
priority: Literal["low", "normal", "high", "urgent"]
action: Literal["auto_reply", "escalate_human", "ask_clarifying_question"]
reasoning: str
This schema becomes your contract between the agent and the rest of your stack.
Step 4: Turn Business Logic Into Tools
Next you wrap a couple of Python functions as tools using the SDK’s decorator. The model can then call them with correctly typed arguments.
from agents import function_tool
# A fake ticket lookup. Replace with real DB or API calls.
@function_tool
def get_ticket_context(ticket_id: str) -> dict:
"""
Return extra context for a ticket.
In production this could query your ticketing system.
"""
sample = {
"123": {
"customer_tier": "enterprise",
"monthly_value": 5000,
"previous_tickets": 7,
},
"456": {
"customer_tier": "self_serve",
"monthly_value": 49,
"previous_tickets": 1,
},
}
return sample.get(ticket_id, {
"customer_tier": "unknown",
"monthly_value": 0,
"previous_tickets": 0,
})
The @function_tool decorator handles the JSON schema, argument parsing, and function calling so the model can safely invoke get_ticket_context when needed.
Step 5: Create Your First Agent
Now you create an Agent configured with:
- A name, useful for logs and handoffs
- Instructions that behave like a system prompt
- A model from the Responses family, for example
gpt-4.1or a reasoning model - The tools it is allowed to use
- An
output_typeso the SDK requests structured output
from agents import Agent
triage_agent = Agent(
name="ticket_triage_agent",
instructions=(
"You are a cautious but efficient support triage assistant. "
"Classify the ticket, pick an appropriate priority, and choose "
"whether to auto reply, escalate, or ask a clarifying question. "
"Use get_ticket_context when the ticket_id is provided."
),
model="gpt-4.1", # or a reasoning model like gpt-5 if available
tools=[get_ticket_context],
output_type=TriageDecision,
)
In the background, the SDK uses the Responses API’s structured outputs and tool calling features so the model returns a valid TriageDecision object whenever possible.
Step 6: Run The Agent With A Runner
To execute the agent, you use a Runner which manages the conversation and tool loop.
from agents import Runner
runner = Runner()
async def triage_ticket(ticket_id: str, description: str) -> TriageDecision:
# You can pass context here if you need dependencies.
result = await runner.run(
agent=triage_agent,
input=f"Ticket {ticket_id}: {description}",
)
return result.output # Parsed TriageDecision object
The runner:
- Sends your input and instructions to the model through Responses
- Executes any tool calls returned by the model, for example
get_ticket_context - Loops until the model signals completion or hits your limits
- Returns a rich result object with the final output and trace IDs
Behind the scenes, these runs create traces that you can inspect from the OpenAI dashboard, including which tools were called and how long each step took.
Step 7: Add Guardrails So It Is Safe In Production
OpenAI’s own guidance on production agents stresses guardrails: you want to validate inputs and outputs, especially when agents can touch money, people, or infrastructure.
The Agents SDK lets you plug in guardrails to check user input or agent output before you trust it.
from agents.guardrails import OutputGuardrail, GuardrailAction
from agents import RunContextWrapper
class PriorityGuardrail(OutputGuardrail[TriageDecision]):
async def check_output(
self,
context: RunContextWrapper[None],
output: TriageDecision,
) -> GuardrailAction:
# Example rule: only enterprise customers can be "urgent" by default.
if (
output.priority == "urgent"
and "enterprise" not in output.reasoning.lower()
):
return GuardrailAction.block_and_explain(
"Urgent priority requires enterprise context or explicit justification."
)
return GuardrailAction.allow()
triage_agent_with_guardrails = triage_agent.clone(
name="triage_with_guardrails",
output_guardrails=[PriorityGuardrail()],
)
This pattern lets you enforce hard policies even if the model tries something creative. The official docs show similar patterns for input validation, PII checks, and relevance filters.
Step 8: Wire The Agent Into A Web API
To be useful, your agent should sit behind a web endpoint or background worker. Here is a minimal FastAPI example that exposes your triage agent as an HTTP endpoint.
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
class TriageRequest(BaseModel):
ticket_id: str
description: str
@app.post("/triage")
async def triage_endpoint(payload: TriageRequest):
decision = await triage_ticket(
ticket_id=payload.ticket_id,
description=payload.description,
)
# FastAPI will automatically serialize TriageDecision to JSON.
return decision
In production you would also:
- Add authentication and rate limiting
- Log trace IDs from the Agents SDK so you can correlate HTTP requests with agent traces
- Record cost and latency metrics in your observability stack
The official Agents SDK docs and community blueprints include examples of Docker, Kubernetes, and background worker deployments that you can adapt.
Going Further: Multi Agent Systems And Hosted Tools
Once you have a single production agent working, the same SDK scales to more complex setups.
Using hosted tools like web search and computer use
Through the Responses API you can attach hosted tools for:
- Web search dynamic retrieval across the public web
- File search retrieval over your private docs
- Code interpreter secure Python execution
- Computer use remote browser control via Operator style agents
These are exposed as built in tools in both the API reference and the Agents SDK tools guide.
Multi agent patterns with handoffs and agents as tools
When your workflows get larger, you can split them across several agents and connect them in two main ways:
- Manager pattern a top level agent exposes specialist agents as tools and stays in control
- Handoffs peer agents hand off control once and the specialist agent takes over the conversation
The Agents SDK has first class support for both patterns, which are documented in the multi agent design section and expanded in practical guides and tutorials.
Production Checklist For Your First Agent
Before you call your agent production ready, walk through this checklist.
- Clear scope the agent has a narrow, well defined responsibility and hard boundaries
- Structured outputs all critical paths return JSON that your backend validates
- Guardrails key business and safety rules are enforced by code, not just prompts
- Observability traces are turned on and you can see which tools were used for each run
- Testing you have unit tests for tools and offline test suites for typical agent conversations
- Rollout strategy you start in shadow mode or partial automation, escalating to full autonomy only after you have real world metrics
- Fallbacks there is a safe path if the agent fails, for example routing to a human operator
OpenAI’s own production agent guides and community articles repeat the same lesson: start small, instrument heavily, and grow autonomy only after you trust the traces.
What To Do Next
If you follow the steps in this post you will have:
- A working Python agent built with the OpenAI Agents SDK
- Structured outputs suitable for backend automation
- Guardrails and traces that make debugging and compliance possible
- A simple HTTP wrapper that you can plug into your product
From here, good next experiments include:
- Adding a second agent and trying the manager or handoff pattern
- Using hosted tools like web search or file search for real knowledge retrieval
- Integrating the Azure OpenAI or other cloud Responses implementations if your infra lives there
- Connecting MCP tools so your agent can reach data from Google Drive, SharePoint, and internal APIs safely
You do not need a giant multi agent mesh to start. One small, reliable agent that does one job well is the real milestone on your path from prompts to production workflows.

