How To Build Your First Production Ready Agent With OpenAI’s Agents SDK And Responses API (2026 Guide)

January 10, 2026 Rahul Kolekar 0 Comments

How To Build Your First Production Ready Agent With OpenAI’s Agents SDK And Responses API (2026 Guide)

In early 2026, OpenAI’s Responses API and Agents SDK are the main path for building agents that actually take action: they search the web, look through your files, call APIs, and even operate a computer through a virtual

If you are still using the old Assistants API or a pile of ad hoc prompt chains, this guide walks you step by step from “toy prototype” to a production ready agent built on the same primitives OpenAI uses internally.

Why Agents SDK + Responses Is The Default Stack In 2026

In March 2025, OpenAI launched the Responses API as a unified, stateful interface that combines chat style completion, function calling, and built in tools like web search, file search, and computer use. Since then, hundreds of thousands of developers have used it at scale for agentic workloads.

At the same time, OpenAI introduced the Agents SDK as a lightweight framework for building multi agent workflows on top of Responses and other model providers. The SDK focuses on a few primitives:

Agents LLMs configured with instructions and tools
Tools functions or hosted capabilities the agent can call
Handoffs structured delegation between agents
Guardrails validation hooks for inputs and outputs
Tracing automatic traces you can inspect in the OpenAI dashboard

The Python package openai-agents on PyPI and its TypeScript sibling are provider agnostic and already support OpenAI Responses, Chat Completions, and more than one hundred other LLMs via LiteLLM.

OpenAI has also started the clock on deprecating the legacy Assistants API, with a planned sunset in August 2026 once Responses reached full feature parity.
If you are starting something new today, you should build it with Responses and the Agents SDK.

The Mental Model: What You Are Actually Building

OpenAI’s own agent building track describes agent systems as four composable parts:

Models reasoning engines that follow instructions
Tools actions the model can take
State and memory what the agent remembers across steps
Orchestration how you coordinate multiple agents and tools

The Agents SDK and Responses API give you batteries included support for exactly those four pieces.

In this tutorial, you will build a single agent that:

Receives a task description from a user
Calls a custom Python tool to fetch mock data from a ticket system
Uses that data plus its own reasoning to decide what to do
Returns a structured JSON result that your backend can act on safely

Then you will see how to wrap it in guardrails and tracing so you can monitor it in production.

Stack Overview: Responses API, Agents SDK, Built In Tools

Responses API in one paragraph

The Responses endpoint (/v1/responses) is OpenAI’s most advanced interface for model responses. It:

Supports text and image inputs
Can produce free form text or structured JSON outputs
Stores conversation state so you can chain calls
Lets you define tools the model can call, including built in tools for web search, file search, and computer use

You control it with parameters like model, instructions, tools, tool_choice, and conversation IDs.

Agents SDK in one paragraph

The Agents SDK wraps Responses and other APIs in a higher level workflow:

Define Agent objects with a name, instructions, model, tools, and settings
Use function_tool decorators to turn Python functions into JSON schema aware tools
Call Runner.run() to execute the agent loop: the SDK repeatedly calls the model, runs any tools it requested, and loops until done
Attach guardrails, handoffs, and tracing without writing your own orchestration engine

Under the hood, the SDK is still sending Responses API calls; you just get a clean abstraction and automatic traces in the dashboard.

Tools and built in capabilities

Tools come in four main categories in the Agents SDK and Responses ecosystem:

Hosted tools run on OpenAI’s side, for example web search, file search, computer use, code interpreter, and image generation
Function tools wrap your own Python or TypeScript code with JSON schemas so the model can call it
Agents as tools expose one agent as a callable tool of another agent
MCP tools connect external systems through the Model Context Protocol, from Google Drive to custom servers

This is what turns a plain LLM into a real agent that can read your data and act inside your systems.

Step 1: Set Up Your Environment

The example below uses Python, but the same ideas map to the TypeScript SDK if you prefer Node.

Install dependencies

python -m venv .venv
source .venv/bin/activate # On Windows use: .venv\Scripts\activate

pip install openai openai-agents

The openai-agents package currently requires Python 3.9 or newer and is published frequently; as of January 2026 the latest version is in the 0.6 series.

Configure your API key

Set an environment variable before you run your app:

export OPENAI_API_KEY="sk-your-key-here"

In production, you will usually mount this through a secrets manager instead of hardcoding it in your code or shell history.

Step 2: Design A Small But Real Workflow

For a first agent, pick something narrow and clear. Here is an example we will build:

Use case: support ticket triage for a SaaS product.

Goal: given a short description of a ticket, decide:

What the issue type is, for example billing, bug, usage question
What priority it should have
Whether to auto reply with a template or escalate to a human

In a production system, the agent would call real APIs to your ticketing tool. For this tutorial, you will stub those calls with simple Python functions.

Step 3: Define Structured Output Types

One of the biggest differences between toy and production agents is structured output. Instead of plain text, you ask the model to return JSON that matches a schema. The Agents SDK integrates with Pydantic to make this easy.

from pydantic import BaseModel
from typing import Literal

class TriageDecision(BaseModel):
  ticket_id: str
  issue_type: Literal["billing", "bug", "how_to", "other"]
  priority: Literal["low", "normal", "high", "urgent"]
  action: Literal["auto_reply", "escalate_human", "ask_clarifying_question"]
  reasoning: str

This schema becomes your contract between the agent and the rest of your stack.

Step 4: Turn Business Logic Into Tools

Next you wrap a couple of Python functions as tools using the SDK’s decorator. The model can then call them with correctly typed arguments.

from agents import function_tool

# A fake ticket lookup. Replace with real DB or API calls.
@function_tool
def get_ticket_context(ticket_id: str) -> dict:
  """
  Return extra context for a ticket.
  In production this could query your ticketing system.
  """
  sample = {
    "123": {
      "customer_tier": "enterprise",
      "monthly_value": 5000,
      "previous_tickets": 7,
    },
    "456": {
      "customer_tier": "self_serve",
      "monthly_value": 49,
      "previous_tickets": 1,
    },
  }
  return sample.get(ticket_id, {
    "customer_tier": "unknown",
    "monthly_value": 0,
    "previous_tickets": 0,
  })

The @function_tool decorator handles the JSON schema, argument parsing, and function calling so the model can safely invoke get_ticket_context when needed.

Step 5: Create Your First Agent

Now you create an Agent configured with:

A name, useful for logs and handoffs
Instructions that behave like a system prompt
A model from the Responses family, for example gpt-4.1 or a reasoning model
The tools it is allowed to use
An output_type so the SDK requests structured output

from agents import Agent

triage_agent = Agent(
  name="ticket_triage_agent",
  instructions=(
    "You are a cautious but efficient support triage assistant. "
    "Classify the ticket, pick an appropriate priority, and choose "
    "whether to auto reply, escalate, or ask a clarifying question. "
    "Use get_ticket_context when the ticket_id is provided."
  ),
  model="gpt-4.1", # or a reasoning model like gpt-5 if available
  tools=[get_ticket_context],
  output_type=TriageDecision,
)

In the background, the SDK uses the Responses API’s structured outputs and tool calling features so the model returns a valid TriageDecision object whenever possible.

Step 6: Run The Agent With A Runner

To execute the agent, you use a Runner which manages the conversation and tool loop.

from agents import Runner

runner = Runner()

async def triage_ticket(ticket_id: str, description: str) -> TriageDecision:
  # You can pass context here if you need dependencies.
  result = await runner.run(
    agent=triage_agent,
    input=f"Ticket {ticket_id}: {description}",
  )
  return result.output # Parsed TriageDecision object

The runner:

Sends your input and instructions to the model through Responses
Executes any tool calls returned by the model, for example get_ticket_context
Loops until the model signals completion or hits your limits
Returns a rich result object with the final output and trace IDs

Behind the scenes, these runs create traces that you can inspect from the OpenAI dashboard, including which tools were called and how long each step took.

Step 7: Add Guardrails So It Is Safe In Production

OpenAI’s own guidance on production agents stresses guardrails: you want to validate inputs and outputs, especially when agents can touch money, people, or infrastructure.

The Agents SDK lets you plug in guardrails to check user input or agent output before you trust it.

from agents.guardrails import OutputGuardrail, GuardrailAction
from agents import RunContextWrapper

class PriorityGuardrail(OutputGuardrail[TriageDecision]):
  async def check_output(
    self,
    context: RunContextWrapper[None],
    output: TriageDecision,
  ) -> GuardrailAction:
    # Example rule: only enterprise customers can be "urgent" by default.
    if (
      output.priority == "urgent"
      and "enterprise" not in output.reasoning.lower()
    ):
      return GuardrailAction.block_and_explain(
        "Urgent priority requires enterprise context or explicit justification."
      )
    return GuardrailAction.allow()

triage_agent_with_guardrails = triage_agent.clone(
  name="triage_with_guardrails",
  output_guardrails=[PriorityGuardrail()],
)

This pattern lets you enforce hard policies even if the model tries something creative. The official docs show similar patterns for input validation, PII checks, and relevance filters.

Step 8: Wire The Agent Into A Web API

To be useful, your agent should sit behind a web endpoint or background worker. Here is a minimal FastAPI example that exposes your triage agent as an HTTP endpoint.

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

class TriageRequest(BaseModel):
  ticket_id: str
  description: str

@app.post("/triage")
async def triage_endpoint(payload: TriageRequest):
  decision = await triage_ticket(
    ticket_id=payload.ticket_id,
    description=payload.description,
  )
  # FastAPI will automatically serialize TriageDecision to JSON.
  return decision

In production you would also:

Add authentication and rate limiting
Log trace IDs from the Agents SDK so you can correlate HTTP requests with agent traces
Record cost and latency metrics in your observability stack

The official Agents SDK docs and community blueprints include examples of Docker, Kubernetes, and background worker deployments that you can adapt.

Going Further: Multi Agent Systems And Hosted Tools

Once you have a single production agent working, the same SDK scales to more complex setups.

Using hosted tools like web search and computer use

Through the Responses API you can attach hosted tools for:

Web search dynamic retrieval across the public web
File search retrieval over your private docs
Code interpreter secure Python execution
Computer use remote browser control via Operator style agents

These are exposed as built in tools in both the API reference and the Agents SDK tools guide.

Multi agent patterns with handoffs and agents as tools

When your workflows get larger, you can split them across several agents and connect them in two main ways:

Manager pattern a top level agent exposes specialist agents as tools and stays in control
Handoffs peer agents hand off control once and the specialist agent takes over the conversation

The Agents SDK has first class support for both patterns, which are documented in the multi agent design section and expanded in practical guides and tutorials.

Production Checklist For Your First Agent

Before you call your agent production ready, walk through this checklist.

Clear scope the agent has a narrow, well defined responsibility and hard boundaries
Structured outputs all critical paths return JSON that your backend validates
Guardrails key business and safety rules are enforced by code, not just prompts
Observability traces are turned on and you can see which tools were used for each run
Testing you have unit tests for tools and offline test suites for typical agent conversations
Rollout strategy you start in shadow mode or partial automation, escalating to full autonomy only after you have real world metrics
Fallbacks there is a safe path if the agent fails, for example routing to a human operator

OpenAI’s own production agent guides and community articles repeat the same lesson: start small, instrument heavily, and grow autonomy only after you trust the traces.

What To Do Next

If you follow the steps in this post you will have:

A working Python agent built with the OpenAI Agents SDK
Structured outputs suitable for backend automation
Guardrails and traces that make debugging and compliance possible
A simple HTTP wrapper that you can plug into your product

From here, good next experiments include:

Adding a second agent and trying the manager or handoff pattern
Using hosted tools like web search or file search for real knowledge retrieval
Integrating the Azure OpenAI or other cloud Responses implementations if your infra lives there
Connecting MCP tools so your agent can reach data from Google Drive, SharePoint, and internal APIs safely

You do not need a giant multi agent mesh to start. One small, reliable agent that does one job well is the real milestone on your path from prompts to production workflows.