Published on June 12, 2026

AI Agent Frameworks Compared in 2026: DSPy vs Claude Agent SDK vs OpenAI Agents SDK vs CrewAI vs AutoGen vs LangGraph vs Google ADK

I shipped agents on all seven. The honest matrix — ergonomics, multi-agent, MCP, lock-in — and which one I'd actually pick for your project.

ai-agentsframeworkmodel-comparisonlanggraphcrewaipython

If you're picking an AI agent framework in June 2026, here's the short version. Want the loop you control, with checkpoints and human-in-the-loop? LangGraph. Building on Claude with file-system and MCP tools? Claude Agent SDK. Want the smallest, most readable API to ship today? OpenAI Agents SDK. Spinning up a role-based team in an afternoon? CrewAI. Optimizing a prompt pipeline instead of hand-tuning it? DSPy. Deep into Microsoft/Azure or research-style multi-agent chat? AutoGen. All-in on Google Cloud and Gemini? Google ADK.

I built the same small agent — answer a question, call two tools, hand off to a second agent — on all seven. Below is the matrix, a minimal code snippet for each, and the part nobody warns you about: where each one breaks. Then the uncomfortable conclusion: in 2026 the framework you pick matters less than the layer above it — training, sandboxing, and observability. Code shapes are current as of June 12, 2026; verify versions before you commit, because this field re-rolls monthly.

Why this is even a question now

Two years ago "agent framework" meant LangChain or nothing. In 2026 there are dozens, and the lineup that actually ships in production has settled into seven: DSPy (Stanford), Claude Agent SDK (Anthropic), OpenAI Agents SDK (OpenAI), CrewAI, AutoGen (Microsoft), LangGraph (LangChain), and Google ADK (Google).

They are not interchangeable. They sit at different altitudes. DSPy is a compiler for LLM programs. LangGraph is a state machine you draw. CrewAI is a team metaphor. The Claude and OpenAI SDKs are thin loops around one provider's models. AutoGen and ADK are platforms with a multi-agent runtime underneath. Pick the wrong altitude and you'll either fight the framework or rebuild half of it.

The mistake I see most: teams reach for a heavy multi-agent platform when a single agent with three tools would have done the job — then spend a sprint debugging message routing instead of shipping. So before the matrix: most "agents" in production are one model, one loop, a handful of tools. Choose for that first.

The matrix

Framework	Ergonomics	Multi-agent	MCP / streaming	Production features	Lock-in	Best for
DSPy	Declarative; you write signatures	Via modules (manual)	Tools yes; MCP via adapters	Optimizers, eval-driven, caching	Low (provider-agnostic)	Optimizing prompt/agent pipelines
Claude Agent SDK	Tiny, opinionated, async	Subagents	Native MCP; native streaming	Context mgmt, permissions, sessions	High (Claude / Anthropic)	Production Claude & coding agents
OpenAI Agents SDK	Smallest, most readable	Handoffs (first-class)	MCP yes; streaming yes	Guardrails, sessions, built-in tracing	Medium (best on OpenAI)	Ship-fast single & handoff agents
CrewAI	Role/goal/backstory metaphor	Crews + Flows	MCP via tools; streaming basic	Flows for determinism, memory, telemetry	Medium (own framework)	Role-based teams, fast prototyping
AutoGen	Verbose; actor/event model	Core strength	MCP via ext; streaming yes	Async runtime, distributed, Studio UI	Medium→Microsoft stack	Research & complex multi-agent chat
LangGraph	Explicit graph; more boilerplate	Supervisor/graph	MCP adapter; streaming yes	Checkpoints, durable exec, human-in-the-loop	Low–Medium (OSS runtime)	Stateful workflows you must control
Google ADK	Clean, code-first	Hierarchical agents	MCP yes; bidi streaming	Vertex Agent Engine, eval, deploy	High (Google Cloud)	Gemini + Google Cloud production

TL;DR: there is no "best framework." There's the right altitude for your problem and the right blast radius for your team's lock-in tolerance.

The seven, in their own words (and where each breaks)

Each snippet is the canonical minimal agent — one model, tools, go. They're illustrative of each API's shape, not a full app; pin the version, the APIs drift.

DSPy — you compile prompts, you don't write them

DSPy treats an LLM program as something you optimize, not hand-tune. You declare a signature (question -> answer), pick a module (ReAct for tool use), and an optimizer tunes the prompts against a metric.

# pip install dspy
import dspy
 
dspy.configure(lm=dspy.LM("openai/gpt-5.1"))
 
def calculate(expression: str) -> str:
    """Evaluate basic arithmetic."""
    return str(eval(expression, {"__builtins__": {}}))  # sandbox in real code
 
agent = dspy.ReAct("question -> answer", tools=[calculate])
print(agent(question="What is 4871 * 209?").answer)

Where it breaks: the optimizer magic needs a metric and a dataset. No eval set, no payoff — you're just using a clunkier prompt template. It's the wrong tool for a one-off agent and the best tool for a pipeline you'll run a million times.

Claude Agent SDK — the production loop, batteries included

Anthropic's SDK (the renamed, generalized successor to the Claude Code SDK) wraps the agent loop with MCP support, file-system tools, permissions, subagents, and automatic context compaction. Python and TypeScript.

# pip install claude-agent-sdk
import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions
 
async def main():
    options = ClaudeAgentOptions(model="claude-opus-4-8")
    async for message in query(prompt="What time is it in Kyiv?", options=options):
        print(message)
 
asyncio.run(main())

Where it breaks: it's gorgeous on Claude and a wall everywhere else — this is the highest-lock-in option here. If "swap to a cheaper model next quarter" is on your roadmap, that's a real cost.

OpenAI Agents SDK — the smallest API that does the job

The lightweight successor to Swarm. Three primitives: agents, handoffs, guardrails, plus sessions and built-in tracing. It's the one I reach for when I want readable code today.

# pip install openai-agents
from agents import Agent, Runner, function_tool
 
@function_tool
def get_weather(city: str) -> str:
    return f"Sunny in {city}, 24°C."
 
agent = Agent(name="Assistant", instructions="Be concise.", tools=[get_weather])
result = Runner.run_sync(agent, "What's the weather in Lisbon?")
print(result.final_output)

Where it breaks: it runs against other providers via litellm, but the ergonomics and tracing are tuned for OpenAI. The "lightweight" promise also means you'll hand-roll anything beyond handoffs — there's no graph, no durable state.

CrewAI — agents as a team with roles

CrewAI's metaphor is a crew: each agent has a role, goal, and backstory; tasks flow sequentially or hierarchically. For determinism it added Flows. It's standalone now (no LangChain underneath).

# pip install crewai
from crewai import Agent, Task, Crew
 
researcher = Agent(role="Researcher", goal="Find the answer",
                   backstory="A meticulous analyst.")
task = Task(description="What time is it in Kyiv?",
            agent=researcher, expected_output="A one-line answer.")
print(Crew(agents=[researcher], tasks=[task]).kickoff())

Where it breaks: the role/backstory framing is great for demos and brittle for control flow. The moment you need "if X then route to Y," you're reaching for Flows — and at that point LangGraph is the more honest tool.

AutoGen — the multi-agent runtime

Microsoft's AutoGen v0.4+ is a ground-up rewrite: async, event-driven, an actor model under the high-level AgentChat API. It's converging with Semantic Kernel into the broader Microsoft Agent Framework, so check which entry point is current before you start.

# pip install autogen-agentchat autogen-ext[openai]
import asyncio
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient
 
async def main():
    client = OpenAIChatCompletionClient(model="gpt-5.1")
    agent = AssistantAgent("assistant", model_client=client)
    print(await agent.run(task="What is 23 * 47?"))
 
asyncio.run(main())

Where it breaks: power has a price — it's the most verbose here, the API churned hard across the v0.2→v0.4→Agent Framework migrations, and a lot of tutorials online still target the old API. Budget time for "which version am I even reading."

LangGraph — draw the state machine

LangGraph models your agent as a graph of nodes and edges with explicit state, checkpointing, durable execution, and first-class human-in-the-loop. The create_react_agent prebuilt hides the graph until you need it.

# pip install langgraph "langchain[anthropic]"
from langgraph.prebuilt import create_react_agent
 
def get_weather(city: str) -> str:
    """Get the weather for a city."""
    return f"Sunny in {city}."
 
agent = create_react_agent(model="anthropic:claude-sonnet-4-6", tools=[get_weather])
out = agent.invoke({"messages": [{"role": "user", "content": "Weather in Kyiv?"}]})
print(out["messages"][-1].content)

Where it breaks: the prebuilt is easy; the real graph API is verbose and has a learning curve. You also inherit LangChain's surface area and its habit of moving things between packages. The payoff — pause, resume, inspect, and edit state mid-run — is unmatched when you actually need it.

Google ADK — Gemini's production kit

The Agent Development Kit is the code-first framework behind Vertex AI Agent Engine. Model-agnostic in principle, Gemini-first in practice, with hierarchical multi-agent, bidirectional streaming, evaluation, and a deploy path into Google Cloud.

# pip install google-adk
from google.adk.agents import Agent
 
def get_weather(city: str) -> dict:
    """Return the weather for a city."""
    return {"status": "ok", "report": f"Sunny in {city}."}
 
root_agent = Agent(name="weather_agent", model="gemini-3-pro",
                   instruction="Answer weather questions.", tools=[get_weather])

Where it breaks: the smooth path runs through Vertex. Off Google Cloud you lose the deploy/eval story that's half the point, and the docs assume Gemini. Great inside the ecosystem, a strange choice outside it.

Two entrants worth watching

PraisonAI — the low-code/fast lane. YAML workflows, Python and JS SDKs, MCP across stdio/WebSocket/SSE/Streamable HTTP, and a viral claim of being "1,209× faster than LangGraph" to spin up an agent. Treat the headline number as marketing (it's measuring setup, not task throughput), but the ergonomics are genuinely quick for simple multi-agent jobs.
AgentScope — the open-source, China-origin entrant: agent-oriented programming, a visual builder, MCP, plus built-in memory and RAG. Worth a look if you want a batteries-included OSS platform that isn't tied to a US cloud.

I wouldn't bet a production system on either yet, but both are real signals of where the ergonomics are heading: faster to start, lower-code, MCP everywhere.

The part that actually matters: frameworks are commoditizing

Here's what surprised me after shipping on all seven: the framework choice is becoming the least interesting decision. The minimal snippets above are nearly identical — define tools, define an agent, run a loop. That convergence is the tell. When everyone's "hello agent" looks the same, the framework is a commodity and the value moves to the layer above it:

Training/optimization. Microsoft's Agent Lightning trains any agent — LangChain, AutoGen, CrewAI, OpenAI SDK — with reinforcement learning, with near-zero code changes. Your agent's behavior becomes the asset, not the framework it's wired in.
Security/sandboxing. AVM (Agent Virtual Machine) and similar tools secure and sandbox any framework at the OS layer. An agent that runs tools is a remote-code-execution surface; the framework doesn't fix that, the isolation layer does. (See my self-hosted AI security checklist for the VPS side of this.)
Observability. Tracing, evals, cost and token accounting, replay. OpenAI's SDK bakes in tracing; everyone else bolts on LangSmith, Langfuse, Phoenix, or Weave. In production this is what keeps you alive at 3 AM — covered in production AI agents: monitoring.

A contrarian thread this spring put it well: most frameworks are racing to expand the context window when the real problem is context quality. The framework that wins your next project might be the one that helps you feed the model the right 4,000 tokens — not the one that lets you cram in 400,000.

So pick a framework, yes. But spend your real engineering budget on the layer above it.

Common pitfalls (and how I fixed them)

Pitfall: reaching for a multi-agent platform for a single-agent job. Fix: start with one agent and tools. Add agents only when one agent provably can't hold the task. Most "multi-agent" demos are theater.
Pitfall: copy-pasting tutorials that target an old major version (AutoGen and LangChain are repeat offenders). Fix: read the version in the import path, not the blog's publish date. Pin it in requirements.txt.
Pitfall: under-pricing lock-in. Fix: before you build, answer "what's the cost of swapping the model/cloud in 6 months?" The Claude Agent SDK and Google ADK make that swap expensive — sometimes worth it, never free.
Pitfall: trusting a viral benchmark ("1,209× faster!"). Fix: read what was measured. Setup time ≠ task throughput ≠ cost-per-task.
Pitfall: no sandbox around tool execution. Fix: every tool an agent can call is an attack surface. Validate inputs, drop privileges, isolate at the OS layer.

So which one should you pick?

A decision shortcut, honestly:

I want to understand agents first → none. Build the loop in pure Python, then come back.
Production app on Claude → Claude Agent SDK.
Ship something readable this week → OpenAI Agents SDK.
Complex, stateful, must-control flow → LangGraph.
Role-based team, fast prototype → CrewAI.
Optimizing a high-volume pipeline → DSPy.
Deep in Microsoft or doing research → AutoGen.
All-in on Google Cloud + Gemini → Google ADK.

Then wrap whatever you pick in the layer that actually keeps it alive: sandboxing, observability, and a way to improve its behavior over time.

What's next

Pick one framework from the shortlist, port your single best tool into it, and ship a single agent to a real user. You'll learn more in a week of production traffic than in a month of comparing READMEs. When that agent needs a teammate, that's your cue to look at multi-agent — and at the architecture trade-offs of going there.

Sources

DSPy documentation — signatures, ReAct module, and optimizers used for the DSPy snippet and "compile, don't tune" framing. https://dspy.ai Retrieved June 12, 2026.
Claude Agent SDK (Anthropic docs) — query/ClaudeAgentOptions, MCP and subagent support cited in the Claude section. https://platform.claude.com/docs/en/api/agent-sdk/overview Retrieved June 12, 2026.
OpenAI Agents SDK — agents, handoffs, guardrails, and built-in tracing referenced in the OpenAI section. https://openai.github.io/openai-agents-python/ Retrieved June 12, 2026.
CrewAI documentation — role/goal/backstory model and Flows, used for the CrewAI snippet and "where it breaks" note. https://docs.crewai.com Retrieved June 12, 2026.
AutoGen / Microsoft Agent Framework — v0.4 actor model, AgentChat API, and the Semantic Kernel convergence noted in the AutoGen section. https://microsoft.github.io/autogen/ Retrieved June 12, 2026.
LangGraph documentation — create_react_agent, checkpointing, durable execution, and human-in-the-loop claims. https://langchain-ai.github.io/langgraph/ Retrieved June 12, 2026.
Google Agent Development Kit (ADK) — code-first agents, hierarchical multi-agent, and Vertex Agent Engine deploy path. https://google.github.io/adk-docs/ Retrieved June 12, 2026.
Microsoft Agent Lightning — RL training for any agent framework, cited in the "frameworks are commoditizing" section. https://github.com/microsoft/agent-lightning Retrieved June 12, 2026.

Need an agent architecture chosen and built — not just a demo? Start a project — I build AI agents, MCP servers, and agentic payment systems that run on your own server, on the framework that actually fits your problem.