1. Why Agentic AI Is the Defining Trend of 2026
In 2023 and 2024, the world focused on large language models and what they know. In 2026, the focus has shifted to what AI can do. The shift from chatbots that answer questions to agents that complete tasks autonomously is the most significant transition in applied AI since the launch of ChatGPT.
Consider the numbers: OpenAI's Operator — a fully autonomous web agent — was launched in January 2025 and reached 10 million active users within its first six months. Google's Project Mariner, Anthropic's Computer Use API, and the open-weight Manus AI system all launched or reached production maturity within the same 12-month window. Enterprise adoption of agentic workflows surged 340% year-over-year according to Gartner's February 2026 AI Hype Cycle report.
Agentic AI is trending for a clear reason: it converts AI from a sophisticated autocomplete into a worker that can book a flight, file a bug report, analyse a financial spreadsheet, and send a follow-up email — all from a single natural language instruction. This is not a future promise; it is production software deployed today.
2. What Is an AI Agent — Precise Definition
An AI agent is a system that perceives its environment, reasons over a goal, selects and executes actions using tools, observes the outcomes, and iterates — autonomously and over multiple steps — until the goal is achieved or it determines the goal is unachievable.
The critical distinction from a standard LLM query: a chatbot performs a single input → output cycle. An agent performs a loop: perceive → plan → act → observe → repeat.
2.1 The Four Properties of a True Agent
- Goal-directedness: The agent works toward an objective, not just a single response.
- Tool use: The agent can call external tools (web search, code execution, APIs, databases, file systems).
- Memory: The agent maintains context across steps — short-term (in-context) and optionally long-term (external vector store or database).
- Autonomy: The agent decides its own next action rather than waiting for user prompts at each step, within defined boundaries.
2.2 Agent vs. Chatbot vs. Copilot
| System Type | Steps | Tool Use | Autonomy | Example |
|---|---|---|---|---|
| Chatbot | Single | None | None | ChatGPT basic Q&A |
| Copilot | Single | Limited | Suggestions only | GitHub Copilot code completion |
| AI Agent | Multi-step loop | Full | High | OpenAI Operator booking a flight |
| Multi-Agent System | Distributed loops | Full | Very high | Manus parallel research + writing pipeline |
3. How Autonomous Agents Work — Architecture Deep Dive
3.1 The Agent Loop
Every AI agent, regardless of framework, implements some variant of the following loop:
- Observe: Receive the current state of the environment (user goal, previous actions, tool outputs).
- Think: The LLM reasons over the state and decides the next action (which tool to call, with what arguments, or whether the goal is achieved).
- Act: Execute the chosen action via a tool call.
- Observe result: Receive the tool's output (search results, code output, API response).
- Repeat until done or a step limit is reached.
3.2 ReAct: Reasoning + Acting
The ReAct pattern (Yao et al., 2022) is the foundational prompting strategy for agents. The model alternates between Thought (reasoning about what to do next), Action (calling a tool), and Observation (receiving the result). This interleaving of reasoning with action is what makes agents far more reliable than single-step LLM calls on complex tasks.
Thought: I need to find the current price of NVIDIA stock to complete this report.
Action: web_search("NVIDIA stock price March 2026")
Observation: NVIDIA (NVDA) is trading at $138.42 as of market close March 2, 2026.
Thought: I now have the price. I should also check the 52-week range for context.
Action: web_search("NVIDIA 52-week high low 2026")
Observation: 52-week range: $85.12 – $149.77
Thought: I have all the data I need. I can now write the report section.
Action: finish("NVIDIA trades at $138.42, within its 52-week range of $85.12–$149.77.")
3.3 Tool Calling (Function Calling)
Modern LLMs (GPT-4, Claude 3.5, Gemini 1.5) support native function calling: the model outputs a structured JSON payload specifying which tool to invoke and with what arguments, rather than embedding tool calls in natural language. This makes parsing reliable and enables type-safe tool definitions.
3.4 Memory Architecture
- In-context memory: The conversation history within the current context window. Limited by token count (typically 128K–1M tokens for modern models).
- External memory (vector store): Documents, past interactions, and facts stored in a vector database (Pinecone, Qdrant, pgvector). Retrieved via semantic similarity when relevant.
- Episodic memory: Structured logs of past agent runs stored as retrievable summaries — the agent "remembers" that it already completed a similar task last week.
- Procedural memory: Stored workflows and code — the agent knows how to perform tasks it has learned from past executions.
3.5 Planning Strategies
- Chain-of-Thought (CoT): The model reasons step-by-step before acting. Improves accuracy on complex tasks.
- Tree of Thoughts (ToT): The model explores multiple reasoning branches simultaneously and selects the most promising path. Higher quality but more compute-intensive.
- Plan-and-Execute: A planner LLM generates a full task plan upfront; executor agents carry out each step. Better for well-structured tasks with predictable subtasks.
- Reflexion: The agent evaluates its own outputs, generates verbal feedback, and revises its approach — self-correction without human intervention.
4. Types of AI Agents
| Type | Description | Example | Best For |
|---|---|---|---|
| Task agent | Completes a single, well-defined task end-to-end | OpenAI Operator booking a hotel | Business process automation |
| Research agent | Searches, synthesises, and produces reports | Manus competitive analysis agent | Knowledge work, due diligence |
| Code agent | Writes, tests, and debugs code autonomously | Devin, SWE-agent, GitHub Copilot Workspace | Software development |
| Browser agent | Controls a web browser to interact with websites | Google Mariner, OpenAI Operator, Browser Use | Web automation, data extraction |
| Computer use agent | Controls desktop GUI (mouse, keyboard, screen) | Anthropic Computer Use, Claude Desktop | Legacy app automation |
| Multi-agent system | Network of specialised agents collaborating | AutoGen, LangGraph multi-agent, CrewAI | Complex workflows, parallel execution |
5. Key Products & Frameworks in 2026
5.1 OpenAI Operator
Launched in January 2025, Operator is a cloud-hosted browser agent accessible via ChatGPT. It can navigate websites, fill forms, complete checkouts, manage calendars, and interact with web applications using the same credentials a human would. Operator uses a vision-language model to "see" web pages as screenshots and a reasoning layer to decide what to click, type, or scroll. As of early 2026, Operator supports integrations with over 200 services including Instacart, Doordash, Uber, and StubHub. Its key limitation is that it cannot access private enterprise systems without explicit API connectors.
5.2 Google Project Mariner
Mariner is Google's browser automation agent, built on Gemini and integrated into Chrome. It can execute tasks described in natural language — booking reservations, filling forms, and conducting multi-step research — while showing its work in a transparent side panel. Mariner's tight integration with Google Search and Google Workspace gives it a unique advantage for productivity tasks. Available as a Google Labs experiment as of Q1 2026.
5.3 Manus AI
Manus, developed by the Chinese AI startup Monica, emerged in February 2025 as an open-weight multi-agent system capable of executing complex tasks in parallel across separate virtual machines. Each sub-agent handles a specialised domain (research, coding, writing, data analysis) while an orchestrator coordinates the workflow. Manus gained significant attention for completing tasks that took human teams hours in under 15 minutes. An MIT evaluation (January 2026) found Manus outperformed GPT-4o on the GAIA benchmark for general AI assistants by 23 percentage points. It runs on-premise or via cloud API.
5.4 Anthropic Computer Use
Available since October 2024, Claude's Computer Use API allows the model to take screenshots of a desktop environment, move the mouse, click, and type — operating any application with a GUI. Unlike browser agents limited to websites, Computer Use works with legacy software, internal tools, and desktop applications. It is primarily targeted at enterprise automation of workflows that lack APIs.
5.5 LangGraph
LangGraph (by LangChain) is a Python and JavaScript framework for building stateful, multi-actor agent applications as directed graphs. Nodes represent agents or functions; edges represent transitions based on state. Its key contribution is making agent control flow explicit, observable, and debuggable — solving a critical problem with purely LLM-driven agents where execution paths are opaque. LangGraph supports cycles (loops), parallel execution, checkpointing, and human-in-the-loop interruptions.
5.6 AutoGen (Microsoft)
AutoGen is Microsoft Research's framework for multi-agent conversation. Agents with defined roles (planner, coder, critic, executor) collaborate by sending messages, reviewing each other's outputs, and iterating until a task is complete. AutoGen Studio provides a no-code interface for building and testing multi-agent workflows. Particularly strong for software engineering tasks involving code generation and testing.
5.7 CrewAI
An open-source framework for orchestrating role-based AI agents. Developers define a "crew" of agents, each with a role, goal, and backstory, and assign them tasks with explicit dependencies. CrewAI handles coordination, tool sharing, and output passing between agents. Its simplicity and clear mental model have made it popular for rapid prototyping of multi-agent workflows.
6. Agent Framework Comparison Table
| Framework / Product | Type | Open Source | Language | Key Strength | Best For |
|---|---|---|---|---|---|
| OpenAI Operator | Browser agent | No | — | Consumer-grade simplicity; 200+ integrations | End-user task automation |
| Google Mariner | Browser agent | No | — | Chrome + Workspace integration | Google ecosystem users |
| Manus AI | Multi-agent | Partial | Python | Parallel specialised agents; GAIA SOTA | Complex research & coding |
| Anthropic Computer Use | Computer agent | No (API) | — | Full desktop GUI control | Legacy software automation |
| LangGraph | Framework | Yes | Python / JS | Explicit stateful graph; observable | Production agent pipelines |
| AutoGen | Framework | Yes | Python | Multi-agent conversation; code-focused | Software engineering agents |
| CrewAI | Framework | Yes | Python | Role-based crews; rapid prototyping | Narrative workflows |
| Smolagents (HuggingFace) | Framework | Yes | Python | Minimal, open-weight model support | Research, local models |
7. Practical Code — ReAct Agent from Scratch
Build a minimal ReAct agent in Python using the OpenAI API and two custom tools. This code strips away framework abstractions to show exactly how the agent loop works.
7.1 Setup
pip install openai requests
7.2 Define Tools
import json
import requests
import openai
client = openai.OpenAI() # uses OPENAI_API_KEY env var
# Tool 1: Web search (using DuckDuckGo instant answers API)
def web_search(query: str) -> str:
"""Search the web and return a brief answer."""
url = "https://api.duckduckgo.com/"
params = {"q": query, "format": "json", "no_html": 1, "skip_disambig": 1}
resp = requests.get(url, params=params, timeout=10)
data = resp.json()
# AbstractText is the instant answer; fall back to first related topic
result = data.get("AbstractText") or ""
if not result and data.get("RelatedTopics"):
result = data["RelatedTopics"][0].get("Text", "No result found.")
return result or "No instant answer found. Try a more specific query."
# Tool 2: Calculator
def calculator(expression: str) -> str:
"""Evaluate a safe mathematical expression."""
allowed = set("0123456789+-*/(). ")
if not all(c in allowed for c in expression):
return "Error: unsafe expression"
try:
return str(eval(expression, {"__builtins__": {}}))
except Exception as e:
return f"Error: {e}"
# Tool registry
TOOLS = {
"web_search": web_search,
"calculator": calculator,
}
# OpenAI tool definitions (JSON schema)
TOOL_DEFINITIONS = [
{
"type": "function",
"function": {
"name": "web_search",
"description": "Search the web for current information.",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "The search query"}
},
"required": ["query"]
}
}
},
{
"type": "function",
"function": {
"name": "calculator",
"description": "Evaluate a mathematical expression.",
"parameters": {
"type": "object",
"properties": {
"expression": {"type": "string", "description": "Math expression, e.g. '2 ** 10 + 42'"}
},
"required": ["expression"]
}
}
}
]
7.3 The Agent Loop
def run_agent(user_goal: str, max_steps: int = 10) -> str:
"""Run the ReAct agent loop until completion or step limit."""
messages = [
{"role": "system", "content": (
"You are an autonomous AI agent. Use the available tools to "
"complete the user's goal step by step. "
"When you have enough information to give a final answer, "
"respond with plain text (no tool call)."
)},
{"role": "user", "content": user_goal}
]
for step in range(max_steps):
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=TOOL_DEFINITIONS,
tool_choice="auto",
)
msg = response.choices[0].message
# If no tool call, the agent has finished reasoning
if not msg.tool_calls:
return msg.content
# Process all tool calls in this step
messages.append(msg) # append assistant message with tool_calls
for tc in msg.tool_calls:
fn_name = tc.function.name
fn_args = json.loads(tc.function.arguments)
print(f" Step {step+1} → {fn_name}({fn_args})")
if fn_name in TOOLS:
result = TOOLS[fn_name](**fn_args)
else:
result = f"Unknown tool: {fn_name}"
print(f" Observation: {result[:120]}...")
messages.append({
"role": "tool",
"tool_call_id": tc.id,
"content": result
})
return "Agent reached step limit without completing the task."
# Run it
print(run_agent("What is the GDP of Germany in 2024 divided by the population?"))
This loop is the essence of every production agent, whether built with LangGraph, LlamaIndex, or a custom framework. The framework adds state management, persistence, error recovery, and observability on top of this core loop.
8. Practical Code — Multi-Agent Pipeline with LangGraph
LangGraph represents agent workflows as stateful directed graphs, making control flow explicit and production-friendly. This example builds a two-agent research pipeline: a Researcher agent that searches the web and a Writer agent that synthesises a report.
8.1 Setup
pip install langgraph langchain-openai langchain-community
8.2 Define State and Agents
from typing import TypedDict, Annotated, List
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage
import operator
# --- Shared state schema ---
class ResearchState(TypedDict):
topic: str
search_results: Annotated[List[str], operator.add] # Appended by researcher
report: str
steps_completed: int
# --- LLM ---
llm = ChatOpenAI(model="gpt-4o", temperature=0)
# --- Researcher node ---
def researcher(state: ResearchState) -> ResearchState:
"""Searches for information about the topic."""
topic = state["topic"]
# In production: call real search API (Tavily, SerpAPI, etc.)
prompt = f"Find 3 key facts about: {topic}. Return as a numbered list."
response = llm.invoke([HumanMessage(content=prompt)])
return {
"search_results": [response.content],
"steps_completed": state.get("steps_completed", 0) + 1
}
# --- Writer node ---
def writer(state: ResearchState) -> ResearchState:
"""Synthesises search results into a structured report."""
all_results = "\n\n".join(state["search_results"])
prompt = (
f"Write a clear, concise report on '{state['topic']}' "
f"based on these findings:\n\n{all_results}\n\n"
"Format: Introduction, Key Findings, Conclusion."
)
response = llm.invoke([HumanMessage(content=prompt)])
return {"report": response.content}
# --- Build graph ---
graph = StateGraph(ResearchState)
graph.add_node("researcher", researcher)
graph.add_node("writer", writer)
graph.set_entry_point("researcher")
graph.add_edge("researcher", "writer")
graph.add_edge("writer", END)
pipeline = graph.compile()
8.3 Run the Pipeline
# Run the multi-agent pipeline
result = pipeline.invoke({
"topic": "Agentic AI adoption in enterprise 2026",
"search_results": [],
"report": "",
"steps_completed": 0
})
print(result["report"])
LangGraph's strength is that every node transition, state change, and decision point is logged, checkpointable, and replayable. Adding a human-in-the-loop checkpoint is one line: graph.add_interrupt_before("writer") — the pipeline pauses and waits for human approval before the writer runs.
9. Real-World Use Cases
9.1 Enterprise Process Automation
Companies are deploying agents to automate knowledge work that previously required human judgment: invoice processing and exception handling, IT support ticket triage and resolution, procurement approval workflows, HR onboarding document preparation, and compliance report generation. Salesforce's Agentforce platform reported 100,000 enterprise deployments by Q4 2025, with average task completion times reduced by 60–80% compared to human-only workflows.
9.2 Software Engineering
Code agents (Devin, GitHub Copilot Workspace, SWE-agent) can take a GitHub issue, understand the codebase, implement a fix, write tests, and open a pull request — without developer intervention on routine or well-specified bugs. The SWE-bench benchmark, which measures agent performance on real GitHub issues, saw the top score rise from 12% (early 2024) to 49% (early 2026), driven largely by better planning and multi-agent debate strategies.
9.3 Research & Due Diligence
Financial firms and consulting companies use research agents (Manus, Perplexity Pro) to conduct competitive intelligence, gather regulatory filings, analyse earnings calls, and produce structured reports in minutes. Tasks that previously required analyst teams working for days can now be completed by agents in under an hour for initial drafts requiring human review and validation.
9.4 Customer Operations
Tier-1 customer support agents handle password resets, order tracking, refund processing, and account updates end-to-end — accessing CRM, order management, and payment systems through tool calls. These agents achieve resolution rates of 70–80% without human escalation on routine issues (Sierra AI customer benchmarks, Q1 2026).
9.5 Scientific Research Acceleration
AI agents are being deployed in drug discovery pipelines to search literature, identify candidate molecules, design experiments in simulation, and synthesise findings. AlphaFold 3 combined with orchestration agents can now propose and virtually test protein-binding candidates at a rate impossible for human researchers, compressing early-stage drug discovery timelines from years to months.
10. Deploying Agents in Production
10.1 Architecture Patterns
| Pattern | Description | Pros | Cons |
|---|---|---|---|
| Single agent + tools | One LLM with a tool belt | Simple; easy to debug | Bottleneck on complex parallel tasks |
| Supervisor + workers | Orchestrator delegates to specialised sub-agents | Clear division of labour | Orchestrator becomes a single point of failure |
| Peer-to-peer agents | Agents communicate directly via message passing | Decentralised; resilient | Harder to observe and control |
| Agent + human review | Agent proposes actions; human approves before execution | Safe for high-stakes actions | Introduces latency; requires human availability |
10.2 Observability
Agents executing dozens of steps make debugging opaque without proper instrumentation. Essential tools for production agents:
- LangSmith: Traces every LLM call, tool invocation, and token cost in a LangChain/LangGraph pipeline.
- Arize Phoenix: Open-source LLM observability with span tracing, latency analysis, and prompt regression testing.
- OpenTelemetry: Instrument agent traces using the emerging OpenTelemetry standard for AI (OpenInference) for vendor-agnostic observability.
- Step-level logging: Log every agent decision (tool selected, arguments, output) to a structured store for post-hoc debugging.
10.3 Cost Management
Agents make many more LLM calls than chatbots. A single agent task may invoke GPT-4o 15–30 times. Cost control strategies:
- Use cheaper models (GPT-4o mini, Gemini Flash) for routine tool-call steps; reserve the flagship model for complex reasoning steps.
- Cache frequent tool outputs (web search results, database queries) with a TTL appropriate to data freshness requirements.
- Set hard limits on maximum steps, token budget, and wall-clock time per agent run.
- Monitor cost-per-task continuously with alerting on anomalous spending (runaway loops are a real failure mode).
11. Safety, Reliability & Human-in-the-Loop
11.1 The Minimal Footprint Principle
Agents should request only the permissions they need, prefer reversible actions over irreversible ones, and avoid acquiring resources or capabilities beyond what the current task requires. This principle, articulated in Anthropic's agent safety guidelines, prevents a class of failures where agents take consequential actions the user did not intend.
11.2 Prompt Injection Attacks
When an agent browses the web or reads documents, adversarial content in those external sources can attempt to hijack the agent's instructions — a prompt injection attack. Defences include: sandboxing tool outputs before passing them to the LLM, using a separate validation model to screen tool results, and treating all external content as untrusted user input.
11.3 Human-in-the-Loop Checkpoints
For high-stakes actions (sending emails, executing financial transactions, deleting data, deploying code), agents must pause and request human approval. LangGraph's interrupt mechanism, AutoGen's human proxy agent, and custom confirmation prompts all implement this pattern. The key design decision: define upfront which action classes require approval and which can be auto-executed.
11.4 Action Sandboxing
Execute agent-generated code in isolated containers (Docker, E2B, Daytona). Restrict network access from code execution environments. Use read-only filesystem mounts where possible. Never execute agent-generated shell commands with root or administrator privileges.
12. Limitations & Current Challenges
- Compounding errors: In a 10-step agent chain, a 95%-accurate model makes at least one error 40% of the time. Errors early in the pipeline cascade through subsequent steps.
- Hallucinated tool calls: Agents sometimes call tools with arguments that sound plausible but are incorrect (wrong API parameters, non-existent function names), causing silent failures.
- Context window limits: Very long agent runs can exhaust the context window. Strategies: periodic summarisation, external memory, context compression.
- Planning brittleness: Current agents handle well-structured tasks reliably but struggle with ambiguous goals, unexpected intermediate states, and tasks requiring deep domain expertise.
- Verification difficulty: It is hard to verify that an agent completed a task correctly without significant human review — defeating the purpose of automation for some use cases.
- Latency: Multi-step agent runs take seconds to minutes. This rules out agents for real-time, sub-second response requirements.
- Trust and transparency: Users and organisations struggle to trust systems whose decision processes are not fully auditable, particularly in regulated industries (finance, healthcare, legal).
13. Future Directions
- Agent operating systems: Persistent, long-running agent processes that manage multiple concurrent tasks, maintain memory across weeks or months, and dynamically spawn sub-agents as needed. Research prototypes exist; production systems are 12–24 months away.
- Standardised agent protocols: Anthropic's Model Context Protocol (MCP) and Google's Agent-to-Agent (A2A) protocol are emerging standards for inter-agent communication and tool exposure, enabling agents from different vendors to collaborate safely.
- Self-improving agents: Agents that identify their own failure modes, generate synthetic training data from those failures, and fine-tune themselves — closing the loop between deployment and improvement.
- Embodied agents: Physical robots controlled by LLM-based reasoning agents that perceive real-world sensor data and manipulate physical objects. Early commercial systems include Figure 02 and Boston Dynamics' Spot with language interfaces.
- Formal verification: Emerging research into formally verifying agent behaviour against safety specifications — ensuring mathematical guarantees that certain actions will never be taken, regardless of input.
14. Frequently Asked Questions
Are AI agents replacing human workers?
The evidence from early adopters suggests augmentation rather than wholesale replacement. Agents are most effective at handling high-volume, well-defined subtasks within a broader workflow, freeing humans for judgment-intensive, creative, and relationship-driven work. The WEF's January 2026 Future of Jobs report projects that agentic AI will displace 14% of current tasks while creating new roles in agent oversight, workflow design, and output validation — net job creation in the near term, with significant displacement concentrated in specific functions like data entry, routine report writing, and basic customer support.
What is the difference between LangGraph and AutoGen?
LangGraph models agent workflows as explicit directed graphs, giving developers full control over state transitions and making the execution path transparent. It excels at production pipelines with complex routing logic. AutoGen models agent collaboration as multi-party conversations where agents exchange messages — more natural for tasks that genuinely involve dialogue and debate between agents, particularly code generation and review. AutoGen Studio adds a visual no-code interface. Many teams use both: AutoGen for agent-to-agent negotiation logic, LangGraph for workflow orchestration.
How much does running agents cost?
A single GPT-4o agent task averaging 20 LLM calls with 2,000 tokens each costs approximately $0.10–$0.30. At scale, this adds up quickly: 10,000 tasks/day = $1,000–$3,000/day. Cost optimisation using GPT-4o mini for routine steps reduces this by 80–90%. For high-volume use cases, self-hosted open-weight models (Llama 3.3 70B, Qwen 2.5 72B) running on dedicated GPUs can reduce marginal cost to near zero after hardware costs.
Is OpenAI Operator available internationally?
As of March 2026, Operator is available in the United States for ChatGPT Plus subscribers and in limited beta in the UK, EU (excluding some member states with stricter AI regulations), Canada, and Australia. Expansion to additional markets is planned for H2 2026.
What is the Model Context Protocol (MCP)?
MCP is an open standard proposed by Anthropic in November 2024 that defines how AI agents connect to external tools and data sources. Instead of each agent framework requiring custom integrations with every tool, MCP-compliant tools expose a standardised interface that any MCP-compatible agent can call. Major IDE extensions (Cursor, VS Code Copilot), data platforms (Notion, Linear, GitHub), and cloud services have adopted MCP, making it the emerging standard for agentic tool ecosystems. Think of it as USB-C for AI tools.
How do I start building AI agents today?
The fastest practical path: (1) Start with the bare ReAct loop in the code section above to understand the fundamentals. (2) Build your first production-ready agent with LangGraph — it has excellent documentation and a free hosted tracing tier via LangSmith. (3) Explore CrewAI for multi-agent workflows with a more intuitive API. (4) Move to Manus or AutoGen when you need parallel specialised agents for complex research or coding tasks. All frameworks have free tiers and active communities.
15. Glossary
- Agentic AI
- AI systems that autonomously plan, take actions using tools, and pursue goals over multiple steps without requiring human input at each step.
- Tool Calling (Function Calling)
- A capability of modern LLMs to output structured JSON requesting invocation of an external function or API, rather than describing the call in natural language.
- ReAct
- A prompting pattern (Reasoning + Acting) where an agent alternates between reasoning steps (Thought) and action execution (Action/Observation), improving reliability on multi-step tasks.
- LangGraph
- A Python/JavaScript framework by LangChain for building stateful multi-actor agent applications as directed graphs with explicit state transitions.
- AutoGen
- Microsoft Research's framework for multi-agent collaboration via structured conversation between agents with defined roles.
- Agent Orchestrator
- A high-level agent or system that breaks a complex goal into subtasks, delegates them to specialised worker agents, and combines their outputs.
- Human-in-the-Loop (HITL)
- A design pattern where an agent pauses at defined checkpoints and requires human review or approval before continuing execution.
- Model Context Protocol (MCP)
- An open standard by Anthropic defining how AI agents connect to external tools and data sources with a consistent, interoperable interface.
- Prompt Injection
- An attack where adversarial instructions embedded in agent-readable external content (web pages, documents) attempt to override the agent's original instructions.
- GAIA Benchmark
- A benchmark by Meta AI Research measuring general AI assistant performance on real-world tasks requiring multi-step reasoning, tool use, and factual accuracy.
- Reflexion
- A technique where an agent evaluates its own output, generates verbal feedback, and revises its approach without external human guidance.
- Minimal Footprint Principle
- A safety guideline stating that agents should acquire only the permissions and resources strictly necessary for the current task, preferring reversible actions over irreversible ones.
16. References & Further Reading
- Yao et al. — ReAct: Synergizing Reasoning and Acting in Language Models (2022)
- Yao et al. — Tree of Thoughts: Deliberate Problem Solving with Large Language Models (2023)
- Shinn et al. — Reflexion: Language Agents with Verbal Reinforcement Learning (2023)
- Wang et al. — Voyager: An Open-Ended Embodied Agent with Large Language Models (2023)
- SWE-bench: Can Language Models Resolve Real-World GitHub Issues? (2024)
- Anthropic — Building Effective Agents (2024)
- Anthropic — Model Context Protocol (MCP) Specification (2024)
- LangGraph Official Documentation
- Microsoft AutoGen Documentation
- Mialon et al. — GAIA: A Benchmark for General AI Assistants (2023)
Start building: implement the bare ReAct loop from section 7 with your OpenAI API key — it takes under 30 minutes and gives you an intuitive feel for how the agent reasoning loop works. Then graduate to LangGraph for a production-ready stateful pipeline. The gap between "I understand agentic AI conceptually" and "I have a working agent in production" is smaller than you think.