AI Integration Overview
Language-agnostic concepts: agents, tools, MCP, communication protocols, and how to choose the right framework for integrating AI into your applications.
Integrating AI into an application goes far beyond making an API call to a model. It means designing how the model interacts with your domain, what tools it has available, how it communicates with external systems, and how you maintain control and traceability over everything it does.
This section covers the foundational concepts β language-agnostic β that you need to understand before choosing a framework. Concrete implementations are in the specific pages for TypeScript, PHP, and Python.
Mental model: from LLM to application
A language model on its own is a function (text) β text. To be useful inside a real application, you need:
Application
βββ Orchestration layer β decides what to do with the model's response
βββ Integration layer β connects the model with your systems
β βββ Tools / Functions β actions the model can invoke
β βββ Context β relevant information the model receives
β βββ Memory β persistence of information across turns
βββ Observation layer β traces, metrics, evaluation
The combination of these layers is what an AI framework provides out of the box. Without one, you build them manually.
Universal primitives
Regardless of language or framework, every serious AI integration revolves around the same primitives:
1. Tool / Function calling
The modelβs ability to invoke functions in your code. The model decides when and with what arguments to call each tool; your code executes them.
LLM β "I need to look up order 42"
β
tool: get_order(id: 42)
β
App β executes the function, returns result
β
LLM β "Order 42 is in 'shipped' status"
Each tool is described with:
- Name: unique identifier
- Description: natural language text explaining what it does (critical β the model uses this to decide whether to call it)
- Parameter schema: typically JSON Schema or a typed equivalent
- Handler: the actual function that executes the action
{
"name": "get_order",
"description": "Retrieves an order by its ID including status, items, and shipping info",
"parameters": {
"type": "object",
"properties": {
"id": { "type": "integer", "description": "The order ID" }
},
"required": ["id"]
}
}
The description is the contract
A tool description is not documentation β itβs the instruction the model reads to decide whether to use it and how. An ambiguous or incomplete description produces incorrect calls or unnecessary invocations. Treat it with the same care as a system prompt.
2. Structured output
Forcing the model to produce responses in a specific format (JSON, XML, etc.) rather than free text. Essential for integrating responses into business workflows.
Prompt: "Extract the order data from this email..."
Unstructured output: "The order is number 42, the customer is..."
Structured output: { "order_id": 42, "customer": "...", "items": [...] }
Modern frameworks expose this directly from your type schema (Zod in TS, Pydantic in Python, etc.), without needing to parse text manually.
3. Streaming
LLM responses are generated token by token. Streaming lets you show text to the user as itβs generated rather than waiting for the full response β it dramatically improves perceived speed.
Without streaming: user waits 8 seconds β receives complete text
With streaming: user sees text appear progressively from the first token
In REST APIs, streaming typically uses Server-Sent Events (SSE) or chunked transfer encoding. Frameworks abstract this with iterators, streams, or reactive hooks.
4. Memory and context
An LLM is stateless β it doesnβt remember previous conversations. βMemoryβ is simply context that your application manages and sends with each call:
| Type | Description | Storage |
|---|---|---|
| In-context | Message history within the current context window | In-memory message array |
| Short-term | Summaries or fragments of recent conversations | Redis, database |
| Long-term | Persistent information about the user or entity | Vector DB, relational database |
| Episodic | Results and learnings from past executions | Database + embeddings |
5. Agent
An agent is an LLM operating in a loop: observe, decide an action (tool call or response), execute, observe the result, repeat. See AI Agents for the full architecture.
What frameworks add over a manual loop:
- Agent state management across steps
- Error handling and retries
- Tracing of each decision
- Context injection at each iteration
- Types and interfaces for tools and responses
MCP: Model Context Protocol
MCP is an open protocol (originally from Anthropic, now under AAIF) that standardizes how agents connect with external tools. It is, essentially, the βUSB-C of agent integrationsβ.
Without MCP: Agent A ββ Tool (Agent A's own format)
Agent B ββ Tool (Agent B's own format) β same tool, two implementations
With MCP: Agent A ββ MCP ββ Tool
Agent B ββ MCP ββ Tool β same tool, one protocol
How MCP works
MCP defines a client-server protocol:
Application/Agent (MCP Client)
β JSON-RPC 2.0 over stdio / HTTP+SSE
MCP Server (exposes tools, resources, prompts)
β
External system (database, API, filesystem, etc.)
An MCP Server exposes three types of capabilities:
| Capability | Description | Example |
|---|---|---|
| Tools | Functions the model can invoke | execute_sql, create_issue |
| Resources | Data the model can read | Files, DB records, URLs |
| Prompts | Reusable prompt templates | Analysis templates, summaries |
Implementing an MCP Server (language-agnostic pseudocode)
MCP Server "database-server":
tools:
- name: "query_database"
description: "Execute a read-only SQL query"
parameters:
sql: string (required)
limit: integer (optional, default: 100)
handler:
validate sql is SELECT only
execute against read replica
return results as JSON
resources:
- uri: "db://schema"
description: "Current database schema"
handler:
return INFORMATION_SCHEMA tables as markdown
transport: stdio // or HTTP+SSE for remote servers
MCP adoption
As of early 2026, there are over 10,000 public MCP servers. If you expose capabilities for agents, implementing MCP means any agent (Claude, GPT-4, Gemini, any framework) can use your tools without framework-specific integration code.
MCP vs direct tools
| Direct tools | MCP | |
|---|---|---|
| Integration | Custom code in the same process | Separate process, standard protocol |
| Reusability | One implementation per framework | One implementation, any framework |
| Security | Direct runtime access | Isolated process, controlled attack surface |
| Overhead | None (in-process) | IPC or HTTP (typically <5ms) |
| Best for | App-specific tools | Shared tools, infrastructure, third-parties |
Agent-to-agent communication protocols
When a system has multiple agents that need to coordinate:
A2A Protocol (Agent-to-Agent)
Googleβs proposal (2025) for communication between agents across different systems. Where MCP solves agentβtool, A2A solves agentβagent:
Orchestrator Agent β A2A β Specialist Agent A
β A2A β Specialist Agent B
Each agent publishes an Agent Card (JSON) describing its capabilities, allowing the orchestrator to discover and delegate dynamically.
Direct handoff
For simpler systems, handoff can simply be a tool call where the result includes the complete state for the next agent:
orchestrator_tool: delegate_to_researcher(task, context)
β researcher runs
β returns { findings, sources, confidence }
orchestrator uses findings to call: delegate_to_writer(task, findings)
Observability: agent tracing
Agents are hard to debug because errors chain together. You need traceability at the level of each decision, not just the final input/output.
The minimum unit of tracing is a span:
Span: process_user_request (root)
βββ Span: llm_call (model=claude-3-5-sonnet, tokens=1847)
β input: [messages array]
β output: tool_call: search_orders(query="...")
βββ Span: tool_execution (tool=search_orders)
β input: { query: "..." }
β output: [3 orders]
βββ Span: llm_call (model=claude-3-5-sonnet, tokens=2103)
β input: [messages + tool result]
β output: final response
βββ metadata: total_tokens=3950, duration=4.2s, cost=$0.012
Relevant standards
| Standard | Purpose |
|---|---|
| OpenTelemetry | Distributed traces, metrics, and logs. Natively supported by most frameworks. |
| LangSmith | LLM-specific platform for tracing, evals, and datasets. Integrates with LangChain/LangGraph but also works as an agnostic SDK. |
| Braintrust | LangSmith alternative, more focused on evals. |
How to choose a framework
The decision depends primarily on the projectβs language, but there are nuances:
What type of application are you building?
β
βββ API/backend with AI responses (streaming, structured output)
β βββ TypeScript β AI SDK (Vercel)
β βββ PHP β Symfony AI or Neuron AI
β βββ Python β Pydantic AI
β
βββ Complex agents with workflows and memory
β βββ TypeScript β Mastra
β βββ PHP β Neuron AI
β βββ Python β LangGraph
β
βββ RAG + semantic search
β βββ TypeScript β AI SDK + vector store
β βββ Python β LangChain / LlamaIndex
β
βββ Observability / evals on existing LLM
βββ LangSmith (framework-agnostic)
Avoid over-engineering
You donβt need an agent framework to add AI to your application. If you only need structured output or a streaming call, use the providerβs official SDK directly. Agent frameworks add value when you have tool loops, multi-step workflows, or persistent memory.
Integration patterns by use case
Case 1: Data enrichment
The model processes entities from your domain and adds structured information.
Input: Order { id, description, items }
LLM: classifies category, extracts intent, suggests tags
Output: Order + { category, intent, tags, confidence }
Framework: Direct SDK with structured output. No tools, no agent.
Case 2: Conversational interface over data
The user asks questions in natural language about your system.
User: "How many pending orders are there over β¬500?"
LLM: calls tool count_orders(status="pending", min_amount=500)
App: executes query, returns 12
LLM: "There are 12 pending orders over 500 euros"
Framework: SDK with tools or lightweight framework (AI SDK, Pydantic AI).
Case 3: Process automation
The agent completes a multi-step task without human intervention.
Task: "Review open PRs, comment on those with type errors, and close ones with no activity for over 30 days"
Agent: list_prs() β analyse_each() β comment() / close()
Framework: Full agent framework (Mastra, LangGraph, Neuron AI).
Case 4: Structured content generation
The model generates documents, reports, or code from context.
Input: project data, metrics, history
Output: executive report in Markdown / JSON
Framework: Direct SDK with structured output + prompt engineering.
Concrete implementations for each ecosystem are in the following pages:
On This Page
- Mental model: from LLM to application
- Universal primitives
- 1. Tool / Function calling
- 2. Structured output
- 3. Streaming
- 4. Memory and context
- 5. Agent
- MCP: Model Context Protocol
- How MCP works
- Implementing an MCP Server (language-agnostic pseudocode)
- MCP vs direct tools
- Agent-to-agent communication protocols
- A2A Protocol (Agent-to-Agent)
- Direct handoff
- Observability: agent tracing
- Relevant standards
- How to choose a framework
- Integration patterns by use case
- Case 1: Data enrichment
- Case 2: Conversational interface over data
- Case 3: Process automation
- Case 4: Structured content generation