mirror of
https://github.com/coder/coder.git
synced 2026-06-03 21:18:24 +00:00
256 lines
12 KiB
Markdown
256 lines
12 KiB
Markdown
# Architecture
|
||
|
||
Coder's AI agent interacts with workspaces over the same
|
||
connection path as a developer's IDE, web terminal, and SSH session already
|
||
use. There is no sidecar process and no new network paths. If your developers
|
||
can already connect to their workspaces, the agent can too.
|
||
|
||
## Architecture at a glance
|
||
|
||
Three components are involved in every agent interaction:
|
||
|
||
1. **The control plane** runs the agent loop. It receives prompts, streams them
|
||
to the LLM provider, interprets tool calls, and dispatches them to
|
||
workspaces.
|
||
1. **The LLM provider** (Anthropic, OpenAI, Google, Azure, AWS Bedrock, or any
|
||
OpenAI-compatible endpoint) performs model inference. It never communicates
|
||
with the workspace directly.
|
||
1. **The workspace** is standard compute infrastructure. It runs shell commands,
|
||
reads and writes files, and executes processes — exactly what occurs when a
|
||
developer connects via their IDE.
|
||
|
||
<img src="../../images/guides/ai-agents/agent-loop-detailed.png" alt="Architecture diagram">
|
||
|
||
## The same connection your IDE uses
|
||
|
||
This is the key architectural insight: the agent reaches into a workspace
|
||
over the same Tailnet tunnel that a developer's tools already use.
|
||
|
||
When a developer opens a web terminal in the Coder dashboard, connects via
|
||
VS Code Remote, or runs `coder ssh`, the traffic follows this path:
|
||
|
||
1. The client connects to the control plane.
|
||
1. The control plane routes the connection through its internal Tailnet node.
|
||
1. The connection reaches the workspace daemon over a DERP relay or
|
||
direct peer-to-peer link.
|
||
1. The workspace daemon handles the request — spawning a shell,
|
||
forwarding a port, or serving a file.
|
||
|
||
When the agent executes a tool call — reading a file, running a command,
|
||
writing code — it follows the same tunnel:
|
||
|
||
1. The agent loop in the control plane issues a tool call.
|
||
1. The control plane routes the call through its internal Tailnet node.
|
||
1. The call reaches the workspace daemon over the same DERP relay or
|
||
peer-to-peer link.
|
||
1. The workspace daemon handles the request via its HTTP API — reading a file,
|
||
starting a process, or writing content.
|
||
|
||
The underlying tunnel is identical. IDE connections use SSH, web terminals use
|
||
a WebSocket protocol, and the agent uses the workspace daemon's HTTP API — but
|
||
all three traverse the same Tailnet connection and rely on the same security
|
||
boundary. No additional ports or network paths are introduced.
|
||
|
||
### No inbound ports
|
||
|
||
The workspace daemon always dials _out_ to the control plane — never
|
||
the reverse. The control plane then uses that established tunnel to reach back
|
||
in. This means:
|
||
|
||
- The workspace needs no inbound ports or exposed services.
|
||
- You can block all inbound traffic to the workspace.
|
||
- The only required outbound connection from the workspace is to the control
|
||
plane itself.
|
||
|
||
This is unchanged from how workspaces already operate in Coder. Enabling
|
||
Coder Agents does not change your workspace network requirements.
|
||
|
||
## The agent loop
|
||
|
||
When a user submits a prompt, the control plane processes it as a background
|
||
job:
|
||
|
||
1. The prompt is saved to the database and the chat is marked `pending`.
|
||
1. The control plane picks up the chat and marks it `running`.
|
||
1. The control plane streams the conversation to the configured LLM provider.
|
||
1. The model responds with text, reasoning, or tool calls.
|
||
1. If the response includes tool calls, the control plane executes them
|
||
(connecting to the workspace as needed) and returns the results to the model.
|
||
1. Steps 3–5 repeat until the model produces a final response with no further
|
||
tool calls.
|
||
1. The chat is marked `waiting` for the next user message.
|
||
|
||
This loop runs inside the control plane process. There is no separate service
|
||
to deploy — it is part of the same binary that serves the dashboard and API.
|
||
|
||
### Context compaction
|
||
|
||
As conversations grow, the agent automatically summarizes older context to stay
|
||
within the model's context window. When token usage exceeds a threshold, the
|
||
agent generates a compressed summary and inserts it as a new message. Earlier
|
||
messages remain in the database and are still visible to users, but are excluded
|
||
from the model's context window. This happens transparently and keeps
|
||
long-running sessions productive.
|
||
|
||
### Message queuing
|
||
|
||
Users can send follow-up messages while the agent is actively working. Messages
|
||
are queued in the database and delivered when the agent completes its current
|
||
turn — the full sequence of steps until the model stops calling tools. There is
|
||
no need to wait for a response before providing additional context or
|
||
redirecting the agent.
|
||
|
||
## Tool execution
|
||
|
||
Tools are how the agent takes action. Each tool call from the LLM translates to
|
||
a concrete operation — either inside a workspace or within the control plane
|
||
itself.
|
||
|
||
### Workspace connection lifecycle
|
||
|
||
The connection to a workspace is **lazy**. It is not established when a chat
|
||
starts — only when something needs to reach the workspace. This is typically
|
||
triggered by the first tool call that requires workspace access. Once
|
||
established, the connection is cached and reused for the duration of that chat
|
||
session.
|
||
|
||
Chats that don't need workspace access (answering questions, planning an
|
||
approach, discussing architecture) never provision or connect to a workspace.
|
||
|
||
### Workspace tools
|
||
|
||
These tools execute inside the workspace via the workspace daemon's HTTP API.
|
||
They traverse the same Tailnet tunnel used by web terminals and IDE connections.
|
||
|
||
| Tool | What it does |
|
||
|------------------|--------------------------------------------------------------------|
|
||
| `read_file` | Reads file contents with line-number pagination. |
|
||
| `write_file` | Writes content to a file. |
|
||
| `edit_files` | Performs atomic search-and-replace edits across one or more files. |
|
||
| `execute` | Runs a shell command (foreground or background). |
|
||
| `process_output` | Retrieves output from a background process. |
|
||
| `process_list` | Lists all tracked processes in the workspace. |
|
||
| `process_signal` | Sends a signal (SIGTERM or SIGKILL) to a background process. |
|
||
|
||
### Platform tools
|
||
|
||
These tools run entirely within the control plane. They do not require a
|
||
workspace connection.
|
||
|
||
| Tool | What it does |
|
||
|--------------------|-------------------------------------------------------------------|
|
||
| `list_templates` | Browses available workspace templates, sorted by popularity. |
|
||
| `read_template` | Gets template details and configurable parameters. |
|
||
| `create_workspace` | Creates a workspace from a template and waits for it to be ready. |
|
||
|
||
### Orchestration tools
|
||
|
||
These tools manage sub-agents — child chats that work on independent tasks in
|
||
parallel.
|
||
|
||
| Tool | What it does |
|
||
|-----------------|--------------------------------------------------------------|
|
||
| `spawn_agent` | Delegates a task to a sub-agent with its own context window. |
|
||
| `wait_agent` | Waits for a sub-agent to finish and collects its result. |
|
||
| `message_agent` | Sends a follow-up message to a running sub-agent. |
|
||
| `close_agent` | Stops a running sub-agent. |
|
||
|
||
## What runs where
|
||
|
||
Understanding the split between the control plane and the workspace is central
|
||
to the security model.
|
||
|
||
| Responsibility | Where it runs | Details |
|
||
|---------------------|---------------|---------------------------------------------------------------------------|
|
||
| Agent loop | Control plane | Prompt processing, tool dispatch, step iteration. |
|
||
| LLM inference | LLM provider | The control plane streams requests to the external provider. |
|
||
| Chat state | Control plane | All messages, token usage, and status stored in the database. |
|
||
| Git authentication | Control plane | Uses existing Coder external auth (GitHub, GitLab, Bitbucket). |
|
||
| User identity | Control plane | Every action is tied to the user who submitted the prompt. |
|
||
| Model/prompt config | Control plane | Administrators configure providers, models, and system prompts centrally. |
|
||
| File read/write | Workspace | The workspace file system is the source of truth for code. |
|
||
| Shell execution | Workspace | Commands run in the workspace's environment with its packages and tools. |
|
||
| Git operations | Workspace | Commits, pushes, and branch management happen inside the workspace. |
|
||
| Build and test | Workspace | Compilation, test suites, and dev servers run on workspace compute. |
|
||
|
||
The workspace has **zero AI awareness**. There are no LLM API keys, no agent
|
||
processes, and no AI-specific software installed. If you inspect a workspace
|
||
created by the agent, it looks identical to one a developer created
|
||
manually.
|
||
|
||
## Chat state and persistence
|
||
|
||
All chat data is stored in the control plane database, not in the workspace.
|
||
|
||
- **Chat metadata** — status, owner, associated workspace, timestamps, and
|
||
parent/child relationships for sub-agents.
|
||
- **Messages** — every message (user, assistant, tool calls, tool results) is
|
||
stored as a separate record with role, content, and token usage.
|
||
- **Compressed context** — when the agent compacts the conversation, summaries
|
||
are stored with a compression flag so the original context budget is
|
||
preserved.
|
||
- **Queued messages** — follow-up messages sent while the agent is working are
|
||
held in a queue and delivered in order.
|
||
|
||
Because state lives in the database:
|
||
|
||
- Chat history survives workspace stops, rebuilds, and deletions.
|
||
- An administrator can inspect any chat for audit or debugging.
|
||
- The agent can resume work by targeting a new workspace and continuing from the
|
||
last git branch or checkpoint.
|
||
|
||
## Security implications
|
||
|
||
The control plane architecture has direct consequences for how you secure AI
|
||
coding workflows.
|
||
|
||
### No API keys in workspaces
|
||
|
||
LLM provider credentials exist only in the control plane. The workspace never
|
||
sees them. There is nothing for a developer, a compromised dependency, or a
|
||
rogue process to exfiltrate.
|
||
|
||
### Workspaces can be fully network-isolated
|
||
|
||
Because the workspace does not need to reach any LLM provider, you can restrict
|
||
its network access to only:
|
||
|
||
- The control plane (required for the workspace daemon to function).
|
||
- Your git provider (for push/pull operations).
|
||
|
||
Everything else can be blocked. The AI functionality comes from the control
|
||
plane, not from the workspace's network.
|
||
|
||
> [!TIP]
|
||
> For sensitive environments, create dedicated templates for agent workloads
|
||
> with stricter egress rules than your standard developer templates. Because
|
||
> the AI comes from the control plane, these templates do not need any
|
||
> outbound access to LLM providers.
|
||
|
||
### Centralized enforcement
|
||
|
||
Administrators control which models are available, the system prompt, and tool
|
||
configuration from the control plane. Developers can select from the set of
|
||
admin-enabled models when starting or continuing a chat, but cannot add their
|
||
own providers or override system prompts or tool permissions. When an
|
||
administrator removes a model or modifies the system prompt, the change applies
|
||
to all agent sessions immediately.
|
||
|
||
### User identity on every action
|
||
|
||
Every action the agent takes — PRs opened, code committed, commands executed —
|
||
is tied to the user who submitted the prompt. There is no shared bot account or
|
||
anonymous identity. If a developer submits a prompt that results in a pull
|
||
request, that pull request is attributed to them via the git authentication
|
||
already configured in your Coder deployment.
|
||
|
||
## Scaling and resource impact
|
||
|
||
The control plane overhead for Coder Agents is minimal. The heavy computation
|
||
happens elsewhere:
|
||
|
||
- **LLM inference** runs on the external provider's infrastructure.
|
||
- **File I/O, builds, and tests** run on workspace compute.
|
||
- **The control plane** primarily proxies streaming responses and dispatches
|
||
tool calls over existing network connections.
|