Commit Graph

19 Commits

Author SHA1 Message Date
Steven Masley 51b531f5b3 chore: 'go generate' mockgen to use go tool wrapper (#25490)
Calling `mockgen` relies on the executable in the `$PATH`. Using `go
tool` uses the one defined in `go.mod`
2026-05-19 14:53:13 +00:00
Mathias Fredriksson 6b0bb02e5d fix: server-side diffs and stricter fuzzy splicing for edit_files (#24454)
Fixes three classes of edit_files bugs and adds structured per-file
diff output for tool callers:

- New IncludeDiff flag on FileEditRequest; when set, the agent
  returns FileEditResponse.Files[]{Path, Diff} with unified diffs
  computed via go-udiff v0.4.1 Lines + ToUnified (not Unified,
  which calls log.Fatalf on internal error).
- Fuzzy match comparators split each line into leading whitespace,
  body, trailing whitespace, and ending. The splice substitutes at
  each position: on agreement between search and replace the file's
  bytes win; on disagreement the replacement's bytes are spliced
  verbatim. Carve-outs for empty-body lines, multi-line EOF splices,
  and level-aware indent translation for inserted lines.
- Indent-unit detection (GCD for spaces, tab-priority) lets a 4sp
  LLM search insert correctly into tab or 2sp files. Falls back to
  the previous cLead-inheritance path when units can't be detected
  cleanly.
- Empty search is rejected with "search string must not be empty".
- Duplicate file paths in one request are rejected; symlink aliases
  resolved via api.resolvePath before the dedup check.
- Frontend EditFilesRenderer consumes the structured files array by
  explicit path (no label munging) with per-file synthetic fallback
  for older agents or mismatched paths. On error, no diff is
  rendered so the synthetic fallback doesn't misrepresent a
  rejected edit as applied.

Breaking change: AgentConn.EditFiles changes from (ctx, req) error
to (ctx, req) (FileEditResponse, error) in codersdk/workspacesdk.
Source-breaking for external Go consumers; no compat shim per plan
owner.

Out of scope (tracked in CODAGT-214): level-aware indent for
middle-substituted splice lines. Locked in
TestEditFiles_FuzzyIndent_InsertionLevelAware's Lock_* cases plus
TestEditFiles_ReplaceAll_FuzzyIndentGap.
2026-04-18 16:39:34 +03:00
Michael Suchacz 1cf0354f72 feat: add plan mode with restricted tool boundary (#24236)
> This PR was authored by Mux on behalf of Mike.

## Summary
- add persistent plan mode for chats and the chat-specific plan file
flow
- add structured planning tools such as `ask_user_question` and
`propose_plan`
- keep `write_file` and `edit_files` constrained to the chat-specific
plan file during plan turns
- allow shell exploration in plan mode, including subagents, via
`execute` and `process_output`
- block implementation-oriented, provider-native, MCP, dynamic, and
computer-use tools during plan turns
- update the chat UI, tests, and docs for the new planning flow
2026-04-16 11:12:01 +02:00
Hugo Dutka efb19eb748 feat: agents desktop recording thumbnail backend (#24022)
The agents chat interface displays thumbnails for videos recorded by the
computer use agent. Currently, to display a thumbnail, the frontend
downloads the entire video and shows the first frame. This PR starts
storing a new thumbnail file in the database for every recorded video,
and exposes the file id in the `wait_agent` tool result alongside the
recording file id, so the frontend can fetch just the thumbnail.
2026-04-09 13:47:54 +02:00
Hugo Dutka 17dec2a70f feat: agents desktop recordings backend (#23894)
This PR introduces screen recording of the computer use agent using the
virtual desktop.

- Screen recording is triggered by a `wait_agent` tool call. Recording
is stopped by a successful `wait_agent` tool call or when there hasn't
been any desktop activity for 10 minutes.
- Recordings are handled by the `portabledesktop` cli via the `record`
command. The videos are sped up in periods of inactivity.
- Recordings are saved to the database to the `chat_files` table.
There's a hard limit of 100MB per recording. Larger recordings are
dropped.
- A successful `wait_agent` on a computer use subagent tool call returns
a `recording_file_id`, later allowing the frontend to display the
corresponding video.
2026-04-02 17:23:27 +00:00
Kyle Carberry ee855f9618 feat: make agent context paths configurable via env vars (#23878)
Replace hardcoded paths for instruction files, skills, and MCP config
with
values read from `CODER_AGENT_EXP_*` environment variables. Template
authors
configure paths via the existing `coder_agent` `env` block. The agent
resolves `~`, relative, and absolute paths locally, then serves the
resolved config over `GET /api/v0/context-config`. `chatd` fetches this
once per workspace attach and falls back to today's defaults for older
agents.

All path env vars are comma-separated, allowing multiple directories:

| Env Var | Default | Controls |
|---|---|---|
| `CODER_AGENT_EXP_INSTRUCTIONS_DIRS` | `~/.coder` | Dirs containing the
instruction file |
| `CODER_AGENT_EXP_INSTRUCTIONS_FILE` | `AGENTS.md` | Instruction file
name |
| `CODER_AGENT_EXP_SKILLS_DIRS` | `.agents/skills` | Skills directories
|
| `CODER_AGENT_EXP_SKILL_META_FILE` | `SKILL.md` | Skill metadata file
name |
| `CODER_AGENT_EXP_MCP_CONFIG_FILES` | `.mcp.json` | MCP config files |

### Example

```hcl
resource "coder_agent" "main" {
  os   = "linux"
  arch = "amd64"
  env = {
    CODER_AGENT_EXP_INSTRUCTIONS_DIRS  = "/opt/company/agent-config,~/.coder"
    CODER_AGENT_EXP_INSTRUCTIONS_FILE  = "CLAUDE.md"
    CODER_AGENT_EXP_SKILLS_DIRS        = "/opt/company/ai-skills,.agents/skills"
    CODER_AGENT_EXP_MCP_CONFIG_FILES   = "/opt/company/mcp.json,.mcp.json"
  }
}
```

<details>
<summary>Implementation Details</summary>

### Architecture

Follows the same pattern as MCP tool discovery:
agent resolves locally → exposes via HTTP → chatd consumes.

**Agent-side** (`agent/agentcontextconfig/`):
- `ResolvePath` / `ResolvePaths` handle `~`, relative, and absolute path
forms; returns `""` for relative paths when baseDir is empty
- `Config` reads env vars, falls back to defaults, resolves all paths
- `GET /api/v0/context-config` serves the resolved config as JSON

**chatd-side** (`coderd/x/chatd/`):
- Calls `conn.ContextConfig()` once on first workspace attach
- Falls back to hardcoded defaults on 404 (older agents)
- Iterates instruction dirs, skills dirs using resolved absolute paths
- `LSRelativityRoot` everywhere — no more home/root juggling

### Key design decisions

- **`EXP_` prefix**: env vars use `CODER_AGENT_EXP_*` to indicate
experimental status
- **Plural names**: comma-separated vars use plural names (`DIRS`,
`FILES`); single-value vars use singular (`FILE`)
- **Defaults in `workspacesdk`**: default constants live in
`codersdk/workspacesdk/` so both agent and server reference them without
cross-layer imports
- **`skillMetaFile` persistence**: stored on context-file parts via
`ContextFileSkillMetaFile` and restored on subsequent chat turns so
custom values survive across turns
- **Working dir dedup**: `slices.Contains` guard prevents reading the
same instruction file from both `InstructionsDirs` and the working
directory
- **MCP server dedup**: first-occurrence-wins dedup prevents leaking
duplicate connections from overlapping config files
- **ResolvePath safety**: returns `""` for relative paths when `baseDir`
is empty, so `ResolvePaths` filters them out

### Files changed

| File | Change |
|---|---|
| `agent/agentcontextconfig/` | New package — path resolution + HTTP
endpoint |
| `codersdk/workspacesdk/agentconn.go` | `ContextConfigResponse` type,
default constants, client method |
| `agent/agent.go` + `agent/api.go` | Wire up endpoint, pass config to
MCP |
| `agent/x/agentmcp/manager.go` | Accept `[]string` MCP config paths,
dedup by name |
| `coderd/x/chatd/chatd.go` | Fetch config, thread through, named
returns |
| `coderd/x/chatd/instruction.go` | Accept configurable dir + file name,
`skillMetaFileFromParts` |
| `coderd/x/chatd/chattool/skill.go` | Accept configurable dirs + meta
file |
| `codersdk/chats.go` | `ContextFileSkillMetaFile` field for persistence
|

### Test coverage

- `TestConfig` (4 cases): defaults, custom env vars, whitespace
trimming, comma-separated dirs
- `TestResolvePath` / `TestResolvePaths`: including empty baseDir edge
case
- `TestPersistInstructionFilesFallbackOnOlderAgent`: backward-compat
path when `ContextConfig` returns 404
- `TestChatMessagePartVariantTags`: updated exclusion list for new
internal field

### Backward compatibility

Older agents return 404 for the new endpoint. `chatd` catches this and
falls back to today's defaults via `readHomeInstructionFile` (using
`LSRelativityHome`). Existing workspaces work with no changes.

</details>
2026-04-01 12:28:47 -04:00
Kyle Carberry 0f86c4237e feat: add workspace MCP tool discovery and proxying for chat (#23680)
Coder's chat (chatd) can now discover and use MCP servers configured in
a workspace's `.mcp.json` file. This brings project-specific tooling
(GitHub, databases, docs servers, etc.) into the chat without any manual
configuration.

## How it works

The workspace agent reads `.mcp.json` from the workspace directory (same
format Claude Code uses), connects to the declared MCP servers —
spawning child processes for stdio servers and connecting over the
network for HTTP/SSE — and caches their tool lists. Two new agent HTTP
endpoints expose this:

- `GET /api/v0/mcp/tools` returns the cached tool list (supports
`?refresh=true`)
- `POST /api/v0/mcp/call-tool` proxies calls to the correct server

On each chat turn, chatd calls `ListMCPTools` through the existing
`AgentConn` tailnet connection, wraps each tool as a
`fantasy.AgentTool`, and adds them to the LLM's tool set alongside
built-in and admin-configured MCP tools. Tool names are prefixed with
the server name (`github__create_issue`) to avoid collisions.

Failed server connections are logged and skipped — they never block the
agent or break the chat. Child stdio processes are terminated on agent
shutdown.
2026-03-26 19:57:02 +00:00
Mathias Fredriksson 41e15ae440 feat: make process output blocking-capable (#23312)
Replace the 200ms polling loop in chatd's execute and
process_output tools with server-side blocking via sync.Cond
on HeadTailBuffer.

The agent's GET /{id}/output endpoint accepts ?wait=true to
block until the process exits or a 5-minute server cap expires.
The process_output tool blocks by default for 10s (overridable
via wait_timeout), and falls back to a non-blocking snapshot on
timeout. The execute tool's foreground path makes a single
blocking call instead of polling.

Related #23316
2026-03-20 14:33:55 +02:00
Hugo Dutka 84527390c6 feat: chat desktop backend (#23005)
Implement the backend for the desktop feature for agents.

- Adds a new `/api/experimental/chats/$id/desktop` endpoint to coderd
which exposes a VNC stream from a
[portabledesktop](https://github.com/coder/portabledesktop) process
running inside the workspace
- Adds a new `spawn_computer_use_agent` tool to chatd, which spawns a
subagent that has access to the `computer` tool which lets it interact
with the `portabledesktop` process running inside the workspace
- Adds the plumbing to make the above possible

There's a follow up frontend PR here:
https://github.com/coder/coder/pull/23006
2026-03-13 19:49:34 +01:00
Hugo Dutka 48ab492f49 feat: agents git watch backend (#22565)
Adds real-time git status watching for workspace agents, so the frontend
can subscribe over WebSocket and show
git file changes in near real-time.

1. Subscription is scoped to a **chat** via `GET
/api/experimental/chats/{chat}/git/watch`.
2. The workspace agent automatically determines which paths to watch
based on tool calls made by the chat (and its ancestor chats).
3. Workspace agent polls subscribed repo working trees on a 30s
interval, on tools calls, and on explicit `refresh` from the client.
4. Scans are rate-limited to at most once per second.
5. Edited paths are tracked **in-memory** inside the workspace agent.
There is no database persistence — state is lost on agent restart. This
will be addresses in a future PR.
6. Messages sent over WebSocket include a full-repo snapshot (unified
diff, branch, origin). A new message is emitted only when the snapshot
changes.

This PR was implemented with AI with me closely controlling what it's
doing. The code follows a plan file that was updated continuously during
implementation. Here's the file if you'd like to see it:
[project.md](https://gist.github.com/hugodutka/8722cf80c92f8a56555f7bc595b770e2).
It reflects the current state of the PR.
2026-03-06 10:47:55 +01:00
Kyle Carberry a621c3cb13 feat(agent): add process execution API and rewrite execute tool (#22416)
## Summary

Adds a new agent-side process management HTTP API and rewrites the chat
execute tool to use it instead of SSH sessions.

## What changed

### New agent/agentproc/ package

- **headtail.go** — Thread-safe io.Writer with bounded memory (16KB head
+ 16KB tail ring buffer). Provides LLM-ready output with truncation
metadata and long-line truncation at 2048 bytes.
- **headtail_test.go** — 16 tests including race detector coverage for
concurrent writes.
- **process.go** — Manager + Process types for lifecycle management
using agentexec.Execer for proper OOM/nice scores.
- **api.go** — HTTP API following the agentfiles chi router pattern. 4
endpoints: start, list, output, signal.

### Agent wiring (agent/agent.go, agent/api.go)

Mounts the process API at /api/v0/processes, mirroring how agentfiles is
mounted.

### SDK (codersdk/workspacesdk/agentconn.go)

4 new AgentConn interface methods + 7 request/response types:
- StartProcess, ListProcesses, ProcessOutput, SignalProcess

### Execute tool rewrite (coderd/chatd/chattool/execute.go)

- SSH to Agent API: conn.StartProcess() + conn.ProcessOutput() polling
- New parameters: workdir, run_in_background
- Structured response: success, exit_code, wall_duration_ms, error,
truncated, note, background_process_id
- Non-interactive env vars: GIT_EDITOR=true, TERM=dumb, NO_COLOR=1,
PAGER=cat, etc.
- Output truncation: HeadTailBuffer caps at 32KB for LLM consumption
- File-dump detection with advisory notes suggesting read_file
- Default timeout: 60s to 10s
- Foreground polling: 200ms intervals until exit or timeout

## Architecture

State lives on the agent, surviving coderd failover and instance
changes. Any coderd replica can query any agent via HTTP over tailnet.
2026-02-28 12:33:52 -05:00
Kyle Carberry b65c0766d2 feat: add line-based read_file tool with safety limits (#22400)
## Summary

Adds a new line-based file reading endpoint to the workspace agent,
replacing the unbounded byte-based approach for the `read_file` chat
tool and `coder_workspace_read_file` MCP tool.

**Problem**: The current `read_file` tool returns the entire file
contents with no limits, which can blow up LLM context windows and cause
OOM issues with large files.

**Solution**: Inspired by [`coder/mux`](https://github.com/coder/mux)
and [`openai/codex`](https://github.com/openai/codex), implement a
line-based reader with safety limits.

## Changes

### Agent (`agent/agentfiles/`)
- New `/read-file-lines` endpoint with `HandleReadFileLines` handler
- Line-based `offset` (1-based line number, default: 1) and `limit`
(line count, default: 2000)
- Safety constants:
  | Constant | Value | Purpose |
  |---|---|---|
  | `MaxFileSize` | 1 MB | Reject files larger than this at stat |
| `MaxLineBytes` | 1,024 | Per-line truncation with `... [truncated]`
marker |
  | `MaxResponseLines` | 2,000 | Max lines per response |
  | `MaxResponseBytes` | 32 KB | Max total response size |
  | `DefaultLineLimit` | 2,000 | Default when no limit specified |
- Line numbering format: `1\tcontent` (tab-separated)
- Structured JSON response: `{ success, file_size, total_lines,
lines_read, content, error }`
- Hard errors when limits exceeded — tells the LLM to use
`offset`/`limit`
- Existing byte-based `/read-file` endpoint preserved (used by
`instruction.go`)

### SDK (`codersdk/workspacesdk/`)
- `ReadFileLinesResponse` type added
- `ReadFileLines` method added to `AgentConn` interface
- Mock regenerated

### Chat tool (`coderd/chatd/chattool/`)
- `read_file` tool now uses `conn.ReadFileLines()` instead of
`conn.ReadFile()`
- Updated tool description to document line-based parameters
- Response includes `file_size`, `total_lines`, `lines_read` metadata

### MCP tool (`codersdk/toolsdk/`)
- `coder_workspace_read_file` updated to use line-based reading
- Schema descriptions updated for line-based offset/limit
- Removed `maxFileLimit` constant (agent handles limits now)

### Tests
- 13 new test cases for `TestReadFileLines`:
- Path validation (empty, relative, non-existent, directory, no
permissions)
  - Empty file handling
  - Basic read, offset, limit, offset+limit combinations
  - Offset beyond file length
  - Long line truncation (>1024 bytes)
  - Large file rejection (>1MB)
- All existing tests pass unchanged

## Design decisions

| Decision | Rationale |
|---|---|
| Line-based, not byte-based | Both coder/mux and openai/codex use
line-based — matches how LLMs reason about code |
| Default limit of 2000 | Matches codex; prevents accidental full-file
dumps while being generous |
| 32 KB response cap | Compromise between mux (16 KB) and codex (no cap)
|
| 1024 byte/line truncation with marker | More generous than codex
(500), marker helps LLM know data is missing |
| Hard errors on overflow | Matches mux; forces LLM to paginate rather
than getting partial data |
| Preserve byte-based endpoint | `instruction.go` needs raw byte access
for AGENTS.md |
2026-02-27 15:12:56 -05:00
Spike Curtis 49b34a716a fix: fix slog to always use array of Fields (#21426)
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).

It also updates dependencies that also use slog and were updated.

I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.

Other dependencies, I pushed new tags.
2026-01-08 10:29:41 +04:00
Danielle Maywood 05529139bc feat(coderd): support deleting dev containers (#21248)
Add an endpoint to coderd to support deleting dev containers
2025-12-24 12:34:39 +00:00
Asher be7aa58075 feat: add coder_workspace_ls MCP tool (#19652) 2025-09-12 15:57:15 -08:00
Asher 30330abaea feat: add coder_workspace_edit_file MCP tool (#19629) 2025-09-12 15:36:14 -08:00
Asher d5a02d570f feat: add coder_workspace_write_file MCP tool (#19591) 2025-09-11 12:17:15 -08:00
Asher 4bf63b4068 feat: add coder_workspace_read_file MCP tool (#19562)
Follows similarly to the bash tool (and some code to connect to an agent
was extracted from it).

There are two main parts: a new agent endpoint, and then a new MCP tool
that consumes that endpoint.
2025-09-09 15:12:24 -08:00
Danielle Maywood 5e84d257b7 refactor: convert workspacesdk.AgentConn to an interface (#19392)
Fixes https://github.com/coder/internal/issues/907

We convert `workspacesdk.AgentConn` to an interface and generate a mock
for it. This allows writing `coderd` tests that rely on the agent's HTTP
api to not have to set up an entire tailnet networking stack.
2025-08-20 10:00:44 +01:00