coder

mirror of https://github.com/coder/coder.git synced 2026-06-02 20:48:20 +00:00

Author	SHA1	Message	Date
Steven Masley	1afc6d4fd0	feat: structured disconnect attribution for agent logs (#25191 ) Implements [PLAT-60](https://linear.app/codercom/issue/PLAT-60/enhance-disconnect-logs-with-structured-reason-attribution): adds structured disconnect attribution to disconnect logs throughout the agent and tailnet packages. Every disconnect log site now carries structured slog fields. All existing logs remain; existing messages are preserved with the fields added alongside. New fields on disconnect log lines: - `connect_type` — which layer disconnected: `server_to_agent`, `agent_to_client`, or `client_to_server` - `disconnect_reason` — categorical reason: `graceful`, `network_error`, `server_shutdown`, etc. - `disconnect_expected` — whether the disconnect is normal operation (`true`) or should be investigated (`false`) - `disconnect_initiator` — who started it: `client`, `agent`, `server`, or `network` (control-plane sites only) - `disconnect_detail` — free-form supplemental info (where useful) ## What's covered Control plane (`server_to_agent`): coordination RPC, DERP map subscriber, agent runLoop, agent Close, `BasicCoordination.Close`, `Controller.run`. Data plane (`agent_to_client`): SSH sessions, reconnecting PTY, JetBrains port-forwarding. <details> <summary>Control-plane sites</summary> \| Site \| Reason \| Initiator \| \|---\|---\|---\| \| `agent/agent.go` `runLoop` EOF \| `network_error` \| `network` \| \| `agent/agent.go` `runCoordinator` deferred exit \| `server_shutdown` / `graceful` / `network_error` \| `agent` / `server` / `network` \| \| `agent/agent.go` `runDERPMapSubscriber` deferred exit \| same (shared `classifyCoordinatorRPCExit`) \| same \| \| `agent/agent.go` `Close` shutdown timeout \| `server_shutdown` + detail \| `agent` \| \| `agent/agent.go` `Close` clean coord disconnect \| `server_shutdown` \| `agent` \| \| `tailnet/controllers.go` `BasicCoordination.Close` \| `graceful` or `network_error` \| `c.initiator` \| \| `tailnet/controllers.go` `Controller.run` `net.ErrClosed` \| `network_error` \| `network` \| </details> <details> <summary>Data-plane sites</summary> \| Site \| Reason \| Notes \| \|---\|---\|---\| \| `agent/agentssh/agentssh.go` SSH session closed \| free-form (`graceful`, `process exited with error status: N`, etc.) \| Also sets `closeCause("normal exit")` for clean exits so coderd's `connection_log.DisconnectReason` is no longer empty \| \| `agent/reconnectingpty/server.go` PTY closed \| `server_shutdown`, error string, or `graceful` \| \| \| `agent/agentssh/jetbrainstrack.go` channel closed \| `normal close` or error string \| Previously passed empty reason \| </details> <details> <summary>Bug fix</summary> The deferred `disconnected from coordination RPC` log no longer fires when the initial `Coordinate()` RPC call fails before any connection is established. </details> Refs PLAT-60. --- _This PR was prepared by Coder Agents on behalf of @Emyrk._ Manually QA'd a lot of common disconnects --------- Co-authored-by: Coder Agents <noreply@coder.com>	2026-05-19 09:47:03 -05:00
Mathias Fredriksson	ca6450cf94	fix(agent): gate MCP tool discovery on startup (#25034 ) The first `/mcp/tools` request could race workspace startup and return an empty tool list before startup scripts had a chance to write `.mcp.json`. Chatd may only discover tools once for a turn, so that empty response could hide workspace MCP tools even though the agent loaded them later. Make the manager wait for startup to settle before treating missing MCP config files as a real empty state. Tool listing now goes through one manager-owned path that starts reload work independently of caller cancellation; caller contexts only bound that caller's wait. After the first reload body settles, transient reload errors return cached tools with the error so the HTTP handler can degrade to the last known tool set instead of returning `[]`. The handler is intentionally thin: it asks the manager for tools, logs any degraded path, and still returns the tool response shape callers already expect. Tests cover startup gating, caller-canceled waits, manager close, reload timeout via quartz, and cached-tool fallback after a later reload error.	2026-05-11 12:57:22 +03:00
Sas Swart	1ba7139f21	feat: add session correlation fields to BoundaryLog proto (#24809 ) 1 of 9 [next >>](https://github.com/coder/coder/pull/24811) RFC: [Bridge ↔ Boundaries Correlation RFC](https://www.notion.so/Bridge-Boundaries-Correlation-313d579be59281f3b4efdbfd6896775a) Adds three new proto fields for boundary session correlation. `ReportBoundaryLogsRequest` - `session_id` (string, field 2) — UUID generated by boundary at startup, shared across all batches from a single run. - `confined_process` (string, field 3) — name of the confined process (e.g. `claude-code`, `codex`, `copilot`). `BoundaryLog` - `sequence_number` (uint64, field 4) — monotonically increasing counter per session, primary ordering key when boundary is in use. `BoundaryLog.time` already existed at field 2; no change needed there. API version bumped to v2.9. No behaviour change in coderd or the agent. This is a pure schema bump that the boundary repo will consume in its own stack. > Generated by Coder Agents	2026-05-05 10:36:26 +02:00
Mathias Fredriksson	881df9a5b0	feat: reload MCP config on change via lazy stat-on-request (#24700 ) The MCP manager previously read .mcp.json exactly once at agent startup. Editing the file had no effect until workspace rebuild or agent restart. handleListTools now stats config file mtimes on every tool-list request and triggers a differential reload when any file changed. Unchanged servers keep their client pointer so in-flight tool calls survive. Concurrent reload requests coalesce via singleflight. MCP stdio subprocesses use the agent's execer for resource limits and receive the same enriched environment as SSH sessions via updateEnv. On the chatd side, WorkspaceMCPTool.Run detects 404 responses from CallMCPTool (indicating the server was removed) and drops the chat's cached tool list so the next turn refetches from the agent.	2026-04-28 19:47:14 +03:00
Mathias Fredriksson	3c450899ea	fix: pass agent context config explicitly instead of reading env (#24759 ) The CODER_AGENT_EXP_* env vars are agent-internal options. When set in the workspace environment they leak to MCP subprocesses and user shells. ReadEnvConfig() captures the values and ClearEnvVars() strips them before the reinit loop, so config survives agent restarts. NewAPI and ReadEnvConfig both use applyDefaults() to fill zero fields. The chatd test passes config via agenttest.WithContextConfigFromEnv().	2026-04-28 17:58:28 +03:00
Zach	72f35e1cd3	feat: runtime user secrets injection into workspaces (#24313 ) Injects user secrets into workspace agents at runtime via the agent manifest. Secrets with an environment variable name are set as environment variables in every agent session and startup script. Secrets with a file path are written to disk before startup scripts run. - Fetch user secrets in GetManifest and convert to proto - Defensively strip secrets from manifests received by the agent to avoid accidental leakage - Add WorkspaceSecret type and proto conversion helpers to agentsdk - Write secret files eagerly on manifest fetch (0600 perms, 0700 dirs) - Inject secret env vars per-session in updateCommandEnv - Expand ~/paths using caller-resolved home directory - Log file write errors without blocking workspace startup	2026-04-17 16:55:24 -06:00
Spike Curtis	4c1a32cd7c	feat: wire DERPTLSConfig through CLI, SDK, tailnet, VPN, agent, and health checks (#24435 ) Wire DERPTLSConfig through the CLI, SDK, tailnet, VPN client, agent, and health checks to allow custom TLS configuration for DERP connections. The main use case is to be able to set a custom CA and also present client certs (mTLS). See https://github.com/coder/tailscale/pull/105 for related changes. Adds three new global CLI flags: - `--client-tls-ca-file` / `CODER_CLIENT_TLS_CA_FILE` - `--client-tls-cert-file` / `CODER_CLIENT_TLS_CERT_FILE` - `--client-tls-key-file` / `CODER_CLIENT_TLS_KEY_FILE` Based on community PR #22695 by @ibdafna, with autogeneration issues fixed (protobuf version mismatches in .pb.go files, golden file regeneration, lint fixes). > [!NOTE] > This PR was authored by Coder Agents on behalf of a Coder team member. <details> <summary>Relationship to #22695</summary> This is a clean reimplementation of the changes from #22695 on top of current `main`, with the following differences: - Removed: Accidental protobuf version changes in `.pb.go` files (contributor had `protoc v6.33.4` vs project's `protoc v4.23.4`) - Added: Properly regenerated golden files and docs via `make gen` - Fixed: Lint issue (`var-declaration` revive warning on explicit type in `createHTTPClient`) - All meaningful code changes are identical to the original PR </details>	2026-04-16 12:46:52 -04:00
Garrett Delfosse	7b7baea851	feat: support disabling reverse/local port forwarding in agent SSH server (#24026 ) The agent SSH server unconditionally allows all four SSH forwarding paths (TCP local, TCP reverse, Unix local, Unix reverse). This is a sandbox escape vector when workspaces are used for AI agent containment — a reverse tunnel lets anything inside the workspace reach the user's local machine, bypassing network isolation. This adds two new agent CLI flags / environment variables: - `--block-reverse-port-forwarding` / `CODER_AGENT_BLOCK_REVERSE_PORT_FORWARDING` — blocks both TCP (`ssh -R`) and Unix socket reverse forwarding - `--block-local-port-forwarding` / `CODER_AGENT_BLOCK_LOCAL_PORT_FORWARDING` — blocks both TCP (`ssh -L`) and Unix socket local forwarding Template admins can set these via the `env` block on the container/VM resource that runs the agent (e.g. `docker_container`, `kubernetes_pod`), or via `coder_env` resources tied to the agent. Fixes https://github.com/coder/coder/issues/22275 <details> <summary>Implementation notes</summary> Follows the existing `BlockFileTransfer` pattern: 1. `agent/agentssh/agentssh.go` — New `BlockReversePortForwarding` and `BlockLocalPortForwarding` fields on `Config`. TCP callbacks check these before allowing forwarding. The `direct-streamlocal@openssh.com` channel handler is wrapped to reject Unix local forwards. 2. `agent/agentssh/forward.go` — `forwardedUnixHandler` gains a `blockReversePortForwarding` field to reject `streamlocal-forward@openssh.com` requests. 3. `agent/agent.go` — New fields on `Options` and `agent` struct, plumbed to SSH config. 4. `cli/agent.go` — New serpent flags with env vars. 5. Tests cover all four blocked paths: TCP local, TCP reverse, Unix local, Unix reverse. </details> > 🤖 Generated by Coder Agents	2026-04-08 10:41:55 -04:00
Kyle Carberry	919dc299fc	feat: agent reads context files and discovers skills locally (#23935 ) Piggybacks on #23878. Moves instruction file reading and skill discovery from `chatd` (server-side, via multiple `LS`/`ReadFile` round-trips through the agent connection) to the agent itself (local filesystem access). This intentionally drops backward compatibility with older agents that don't support the context-config endpoint. Agents and server are deployed together; there is no rolling-update contract to maintain here. ## What changed The agent's `GET /api/v0/context-config` response now returns `[]ChatMessagePart` directly — the same types chatd persists. This eliminates intermediate type conversions and makes the protocol extensible. \| Field \| Type \| Description \| \|---\|---\|---\| \| `parts` \| `[]ChatMessagePart` \| Context-file and skill parts, ready to persist \| \| `working_dir` \| `string` \| Agent's resolved working directory \| Removed from the response: `instructions_dirs`, `instructions_file`, `skills_dirs`, `skill_meta_file`, `mcp_config_files` — the agent reads files locally and returns their content as parts. Removed from chatd: all legacy `LS`/`ReadFile` fallback code (`readHomeInstructionFile`, `readInstructionDirFile`, `DiscoverSkills` via LS, etc). ## Why The previous architecture had the agent resolve paths, serve them over HTTP, then `chatd` make N+1 round-trips back through the agent connection to read files. The agent has direct filesystem access and should just read the files. ## Key design decisions - Agent returns `ChatMessagePart` directly — same types chatd persists. No intermediate `InstructionFileEntry`/`SkillEntry` types needed. - `SkillMeta.MetaFile` — persisted via `ContextFileSkillMetaFile` on the skill part, so custom meta file names (`CODER_AGENT_EXP_SKILL_META_FILE`) survive across chat turns. - No pre-read body — `read_skill` always dials the workspace to fetch the skill body on demand. Simpler than caching the body in the response. - MCP config paths kept agent-internal — `MCPConfigFiles()` getter, not sent over the wire. - No backward compat fallback — old agents that don't support context-config get no instruction files. This is acceptable since agent and server deploy together.	2026-04-04 12:45:46 -04:00
Hugo Dutka	17dec2a70f	feat: agents desktop recordings backend (#23894 ) This PR introduces screen recording of the computer use agent using the virtual desktop. - Screen recording is triggered by a `wait_agent` tool call. Recording is stopped by a successful `wait_agent` tool call or when there hasn't been any desktop activity for 10 minutes. - Recordings are handled by the `portabledesktop` cli via the `record` command. The videos are sped up in periods of inactivity. - Recordings are saved to the database to the `chat_files` table. There's a hard limit of 100MB per recording. Larger recordings are dropped. - A successful `wait_agent` on a computer use subagent tool call returns a `recording_file_id`, later allowing the frontend to display the corresponding video.	2026-04-02 17:23:27 +00:00
Cian Johnston	cd784c755a	fix(agent): exorcise data race haunting contextConfigAPI on reconnect (#23946 ) Fixes: coder/internal#1441 - Move `contextConfigAPI` init from `handleManifest` to `init()`, matching all other API fields - Change `agentcontextconfig.NewAPI` to accept `func() string` closure (lazy directory evaluation) - `Config()` and HTTP handler now compute on demand via `a.manifest.Load().Directory` - Widen `TestAgent_Reconnect` to loop 5 reconnections with a non-empty manifest directory - Add `TestContextConfigAPI_InitOnce` internal test verifying lazy eval across manifest changes - Add `TestNewAPI_LazyDirectory` unit test for the lazy contract > 🤖 Written by a Coder Agent. Reviewed by a human.	2026-04-02 09:00:13 +01:00
Kyle Carberry	ee855f9618	feat: make agent context paths configurable via env vars (#23878 ) Replace hardcoded paths for instruction files, skills, and MCP config with values read from `CODER_AGENT_EXP_` environment variables. Template authors configure paths via the existing `coder_agent` `env` block. The agent resolves `~`, relative, and absolute paths locally, then serves the resolved config over `GET /api/v0/context-config`. `chatd` fetches this once per workspace attach and falls back to today's defaults for older agents. All path env vars are comma-separated, allowing multiple directories: \| Env Var \| Default \| Controls \| \|---\|---\|---\| \| `CODER_AGENT_EXP_INSTRUCTIONS_DIRS` \| `~/.coder` \| Dirs containing the instruction file \| \| `CODER_AGENT_EXP_INSTRUCTIONS_FILE` \| `AGENTS.md` \| Instruction file name \| \| `CODER_AGENT_EXP_SKILLS_DIRS` \| `.agents/skills` \| Skills directories \| \| `CODER_AGENT_EXP_SKILL_META_FILE` \| `SKILL.md` \| Skill metadata file name \| \| `CODER_AGENT_EXP_MCP_CONFIG_FILES` \| `.mcp.json` \| MCP config files \| ### Example ```hcl resource "coder_agent" "main" { os = "linux" arch = "amd64" env = { CODER_AGENT_EXP_INSTRUCTIONS_DIRS = "/opt/company/agent-config,~/.coder" CODER_AGENT_EXP_INSTRUCTIONS_FILE = "CLAUDE.md" CODER_AGENT_EXP_SKILLS_DIRS = "/opt/company/ai-skills,.agents/skills" CODER_AGENT_EXP_MCP_CONFIG_FILES = "/opt/company/mcp.json,.mcp.json" } } ``` <details> <summary>Implementation Details</summary> ### Architecture Follows the same pattern as MCP tool discovery: agent resolves locally → exposes via HTTP → chatd consumes. Agent-side* (`agent/agentcontextconfig/`): - `ResolvePath` / `ResolvePaths` handle `~`, relative, and absolute path forms; returns `""` for relative paths when baseDir is empty - `Config` reads env vars, falls back to defaults, resolves all paths - `GET /api/v0/context-config` serves the resolved config as JSON chatd-side (`coderd/x/chatd/`): - Calls `conn.ContextConfig()` once on first workspace attach - Falls back to hardcoded defaults on 404 (older agents) - Iterates instruction dirs, skills dirs using resolved absolute paths - `LSRelativityRoot` everywhere — no more home/root juggling ### Key design decisions - `EXP_` prefix: env vars use `CODER_AGENT_EXP_` to indicate experimental status - Plural names: comma-separated vars use plural names (`DIRS`, `FILES`); single-value vars use singular (`FILE`) - Defaults in `workspacesdk`: default constants live in `codersdk/workspacesdk/` so both agent and server reference them without cross-layer imports - `skillMetaFile` persistence: stored on context-file parts via `ContextFileSkillMetaFile` and restored on subsequent chat turns so custom values survive across turns - Working dir dedup: `slices.Contains` guard prevents reading the same instruction file from both `InstructionsDirs` and the working directory - MCP server dedup: first-occurrence-wins dedup prevents leaking duplicate connections from overlapping config files - ResolvePath safety*: returns `""` for relative paths when `baseDir` is empty, so `ResolvePaths` filters them out ### Files changed \| File \| Change \| \|---\|---\| \| `agent/agentcontextconfig/` \| New package — path resolution + HTTP endpoint \| \| `codersdk/workspacesdk/agentconn.go` \| `ContextConfigResponse` type, default constants, client method \| \| `agent/agent.go` + `agent/api.go` \| Wire up endpoint, pass config to MCP \| \| `agent/x/agentmcp/manager.go` \| Accept `[]string` MCP config paths, dedup by name \| \| `coderd/x/chatd/chatd.go` \| Fetch config, thread through, named returns \| \| `coderd/x/chatd/instruction.go` \| Accept configurable dir + file name, `skillMetaFileFromParts` \| \| `coderd/x/chatd/chattool/skill.go` \| Accept configurable dirs + meta file \| \| `codersdk/chats.go` \| `ContextFileSkillMetaFile` field for persistence \| ### Test coverage - `TestConfig` (4 cases): defaults, custom env vars, whitespace trimming, comma-separated dirs - `TestResolvePath` / `TestResolvePaths`: including empty baseDir edge case - `TestPersistInstructionFilesFallbackOnOlderAgent`: backward-compat path when `ContextConfig` returns 404 - `TestChatMessagePartVariantTags`: updated exclusion list for new internal field ### Backward compatibility Older agents return 404 for the new endpoint. `chatd` catches this and falls back to today's defaults via `readHomeInstructionFile` (using `LSRelativityHome`). Existing workspaces work with no changes. </details>	2026-04-01 12:28:47 -04:00
Kyle Carberry	0f86c4237e	feat: add workspace MCP tool discovery and proxying for chat (#23680 ) Coder's chat (chatd) can now discover and use MCP servers configured in a workspace's `.mcp.json` file. This brings project-specific tooling (GitHub, databases, docs servers, etc.) into the chat without any manual configuration. ## How it works The workspace agent reads `.mcp.json` from the workspace directory (same format Claude Code uses), connects to the declared MCP servers — spawning child processes for stdio servers and connecting over the network for HTTP/SSE — and caches their tool lists. Two new agent HTTP endpoints expose this: - `GET /api/v0/mcp/tools` returns the cached tool list (supports `?refresh=true`) - `POST /api/v0/mcp/call-tool` proxies calls to the correct server On each chat turn, chatd calls `ListMCPTools` through the existing `AgentConn` tailnet connection, wraps each tool as a `fantasy.AgentTool`, and adds them to the LLM's tool set alongside built-in and admin-configured MCP tools. Tool names are prefixed with the server name (`github__create_issue`) to avoid collisions. Failed server connections are logged and skipped — they never block the agent or break the chat. Child stdio processes are terminated on agent shutdown.	2026-03-26 19:57:02 +00:00
Cian Johnston	c753a622ad	refactor(agent): move agentdesktop under x/ subpackage (#23610 ) - Move `agent/agentdesktop/` to `agent/x/agentdesktop/` to signal experimental/unstable status - Update import paths in `agent/agent.go` and `api_test.go` > 🤖 This mechanical refactor was performed by an agent. I made sure it didn't change anything it wasn't supposed to.	2026-03-25 18:23:52 +00:00
Mathias Fredriksson	147df5c971	refactor: replace sort.Strings with slices.Sort (#23457 ) The slices package provides type-safe generic replacements for the old typed sort convenience functions. The codebase already uses slices.Sort in 43 call sites; this finishes the migration for the remaining 29. - sort.Strings(x) -> slices.Sort(x) - sort.Float64s(x) -> slices.Sort(x) - sort.StringsAreSorted(x) -> slices.IsSorted(x)	2026-03-23 23:19:23 +02:00
Mathias Fredriksson	119030d795	fix(agent): default process working directory to agent dir or $HOME (#23224 ) Processes started via the agent process API inherited the agent's own working directory (/tmp/coder.xxx) when no WorkDir was specified. SSH sessions already use a fallback chain: configured agent directory > $HOME. This wires the same manifest directory closure into the process manager so the priority is now: explicit req.WorkDir > agent configured dir > $HOME The resolved directory is recorded on the process struct so ProcessInfo.WorkDir and pathStore notifications reflect where the process actually ran.	2026-03-18 16:46:26 +00:00
Hugo Dutka	fdb1205bdf	chore(agent): remove portabledesktop download logic (#23128 ) The new way to install portabledesktop in a workspace will be via a module: https://github.com/coder/registry/pull/805	2026-03-17 15:24:11 +01:00
Hugo Dutka	84527390c6	feat: chat desktop backend (#23005 ) Implement the backend for the desktop feature for agents. - Adds a new `/api/experimental/chats/$id/desktop` endpoint to coderd which exposes a VNC stream from a [portabledesktop](https://github.com/coder/portabledesktop) process running inside the workspace - Adds a new `spawn_computer_use_agent` tool to chatd, which spawns a subagent that has access to the `computer` tool which lets it interact with the `portabledesktop` process running inside the workspace - Adds the plumbing to make the above possible There's a follow up frontend PR here: https://github.com/coder/coder/pull/23006	2026-03-13 19:49:34 +01:00
Hugo Dutka	48ab492f49	feat: agents git watch backend (#22565 ) Adds real-time git status watching for workspace agents, so the frontend can subscribe over WebSocket and show git file changes in near real-time. 1. Subscription is scoped to a chat via `GET /api/experimental/chats/{chat}/git/watch`. 2. The workspace agent automatically determines which paths to watch based on tool calls made by the chat (and its ancestor chats). 3. Workspace agent polls subscribed repo working trees on a 30s interval, on tools calls, and on explicit `refresh` from the client. 4. Scans are rate-limited to at most once per second. 5. Edited paths are tracked in-memory inside the workspace agent. There is no database persistence — state is lost on agent restart. This will be addresses in a future PR. 6. Messages sent over WebSocket include a full-repo snapshot (unified diff, branch, origin). A new message is emitted only when the snapshot changes. This PR was implemented with AI with me closely controlling what it's doing. The code follows a plan file that was updated continuously during implementation. Here's the file if you'd like to see it: [project.md](https://gist.github.com/hugodutka/8722cf80c92f8a56555f7bc595b770e2). It reflects the current state of the PR.	2026-03-06 10:47:55 +01:00
Spike Curtis	7cc2b22568	chore: expose UpdateAppStatus on agentsocket (#22353 ) relates to #21335 Adds UpdateAppStatus on the agentsocket, wired up to forward to Coderd over the dRPC connection the agent maintains. Disclosure: I used AI to generate significant portions of this PR, but hand-reviewed and tweaked the code. I consider it approximately indistinguishable from what I would have done by hand.	2026-03-04 21:18:17 +04:00
Zach	5b7377c375	feat: add Prometheus metrics for boundary log drop reporting (#22521 ) Add Prometheus metrics to the boundary log proxy for observability: - batches_dropped_total (reason: buffer_full, forward_failed) - logs_dropped_total (reason: buffer_full, forward_failed, boundary_channel_full, boundary_batch_full) - batches_forwarded_total Also add BoundaryStatus to the BoundaryMessage envelope so boundary can report dropped log counts as a separate wire message. The agent records these as Prometheus metrics, making boundary-side data loss visible. Backwards compatibility for older versions of boundary is maintained. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 12:42:34 -07:00
Spike Curtis	56eb57caf4	chore: enable agent socket by default (#22352 ) relates to #21335 Enables the agent socket by default and updates docs to strike references to having to enable it. The PRs in this stack change the MCP server that Tasks use to update their status to rely on the agent socket, rather than directly dialing Coderd with the agent token. Default disable was a reasonable default when it was only used for the experimental script ordering features, but now that we want to use it for Tasks, it should be default on.	2026-03-03 21:23:59 +04:00
Kyle Carberry	fb6bf3a568	fix(agent): wire updateCommandEnv into process manager (#22451 ) ## Problem The `agentproc` process manager spawns processes with only `os.Environ()`, missing agent-level environment variables like `GIT_ASKPASS`, `CODER_*`, and `GIT_SSH_COMMAND` that are injected by the agent's `updateCommandEnv` function. This means processes started through the HTTP process API (used by chat tools) cannot authenticate git operations via the Coder gitaskpass helper. By contrast, SSH sessions get the full agent environment because the SSH server calls `updateCommandEnv` via its `UpdateEnv` config hook. ## Fix Wire the agent's `updateCommandEnv` hook into the process manager so all spawned processes receive the full agent environment. The hook is: - Passed as a parameter through `NewAPI` → `newManager` - Called in `manager.start()` with `os.Environ()` as the base, producing the same enriched env that SSH sessions get - Gracefully falls back to `os.Environ()` if the hook returns an error Request-level env vars (`req.Env`, set by chat tools) are still appended last and take precedence. ## Changes - `agent/agentproc/process.go`: Add `updateEnv` field to manager, call it when building process env - `agent/agentproc/api.go`: Accept `updateEnv` parameter in `NewAPI` - `agent/agent.go`: Pass `a.updateCommandEnv` when creating the process API - `agent/agentproc/api_test.go`: Add `UpdateEnvHook` and `UpdateEnvHookOverriddenByReqEnv` tests Co-authored-by: Coder <coder@coder.com>	2026-02-28 21:58:59 -05:00
Kyle Carberry	a621c3cb13	feat(agent): add process execution API and rewrite execute tool (#22416 ) ## Summary Adds a new agent-side process management HTTP API and rewrites the chat execute tool to use it instead of SSH sessions. ## What changed ### New agent/agentproc/ package - headtail.go — Thread-safe io.Writer with bounded memory (16KB head + 16KB tail ring buffer). Provides LLM-ready output with truncation metadata and long-line truncation at 2048 bytes. - headtail_test.go — 16 tests including race detector coverage for concurrent writes. - process.go — Manager + Process types for lifecycle management using agentexec.Execer for proper OOM/nice scores. - api.go — HTTP API following the agentfiles chi router pattern. 4 endpoints: start, list, output, signal. ### Agent wiring (agent/agent.go, agent/api.go) Mounts the process API at /api/v0/processes, mirroring how agentfiles is mounted. ### SDK (codersdk/workspacesdk/agentconn.go) 4 new AgentConn interface methods + 7 request/response types: - StartProcess, ListProcesses, ProcessOutput, SignalProcess ### Execute tool rewrite (coderd/chatd/chattool/execute.go) - SSH to Agent API: conn.StartProcess() + conn.ProcessOutput() polling - New parameters: workdir, run_in_background - Structured response: success, exit_code, wall_duration_ms, error, truncated, note, background_process_id - Non-interactive env vars: GIT_EDITOR=true, TERM=dumb, NO_COLOR=1, PAGER=cat, etc. - Output truncation: HeadTailBuffer caps at 32KB for LLM consumption - File-dump detection with advisory notes suggesting read_file - Default timeout: 60s to 10s - Foreground polling: 200ms intervals until exit or timeout ## Architecture State lives on the agent, surviving coderd failover and instance changes. Any coderd replica can query any agent via HTTP over tailnet.	2026-02-28 12:33:52 -05:00
Kacper Sawicki	f016d9e505	fix(coderd): add role param to agent RPC to prevent false connectivity (#22052 ) ## Summary coder-logstream-kube and other tools that use the agent token to connect to the RPC endpoint were incorrectly triggering connection monitoring, causing false connected/disconnected timestamps on the agent. This led to VSCode/JetBrains disconnections and incorrect dashboard status. ## Changes Add a `role` query parameter to `/api/v2/workspaceagents/me/rpc`: - `role=agent`: triggers connection monitoring (default for the agent SDK) - any other value (e.g. `logstream-kube`): skips connection monitoring - omitted: triggers monitoring for backward compatibility with older agents The agent SDK now sends `role=agent` by default. A new `Role` field on the `agentsdk.Client` allows non-agent callers to specify a different role. ## Required follow-up coder-logstream-kube needs to set `client.Role = "logstream-kube"` before calling `ConnectRPC20()`. Without that change, it will still send `role=agent` and trigger monitoring. Fixes #21625	2026-02-18 09:44:06 +01:00
Danielle Maywood	2de8cdf160	feat(agent): add subagent ID fields to devcontainers in manifest (#21848 ) Update the agent protobuf schema (agent/proto/agent.proto) to include: - subagent_id field in WorkspaceAgentDevcontainer message - id field in CreateSubAgentRequest message Bump the Agent API version from v2.7 to v2.8 and update all client references throughout the codebase (ConnectRPC27 -> ConnectRPC28, DRPCAgentClient27 -> DRPCAgentClient28).	2026-02-03 12:37:30 +00:00
Spike Curtis	3398833919	test: don't drop error on blank IP address in report (#21642 ) fixes https://github.com/coder/internal/issues/1286 We can get blank IP address from the net connection if the client has already disconnected, as was the case in this flake. Fix is to only log error if we get something non-empty we can't parse. --------- Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>	2026-01-23 10:26:44 +00:00
Asher	ff9ed91811	chore: move agent's file API into separate package (#21531 ) This makes it so we can test it directly without having to go through Tailnet, which appears to be causing flakes in CI where the requests time out and never make it to the agent. Takes inspiration from the container-related API endpoints. Would probably make sense to refactor the ls tests to also go through the API (rather than be internal tests like they are currently) but I left those alone for now to keep the diff minimal.	2026-01-16 17:03:17 -09:00
Spike Curtis	bddb808b25	chore: arrange imports in a standard way (#21452 ) Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example: ``` import ( "context" "time" "github.com/prometheus/client_golang/prometheus" "golang.org/x/xerrors" "gopkg.in/natefinch/lumberjack.v2" "cdr.dev/slog/v3" "github.com/coder/coder/v2/codersdk/agentsdk" "github.com/coder/serpent" ) ``` 3 groups: standard library, 3rd partly libs, Coder libs. This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.	2026-01-08 15:24:11 +04:00
Spike Curtis	49b34a716a	fix: fix slog to always use array of Fields (#21426 ) Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder). It also updates dependencies that also use slog and were updated. I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule. Other dependencies, I pushed new tags.	2026-01-08 10:29:41 +04:00
Zach	07924037e7	feat: add boundary log forwarding from agent to coderd (#21345 ) Add agent forwarding of boundary audit logs from workspaces to coderd via agent API, and re-emission of boundary logs to coderd stderr. This change adds a server to the workspace agent that always listens on a unix socket for boundary to connect and send audit logs. coderd log format example: ``` [API] 2025-12-23 18:31:46.755 [info] coderd.agentrpc: boundary_request owner=.. workspace_name=.. agent_name=.. decision=.. workspace_id=.. http_method=.. http_url=.. event_time=.. request_id=.. ``` Corresponding boundary PR: https://github.com/coder/boundary/pull/124 RFC: https://www.notion.so/coderhq/Agent-Boundary-Logs-2afd579be59280f29629fc9823ac41ba https://github.com/coder/coder/issues/21280	2025-12-31 16:38:19 -07:00
Zach	9d1493a13a	feat: add initial API for boundary log forwarding to coderd (#21293 ) Add the AgentAPI changes to support the feature that transmits boundary logs from workspaces to coderd via the agent API for eventual re-emission to stderr. The API handlers are stubs for now because I'm trying to land this feature from multiple smaller PRs. High level architecture: - Boundary records resource access in batches and sends proto message to agent - Agent proxies messages to coderd (captured by the API changes in this PR) - coderd re-emits logs to stderr RFC: https://www.notion.so/coderhq/Agent-Boundary-Logs-2afd579be59280f29629fc9823ac41ba	2025-12-19 10:41:39 -07:00
Spike Curtis	71c6dc4043	fix: stop disconnecting from coderd early and record disconnect correctly (#21250 ) fixes https://github.com/coder/internal/issues/1196 The above issue exposes two different bugs in Coder. In the agent, there is a race where if the agent is closed while starting up networking, it will erroneously disconnect from Coderd, which delays or breaks writing final status and logs. In Coderd, there is a bug where we don't properly record the latest agent disconnection time if the agent had previously disconnected. This causes us to report the agent status as "Connected" even after it has disconnected up until the inactivity timeout fires. This PR fixes both issues. It also slightly reworks when we send workspace updates based on connection and disconnection. Previously we would send two updates when the agent connected in certain circumstances, even though the status would be the same in both (only times changed). Now we universally only send one on connect, and then another on disconnect.	2025-12-15 12:04:01 +04:00
Spike Curtis	ce9e7ad909	fix(agent): ignore EOF errors during shutdown (#21187 ) fixes: https://github.com/coder/internal/issues/1179 The problem in that flake is that dRPC doensn't consistently return `context.Canceled` if you make an RPC call and then cancel it: sometimes it returns EOF. Without this PR, if we get an EOF on one of the routines that uses the agentapi connection, we tear down the whole connection and reconnect to coderd --- even if we are in the middle of a graceful shutdown. What happened in the linked flake is that writing stats failed with EOF, which then caused us to reconnect and write the lifecycle "SHUTTING DOWN" twice.	2025-12-09 17:32:38 +04:00
Spike Curtis	40df21ed62	fix: fixes use of possibly nil RemoteAddr() and LocalAddr() return values (#21076 ) fixes: https://github.com/coder/internal/issues/1143 Both gVisor and the Go standard library implementations of `net.Conn` can under certain circumstances return `nil` for `RemoteAddr()` and `LocalAddr()` calls. If we call their methods, we segfault. This PR fixes these calls and adds ruleguard rules. Note that `slog.F("remote_addr", conn.RemoteAddr())` is fine because slog detects the `nil` before attempting to stringify the type.	2025-12-03 15:06:00 +04:00
Sas Swart	ce627bf23f	feat: implement agent socket api, client and cli (#20758 ) closes: https://github.com/coder/coder/issues/10352 closes: https://github.com/coder/internal/issues/1094 closes: https://github.com/coder/internal/issues/1095 In this pull request, we enable a new set of experimental cli commands grouped under `coder exp sync`. These commands allow any process acting within a coder workspace to inform the coder agent of its requirements and execution progress. The coder agent will then relay this information to other processes that have subscribed. These commands are: ``` # Check if this feature is enabled in your environment coder exp sync ping # express that your unit depends on another coder exp sync want <unit> <dependency_unit> # express that your unit intends to start a portion of the script that requires # other units to have completed first. This command blocks until all dependencies have been met coder exp sync start <unit> # express that your unit has completes its work, allowing dependent units to begin their execution coder exp sync complete <unit> ``` Example: In order to automatically run claude code in a new workspace, it must first have a git repository cloned. The scripts responsible for cloning the repository and for running claude code would coordinate in the following way: ```bash # Script A: Claude code # Inform the agent that the claude script wants the git script. # That is, the git script must have completed before the claude script can begin its execution coder exp sync want claude git # Inform the agent that we would now like to begin execution of claude. # This command will block until the git script (and any other defined dependencies) # have completed coder exp sync start claude # Now we run claude code and any other commands we need claude ... # Once our script has completed, we inform the agent, so that any scripts that depend on this one # may begin their execution coder exp sync complete claude ``` ```bash # Script B: Git # Because the git script does not have any dependencies, we can simply inform the agent that we # intend to start coder exp sync start git git clone ssh://git@github.com/coder/coder # Once the repository have been cloned, we inform the agent that this script is complete, so that # scripts that depend on it may begin their execution. coder exp sync complete git ``` Notes: * Unit names (ie. `claude` and `git`) given as input to the sync commands are arbitrary strings. You do not have to conform to specific identifiers. We recommend naming your scripts descriptively, but succinctly. * Scripts unit names should be well documented. Other scripts will need to know the names you've chosen in order to depend on yours. Therefore, you --------- Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>	2025-11-28 08:33:50 +02:00
Sas Swart	1d726c81bb	fix: remove a sensitive field from an agent log line (#20968 ) This PR removes a log field that could expose sensitive information in agent logs for workspaces that pass such information to the agent via its manifest.	2025-11-27 16:12:03 +02:00
Spike Curtis	afd40436f0	fix: mock Agent querying OS for listening ports in tests (#20842 ) fixes https://github.com/coder/internal/issues/1123 We want to tests that ports are not included after they are no longer used, but this isn't safe on the real OS networking stack because there is no way to guarantee a port _won't_ be used. Instead, we introduce an interface and fake implementation for testing. On order to leave the filtering logic in the test path, this PR also does some refactoring. Caching logic is left in the real OS querying implementation and a new test case is added for it in this PR.	2025-11-25 14:25:24 +04:00
Dean Sheather	6c99d5eca2	fix: avoid connection logging crashes in agent (#20307 ) - Ignore errors when reporting a connection from the server, just log them instead - Translate connection log IP `localhost` to `127.0.0.1` on both the server and the agent Note that the temporary fix for converting invalid IPs to localhost is not required in main since the database no longer forbids NULL for the IP column since https://github.com/coder/coder/pull/19788 Relates to #20194	2025-10-16 01:56:43 +11:00
Spike Curtis	1354d84eb4	chore: refactor instance identity to be a SessionTokenProvider (#19566 ) Refactors Agent instance identity to be a SessionTokenProvider. Refactors the CLI to create Agent clients via a centralized function, rather than add-hoc via individual command handlers and their flags. This allows commands besides `coder agent`, but which still use the agent identity, to support instance identity authentication. Fixes #19111 by unifying all API requests to go thru the SessionTokenProvider for auth credentials.	2025-09-03 10:38:42 +04:00
Danielle Maywood	f41275eb39	feat(agent/agentcontainers): auto detect dev containers (#18950 ) Relates to https://github.com/coder/internal/issues/711 This PR implements a project discovery mechanism that searches for any dev container projects and makes them visible in the UI so that they can be started. To make the wording on the site more clear, "Rebuild" has been changed to "Start" when there is no container associated with a known dev container configuration. I've also made it so that site will show the dev container config path when there is no other name available. ### Design decisions Just want to ensure my explanation for a few design decisions are noted down: - We only search for dev container configurations inside git repositories - We only search for these git repositories if they're at the top level or a direct child of the agent directory. This limited approach is to reduce the amount of files we ultimately walk when trying to find these projects. It makes sense to limit it to only the agent directory, although I'm open to expanding how deep we search.	2025-07-22 19:02:43 +01:00
Dean Sheather	a1b87a67c6	fix: use client preferred URL for the default DERP (#18911 ) The agentsdk currently does a remap of the DERP map to change the EmbeddedRelay node's URL to match the agent's access URL. This PR makes changes to the `workspacesdk` (used by clients like the CLI) and `vpn` (used by Coder Desktop) to match this behavior. This enables us the ability to try Coder clients in dogfood over a VPN without changing the global access URL.	2025-07-17 20:17:44 +10:00
Danielle Maywood	0118e75009	fix(agent): disable dev container integration inside sub agents (#18781 ) It appears we accidentally broke this logic in a previous PR. This should now correctly disable the agent api as we'd expect.	2025-07-08 11:05:30 +01:00
Mathias Fredriksson	0f3a1e9849	fix(agent/agentcontainers): split Init into Init and Start for early API responses (#18640 ) Previously in #18635 we delayed the containers API `Init` to avoid producing errors due to Docker and `@devcontainers/cli` not yet being installed by startup scripts. This had an adverse effect on the UX via UI responsiveness as the detection of devcontainers was greatly delayed. This change splits `Init` into `Init` and `Start` so that we can immediately after `Init` start serving known devcontainers (defined in Terraform), improving the UX. Related #18635 Related #18640	2025-06-27 19:01:50 +03:00
Mathias Fredriksson	8ee2668b39	fix(agent): fix script filtering for devcontainers (#18635 )	2025-06-27 16:59:31 +03:00
Mathias Fredriksson	7e99fb7d7e	fix(agent): delay containerAPI init to ensure startup scripts run before (#18630 )	2025-06-27 14:10:35 +03:00
ケイラ	09cc906981	chore: remove unnecessary redeclarations in for loops (part 2) (#18593 )	2025-06-26 12:28:00 -06:00
Mathias Fredriksson	eca6381314	feat(agent/agentcontainers): add more envs to readconfig for app URL building (#18603 )	2025-06-26 09:33:58 +00:00
Danielle Maywood	c4e4fe85f9	fix(agent): start devcontainers through agentcontainers package (#18471 ) Fixes https://github.com/coder/internal/issues/706 Context for the implementation here https://github.com/coder/internal/issues/706#issuecomment-2990490282 Synchronously starts dev containers defined in terraform with our `DevcontainerCLI` abstraction, instead of piggybacking off of our `agentscripts` package. This gives us more control over logs, instead of being reliant on packages which may or may not exist in the user-provided image.	2025-06-25 11:52:50 +01:00
Mathias Fredriksson	99d124e276	feat(agent): enable devcontainers by default (#18533 )	2025-06-24 21:17:04 +03:00

1 2 3 4 5 ...

287 Commits