coder

mirror of https://github.com/coder/coder.git synced 2026-06-03 04:58:23 +00:00

Author	SHA1	Message	Date
Ethan	3a9080fff6	feat: tag chat-originating agent logs with chat_id (#25019 ) Workspace-agent logs emitted while serving chatd-driven requests were not correlated with the originating chat, making agent logs hard to attribute to the corresponding/originating chat. This adds agent-side chat context middleware that parses `Coder-Chat-Id` once, enriches agent access logs and structured handler/background logs, and adds a chatd bridge log when chat headers are attached to an agent connection. Closes CODAGT-324	2026-05-08 13:25:30 +10:00
Kyle Carberry	19e44f4136	fix: target specific chat in MarkStale instead of broadcasting to all workspace chats (#23883 ) ## Problem Subagent chats were receiving git context (branch, remote origin, PR status) from their parent or sibling chats' git operations. When a git operation triggers external auth, the workspace agent sends `chat_id` identifying which chat initiated it — but this was broken at two levels: 1. Agent side: `CODER_CHAT_ID` was never injected into process environments. `chatd` sets `Coder-Chat-Id` HTTP headers and the agent extracts them for process isolation, but never propagated `CODER_CHAT_ID` to `cmd.Env`. So `gitaskpass` always sent an empty `chat_id`. 2. Server side: `workspaceAgentsExternalAuth` ignored the `chat_id` query param. `MarkStale` broadcast git context to all chats on the workspace via `filterChatsByWorkspaceID`. ## Fix - Inject `CODER_CHAT_ID` into `cmd.Env` in `agentproc` when the chat ID is known, so `gitaskpass` can read and forward it. - Read `chat_id` from query params in `workspaceAgentsExternalAuth` and thread it through `chatGitRef`. - Refactor `MarkStale` to accept a `MarkStaleParams` struct. When `ChatID` is provided, target only that specific chat. When empty (legacy agents, non-chat git operations), fall back to the existing workspace-wide broadcast. - Extract `markStaleSingle` helper to deduplicate the upsert+publish logic. <details><summary>Investigation notes</summary> ### Data flow before fix ``` chatd → sets Coder-Chat-Id header on agent conn agent → extracts chatID, stores on process struct agent → does NOT set CODER_CHAT_ID in cmd.Env ← gap 1 gitaskpass → reads CODER_CHAT_ID (always empty), sends chat_id="" server handler → ignores chat_id query param ← gap 2 MarkStale → broadcasts to ALL workspace chats ``` ### Data flow after fix ``` chatd → sets Coder-Chat-Id header on agent conn agent → extracts chatID, stores on process struct agent → sets CODER_CHAT_ID in cmd.Env gitaskpass → reads CODER_CHAT_ID, sends chat_id=<uuid> server handler → reads chat_id, passes to MarkStale MarkStale → targets only that specific chat ``` </details>	2026-04-01 13:04:59 +00:00
Mathias Fredriksson	41e15ae440	feat: make process output blocking-capable (#23312 ) Replace the 200ms polling loop in chatd's execute and process_output tools with server-side blocking via sync.Cond on HeadTailBuffer. The agent's GET /{id}/output endpoint accepts ?wait=true to block until the process exits or a 5-minute server cap expires. The process_output tool blocks by default for 10s (overridable via wait_timeout), and falls back to a non-blocking snapshot on timeout. The execute tool's foreground path makes a single blocking call instead of polling. Related #23316	2026-03-20 14:33:55 +02:00
Mathias Fredriksson	119030d795	fix(agent): default process working directory to agent dir or $HOME (#23224 ) Processes started via the agent process API inherited the agent's own working directory (/tmp/coder.xxx) when no WorkDir was specified. SSH sessions already use a fallback chain: configured agent directory > $HOME. This wires the same manifest directory closure into the process manager so the priority is now: explicit req.WorkDir > agent configured dir > $HOME The resolved directory is recorded on the process struct so ProcessInfo.WorkDir and pathStore notifications reflect where the process actually ran.	2026-03-18 16:46:26 +00:00
Kyle Carberry	6972d073a2	fix: improve background process handling for agent tools (#23132 ) ## Problem Models frequently use shell `&` instead of `run_in_background=true` when starting long-running processes through `/agents`, causing them to die shortly after starting. This happens because: 1. No guidance in tool schema — The `ExecuteArgs` struct had zero `description` tags. The model saw `run_in_background: boolean (optional)` with no explanation of when/why to use it. 2. Shell `&` is silently broken — `sh -c "command &"` forks the process, the shell exits immediately, and the forked child becomes an orphan not tracked by the process manager. 3. No process group isolation — The SSH subsystem sets `Setsid: true` on spawned processes, but the agent process manager set no `SysProcAttr` at all. Signals only hit the top-level `sh`, not child processes. ## Investigation Compared our implementation against openai/codex and coder/mux: \| Aspect \| codex \| mux \| coder/coder (before) \| \|--------\|-------\|-----\|---------------------\| \| Background flag \| Yield/resume with `session_id` \| `run_in_background` with rich description \| `run_in_background` with no description \| \| `&` handling \| `setsid()` + `killpg()` \| `detached: true` + `killProcessTree()` \| Nothing — orphaned children escape \| \| Process isolation \| `setsid()` on every spawn \| `set -m; nohup ... setsid` for background \| No `SysProcAttr` at all \| \| Signal delivery \| `killpg(pgid, sig)` — entire group \| `kill -15 -\$pid` — negative PID \| `proc.cmd.Process.Signal()` — PID only \| ## Changes ### Fix 1: Add descriptions to `ExecuteArgs` (highest impact) The model now sees explicit guidance: "Use for long-running processes like dev servers, file watchers, or builds. Do NOT use shell & — it will not work correctly." ### Fix 2: Update tool description The top-level execute tool description now reinforces: "Use run_in_background=true for long-running processes. Never use shell '&' for backgrounding." ### Fix 3: Detect trailing `&` and auto-promote to background Defense-in-depth: if the model still uses `command &`, we strip the `&` and promote to `run_in_background=true` automatically. Correctly distinguishes `&` from `&&`. ### Fix 4: Process group isolation (`Setpgid`) New platform-specific files (`proc_other.go` / `proc_windows.go`) following the same pattern as `agentssh/exec_other.go`. Every spawned process gets its own process group. ### Fix 5: Process group signaling `signal()` now uses `syscall.Kill(-pid, sig)` on Unix to signal the entire process group, ensuring child processes from shell pipelines are also cleaned up. ## Testing All existing `agent/agentproc` tests pass. Both packages compile cleanly.	2026-03-16 16:22:10 -04:00
Kyle Carberry	0e1846fe2a	fix(agent): reap exited processes and scope process list by chat ID (#22944 )	2026-03-12 14:51:05 -07:00
Kyle Carberry	fb6bf3a568	fix(agent): wire updateCommandEnv into process manager (#22451 ) ## Problem The `agentproc` process manager spawns processes with only `os.Environ()`, missing agent-level environment variables like `GIT_ASKPASS`, `CODER_*`, and `GIT_SSH_COMMAND` that are injected by the agent's `updateCommandEnv` function. This means processes started through the HTTP process API (used by chat tools) cannot authenticate git operations via the Coder gitaskpass helper. By contrast, SSH sessions get the full agent environment because the SSH server calls `updateCommandEnv` via its `UpdateEnv` config hook. ## Fix Wire the agent's `updateCommandEnv` hook into the process manager so all spawned processes receive the full agent environment. The hook is: - Passed as a parameter through `NewAPI` → `newManager` - Called in `manager.start()` with `os.Environ()` as the base, producing the same enriched env that SSH sessions get - Gracefully falls back to `os.Environ()` if the hook returns an error Request-level env vars (`req.Env`, set by chat tools) are still appended last and take precedence. ## Changes - `agent/agentproc/process.go`: Add `updateEnv` field to manager, call it when building process env - `agent/agentproc/api.go`: Accept `updateEnv` parameter in `NewAPI` - `agent/agent.go`: Pass `a.updateCommandEnv` when creating the process API - `agent/agentproc/api_test.go`: Add `UpdateEnvHook` and `UpdateEnvHookOverriddenByReqEnv` tests Co-authored-by: Coder <coder@coder.com>	2026-02-28 21:58:59 -05:00
Kyle Carberry	a621c3cb13	feat(agent): add process execution API and rewrite execute tool (#22416 ) ## Summary Adds a new agent-side process management HTTP API and rewrites the chat execute tool to use it instead of SSH sessions. ## What changed ### New agent/agentproc/ package - headtail.go — Thread-safe io.Writer with bounded memory (16KB head + 16KB tail ring buffer). Provides LLM-ready output with truncation metadata and long-line truncation at 2048 bytes. - headtail_test.go — 16 tests including race detector coverage for concurrent writes. - process.go — Manager + Process types for lifecycle management using agentexec.Execer for proper OOM/nice scores. - api.go — HTTP API following the agentfiles chi router pattern. 4 endpoints: start, list, output, signal. ### Agent wiring (agent/agent.go, agent/api.go) Mounts the process API at /api/v0/processes, mirroring how agentfiles is mounted. ### SDK (codersdk/workspacesdk/agentconn.go) 4 new AgentConn interface methods + 7 request/response types: - StartProcess, ListProcesses, ProcessOutput, SignalProcess ### Execute tool rewrite (coderd/chatd/chattool/execute.go) - SSH to Agent API: conn.StartProcess() + conn.ProcessOutput() polling - New parameters: workdir, run_in_background - Structured response: success, exit_code, wall_duration_ms, error, truncated, note, background_process_id - Non-interactive env vars: GIT_EDITOR=true, TERM=dumb, NO_COLOR=1, PAGER=cat, etc. - Output truncation: HeadTailBuffer caps at 32KB for LLM consumption - File-dump detection with advisory notes suggesting read_file - Default timeout: 60s to 10s - Foreground polling: 200ms intervals until exit or timeout ## Architecture State lives on the agent, surviving coderd failover and instance changes. Any coderd replica can query any agent via HTTP over tailnet.	2026-02-28 12:33:52 -05:00

8 Commits