Files
coder/agent/agentproc/proc_windows.go
T
Kyle Carberry 6972d073a2 fix: improve background process handling for agent tools (#23132)
## Problem

Models frequently use shell `&` instead of `run_in_background=true` when
starting long-running processes through `/agents`, causing them to die
shortly after starting. This happens because:

1. **No guidance in tool schema** — The `ExecuteArgs` struct had zero
`description` tags. The model saw `run_in_background: boolean
(optional)` with no explanation of when/why to use it.
2. **Shell `&` is silently broken** — `sh -c "command &"` forks the
process, the shell exits immediately, and the forked child becomes an
orphan not tracked by the process manager.
3. **No process group isolation** — The SSH subsystem sets `Setsid:
true` on spawned processes, but the agent process manager set no
`SysProcAttr` at all. Signals only hit the top-level `sh`, not child
processes.

## Investigation

Compared our implementation against **openai/codex** and **coder/mux**:

| Aspect | codex | mux | coder/coder (before) |
|--------|-------|-----|---------------------|
| Background flag | Yield/resume with `session_id` | `run_in_background`
with rich description | `run_in_background` with **no description** |
| `&` handling | `setsid()` + `killpg()` | `detached: true` +
`killProcessTree()` | **Nothing** — orphaned children escape |
| Process isolation | `setsid()` on every spawn | `set -m; nohup ...
setsid` for background | **No `SysProcAttr` at all** |
| Signal delivery | `killpg(pgid, sig)` — entire group | `kill -15
-\$pid` — negative PID | `proc.cmd.Process.Signal()` — **PID only** |

## Changes

### Fix 1: Add descriptions to `ExecuteArgs` (highest impact)
The model now sees explicit guidance: *"Use for long-running processes
like dev servers, file watchers, or builds. Do NOT use shell & — it will
not work correctly."*

### Fix 2: Update tool description
The top-level execute tool description now reinforces: *"Use
run_in_background=true for long-running processes. Never use shell '&'
for backgrounding."*

### Fix 3: Detect trailing `&` and auto-promote to background
Defense-in-depth: if the model still uses `command &`, we strip the `&`
and promote to `run_in_background=true` automatically. Correctly
distinguishes `&` from `&&`.

### Fix 4: Process group isolation (`Setpgid`)
New platform-specific files (`proc_other.go` / `proc_windows.go`)
following the same pattern as `agentssh/exec_other.go`. Every spawned
process gets its own process group.

### Fix 5: Process group signaling
`signal()` now uses `syscall.Kill(-pid, sig)` on Unix to signal the
entire process group, ensuring child processes from shell pipelines are
also cleaned up.

## Testing
All existing `agent/agentproc` tests pass. Both packages compile
cleanly.
2026-03-16 16:22:10 -04:00

21 lines
565 B
Go

package agentproc
import (
"os"
"syscall"
)
// procSysProcAttr returns the SysProcAttr to use when spawning
// processes. On Windows, process groups are not supported in the
// same way as Unix, so this returns an empty struct.
func procSysProcAttr() *syscall.SysProcAttr {
return &syscall.SysProcAttr{}
}
// signalProcess sends a signal directly to the process. Windows
// does not support process group signaling, so we fall back to
// sending the signal to the process itself.
func signalProcess(p *os.Process, _ syscall.Signal) error {
return p.Kill()
}