Kyle Carberry 6972d073a2 fix: improve background process handling for agent tools (#23132)
## Problem

Models frequently use shell `&` instead of `run_in_background=true` when
starting long-running processes through `/agents`, causing them to die
shortly after starting. This happens because:

1. **No guidance in tool schema** — The `ExecuteArgs` struct had zero
`description` tags. The model saw `run_in_background: boolean
(optional)` with no explanation of when/why to use it.
2. **Shell `&` is silently broken** — `sh -c "command &"` forks the
process, the shell exits immediately, and the forked child becomes an
orphan not tracked by the process manager.
3. **No process group isolation** — The SSH subsystem sets `Setsid:
true` on spawned processes, but the agent process manager set no
`SysProcAttr` at all. Signals only hit the top-level `sh`, not child
processes.

## Investigation

Compared our implementation against **openai/codex** and **coder/mux**:

| Aspect | codex | mux | coder/coder (before) |
|--------|-------|-----|---------------------|
| Background flag | Yield/resume with `session_id` | `run_in_background`
with rich description | `run_in_background` with **no description** |
| `&` handling | `setsid()` + `killpg()` | `detached: true` +
`killProcessTree()` | **Nothing** — orphaned children escape |
| Process isolation | `setsid()` on every spawn | `set -m; nohup ...
setsid` for background | **No `SysProcAttr` at all** |
| Signal delivery | `killpg(pgid, sig)` — entire group | `kill -15
-\$pid` — negative PID | `proc.cmd.Process.Signal()` — **PID only** |

## Changes

### Fix 1: Add descriptions to `ExecuteArgs` (highest impact)
The model now sees explicit guidance: *"Use for long-running processes
like dev servers, file watchers, or builds. Do NOT use shell & — it will
not work correctly."*

### Fix 2: Update tool description
The top-level execute tool description now reinforces: *"Use
run_in_background=true for long-running processes. Never use shell '&'
for backgrounding."*

### Fix 3: Detect trailing `&` and auto-promote to background
Defense-in-depth: if the model still uses `command &`, we strip the `&`
and promote to `run_in_background=true` automatically. Correctly
distinguishes `&` from `&&`.

### Fix 4: Process group isolation (`Setpgid`)
New platform-specific files (`proc_other.go` / `proc_windows.go`)
following the same pattern as `agentssh/exec_other.go`. Every spawned
process gets its own process group.

### Fix 5: Process group signaling
`signal()` now uses `syscall.Kill(-pid, sig)` on Unix to signal the
entire process group, ensuring child processes from shell pipelines are
also cleaned up.

## Testing
All existing `agent/agentproc` tests pass. Both packages compile
cleanly.
2026-03-16 16:22:10 -04:00
2022-04-04 11:55:06 -05:00

Coder Logo Light Coder Logo Dark

Self-Hosted Cloud Development Environments

Coder Banner Light Coder Banner Dark

Quickstart | Docs | Why Coder | Premium

discord release godoc Go Report Card OpenSSF Best Practices OpenSSF Scorecard license

Coder enables organizations to set up development environments in their public or private cloud infrastructure. Cloud development environments are defined with Terraform, connected through a secure high-speed Wireguard® tunnel, and automatically shut down when not used to save on costs. Coder gives engineering teams the flexibility to use the cloud for workloads most beneficial to them.

  • Define cloud development environments in Terraform
    • EC2 VMs, Kubernetes Pods, Docker Containers, etc.
  • Automatically shutdown idle resources to save on costs
  • Onboard developers in seconds instead of days

Coder Hero Image

Quickstart

The most convenient way to try Coder is to install it on your local machine and experiment with provisioning cloud development environments using Docker (works on Linux, macOS, and Windows).

# First, install Coder
curl -L https://coder.com/install.sh | sh

# Start the Coder server (caches data in ~/.cache/coder)
coder server

# Navigate to http://localhost:3000 to create your initial user,
# create a Docker template and provision a workspace

Install

The easiest way to install Coder is to use our install script for Linux and macOS. For Windows, use the latest ..._installer.exe file from GitHub Releases.

curl -L https://coder.com/install.sh | sh

You can run the install script with --dry-run to see the commands that will be used to install without executing them. Run the install script with --help for additional flags.

See install for additional methods.

Once installed, you can start a production deployment with a single command:

# Automatically sets up an external access URL on *.try.coder.app
coder server

# Requires a PostgreSQL instance (version 13 or higher) and external access URL
coder server --postgres-url <url> --access-url <url>

Use coder --help to get a list of flags and environment variables. Use our install guides for a complete walkthrough.

Documentation

Browse our docs here or visit a specific section below:

  • Templates: Templates are written in Terraform and describe the infrastructure for workspaces
  • Workspaces: Workspaces contain the IDEs, dependencies, and configuration information needed for software development
  • IDEs: Connect your existing editor to a workspace
  • Administration: Learn how to operate Coder
  • Premium: Learn about our paid features built for large teams

Support

Feel free to open an issue if you have questions, run into bugs, or have a feature request.

Join our Discord to provide feedback on in-progress features and chat with the community using Coder!

Integrations

We are always working on new integrations. Please feel free to open an issue and ask for an integration. Contributions are welcome in any official or community repositories.

Official

Community

Contributing

We are always happy to see new contributors to Coder. If you are new to the Coder codebase, we have a guide on how to get started. We'd love to see your contributions!

Hiring

Apply here if you're interested in joining our team.

Languages
Go 74.4%
TypeScript 23.5%
Shell 0.8%
HCL 0.4%
PLpgSQL 0.3%
Other 0.2%