Ethan bd6cc1aaf2 feat(coderd): add stop_workspace chatd tool and recovery classification (#24997)
## Summary

Adds a `stop_workspace` tool to chatd so the model can recover from the
"workspace running but agent dead" failure mode (e.g. an OOM that leaves
the workspace running but the agent unreachable) by stopping and then
starting the workspace.

<img width="924" height="742" alt="image"
src="https://github.com/user-attachments/assets/279dedb6-6e29-4fe1-8754-3a1f01e538bf"
/>



## What changed

**New `stop_workspace` chatd tool**
(`coderd/x/chatd/chattool/stopworkspace.go`). Mirrors `start_workspace`:
shares `WorkspaceMu` to serialize with create/start, waits for any
in-progress build before issuing a stop, and is idempotent only after a
successful Stop transition. Failed stop builds re-attempt rather than
reporting success.

**New `chatStopWorkspace` coderd hook** (`coderd/exp_chats.go`). Mirrors
`chatStartWorkspace` minus the `RequireActiveVersion` gate. Stop should
not be blocked by template version policy.

**Differentiated recovery sentinels** (`coderd/x/chatd/chatd.go`).
`errChatAgentDisconnected` instructs the model to call `stop_workspace`
then `start_workspace`. `errChatDialTimeout` instructs a single retry,
then user escalation if it repeats. The previous single message
conflated transient and persistent failures.

**Two-signal recovery gate.** Recovery is only surfaced when a tool call
times out *and* a fresh DB read of the latest workspace agent says
`Disconnected`. The previous draft escalated on the DB read alone, which
would fire on a 30-second heartbeat blip (e.g. agent respawn) and prompt
a destructive stop/start unnecessarily.

**Cache-hit disconnected handling** now clears the cache and retries a
fresh dial before escalating, rather than returning the recovery
sentinel immediately. Latest-agent classification uses
`GetWorkspaceAgentsInLatestBuildByWorkspaceID` instead of the chat's
bound `AgentID`, so stale bindings after a rebuild don't misclassify.

**Shared chattool helpers** in `coderd/x/chatd/chattool/chattool.go`:
`latestWorkspaceBuildAndJob`, `publishBuildBinding`,
`provisionerJobTerminal`. Applied to both `start_workspace` and
`stop_workspace`.

## Notes

- Reverts an earlier draft that widened `ask_user_question` to root
standard turns. Plan-mode-only behavior is restored.
- The `stop_workspace` tool currently renders via the generic chat
tool-call UI. A follow-up frontend PR will prettify the `stop_workspace`
tool and style it like the `start_workspace` tool.
- Never-connected (`Timeout` status) agents are intentionally excluded
from recovery. They indicate template or startup failure, not the
running-but-dead case this PR targets.

Closes CODAGT-315
2026-05-11 16:23:07 +10:00
2022-04-04 11:55:06 -05:00

Coder Logo Light Coder Logo Dark

Self-Hosted Cloud Development Environments and AI Agents

Coder Banner Light Coder Banner Dark

Quickstart | Docs | Why Coder | Premium

discord release godoc Go Report Card OpenSSF Best Practices OpenSSF Scorecard license

Coder is a self-hosted platform for cloud development environments and AI coding agents. Workspaces are defined with Terraform, connected through a secure Wireguard® tunnel, and automatically shut down when not used. Coder Agents runs a native AI coding agent whose loop executes in the control plane on your infrastructure, with no API keys in workspaces.

  • Define cloud development environments in Terraform
    • EC2 VMs, Kubernetes Pods, Docker Containers, etc.
  • Automatically shutdown idle resources to save on costs
  • Onboard developers in seconds instead of days
  • Delegate coding work to AI agents on your infrastructure
    • Bring any model (Anthropic, OpenAI, Google, Bedrock, self-hosted)
    • No LLM credentials in workspaces, user identity on every action
    • Centralized model governance, cost tracking, and audit logging

Coder platform showing templates and a running workspace

Quickstart

The most convenient way to try Coder is to install it on your local machine and experiment with provisioning cloud development environments using Docker (works on Linux, macOS, and Windows).

# First, install Coder
curl -L https://coder.com/install.sh | sh

# Start the Coder server (caches data in ~/.cache/coder)
coder server

# Navigate to http://localhost:3000 to create your initial user,
# create a Docker template and provision a workspace

Install

The easiest way to install Coder is to use the install script for Linux and macOS. For Windows, use the latest ..._installer.exe file from GitHub Releases.

curl -L https://coder.com/install.sh | sh

You can run the install script with --dry-run to see the commands that will be used to install without executing them. Run the install script with --help for additional flags.

See install for additional methods.

Once installed, you can start a production deployment with a single command:

# Automatically sets up an external access URL on *.try.coder.app
coder server

# Requires a PostgreSQL instance (version 13 or higher) and external access URL
coder server --postgres-url <url> --access-url <url>

Use coder --help to get a list of flags and environment variables. See the install guides for a complete tutorial.

Documentation

Browse the documentation or visit a specific section below:

  • Workspaces: Workspaces contain the IDEs, dependencies, and configuration information needed for software development
  • Templates: Templates are written in Terraform and describe the infrastructure for workspaces
  • Coder Agents: Delegate coding work to AI agents running on your self-hosted infrastructure
  • Administration: Learn how to operate Coder
  • Premium: Learn about paid features built for large teams
  • IDEs: Connect your existing editor to a workspace

Support

Feel free to open an issue if you have questions, run into bugs, or have a feature request.

Join our Discord to provide feedback on in-progress features and chat with the community using Coder!

Integrations

New integrations are always in progress. Open an issue to request one. Contributions are welcome in any official or community repository.

Official

Community

Contributing

New contributors are always welcome. If you are new to the Coder codebase, see the contribution guide to get started.

Hiring

Apply on the careers page if you are interested in joining the team.

Languages
Go 74.4%
TypeScript 23.5%
Shell 0.8%
HCL 0.4%
PLpgSQL 0.3%
Other 0.2%