Files
coder/coderd/x
Ethan de9cdca77e fix(coderd): handle external-agent workspaces honestly in chat (#24969)
## Summary

Make Coder's chat agent honest about workspaces that use
`coder_external_agent`. Three behaviors change so the chat stops
pretending it can drive an external workspace through to a usable state
on its own.

<img width="859" height="537" alt="image"
src="https://github.com/user-attachments/assets/0561442b-95f1-4a2d-853c-7e3776114680"
/>


## Problem

External agents are not started by Coder. The user has to run `coder
agent` on their own host with a token Coder generates. Before this
change, the chat agent treated those workspaces like any other:

- `create_workspace` would enqueue a build for an external-agent
template and then wait minutes (~22 worst case) for an agent that was
never going to come up.
- When mid-turn tool calls dialed an external agent that was not
connected, the chat burned the full 30-second dial timeout and returned
generic "the workspace may need to be restarted from the Coder
dashboard" guidance, which is not the action the user can take.
- Nothing told the chat (or the user, through the chat) that the next
action lives outside Coder.

## Fix

Three changes scoped to `coderd/x/chatd/`:

1. **`create_workspace` blocks templates with external agents.** The
tool reads `template_versions.has_external_agent` for the template's
active version and refuses external-agent templates with a message
instructing the chat to pick a different template, or to have the user
create and start the workspace themselves and then attach it.

2. **Attaching an existing external workspace stays open.** No
selection-time gate on attachment; users can still bind a working
external workspace to a chat.

3. **External-agent-aware error handling on connection.** Two
complementary changes both predicated on proven connectivity failures
rather than every dial error:

- **`getWorkspaceConn` preflight and timeout handling.** Before opening
a connection, the cache-miss path reads the agent's status from the
already-loaded row. If the selected agent is external and clearly
offline according to the existing `isAgentUnreachable` helper
(`Disconnected` or `Timeout`, never `Connecting`), it returns an
external-agent-specific error immediately instead of waiting out the
30-second dial timeout. `Connecting` external agents fall through to the
dial so a user who just started the agent on their host can still
succeed in the same turn. The preflight only fires when the agent is
still the latest selected agent for the workspace, so stale-binding
recovery via `dialWithLazyValidation` is unaffected. The post-dial
rewrite is limited to the dial timeout sentinel; stale/no-agent bindings
and non-timeout dial failures preserve their original errors.

- **`waitForAgentReady` timeout-branch rewrite.** The 2-minute retry
loop used by `create_workspace` and `start_workspace` runs unchanged for
all agents. When the loop's outer deadline elapses, the timeout branch
substitutes the external-agent message in place of the raw dial error if
the agent belongs to an external resource.

This applies the same pattern that the cache-hit path of
`getWorkspaceConn` already used (`isAgentUnreachable` returning
`errChatAgentDisconnected`), extended to the cache-miss path and to the
readiness helper, with the external-agent-aware error rewrite layered
only on confirmed offline or timeout paths.

Closes CODAGT-314
2026-05-08 13:51:13 +10:00
..