mirror of
https://github.com/coder/coder.git
synced 2026-06-03 04:58:23 +00:00
de9cdca77e
## Summary Make Coder's chat agent honest about workspaces that use `coder_external_agent`. Three behaviors change so the chat stops pretending it can drive an external workspace through to a usable state on its own. <img width="859" height="537" alt="image" src="https://github.com/user-attachments/assets/0561442b-95f1-4a2d-853c-7e3776114680" /> ## Problem External agents are not started by Coder. The user has to run `coder agent` on their own host with a token Coder generates. Before this change, the chat agent treated those workspaces like any other: - `create_workspace` would enqueue a build for an external-agent template and then wait minutes (~22 worst case) for an agent that was never going to come up. - When mid-turn tool calls dialed an external agent that was not connected, the chat burned the full 30-second dial timeout and returned generic "the workspace may need to be restarted from the Coder dashboard" guidance, which is not the action the user can take. - Nothing told the chat (or the user, through the chat) that the next action lives outside Coder. ## Fix Three changes scoped to `coderd/x/chatd/`: 1. **`create_workspace` blocks templates with external agents.** The tool reads `template_versions.has_external_agent` for the template's active version and refuses external-agent templates with a message instructing the chat to pick a different template, or to have the user create and start the workspace themselves and then attach it. 2. **Attaching an existing external workspace stays open.** No selection-time gate on attachment; users can still bind a working external workspace to a chat. 3. **External-agent-aware error handling on connection.** Two complementary changes both predicated on proven connectivity failures rather than every dial error: - **`getWorkspaceConn` preflight and timeout handling.** Before opening a connection, the cache-miss path reads the agent's status from the already-loaded row. If the selected agent is external and clearly offline according to the existing `isAgentUnreachable` helper (`Disconnected` or `Timeout`, never `Connecting`), it returns an external-agent-specific error immediately instead of waiting out the 30-second dial timeout. `Connecting` external agents fall through to the dial so a user who just started the agent on their host can still succeed in the same turn. The preflight only fires when the agent is still the latest selected agent for the workspace, so stale-binding recovery via `dialWithLazyValidation` is unaffected. The post-dial rewrite is limited to the dial timeout sentinel; stale/no-agent bindings and non-timeout dial failures preserve their original errors. - **`waitForAgentReady` timeout-branch rewrite.** The 2-minute retry loop used by `create_workspace` and `start_workspace` runs unchanged for all agents. When the loop's outer deadline elapses, the timeout branch substitutes the external-agent message in place of the raw dial error if the agent belongs to an external resource. This applies the same pattern that the cache-hit path of `getWorkspaceConn` already used (`isAgentUnreachable` returning `errChatAgentDisconnected`), extended to the cache-miss path and to the readiness helper, with the external-agent-aware error rewrite layered only on confirmed offline or timeout paths. Closes CODAGT-314