Files
Ethan de9cdca77e fix(coderd): handle external-agent workspaces honestly in chat (#24969)
## Summary

Make Coder's chat agent honest about workspaces that use
`coder_external_agent`. Three behaviors change so the chat stops
pretending it can drive an external workspace through to a usable state
on its own.

<img width="859" height="537" alt="image"
src="https://github.com/user-attachments/assets/0561442b-95f1-4a2d-853c-7e3776114680"
/>


## Problem

External agents are not started by Coder. The user has to run `coder
agent` on their own host with a token Coder generates. Before this
change, the chat agent treated those workspaces like any other:

- `create_workspace` would enqueue a build for an external-agent
template and then wait minutes (~22 worst case) for an agent that was
never going to come up.
- When mid-turn tool calls dialed an external agent that was not
connected, the chat burned the full 30-second dial timeout and returned
generic "the workspace may need to be restarted from the Coder
dashboard" guidance, which is not the action the user can take.
- Nothing told the chat (or the user, through the chat) that the next
action lives outside Coder.

## Fix

Three changes scoped to `coderd/x/chatd/`:

1. **`create_workspace` blocks templates with external agents.** The
tool reads `template_versions.has_external_agent` for the template's
active version and refuses external-agent templates with a message
instructing the chat to pick a different template, or to have the user
create and start the workspace themselves and then attach it.

2. **Attaching an existing external workspace stays open.** No
selection-time gate on attachment; users can still bind a working
external workspace to a chat.

3. **External-agent-aware error handling on connection.** Two
complementary changes both predicated on proven connectivity failures
rather than every dial error:

- **`getWorkspaceConn` preflight and timeout handling.** Before opening
a connection, the cache-miss path reads the agent's status from the
already-loaded row. If the selected agent is external and clearly
offline according to the existing `isAgentUnreachable` helper
(`Disconnected` or `Timeout`, never `Connecting`), it returns an
external-agent-specific error immediately instead of waiting out the
30-second dial timeout. `Connecting` external agents fall through to the
dial so a user who just started the agent on their host can still
succeed in the same turn. The preflight only fires when the agent is
still the latest selected agent for the workspace, so stale-binding
recovery via `dialWithLazyValidation` is unaffected. The post-dial
rewrite is limited to the dial timeout sentinel; stale/no-agent bindings
and non-timeout dial failures preserve their original errors.

- **`waitForAgentReady` timeout-branch rewrite.** The 2-minute retry
loop used by `create_workspace` and `start_workspace` runs unchanged for
all agents. When the loop's outer deadline elapses, the timeout branch
substitutes the external-agent message in place of the raw dial error if
the agent belongs to an external resource.

This applies the same pattern that the cache-hit path of
`getWorkspaceConn` already used (`isAgentUnreachable` returning
`errChatAgentDisconnected`), extended to the cache-miss path and to the
readiness helper, with the external-agent-aware error rewrite layered
only on confirmed offline or timeout paths.

Closes CODAGT-314
2026-05-08 13:51:13 +10:00

48 lines
1.8 KiB
Go

package chattool
import (
"context"
"github.com/google/uuid"
"github.com/coder/coder/v2/coderd/database"
)
// ExternalAgentResourceType is the Terraform resource type for externally
// managed agents.
const ExternalAgentResourceType = "coder_external_agent"
const createWorkspaceExternalAgentMessage = "create_workspace cannot create workspaces from templates with externally managed agents. " +
"Use list_templates to choose a different template, or if the user wants " +
"to use an external workspace, they should create it and start it up fully " +
"themselves first, then attach it to this chat"
const externalAgentNotConnectedMessage = "workspace uses an externally managed agent that has not connected yet. " +
"The user needs to start the workspace externally and make sure the " +
"external agent is connected, then try again"
const externalAgentDisconnectedMessage = "workspace uses an externally managed agent that is currently offline. " +
"The user needs to reconnect the external agent on its host, then try again"
// ExternalAgentUnavailableMessage explains how to make an externally managed
// agent usable based on its connection history.
func ExternalAgentUnavailableMessage(agent database.WorkspaceAgent) string {
if agent.FirstConnectedAt.Valid {
return externalAgentDisconnectedMessage
}
return externalAgentNotConnectedMessage
}
// IsExternalWorkspaceAgent reports whether agent belongs to an external
// resource.
func IsExternalWorkspaceAgent(ctx context.Context, db database.Store, agent database.WorkspaceAgent) (bool, error) {
if db == nil || agent.ResourceID == uuid.Nil {
return false, nil
}
resource, err := db.GetWorkspaceResourceByID(ctx, agent.ResourceID)
if err != nil {
return false, err
}
return resource.Type == ExternalAgentResourceType, nil
}