mirror of
https://github.com/coder/coder.git
synced 2026-06-02 20:48:20 +00:00
docs/cursor-self-hosted-workers
3825 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
e1b1c7ec5b |
feat: resize chat image attachments client-side for provider budgets (#24533)
Anthropic rejects inline images over 5,242,880 bytes, but our upload endpoint accepts images up to 10 MiB — so 5–10 MiB images were reaching the provider and failing. This adds two layers of protection: the browser resizes oversized images before upload, and the server rejects any that still slip through before an upstream request is issued. Client-side resizing uses `createImageBitmap` with `resizeWidth`/`resizeHeight` to clamp the decoded bitmap at decode time, then iteratively shrinks on an `OffscreenCanvas` (falling back to `HTMLCanvasElement`) until the output fits the applicable budget. Anthropic (and Bedrock-hosted Claude — fantasy's bedrock provider is a thin wrapper around the Anthropic client) uses a ~5 MiB budget; other providers use a ~10 MiB budget to stay under the server cap. Doing the resize in the browser avoids decoding attacker-controlled image bytes in `coderd` (image-bomb DoS surface). Server-side, `chatFileResolver` now takes a provider string and looks up the inline-image cap via a new `chatprovider.InlineImageByteCap` helper; oversized `image/*` files for capped providers are rejected with a pre-classified `chaterror` before the SDK call. The backstop fires for older clients, direct API callers, or any image that was committed to the composer before the user switched to a stricter provider. Attachments commit to composer state synchronously with a new `"processing"` `UploadState` so paste+Enter can't dispatch before the resize finishes; the `"uploading"` send gate now covers both states. Dismissed-while-resizing attachments are tracked in a `WeakSet` so a late swap can't resurrect a removed file. Closes CODAGT-215 |
||
|
|
03c5ae3f70 |
test(coderd/database): enhance FinalizeStaleChatDebugRows integration test (#24693)
Closes #24090 Enhances the existing `TestFinalizeStaleChatDebugRows` test with three missing coverage areas: 1. **Error JSON preservation**: verifies pre-existing error payloads are not overwritten by finalization 2. **Timestamp correctness**: verifies `updated_at` and `finished_at` match the `@now` parameter across all finalized row paths 3. **Null error preservation**: verifies finalized steps that had no error keep a null error column No production code changed. Test passes against Postgres. > 🤖 Generated by Coder Agents <details><summary>Review notes</summary> - Enhances existing test rather than adding a new one, the existing test was the right place - Covers stale, orphaned, and cascade finalization timestamp assertions - Preserves both pre-existing error JSON and null error values during finalization </details> |
||
|
|
b94a0aebcd |
fix(coderd/externalauth): isolate TestValidateToken transports to fix flake (#25015)
This change uses separate http clients/transports in TestValidateToken subtests. Previously parallel subtests of TestValidateToken shared a http.DefaultTransport. When one subtest's httptest.Server.Close() ran in t.Cleanup, it called http.DefaultTransport.CloseIdleConnections, which could interrupt connection(s) used in another subtest. |
||
|
|
273e828442 | fix: remove advisor reasoning configuration (#25030) | ||
|
|
8c08aa1f6c |
fix(coderd/x/chatd): wake after async chat-row UPDATEs commit (#25036)
The async title-generation and turn-summary goroutines launched from processChat run autocommit UPDATEs on the chat row after finishActiveChat has set the chat to pending and signalWake has fired. If the row lock from one of those UPDATEs is held while acquireLoop's processOnce runs, AcquireChats's FOR UPDATE SKIP LOCKED skips the freshly-pending chat and returns no rows. The wake is then consumed with no acquisition, and the chat sits in pending until the next acquireTicker (default 1s). Wake again after each UPDATE commits. The second wake covers the race window without changing the transaction semantics. Closes coder/internal#1500 |
||
|
|
6fa7e84761 |
test(coderd/x/chatd): skip TestExploreChatSendMessageCannotMutateMCPSnapshot (#25023)
Skips `TestExploreChatSendMessageCannotMutateMCPSnapshot` while the chatd redesign is in flight. The test exposes a self-interrupt race in `processChat`'s control-pubsub subscriber that is structurally fixed by the redesign in #24444; skipping until then matches the existing `TestSubscribeRelayEstablishedMidStream` skip in `enterprise/coderd/x/chatd/chatd_test.go`. Relates to https://github.com/coder/internal/issues/1493. |
||
|
|
2ff05608d2 |
test: stabilize chatdebug heartbeat threshold test (#25022)
`launchHeartbeat` could miss a stale-threshold update during startup if `SetStaleAfter` ran after the heartbeat ticker was created but before the goroutine subscribed to `thresholdChan`. In that case, the heartbeat kept the old interval until a future tick, and the mock-clock test could time out waiting for `Ticker.Reset` without advancing time. Subscribe to `thresholdChan` before reading the heartbeat interval so the channel consistently invalidates the interval. The regression test now changes the threshold while ticker creation is trapped, making the startup race deterministic. Closes https://github.com/coder/internal/issues/1513 |
||
|
|
100ebd9f3b |
test(coderd/x/chatd): deflake advisor chain mode snapshot (#25021)
`TestAdvisorChainMode_SnapshotKeepsFullHistory` was using the generic active chatd test server, which leaves periodic pending-chat polling enabled. That made the test inconsistent with the other OpenAI Responses API tests and allowed stale pending pubsub notifications to interrupt the second turn before the advisor request was observed. Use the existing OpenAI Responses test server helper so pending-chat acquisition is delayed and the test only starts processing after the SendMessage pending notification has been published. Closes https://github.com/coder/internal/issues/1510 |
||
|
|
ef0151601e |
feat: report insufficient quota build failures in chat tools (#24956)
## Summary When a workspace build fails because the user is over their group quota, the chat tools currently surface the failure as a bare `"workspace build failed: insufficient quota"` string with no machine-readable error code and no visibility into the user's current usage. Agents and the UI cannot distinguish quota failures from any other Terraform error, so users see an opaque message and have no clear path to recovery. This PR tags quota failures with a typed error code at the source and propagates it through the chat tool layer so callers can react to it explicitly. Relates to CODAGT-20 ## Changes **Provisioner runner** - Add `InsufficientQuotaErrorCode = "INSUFFICIENT_QUOTA"` and set it explicitly at the `commitQuota` failure site via a new `failedWorkspaceBuildfCode` helper, so `provisioner_jobs.error_code` is populated only on the genuine quota path. The substring matcher used for externally produced sentinels (e.g. `"missing parameter"`, `"required template variables"`) is intentionally not extended; provider errors that happen to mention "insufficient quota" stay classified as generic build failures. **SDK and API contract** - Add `JobErrorCodeInsufficientQuota` and a `JobIsInsufficientQuotaErrorCode` helper to `codersdk`. - Extend the swagger `enums` tag on `ProvisionerJob.ErrorCode` to include `INSUFFICIENT_QUOTA`. - Regenerate `coderd/apidoc`, `docs/reference/api/*`, and `site/src/api/typesGenerated.ts`. **chattool create_workspace / start_workspace** - `waitForBuild` now returns a typed `*workspaceBuildError` carrying both the message and the `JobErrorCode`, instead of a bare error string. - New `quotaerror.go` introduces a structured `quotaErrorResult` (with `error_code`, `title`, `message`, `build_id`, and optional `quota`) and a best-effort `workspaceQuotaDetails` lookup that wraps owner authorization internally and fetches `credits_consumed` and `budget` from the database. Quota lookup failures (including authorization failures) never block the failure payload. - On quota-coded build failures, both `create_workspace` and `start_workspace` now return the structured response (with the recovery guidance inlined into `message`) instead of the bare `"insufficient quota"` string. This applies to all three failure paths: post-creation, an in-progress existing build, and a freshly triggered start build. Non-quota build failures continue to use the existing `buildToolResponse` / `newBuildError` path. - Owner authorization is wrapped only on the call sites that need it (the `CreateFn` and `StartFn` invocations and the quota-detail lookup), so idempotent fast paths (already running, already in progress, existing-workspace early returns) do not pay for an extra RBAC round-trip or fail when role lookup is transient. ## Out of scope - No changes to quota math, allowances, or bypass behavior. - No automatic retries. - No new quota-inspection tools and no changes to MCP `coder_create_workspace` (which returns immediately and never observed the build outcome here). - No frontend UI changes; those will land in a follow-up PR that consumes the new `INSUFFICIENT_QUOTA` code. |
||
|
|
5e4647bb3a | fix: synchronize access to drpc Send (#24600) | ||
|
|
6a200a49d3 |
feat: refresh dynamic parameters on secret changes (#24786)
Publishes user secret create, update, and delete events and subscribes dynamic parameter websockets to authorized owner secret changes. Secret changes trigger fresh renders with monotonic response IDs, with backend tests covering subscription authorization and websocket refresh behavior. |
||
|
|
6b0518d051 |
fix: state-aware queued message promotion (#24819)
PromoteQueued now branches on chat status: synth tool results before the user message on requires_action, deferred reorder + Waiting on running so the worker's persist+auto-promote keeps partial output. Stale heartbeat falls through to the synchronous path; GetStaleChats picks up Waiting+queue to recover post-cleanup-crash. Endpoint returns 202. Closes CODAGT-119 |
||
|
|
0bfb9f6f13 |
feat: show agent turn summary in agents sidebar (#24942)
Persists the agent-generated turn-end summary on `chats` and shows it as the Agents sidebar subtitle when present, falling back to the model name. Errors still take precedence. > Mux is acting on Mike's behalf. ## What changes **Storage.** New nullable `last_turn_summary` column on `chats` (migration `000486`). New `UpdateChatLastTurnSummary` query normalizes blank/whitespace input to `NULL`, preserves `updated_at` (so the chat does not jump to the top of the sidebar on summary writes), and uses an `expected_updated_at` stale-write guard so an older async summary cannot overwrite a newer turn. **Backend.** `coderd/x/chatd/chatd.go` decouples summary generation from webpush. Generated summaries persist for completed parent turns even when webpush is unconfigured or has no subscriptions. The same generated text is reused as the webpush body when webpush is configured, so the summary model is not called twice. Generic fallback push text is no longer persisted; it clears any stale summary instead. Error/interrupt/pending-action terminal paths clear `last_turn_summary` for the latest turn. **Frontend.** `AgentsSidebar.tsx` subtitle priority is now `errorReason || lastTurnSummary || modelName`, normalized via the existing `asNonEmptyString` helper from `blockUtils.ts`. ## Tests - `TestUpdateChatLastTurnSummary` (database): success, whitespace-to-NULL, stale guard rejects, `updated_at` preserved. - `TestUpdateLastTurnSummaryRejectsStaleWrites` (chatd internal): direct stale-`expected_updated_at` test. - `TestSuccessfulChatPersistsTurnSummaryWithoutWebPush`: persistence works without webpush subscriptions. - `TestSuccessfulChatSendsWebPushWithSummary`: same generated text drives both DB and push body. - `TestSuccessfulChatSendsWebPushFallbackWithoutSummaryForEmptyAssistantText`: fallback text is not persisted. - `TestErroredChatClearsLastTurnSummaryAndSendsWebPush`: error path clears the field. - `TestInterruptChatDoesNotSendWebPushNotification`: interrupt path clears the field, no push fires. - `AgentsSidebar.test.tsx`: subtitle priority for summary-present, error-wins, no-summary fallback, whitespace fallback. - `AgentsSidebar.stories.tsx`: `ChatWithTurnSummary` and `ChatWithTurnSummaryAndError`. ## Notes - No backfill. Existing chats keep showing the model name until their next turn completes. - Parent chats only in this iteration; the field is rendered on any `Chat` if a future change extends generation to children. - Decoupling generation from webpush adds quickgen model calls for completed parent turns that previously skipped generation when no subscriptions existed. Existing parent-only, assistant-text-present, `PushSummaryModel` configured, and bounded-timeout gates keep this behavior bounded. |
||
|
|
3d03c393d2 |
chore: bump Go toolchain version to 1.26.2 (#24975)
## Summary Bumps the repository Go toolchain from 1.25.9 to 1.26.2 across local development, CI, dogfood Docker images, and Nix builds. ## Changes - Update `go.mod` and the shared setup-go action to Go 1.26.2. - Update dogfood Ubuntu image Go versions and the official linux-amd64 tarball checksum. - Move Nix Go module builds from `buildGo125Module` to `buildGo126Module`. - Regenerate API docs affected by Go 1.26 stdlib URL documentation changes. ## Validation - `./scripts/check_go_versions.sh` - `make fmt` - `make lint` - `make build-slim` - `make test TEST_SHORT=1` - `make pre-commit` > 🤖 This PR was created with the help of Coder Agents, and needs a human review. 🧑💻 |
||
|
|
a74015fc85 |
refactor: make store and chatID explicit parameter arguments in chattools (#24850)
Fixes CODAGT-175 Addresses a review finding in https://github.com/coder/coder/pull/23827 that the nil-guards for both `database.Store` and `chatID` are both dead code in practice in the `chattool` package. - Modifies the return signatures require passing both `database.Store` and `chatID` explicitly as positional arguments instead of just parameter struct keys. - Drops the nil-guards for `database.Store` and `chatID`. |
||
|
|
2949028dcb | fix(coderd): enforce chat owner check on processing handlers (#24921) | ||
|
|
e5c7fdff86 |
fix(coderd/x/chatd): refresh chat status and bound subscriber reads on Subscribe (#24095)
Tightens the chat stream subscription path on a few related axes. None of these changes touch the steady-state event flow; they all concern the subscribe handshake. ## Motivation `Server.Subscribe` carries three responsibilities that were entangled: 1. Authorize the caller against the chat row. 2. Arm local + pubsub subscriptions before any DB reads (subscribe-first-then-query). 3. Build the initial snapshot from a fresh chat row, message history, and queue. When all three live in one function and share the request context, a few unfortunate behaviors fall out: - The HTTP handler's middleware already loaded and authorized the chat row, but `Subscribe(chatID)` discarded it and re-fetched on every WebSocket connection. - The chat row used to populate the initial `status` event was loaded *before* the pubsub subscription was armed, so a status transition that happened in that window was silently lost. - Control-path DB reads inherited whatever context the caller passed in. A caller without a deadline could wedge a subscriber goroutine indefinitely on a stalled DB. - A transient failure of the chat re-read collapsed the entire subscription instead of degrading gracefully. ## What changes **Split the auth boundary out into the type signature.** A new `SubscribeAuthorized(ctx, chat, ...)` takes the already-authorized row directly. The HTTP handler in `coderd/exp_chats.go` calls it with the chat row from `httpmw.ChatParam`, eliminating the redundant `GetChatByID`. `Subscribe(chatID)` is preserved as a thin wrapper for callers that don't have a chat row in hand (tests, internal callers); it does the auth lookup and delegates. **Re-read the chat after arming subscriptions.** Inside `SubscribeAuthorized`, after the local stream and pubsub subscriptions are active, we reload the chat row to populate the initial `status` event and any enterprise relay setup. Combined with the existing subscribe-first-then-query pattern, this closes the gap where a status transition between the middleware's load and the subscription arming would not appear in either the initial snapshot or a live notification. **Fall back to the middleware row on refresh failure.** If the post-subscription refresh fails (transient DB blip, brief pool exhaustion), we log a warning and reuse the row that proved authorization in the first place. Messages, queue, and pubsub are all independent of this row, so the stream still works; the initial `status` is just slightly stale and self-corrects via the next pubsub event. **Bound subscriber control-path DB reads.** A new `streamSubscriberControlFetchContext` helper applies a 5-second fallback timeout only when the caller has no deadline of their own. Used at the chat refresh, the initial queue load, and the queue-update goroutine following pubsub notifications. HTTP-driven callers pass through unchanged; background callers can no longer hang forever on a stalled DB and leak subscriber goroutines, pubsub subscriptions, and `chatStreams` entries. |
||
|
|
0dc4c34efc |
fix: regenerate API docs for ChatErrorKind (#24989)
Follow-up to #24955 (`refactor: move chat error kinds into codersdk`), which moved `ChatErrorKind` into `codersdk` but did not refresh the generated apidoc artifacts. As a result, `make gen` was producing a dirty tree on `main`. This PR is the output of running `make gen -B` on a clean checkout of `main`. Only generated files are touched: - `coderd/apidoc/docs.go` - `coderd/apidoc/swagger.json` - `docs/reference/api/chats.md` - `docs/reference/api/schemas.md` The diff adds the `codersdk.ChatErrorKind` schema and replaces the previously-untyped `kind: string` fields on `codersdk.ChatError` and `codersdk.ChatRetryEvent` with references to the new enum. |
||
|
|
46a60e6d5d |
refactor: move chat error kinds into codersdk (#24955)
Moves the chat error kind taxonomy from `coderd/x/chatd/chaterror` into `codersdk.ChatErrorKind` and types `ChatError.Kind` / `ChatStreamRetry.Kind` so generated TypeScript exposes an SDK-owned union, including `usage_limit`. Backend chat classification now references the SDK constants directly while preserving the existing JSON string values. Keeps chat usage-limit admission failures on their existing 409 response shape. The frontend maps structured usage-limit responses to the SDK-owned `usage_limit` kind, uses generated `TypesGen.ChatErrorKind` directly, and removes the local string union and alias. |
||
|
|
2874d4b4cd |
feat: add chat debug retention purge (#24943)
> Mux is acting on Mike's behalf. Adds configurable retention for chat debug data, including the purge query, updated_at index, site config, experimental API, SDK types, frontend lifecycle setting, and docs. The purge deletes debug runs older than the configured retention window and relies on existing cascades to delete steps. The default retention is 30 days, and setting the value to 0 disables the purge. |
||
|
|
e48d12160f |
fix(coderd): cut DB fan-out on agent instance-identity auth (#24973)
## Summary Restores `v2.33.0-rc.2`-equivalent query cost for agent instance-identity auth on `v2.33.0-rc.3`, which currently saturates the pgx pool when multiple agents share an instance ID. Customer report against rc.3 traced 233× `Internal error fetching provisioner job resource. fetch related workspace build: context canceled` 500s during a 50-minute incident window to this path. Backport to `release/2.33` will follow as a separate PR after this merges. ## Root cause [#24325](https://github.com/coder/coder/pull/24325) ("support multiple agents with shared instance-identity auth") rewrote `coderd/workspaceresourceauth.go::handleAuthInstanceID` to use the new `:many` agent lookup followed by a per-candidate filter loop. Each iteration synchronously calls `GetWorkspaceResourceByID` and `GetProvisionerJobByID`. Both go through `dbauthz`, and both fan out into the same `provisioner_job → workspace_build → workspace` cascade because `authorizeProvisionerJob` always re-authorizes the workspace via `GetWorkspaceBuildByJobID → GetWorkspaceByID`. The handler then re-fetches resource and job again for the surviving agent. Net effect on the agent-auth happy path: | | SQL | RBAC | |---|---|---| | rc.2 baseline | 13 | 5 | | rc.3 today, 1 agent | 19 | 7 | | rc.3 today, 2 agents | 26 | 9 | | **After this PR, 1 agent** | **6** | **3** | | **After this PR, 2 agents** | **7** | **3** | Under load, the rc.3 chain blocks on pool acquire and the request blows past the 30s HTTP write timeout. ## Changes ### 1. System fast-path on `authorizeProvisionerJob` (`coderd/database/dbauthz/dbauthz.go`) Add an `AsSystemRestricted` early-return at the top of `authorizeProvisionerJob`. Instance-identity auth has already proven cloud identity before reaching the DB layer, so re-authorizing the workspace on every provisioner-job lookup is pure overhead. Existing `GetWorkspaceAgentsByInstanceID` already uses the same fast-path pattern. ```go if err := q.authorizeContext(ctx, policy.ActionRead, rbac.ResourceSystem); err == nil { return nil } ``` ### 2. Drop survivor re-fetch in `handleAuthInstanceID` (`coderd/workspaceresourceauth.go`) Capture the provisioner job alongside each candidate during the filter loop so the survivor lookup does not re-fetch resource and job after selection. The previous code fired the resource→job→build→workspace cascade twice for the surviving agent. ## Tests Adds `TestAuthorizeProvisionerJob_SystemFastPath` in `coderd/database/dbauthz/dbauthz_test.go` with two sub-tests: - `AsSystemRestricted/SkipsCascade` — strict mock fails the test if `GetWorkspaceBuildByJobID` or `GetWorkspaceByID` is called. - `NonSystemActor/StillCascades` — auditor (no `ResourceSystem`) still pays the cascade and produces a `NotAuthorized` error, proving the fast-path is gated correctly. Updates 12 existing dbauthz suite cases to expect the new `ResourceSystem.Read` check ahead of the workspace/template-version check, with `FailSystemObjectChecks()` to force the slow path. Existing integration coverage in `TestPostWorkspaceAuthAWSInstanceIdentity/Ambiguous/{SingleAgent, MultipleAgentsWithSelector, MultipleAgentsNoSelector, SubAgentExcluded, ...}` exercises Part 2 end-to-end and continues to pass. ## Footprint - 3 files changed, +166/-48 - No SQL changes - No `make gen` - No migrations - No audit-table updates ## Validation - [x] `go test ./coderd/database/dbauthz/` — full suite, ~6s - [x] `go test -run TestPostWorkspaceAuth ./coderd/` — instance-identity handler tests - [x] `go test -run TestProvisionerJob ./coderd/` - [x] `go test -run TestWorkspaceAgent ./coderd/` - [x] `go test ./coderd/provisionerdserver/` - [x] `gofmt -l` clean ## Alternatives considered - **SQL-side filter:** rewrite `GetWorkspaceAgentsByInstanceID` to join `workspace_resources`/`provisioner_jobs` and filter `job.type = 'workspace_build'` server-side, eliminating the filter loop entirely. Cleaner long-term, but changes generated SQL and is too much surface for a release-branch hotfix. Worth doing as a follow-up. - **Full revert of #24325:** removes the multi-agent feature outright; conflicts with downstream commits ([#24441](https://github.com/coder/coder/pull/24441), [#24438](https://github.com/coder/coder/pull/24438), [#24313](https://github.com/coder/coder/pull/24313)). Reserved as fallback if the surgical fix doesn't hold under load testing. |
||
|
|
e7360da974 | docs: generate Chats API docs from swagger annotations (#24830) | ||
|
|
1b2a1af097 |
feat: report user secrets adoption summary in telemetry (#24854)
Add a deployment-wide user secrets summary to the telemetry snapshot so we can track adoption of user secrets The summary reports: - A breakdown of secrets by which injection fields are populated: EnvNameOnly, FilePathOnly, Both, Neither - The distribution of secrets per user (max, p25, p50, p75, p90) All metrics are scoped to active non-system users. Soft-deleted users are excluded. The percentile distribution is computed across the entire active non-system user base, including users with zero secrets, so the percentiles reflect deployment-wide adoption. Assisted by Coder Agents. |
||
|
|
1ba7139f21 |
feat: add session correlation fields to BoundaryLog proto (#24809)
1 of 9 [next >>](https://github.com/coder/coder/pull/24811) RFC: [Bridge ↔ Boundaries Correlation RFC](https://www.notion.so/Bridge-Boundaries-Correlation-313d579be59281f3b4efdbfd6896775a) Adds three new proto fields for boundary session correlation. **`ReportBoundaryLogsRequest`** - `session_id` (string, field 2) — UUID generated by boundary at startup, shared across all batches from a single run. - `confined_process` (string, field 3) — name of the confined process (e.g. `claude-code`, `codex`, `copilot`). **`BoundaryLog`** - `sequence_number` (uint64, field 4) — monotonically increasing counter per session, primary ordering key when boundary is in use. `BoundaryLog.time` already existed at field 2; no change needed there. API version bumped to v2.9. No behaviour change in coderd or the agent. This is a pure schema bump that the boundary repo will consume in its own stack. > Generated by Coder Agents |
||
|
|
4751416b29 |
fix!: persist structured chat errors (#24919)
**Breaking change for changelog:**
> `codersdk.Chat.last_error` now returns a structured `ChatError` object
(`{message, kind, provider, retryable, status_code, detail}`) instead of
a plain string. The chats API is experimental
(`/api/experimental/chats`), so this ships without a deprecation cycle;
consumers reading `chat.last_error` as a string must update to read
`chat.last_error.message`. SDK/generated TypeScript terminal error
payloads now use the single `ChatError` type; the live stream error
payload type is renamed from `ChatStreamError` to `ChatError`.
Persisted chat errors now carry the same provider-specific detail (kind,
provider, retryable, HTTP status, optional detail) as the live stream,
so refreshing a failed chat rehydrates with the full structured error
instead of a one-line headline.
Existing rows are migrated in place: legacy text errors are wrapped into
`{message, kind: "generic"}` so already-errored chats still render, and
rows with `last_error IS NULL` stay NULL. Internally, persisted fallback
decoding now reuses the existing `chaterror.KindGeneric` constant, with
no JSON value change.
Closes CODAGT-239
|
||
|
|
7e01edeb8e |
fix: align chat attachment picker with allowed file types (#24917)
The agent chat composer only advertised image uploads to the OS file picker and filtered drag-and-drop and paste events to `image/*`, even though the backend accepts text, CSV, JSON, PDF, and a narrower set of image types. Move the allowed chat attachment media types into `codersdk` so the frontend picker and backend enforcement share one source of truth. Use the generated TypeScript list to drive the file input `accept` attribute and the drag-and-drop and paste filters, while adding common text extensions so platforms without MIME registrations still surface those files in the picker. |
||
|
|
632dcdb63a | feat: add personal chat model overrides (#24715) | ||
|
|
fad69df710 | fix: correct SCIM Swagger try it out URLs (#24779) | ||
|
|
0bb09935bc |
feat: add computer-use provider selection for AI agents (#24772)
Adds a deployment-wide setting to select the computer-use provider (Anthropic or OpenAI) for AI agents, plus the OpenAI computer-use runner needed to honor that selection. The setting is stored in `site_configs` under `agents_computer_use_provider`, defaults to Anthropic when unset, and is exposed via experimental GET/PUT endpoints under `/api/experimental/chats/config/computer-use-provider`. The chatd computer-use tool now dispatches to either `runAnthropicComputerUse` or `runOpenAIComputerUse` based on the resolved provider, with provider-specific result metadata for OpenAI screenshots. Frontend adds a provider dropdown to the Agents Experiments settings page nested under the virtual desktop toggle, with disabled state handling while virtual desktop is off and skeleton loaders while config queries are in flight. Hugo and Codex review follow-up: - Uses shared provider validation and clearer computer-use constant names. - Removes stale OpenAI pending-safety-checks commentary. - Documents why provider result metadata is needed for OpenAI screenshots. - Keeps the computer-use subagent visible when provider credentials are missing, then returns a clear spawn-time configuration error. - Uses OpenAI's recommended 1600x900 screenshot geometry to preserve the native 16:9 aspect ratio. - Moves OpenAI-specific computer-use helpers into `coderd/x/chatd/chatopenai/computeruse` after rebasing onto the provider package refactor in `main`. - Converts OpenAI pixel scroll deltas to Coder desktop wheel-click amounts. - Preserves OpenAI pointer modifiers with key down/up desktop actions and rejects unsupported non-left double-click buttons explicitly. - Maps OpenAI back/forward side-button clicks to browser navigation key actions. - Defaults omitted OpenAI click buttons to left-click. - Retries mouse release cleanup if the final OpenAI drag release fails. - Keeps computer-use subagent availability messages stable when provider config cannot be loaded, while logging the backend error. - Releases remaining OpenAI modifier keys if a synthetic key-up cleanup action fails. - Updates Storybook interaction stories so provider snapshots show the selected final provider. > Mux updated this PR description on behalf of Mike. |
||
|
|
da6e708bd2 |
fix(coderd/externalauth): detect concurrent refresh race to prevent cache poisoning (#24228)
<!-- If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting. --> Fixes https://github.com/coder/coder/issues/17069 Builds on #24332 and #24334 which addressed token persistence and rate limit handling. ## Problem When multiple concurrent requests race to refresh an expiring external auth token, providers with single-use refresh tokens (e.g., GitHub Apps) reject all but the first refresh attempt with `bad_refresh_token`. The losing request caches this transient error in the `oauth_refresh_failure_reason` database column and clears the refresh token, blocking all subsequent refresh attempts until the user manually re-authenticates. This is common for users with multiple terminals, IDE connections, or workspaces open, all of which poll the external auth endpoint and trigger concurrent refreshes when the token nears expiry. Database analysis showed 5 of 7 affected users failed within 5-10 seconds of token expiry, matching the Go oauth2 library's `expiryDelta` window. ## Fix Before caching a `bad_refresh_token` failure, re-read the external auth link from the database. If the refresh token has changed (indicating a concurrent caller already refreshed successfully), return the winner's updated link instead of writing a failure. An empty-string guard ensures a token cleared by another loser isn't mistaken for a winner's successful refresh. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Garrett Delfosse <garrett@coder.com> |
||
|
|
033ed0bb82 |
feat: add admin-configurable chat title generation model (#24838)
Adds an admin-configurable deployment-wide setting that controls which
model is used for chat title generation. Admins can pick any enabled
chat model config from the Agents settings page, or leave the setting
unset to keep the existing fast-models-then-chat-model fallback
algorithm.
When a model is selected, both automatic and manual title generation use
only that model, with no silent fallback. When the configured model is
disabled, missing credentials, or otherwise unusable, automatic title
generation skips entirely (best-effort) and manual title regeneration
returns a clear error, so admins notice the misconfiguration instead of
silently routing title traffic through another provider.
## Surface
- New deployment-wide setting stored as a `site_configs` row
(`agents_chat_title_generation_model_override`).
- New experimental endpoint `GET/PUT
/api/experimental/chats/config/model-override/{context}`.
- Frontend: title generation now appears as a third dropdown on the
Agents admin settings page alongside the existing general and explore
context overrides.
## DRY refactors folded in
Title generation is integrated as a third value of the existing
`ChatModelOverrideContext` type alongside `general` and `explore`,
sharing the parameterized HTTP route, SDK methods, generated types, and
frontend API plumbing rather than introducing a parallel surface. The
`Agent` prefix was dropped from the type and route since title
generation is not a delegated agent.
The chatd model-override resolver is also shared.
`resolveConfiguredModelOverride` now takes a `failureMode` parameter:
- Subagent overrides use soft failure: misconfigured overrides are
logged and the parent model is used.
- Title generation uses hard failure: misconfigured overrides return an
explicit error so manual title regeneration surfaces the
misconfiguration and automatic title generation skips instead of
silently falling back.
> Mux is acting on Mike's behalf.
|
||
|
|
203b0a9df8 |
refactor(coderd/x/chatd): extract OpenAI logic into chatopenai package (#24788)
Extracts OpenAI-specific logic from `coderd/x/chatd` into `coderd/x/chatd/chatopenai` so the main chat path no longer references `fantasyopenai` directly for chain mode info, response IDs, web search tooling, or option mapping. Structural refactor. The only deliberate behavioral narrowing is consolidating Responses store checks and related keyed option or metadata access on `opts[fantasyopenai.Name]`. That is documented by `TestIsResponsesStoreEnabledIgnoresMalformedNonOpenAIKey` and is unreachable in production where Responses options always live under `fantasyopenai.Name`. Summary: - Moves OpenAI Responses chain mode info, response ID helpers, web search tool construction, and provider option conversion into `chatopenai`. - Keeps Anthropic, Google, OpenRouter, and Vercel provider branches as thin, existing code paths. - `chatopenai` only imports `chatprompt` from chatd subpackages. It does not import `chatd`, `chatloop`, `chatprovider`, or `chaterror`. - Follow-up review fixes align helper names, keyed provider option access, map cloning behavior, and PR documentation with the extracted package boundary. - Final sweep trims unused chain-mode state, removes a duplicate store-check test case, drops an unused provider-tool parameter, and shares the chat-message test helper through `chattest`. > Mux is updating this PR on Mike's behalf. |
||
|
|
761adfa62a |
fix(coderd/rbac): grant template admin read access to dormant workspaces (#23554)
## Summary Template admins could **list** dormant workspaces but could not **read** them individually, resulting in a 403 when clicking into a dormant workspace that was visible in the list. ### Root cause - `GetWorkspaces` prepares its SQL authorization filter against the `workspace` type, so dormant workspaces pass the filter and appear in list results for template admins. - `GetWorkspaceByID` calls `RBACObject()` on the fetched workspace, which returns `workspace_dormant` when `DormantAt` is set. Template admin had zero permissions on that type, so the read was denied. ### Fix Add `ActionRead` on `ResourceWorkspaceDormant` to both the site-level `template-admin` and org-level `organization-template-admin` roles. This is the minimal grant needed to make list and read consistent without granting any lifecycle permissions (create, update, delete, stop, etc.) on dormant workspaces. Split the `WorkspaceDormant` RBAC test case into `WorkspaceDormantRead` (read only) and `WorkspaceDormant` (remaining write/lifecycle actions) so the new permission can be asserted independently. Template admins can read non-dormant workspaces, so this is the only missing permission. --- > This PR was generated with Coder agents and reviewed by a human. |
||
|
|
3a153ebb15 |
fix(coderd/x/chatd): replay retry phase on subscribe (#24569)
Retry events were previously fire-and-forget, so subscribers that connected after a retry started only saw durable history plus `status=running` and could not tell the stream was backing off. Keep the current retry phase in `chatStreamState`, capture it atomically with subscriber registration, replay it in the initial snapshot for same-chat late joiners, and clear it when streaming resumes or ends so reconnects get consistent retry state without duplicate delivery at the subscription boundary. Relates to CODAGT-139 |
||
|
|
d889ba1842 |
feat: add user_oidc auth type for MCP servers (#24793)
Adds a 5th MCP server authentication mode, `user_oidc` ("User OIDC
Identity"), that forwards the calling user's OIDC access token from
`user_links.oauth_access_token` to the upstream MCP server as
`Authorization: Bearer <token>`.
The token is read from `user_links` and refreshed transparently via
`oauth2.TokenSource` before each MCP request. No new per-MCP-server
secret storage and no per-user connect/disconnect step.
**Limitation**: only users who logged in via OIDC have a forwardable
token. Users authenticated via password or GitHub will see requests sent
without an `Authorization` header, and the upstream MCP server is
expected to respond with 401. A pluggable token source (e.g. CLI-minted
E2E tokens) is left as future work.
<details>
<summary>Implementation notes</summary>
- Schema: new
`coderd/database/migrations/000481_mcp_user_oidc_auth.{up,down}.sql`
relaxes the `mcp_server_configs.auth_type` CHECK constraint to include
`user_oidc`. Down migration deletes affected rows before restoring the
old constraint.
- SDK validation: `codersdk/mcp.go` extends `oneof` for
`CreateMCPServerConfigRequest` and `UpdateMCPServerConfigRequest`.
- Handler: `coderd/mcp.go` adds `case "user_oidc":` to the
field-clearing switch on update. The existing list and detail handlers
already report `auth_connected = true` for any non-`oauth2` auth type.
- Header construction: `coderd/x/chatd/mcpclient/mcpclient.go`
introduces a `UserOIDCTokenSource` interface and adds the `user_oidc`
case to `buildAuthHeaders`. `ConnectAll` / `connectOne` /
`buildAuthHeaders` gain `userID uuid.UUID, oidcSrc UserOIDCTokenSource`
parameters.
- Wiring: `coderd/x/chatd/chatd.go` adds `OIDCTokenSource` to `Config` /
`Server` and passes `chat.OwnerID` plus the source through `ConnectAll`.
`coderd/coderd.go` constructs the source next to the `chatd.New` call
when `options.OIDCConfig` is non-nil.
- Token source: `oidcMCPTokenSource` lives in `coderd/mcp.go`. It reads
the user's OIDC link, refreshes via `oauth2.TokenSource`, and writes the
refreshed token back to `user_links`. Logic is duplicated from
`provisionerdserver.ObtainOIDCAccessToken` to avoid an MCP ->
provisionerdserver dependency. The two copies must be kept in sync; a
comment on `oidcMCPTokenSource` records this.
- Frontend: `MCPServerAdminPanel.tsx` adds the new dropdown option, an
explanatory helper block (no admin-configurable fields), and a Storybook
story (`CreateServerUserOIDC`).
- Tests:
- `mcpclient_test.go`: `TestConnectAll_UserOIDCAuth`,
`TestConnectAll_UserOIDCAuth_NoLink`,
`TestConnectAll_UserOIDCAuth_NilSource`. All existing tests updated for
the new signature.
- `mcp_test.go`: extends `TestMCPServerConfigsAuthConnected` to assert
`auth_connected=true` for `user_oidc`; adds
`TestMCPServerConfigsUserOIDCClearsFields` and
`TestMCPServerConfigsUserOIDCDirect`.
- Docs: `docs/ai-coder/agents/platform-controls/mcp-servers.md`
describes the new mode and its OIDC-only limitation.
</details>
This PR was created by Coder Agents.
---------
Co-authored-by: Coder Agents <agents@coder.com>
|
||
|
|
6b9637d85a | feat: replace pgcoordinator pg_notify triggers with app-level Publish() (#24717) | ||
|
|
efda5c2c12 |
feat: disable Git controls when Git is not active (#24673)
closes CODAGT-148 In chats with no Git context (no repositories known to the watcher, no PR tab, no remote diff), the refresh button fires an "Unable to refresh git status" toast because the watcher WebSocket never opens. Derive `isGitActive = repositories.size > 0 || showRemoteTab` in `GitPanel` and use it to: - Disable the refresh button, unified-diff toggle, and split-diff toggle with a "Git is not set up for this chat" tooltip. - Show a dedicated empty state explaining how to enable Git, replacing the generic "No pushed changes yet" copy. Chats with at least one repository or a PR tab are unaffected; all controls remain enabled and behave as before. Adds a `GitNotActive` Storybook story with play-function assertions covering the disabled controls and empty-state copy. |
||
|
|
2f855904be |
refactor: add dbgen chat generators and migrate test boilerplate (#24497)
- Adds chat-related dbgen generators covering defaults, overrides, and message field mapping. - Replaces raw single-row chat, message, provider, and model-config setup in tests with dbgen helpers. - Simplifies chat seed helpers after moving fixture setup into dbgen. > Generated with [Coder Agents](https://coder.com/agents). |
||
|
|
04cc983833 |
fix: add preset support to MCP tools (#24694)
The chat tools (`read_template`, `create_workspace`) did not surface or respect template version presets. Presets were invisible to the LLM and preset parameter defaults were never applied at workspace creation. The `toolsdk` MCP surface had the same gap (ref #24695, now subsumed here). ## What this changes - **`read_template`** returns presets with `id`, `name`, `default`, `description`, `icon`, `parameters`, and `desired_prebuild_instances` (when set), so the LLM can pick the right preset and prefer prebuilt-backed ones. - **`create_workspace`** accepts a `preset_id`. The wsbuilder applies preset parameter defaults and may claim a prebuilt workspace. - **`start_workspace`** does *not* accept a preset. Presets are a creation-time choice; subsequent starts use the workspace's existing version and parameters. Users who need a specific preset or version on an existing chat can create the workspace out-of-band (CLI / UI / API) with the desired configuration and attach the chat to it. - **`toolsdk`** gains `GetTemplate` (with presets including `desired_prebuild_instances`), preset support on `CreateWorkspace`, and preset + `rich_parameters` support on `CreateWorkspaceBuild`. The `template_version_preset_id` description warns about preset/version affinity. > 🤖 Generated with [Coder Agents](https://coder.com/agents) and reviewed by a human. Co-authored-by: Max schwenk <maschwenk@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
fb6e00de18 |
fix: preserve rollback errors in runTx (#24598)
Previously, `runTx` could lose a deferred rollback failure when returning an existing transaction error, because the rollback path could not update the final return value. https://go.dev/play/p/AhBK31lO0Gd |
||
|
|
e57525002c |
chore: remove agents experiment flag and mark feature as beta (#24432)
Remove the `ExperimentAgents` feature flag so the Agents feature is always available without requiring `--experiments=agents`. The feature is now in beta. Existing deployments that still pass `--experiments=agents` will get a harmless "ignoring unknown experiment" warning on startup. ### Changes **Backend:** - Remove `RequireExperimentWithDevBypass` middleware from chat and MCP server routes - Always include `AgentsAccessRole` in assignable site roles (later refactored to org-scoped on main; rebase keeps that) - Always set `AgentsTabVisible = true`, then drop the entire dead `AgentsTabVisible` metadata pipeline (Go htmlState field, populateHTMLState goroutine, HTML meta tag, useEmbeddedMetadata registration, mock); no production consumer reads it. `AgentsNavItem` already gates on `permissions.createChat`. - Make `blob:` CSP `img-src` addition unconditional - Remove `ExperimentAgents` constant, `DisplayName` case, and `ExperimentsKnown` entry **CLI:** - Graduate the agents TUI from `coder exp agents` to `coder agents` (moved from `AGPLExperimental()` to `CoreSubcommands()`) - Drop the `agent` alias so it does not collide with the hidden workspace-agent command - Rename implementation files `cli/exp_agents_*.go` -> `cli/agents_*.go` and internal identifiers (`expChatsTUIModel` -> `chatsTUIModel`, `newExpChatsTUIModel` -> `newChatsTUIModel`, `setupExpAgentsBackend` -> `setupAgentsBackend`, `startExpAgentsSession` -> `startAgentsSession`, `expAgentsPtr` -> `agentsPtr`, `expAgentsSession` -> `agentsSession`, `TestExpAgents*` -> `TestAgents*`). `expClient` (the `*codersdk.ExperimentalClient` local) is kept; `coderd/exp_chats*.go` and other still-experimental `cli/exp_*.go` commands are intentionally untouched. **Frontend:** - Remove experiment check from `AgentsNavItem` - render when `canCreateChat` is true - Remove `agentsEnabled` experiment check from `WorkspacesPage`, then gate `chatsByWorkspace` on `permissions.createChat` so users without chat access don't trigger the per-page DB query (Copilot review feedback) - Add `FeatureStageBadge` (beta) next to the Coder logo in the Agents sidebar (desktop + mobile) **Docs:** - Remove experiment flag setup instructions from `early-access.md` and `getting-started.md` (and rename `early-access.md`'s "Enable Coder Agents" heading to "Set up Coder Agents", since there is no enablement step left) - Update `chats-api.md` and `getting-started.md`'s Chats API note to say "beta" instead of "experimental" - `docs/manifest.json`: drop "experimental" from the Chats API sidebar description - `make gen` regenerated `docs/reference/cli/agents.md` and the CLI index - `scripts/check_emdash.sh`: exclude `cli/testdata/*.golden` and `enterprise/cli/testdata/*.golden` from the new repo-wide emdash lint, since serpent emits emdash borders in every generated `--help` golden file **Tests:** - Remove `ExperimentAgents` setup from all test files (14 occurrences across 7 files) - Update stale "with the agents experiment" comments in `coderd/x/chatd/integration_test.go` and `coderd/mcp_test.go` <img width="1185" height="900" alt="image" src="https://github.com/user-attachments/assets/b420bc8f-41d6-42c6-abd8-ad572533d651" /> > 🤖 Generated by Coder Agents |
||
|
|
17409a515c |
feat(coderd): wire advisor runtime to admin config (#24622)
## Summary Wire the advisor runtime into `chatd`: read the admin config on every `runChat`, gate tool registration and system-prompt guidance on a **single eligibility boolean**, register the `advisor` built-in tool, and apply the exclusive-tool policy from PR 1. ## Motivation This is the integration seam where PRs 1–3 come together into an actual user-visible feature. Gating is deliberately root-chat-only for the initial rollout; child/sub-agent chats still do not see the tool or the guidance block. ## Changes ### `coderd/x/chatd/chatd.go` - `loadAdvisorConfig(ctx, logger)` reads the admin config (from PR 3) on each run. If `ModelConfigID` is set, it resolves the override model via `configCache.ModelConfigByID`; otherwise it falls back to the outer chat's model and provider options. Reasoning effort is plumbed into provider options via `applyAdvisorReasoningEffort`. - One computed `advisorEligible` boolean drives **both** tool registration (after skill tools, before MCP tools) and guidance injection via `chatprompt.InsertSystem(prompt, chatadvisor.ParentGuidanceBlock)`. - `setAdvisorPromptSnapshot` closures capture the outer prompt state at the right points in the lifecycle (`renderPlanPathPrompt`, `ReloadMessages`, `PrepareMessages`) so the advisor handoff uses the same context the outer model saw. - `ExclusiveToolNames["advisor"] = true` is passed to `chatloop.Run()` so mixed batches are rejected cleanly (PR 1 machinery). - `builtinToolNames["advisor"] = true` so metrics keep advisor distinct from the generic `mcp` label. ### Child-chat guard - Child/sub-agent chats deliberately do not see the advisor tool or guidance block, to avoid recursion/cost blowups until the pattern is proven. This is covered by `TestAdvisorGating_ChildChat` (currently skipped pending a rewrite against the new `plan`/`explore` subagent infrastructure; core gating logic is still exercised by `TestAdvisorGating_Disabled` and `TestAdvisorGating_RootChat`). ## Stack context This is **PR 4 of 6** in the advisor feature stack. It depends on PRs 1–3. ## Scope / non-goals - No frontend changes. The feature is invocable via the backend but renders generically until PR 5. - No separate provider runner; the nested advisor call reuses the existing model/provider path. - No DB migration. ## Validation - `go test ./coderd/x/chatd/... -run TestAdvisor` - `go build ./...` - `make lint` --- <details> <summary>📋 Implementation Plan (shared across the advisor stack)</summary> # Plan: Add a Mux-style advisor tool to coder agents/chatd ## Outcome Add a first-class `advisor` tool to agent chats in `coderd/x/chatd` that feels native to Coder: - it is a built-in server-side tool, not an MCP/dynamic-tool workaround; - it performs a nested **tool-less** model call for strategic advice; - it is exposed only when eligible, and the prompt mentions it only when it is actually available; - it is treated as a **planning-only** tool so it does not run alongside action tools in the same batch; - it tracks usage/cost separately enough for operators to reason about it; - it has a minimally polished UI in the Agents page; - and it ships with explicit dogfooding evidence, including screenshots and repro videos. ## Design decisions to lock before coding 1. **Primary architecture:** native built-in tool in `chattool/`, backed by a small `chatadvisor` package. 2. **Nested model execution:** reuse chatd's existing model/provider stack for a one-step, tool-less advisor call rather than inventing a new provider pathway. 3. **Execution policy:** treat `advisor` as an exclusive/planning-only tool; mixed batches must return structured policy errors and force the model to retry cleanly. 4. **Availability:** initial rollout is for root agent chats only; disable for child/sub-agent chats until recursion/cost policy is proven. 5. **Prompt sync:** use one eligibility boolean to drive both tool registration and advisor guidance injection. 6. **Persistence/cost split:** MVP should keep advisor usage visible in result metadata and server metrics; only add DB schema if product/billing explicitly needs queryable advisor-specific cost. 7. **UI scope:** generic tool rendering is an acceptable temporary milestone during backend bring-up, but the release candidate should include a dedicated lightweight advisor renderer. ## Delivery model The work should be executed as coordinated workstreams with one integration owner and parallel contributors for low-conflict areas. The integration owner should own `coderd/x/chatd/chatd.go` because prompt assembly, tool registration, and model resolution all converge there. ## Detailed workstreams ### Repo evidence used for this plan <details> <summary>Mux reference and current chatd seams</summary> **Mux reference implementation** - `src/node/services/tools/advisor.ts` — native advisor tool implementation. - `src/common/constants/advisor.ts` — advisor prompt/constants and truncation policy. - `src/common/utils/tools/tools.ts` — conditional tool registration. - `src/node/services/streamContextBuilder.ts` — injects advisor guidance only when the tool is available. **Current chatd seams** - `coderd/x/chatd/chatd.go` - `processChat()` — tool assembly, prompt assembly, and chatloop invocation. - `resolveChatModel()` — current model/provider/key resolution seam. - `type Config struct` — server-level chatd configuration surface. - `coderd/x/chatd/chatloop/chatloop.go` - `Run()` — main streaming/model loop. - `executeTools()` — built-in tool execution/batching seam. - `coderd/x/chatd/chattool/` — built-in tool implementations. - `site/src/pages/AgentsPage/components/ChatElements/tools/Tool.tsx` — tool renderer dispatch. - `site/src/pages/AgentsPage/components/ChatConversation/messageParsing.ts` and `ConversationTimeline.tsx` — tool/result merge and rendering flow. </details> ### Workstream map and ownership | Workstream | Primary owner | Main files | Can run in parallel? | Done when | |---|---|---|---|---| | 0. Integration + gating | Integration lead | `coderd/x/chatd/chatd.go` | No; central merge lane | Tool registration, prompt sync, and model selection are wired together | | 1. Advisor runtime + tool | Backend agent | new `coderd/x/chatd/chatadvisor/`, new `coderd/x/chatd/chattool/advisor.go` | Yes | Tool can perform a tool-less advisor call in memory and return structured results | | 2. Planning-only execution policy | Chatloop agent | `coderd/x/chatd/chatloop/chatloop.go`, related tests | Yes | Mixed `advisor` + action-tool batches are rejected cleanly and deterministically | | 3. Metrics/usage/config | Backend/telemetry agent | `chatd.go`, `chatloop/metrics.go`, optional config plumbing | Partially; coordinate with integration lead | Advisor usage is separately visible in metadata/metrics and limits are enforced | | 4. Frontend rendering | Frontend agent | `site/.../tools/Tool.tsx`, new `AdvisorTool.tsx`, stories | Yes after result schema stabilizes | Advisor renders as a readable card and story tests pass | | 5. Dogfood + QA evidence | QA agent | dev server, Storybook, dogfood output | After backend + UI are usable | Repro videos, screenshots, and a concise QA report exist | ### Parallelization rules - **Do not split `coderd/x/chatd/chatd.go` across multiple execution agents without an integration lead.** That file owns prompt building, tool registration, model resolution, and cost persistence. - Workstreams 1 and 2 can be developed in parallel and then stacked onto the integration branch. - Workstream 4 should begin once the backend result schema is agreed on, even if the backend is still behind a feature flag. - Any agent that needs to re-check Mux behavior should clone `coder/mux` into a temporary directory (for example, `$(mktemp -d)/mux`) and inspect it read-only; do not vendor or copy code from Mux directly. ## Phase 0 — Preflight and guardrails ### Goals - Align the team on the smallest shippable architecture. - Prevent scope creep into MCP/dynamic-tool/sub-agent variants. - Decide upfront what is MVP vs. follow-up. ### Tasks 1. **Confirm the MVP boundary.** - Ship a built-in advisor tool first. - Do **not** make MCP, dynamic tools, or sub-agents the primary implementation. - Do **not** add transient streaming phases in the first backend PR unless they fall out almost for free. 2. **Confirm local workflow hygiene before coding.** - Ensure the repo is using the project git hooks from `scripts/githooks`. - Do not bypass hooks with `--no-verify`. - Use `./scripts/develop.sh` for the full dev server rather than manual build/run commands. 3. **Lock the model-selection policy.** - **Recommended MVP:** advisor uses the same resolved provider/model/cost config as the current chat, with advisor-specific max-output and usage caps. - **Follow-up only if required:** add a separate `AdvisorModelConfigID`-style override that resolves through the existing `configCache`/model-config path. Do not invent a new free-form `provider:model` parser if chatd already stores provider/model separately. 4. **Lock the persistence policy.** - **Recommended MVP:** no DB migration. Persist advisor-visible metadata in the tool result and record separate metrics in memory/Prometheus. - **Only if product/billing explicitly asks for queryable advisor cost:** add a later DB migration or usage table, following the normal `queries/*.sql` + `make gen` workflow. 5. **Create an execution ADR note in the work item or tracking doc.** - Capture: built-in tool, tool-less nested call, root-chat-only rollout, exclusive execution policy, MVP no-DB-migration default. ### Quality gate - Everyone on the team can state the same answers to these questions: - Is advisor a built-in tool? **Yes.** - Can advisor run with action tools in the same batch? **No.** - Does advisor get tools of its own? **No.** - Is a DB migration required for MVP? **No, unless billing insists.** ## Phase 1 — Build the advisor runtime and tool wrapper ### Goals Create the core advisor implementation in a way that is easy to test and keeps `chattool/` thin. ### Files to add - `coderd/x/chatd/chatadvisor/types.go` - `coderd/x/chatd/chatadvisor/guidance.go` - `coderd/x/chatd/chatadvisor/handoff.go` - `coderd/x/chatd/chatadvisor/runtime.go` - `coderd/x/chatd/chatadvisor/runner.go` - `coderd/x/chatd/chattool/advisor.go` ### Responsibilities by file 1. **`types.go`** - Define the input/result schema used by the tool and UI. - Keep the result shape close to Mux so the UI and model both have predictable cases. - Recommended result variants: - `advice` - `limit_reached` - `error` Recommended shape: ```go type AdvisorArgs struct { Question string `json:"question"` } type AdvisorResult struct { Type string `json:"type"` Advice string `json:"advice,omitempty"` Error string `json:"error,omitempty"` AdvisorModel string `json:"advisor_model,omitempty"` RemainingUses int `json:"remaining_uses,omitempty"` Usage *AdvisorUsageResult `json:"usage,omitempty"` } ``` 2. **`guidance.go`** - Hold two strings: - the nested advisor system prompt; - the parent-agent guidance block to inject into the outer system prompt. - The nested advisor prompt must say, in plain language: - you are advising the parent agent; - you do not address the end user directly; - you do not claim actions happened; - you return concise strategic guidance and tradeoffs. 3. **`runtime.go`** - Define the per-run runtime state. - Recommended fields: - resolved model + model config; - provider keys/options reused from the outer chat; - `MaxUsesPerRun`; - `MaxOutputTokens`; - atomic/current call counter; - callback(s) to obtain the current prompt snapshot and current-step snapshot; - optional metrics/usage hook. - Add fail-fast validation for impossible config: nil model, non-positive limits, empty prompt builders, etc. 4. **`handoff.go`** - Build the advisor handoff message from: - the explicit question; - the exact prompt/messages the parent model just used; - the current step's text/reasoning snapshot, if available; - the most recent relevant tool outputs, if they are already in the prompt snapshot. - **Important:** use the already-prepared outer prompt tail, not a fresh DB reload. That keeps the advisor aligned with compaction and the exact context the outer model saw. - Apply hard truncation budgets with recent-context bias. 5. **`runner.go`** - Execute the nested advisor call. - **Recommended implementation:** call `chatloop.Run()` in an in-memory, one-step mode: - `Tools: nil` - `ProviderTools: nil` - `MaxSteps: 1` - `PersistStep`: capture the assistant output in memory instead of writing DB rows - Reuse the existing provider/model/cost path instead of building a second provider runner. - Assert that no tool definitions are passed to the nested call. 6. **`chattool/advisor.go`** - Keep this file thin and consistent with other built-ins. - Responsibilities: - decode `AdvisorArgs`; - validate `Question` is non-empty and bounded; - call the `chatadvisor` runner; - return a structured tool response. ### Defensive programming requirements - Assert `Question` is non-empty after trimming. - Assert runtime limits are positive. - Assert the nested advisor call runs with zero tools/provider tools. - Assert `AdvisorResult.Type` is one of the known variants before returning. - Assert remaining uses never goes negative. ### Acceptance criteria - A unit test can call the advisor tool with a fake model and receive a stable `advice` result. - The nested advisor call is impossible to run with tools accidentally attached. - The core logic lives in `chatadvisor/`, not embedded inside `chatd.go`. ## Phase 2 — Wire advisor into chatd and keep prompt/tool availability in sync ### Goals Register the tool in the right place, expose it only when eligible, and inject system guidance only when the tool is present. ### Files to modify - `coderd/x/chatd/chatd.go` - optionally a small helper file if `chatd.go` becomes too crowded ### Tasks 1. **Compute one eligibility boolean in `processChat()`.** Recommended inputs: - server-level advisor enabled flag; - root chat only (`chat.ParentChatID == uuid.Nil` or equivalent existing root/child check); - a usable resolved model/provider exists; - optional experiment/workspace/org gate if product wants staged rollout. 2. **Create the runtime once per outer chat run.** - Use the model/config/keys resolved by `resolveChatModel()`. - Reuse provider options from the current chat's `ChatModelCallConfig`. - Set `MaxUsesPerRun` and `MaxOutputTokens` from advisor config defaults. 3. **Register the tool in the built-in tool block.** - Insert after the skill tools and before MCP tools in `processChat()`. - Record `builtinToolNames["advisor"] = true` so metrics stay bounded. 4. **Inject advisor guidance into the outer system prompt using the same boolean.** - Use `chatprompt.InsertSystem()` in the same prompt assembly path that already injects user/system instructions. - Place the block near the existing instruction insertion, before plan-path/skill context blocks. - Wrap the guidance in an explicit tag like `<advisor-guidance>` so it is easy to spot in tests and future refactors. 5. **Keep advisor out of child chats for the first release.** - That avoids recursion/cost blowups with `spawn_agent` / `wait_agent` flows. - Document this explicitly in the rollout notes and tests. ### Acceptance criteria - If advisor is disabled, neither the tool nor the prompt guidance appears. - If advisor is enabled, both the tool and the prompt guidance appear. - Root chats can use advisor; child chats cannot. - Built-in tool names include `advisor` so metrics do not collapse it into the generic `mcp` label. ## Phase 3 — Enforce planning-only execution policy in `chatloop` ### Goals Prevent the model from calling `advisor` and action tools in the same execution batch. ### Files to modify - `coderd/x/chatd/chatloop/chatloop.go` - related chatloop tests ### Recommended implementation Keep the MVP small; do **not** build a general policy engine yet. 1. Add a minimal field to `chatloop.RunOptions`, for example: ```go ExclusiveToolName *string ``` 2. In `Run()` / `executeTools()`, detect the case where the exclusive tool appears in the same local-tool batch as any other locally executed tool. 3. When that happens, synthesize structured tool-result errors for the affected calls instead of executing anything in the batch. - `advisor` should receive a clear error like: _advisor must be called by itself before action tools_. - The sibling action tools should receive a paired policy error like: _this tool was skipped because advisor must run alone_. 4. Let the outer model see those tool errors and retry cleanly. - This is simpler and safer than partial execution or hidden deferral. - It preserves deterministic transcript history for debugging. 5. Pass the just-finished step snapshot into the tool execution context. - The advisor runtime should be able to see the current step's text/reasoning content, because that is often the best hint about what the outer model is trying to decide. ### Why this is the right fit - It matches the intended semantics: advisor is consulted **before** taking action. - It avoids subtle race conditions caused by concurrent built-in tool execution. - It keeps the behavior easy to test with fake models. ### Acceptance criteria - A model-emitted batch containing only `advisor` succeeds. - A model-emitted batch containing `advisor` plus any other locally executed tool returns deterministic policy errors and executes nothing. - Non-advisor tool execution stays unchanged for normal chats. ## Phase 4 — Usage limits, metrics, and configuration ### Goals Make advisor safe to operate without over-designing billing/storage in the first release. ### Files to modify - `coderd/x/chatd/chatd.go` - `coderd/x/chatd/chatloop/metrics.go` as needed - `coderd/x/chatd/chatd.go` `Config` struct and constructor path - optional follow-up config/db files only if a separate advisor model or persistent billing is required ### Tasks 1. **Add explicit server config knobs for MVP.** Recommended fields on `chatd.Config` or a nested advisor config struct: - `AdvisorEnabled bool` - `AdvisorMaxUsesPerRun int` - `AdvisorMaxOutputTokens int64` 2. **Track usage per outer run.** - Reset the counter for each `processChat()` invocation. - Return `remaining_uses` in the tool result. - Return `limit_reached` when the cap is exhausted. 3. **Expose advisor usage metadata in the tool result.** - Include model name and token/cost summary if available. - Use the same `callConfig.Cost` calculation path as the outer chat for MVP if advisor reuses the same model. 4. **Record server-side metrics.** - Count advisor invocations, failures, and latency. - Ensure they show up under the built-in tool label `advisor`. 5. **Optional decision gate: separate advisor model.** - If product insists on a stronger/different advisor model, add a follow-up config hook that resolves another existing chat model config through the same `configCache` path. - Keep that out of the first landing PR unless it is required for acceptance. 6. **Optional decision gate: queryable advisor cost.** - If this becomes required, spin a follow-up DB task: - update `coderd/database/queries/*.sql`; - add migration files; - run `make gen`; - update audit mappings if a new auditable type/field is introduced. ### Acceptance criteria - Advisor calls are capped per outer run. - Limit exhaustion is user-visible in the tool result. - Metrics distinguish advisor calls from other built-in tools. - MVP does not require a schema migration unless explicitly approved. ## Phase 5 — Frontend rendering and Storybook coverage ### Goals Make advisor feel intentional in the Agents UI without blocking the backend on fancy streaming UI. ### Files to modify - `site/src/pages/AgentsPage/components/ChatElements/tools/Tool.tsx` - new `site/src/pages/AgentsPage/components/ChatElements/tools/AdvisorTool.tsx` - Storybook story file(s) in the same tools directory ### Delivery strategy 1. **Intermediate milestone during backend bring-up:** rely on the existing generic tool renderer if needed. - This is acceptable only as a short-lived integration checkpoint. 2. **Release milestone:** add a dedicated lightweight `AdvisorTool` renderer. - Reuse existing primitives: - `ToolCollapsible` - `ToolIcon` - `Response` for markdown/prose rendering - `ScrollArea` if the advice can be long - Keep styling light and consistent with the Agents page. - Do not add unnecessary React memoization in `site/src/pages/AgentsPage/`; that area is already React-Compiler aware. 3. **Render the structured result states cleanly.** - `advice` — readable prose/markdown with optional metadata footer. - `limit_reached` — warning-style message. - `error` — error state with visible fallback text. - `running` — existing tool loading state/spinner is enough for MVP. 4. **Add Storybook coverage instead of ad-hoc component tests.** Recommended stories: - successful advice; - running/loading; - limit reached; - error. 5. **Keep the UI contract narrow.** - Prefer one text field like `advice` plus small metadata rather than a deeply nested schema. - That keeps the UI resilient to prompt iteration. ### Acceptance criteria - The advisor tool card renders readable content rather than raw quoted JSON in the final release branch. - Running, limit, and error states are visibly distinct. - Storybook stories and play assertions cover the new states. - Existing tool rendering flows remain unchanged. ## Phase 6 — Automated tests and validation gates ### Backend tests to add 1. **Advisor runtime/tool tests** - question validation; - tool-less nested execution assertion; - success result shaping; - limit-reached result shaping; - error result shaping. 2. **Prompt/gating tests in chatd** - advisor disabled ⇒ no tool, no guidance; - advisor enabled/root chat ⇒ tool + guidance; - child chat ⇒ advisor absent. 3. **Chatloop policy tests** - advisor alone runs; - advisor + action tool mixed batch returns deterministic policy errors; - non-advisor tools still execute normally. 4. **Usage/metrics tests** - per-run cap resets correctly; - builtin tool labeling includes `advisor`; - returned metadata includes model/usage summary when available. ### Frontend tests to add - Storybook `play()` assertions for the advisor renderer states. - Verify expand/collapse behavior and visible fallback text. - Verify the message timeline still renders adjacent tools correctly. ### Recommended command sequence Run these as the implementation matures, not only at the end: 1. Backend-focused gate after phases 1–4: - `make test RUN=TestAdvisor` - `make test RUN=TestChatloopAdvisor` - `make lint` 2. Frontend-focused gate after phase 5: - `pnpm test:storybook src/pages/AgentsPage/components/ChatElements/tools/AdvisorTool.stories.tsx` - `pnpm lint` - `pnpm format` 3. Final repo gate before handoff: - `make pre-commit` - run any additional targeted `make test RUN=...` selections covering touched chatd paths > Use the exact new test names the implementing agents create; the names above are recommended anchors, not existing tests. ## Dogfooding plan ### Principle Dogfood the change as a real agent feature, not just a unit-tested backend. Per the dogfood and `agent-browser` skills, the reviewer should get **watchable repro videos** plus screenshots that make the behavior obvious without reading logs. ### Required setup 1. Start the full dev environment with: - `./scripts/develop.sh` 2. If the frontend renderer changes, also start Storybook from `site/` with: - `pnpm storybook --no-open` 3. Use `agent-browser` directly — **never `npx agent-browser`**. 4. Use named browser sessions and an output folder such as: - `./dogfood-output/advisor/` - with subfolders `screenshots/` and `videos/` ### Evidence protocol For every interactive scenario below: 1. Start video recording **before** the action. 2. Capture step-by-step screenshots at human pace. 3. Capture one annotated screenshot of the final state. 4. Stop the recording. 5. Note the exact pass/fail observation in the QA report. For static UI states (for example Storybook error/limit cards), an annotated screenshot is sufficient; video is optional but still encouraged by this project’s review preference. ### Dogfood scenarios #### Scenario A — Happy path in the real Agents UI **Goal:** prove that a root agent chat can invoke advisor and produce a readable recommendation before taking further action. Steps: 1. Open the Agents page with an advisor-enabled root chat. 2. Start a repro video. 3. Send a prompt that should reasonably trigger strategic planning, such as an architecture or multi-tradeoff question. 4. Capture screenshots of: - the prompt before send; - the running advisor state; - the completed advisor card and the assistant’s follow-up response. 5. Stop recording. Pass criteria: - advisor appears in the timeline; - the rendered result is readable; - the assistant can continue after consuming the advisor output. #### Scenario B — Advisor unavailable path **Goal:** prove the feature is truly gated. Suggested variants (at least one is required, both are better): - feature flag/config off; - child/sub-agent chat. Evidence: - annotated screenshot of the chat/tool state showing advisor is absent; - short video if toggling the gate live is part of the repro. Pass criteria: - no advisor tool is available; - no advisor-specific prompt behavior leaks through. #### Scenario C — UI states in Storybook **Goal:** prove the renderer handles non-happy states cleanly. Required story states: - success/advice; - running; - limit reached; - error. Evidence: - one screenshot per state; - at least one short video showing collapse/expand behavior. Pass criteria: - success renders readable advice; - limit/error have visible fallback text; - the component behaves like the other tool cards. #### Scenario D — Regression sweep of nearby tools **Goal:** ensure advisor does not break the surrounding chat timeline. Check at minimum: - another existing built-in tool still renders correctly near advisor; - sub-agent/tool cards still expand/collapse normally; - no obvious console errors appear in the Agents page during the advisor flow. Evidence: - screenshots of adjacent tool cards; - console/error capture if anything suspicious appears. ### `agent-browser` usage notes for the QA agent - Prefer `agent-browser batch` for 2+ sequential commands when no intermediate parsing is needed. - Use `snapshot -i` to discover interactive refs. - Re-snapshot after navigation or major DOM changes. - Avoid `wait --load networkidle` unless the page is known to go idle; prefer explicit element/text waits or short fixed waits. - Record videos at human pace and include pauses that a reviewer can follow. ## Rollout plan ### Initial rollout - Gate behind a server-side advisor-enabled flag. - Enable only for selected internal/root agent chats first. - Watch metrics for: - invocation count; - failure rate; - latency; - obvious retry loops. ### Expansion conditions Expand beyond the initial rollout only after the following are true: - mixed-batch policy behavior is stable; - cost impact is understood; - frontend UX is readable in production-like dogfood; - no recursion surprises have appeared with sub-agent flows. ### Explicit non-goals for the first release - advisor inside child/sub-agent chats; - provider-agnostic streaming phase UI; - MCP-based external advisor implementation; - mandatory DB-backed advisor cost reporting. ## Final acceptance checklist - [ ] `advisor` is a built-in chatd tool, not an MCP/dynamic-tool substitute. - [ ] The nested advisor call is tool-less and bounded to one in-memory step. - [ ] One eligibility boolean controls both tool registration and prompt guidance injection. - [ ] Root chats can use advisor; child chats cannot in the initial rollout. - [ ] Mixed advisor/action batches produce deterministic policy errors instead of partial execution. - [ ] Per-run usage caps and limit-reached behavior work. - [ ] Advisor usage is visible in metadata/metrics without forcing a DB migration for MVP. - [ ] The Agents UI has a readable advisor card and Storybook coverage. - [ ] Dogfooding produced screenshots and repro videos for the required scenarios. - [ ] Validation commands (`make lint`, targeted `make test`, Storybook tests, `make pre-commit`) passed before handoff. ## Suggested PR split 1. **PR 1 — Backend foundation** - `chatadvisor/` package - `chattool/advisor.go` - `chatloop` exclusive policy - chatd gating/prompt sync - backend tests 2. **PR 2 — Frontend + QA** - advisor renderer - stories/play assertions - dogfood artifacts and QA notes 3. **PR 3 — Optional follow-ups only if demanded by stakeholders** - separate advisor model override - persistent advisor billing/queryability - transient phase-stream UX </details> --- _Generated with [`mux`](https://github.com/coder/mux) • Model: `anthropic:claude-opus-4-7` • Thinking: `max`_ |
||
|
|
06bad73df4 |
feat: add admin-configurable advisor API, SDK, and queries (#24621)
## Summary
Add the **admin-configurable advisor configuration**: database-backed storage, SDK types, and the experimental HTTP handlers that back the admin settings UI (later PRs). Follows the same "site-configs" pattern as Virtual Desktop.
## Motivation
The advisor needs runtime-tunable knobs (enable/disable, per-run cap, max output tokens, reasoning effort, optional model override) without a service restart or redeploy. Using the existing `site_configs` K/V table keeps this pattern consistent with other admin features and avoids a bespoke schema.
## Changes
### Database (`coderd/database/queries/siteconfig.sql`)
- `GetChatAdvisorConfig` returns the stored JSON blob (default `'{}'`) under key `agents_advisor_config`.
- `UpsertChatAdvisorConfig` uses the standard `INSERT ... ON CONFLICT` pattern.
- Regenerated via `make gen` (queries.sql.go + mocks).
### SDK (`codersdk/chats.go`)
- `AdvisorConfig` type with `Enabled`, `MaxUsesPerRun`, `MaxOutputTokens`, `ReasoningEffort` (`""` / `low` / `medium` / `high`), `ModelConfigID uuid.UUID`.
- Client methods: `ChatAdvisorConfig(ctx)` / `UpdateChatAdvisorConfig(ctx, cfg)`.
### API (`coderd/exp_chats.go`)
- `GET /api/experimental/chats/config/advisor`: reads current config; relies on `ActorFromContext` validation.
- `PUT /api/experimental/chats/config/advisor`: requires `policy.ActionUpdate` on `rbac.ResourceDeploymentConfig`.
- Handlers unmarshal `{}` to a typed zero value and re-marshal on upsert for schema stability.
- Tests in `exp_chats_test.go` cover empty defaults, round-trip update, unauthorized update, and invalid body.
## Stack context
This is **PR 3 of 6** in the advisor feature stack. Consumed by:
- PR 4 (`feat/advisor-04-chatd-runtime`), which reads this config on every `runChat`.
- PR 6 (`feat/advisor-06-admin-settings-ui`), which renders the admin form.
## Scope / non-goals
- No `chatd` read path (lands in PR 4).
- No UI (lands in PR 6).
- `agents_advisor_config` remains a single-row JSON blob; we intentionally do not shard per-org/per-template yet.
## Validation
- `make gen`
- `go test ./coderd/database/... -run TestChatAdvisor`
- `go test ./coderd/... -run TestChatAdvisorConfig`
- `make lint`
---
<details>
<summary>📋 Implementation Plan (shared across the advisor stack)</summary>
# Plan: Add a Mux-style advisor tool to coder agents/chatd
## Outcome
Add a first-class `advisor` tool to agent chats in `coderd/x/chatd` that feels native to Coder:
- it is a built-in server-side tool, not an MCP/dynamic-tool workaround;
- it performs a nested **tool-less** model call for strategic advice;
- it is exposed only when eligible, and the prompt mentions it only when it is actually available;
- it is treated as a **planning-only** tool so it does not run alongside action tools in the same batch;
- it tracks usage/cost separately enough for operators to reason about it;
- it has a minimally polished UI in the Agents page;
- and it ships with explicit dogfooding evidence, including screenshots and repro videos.
## Design decisions to lock before coding
1. **Primary architecture:** native built-in tool in `chattool/`, backed by a small `chatadvisor` package.
2. **Nested model execution:** reuse chatd's existing model/provider stack for a one-step, tool-less advisor call rather than inventing a new provider pathway.
3. **Execution policy:** treat `advisor` as an exclusive/planning-only tool; mixed batches must return structured policy errors and force the model to retry cleanly.
4. **Availability:** initial rollout is for root agent chats only; disable for child/sub-agent chats until recursion/cost policy is proven.
5. **Prompt sync:** use one eligibility boolean to drive both tool registration and advisor guidance injection.
6. **Persistence/cost split:** MVP should keep advisor usage visible in result metadata and server metrics; only add DB schema if product/billing explicitly needs queryable advisor-specific cost.
7. **UI scope:** generic tool rendering is an acceptable temporary milestone during backend bring-up, but the release candidate should include a dedicated lightweight advisor renderer.
## Delivery model
The work should be executed as coordinated workstreams with one integration owner and parallel contributors for low-conflict areas. The integration owner should own `coderd/x/chatd/chatd.go` because prompt assembly, tool registration, and model resolution all converge there.
## Detailed workstreams
### Repo evidence used for this plan
<details>
<summary>Mux reference and current chatd seams</summary>
**Mux reference implementation**
- `src/node/services/tools/advisor.ts` — native advisor tool implementation.
- `src/common/constants/advisor.ts` — advisor prompt/constants and truncation policy.
- `src/common/utils/tools/tools.ts` — conditional tool registration.
- `src/node/services/streamContextBuilder.ts` — injects advisor guidance only when the tool is available.
**Current chatd seams**
- `coderd/x/chatd/chatd.go`
- `processChat()` — tool assembly, prompt assembly, and chatloop invocation.
- `resolveChatModel()` — current model/provider/key resolution seam.
- `type Config struct` — server-level chatd configuration surface.
- `coderd/x/chatd/chatloop/chatloop.go`
- `Run()` — main streaming/model loop.
- `executeTools()` — built-in tool execution/batching seam.
- `coderd/x/chatd/chattool/` — built-in tool implementations.
- `site/src/pages/AgentsPage/components/ChatElements/tools/Tool.tsx` — tool renderer dispatch.
- `site/src/pages/AgentsPage/components/ChatConversation/messageParsing.ts` and `ConversationTimeline.tsx` — tool/result merge and rendering flow.
</details>
### Workstream map and ownership
| Workstream | Primary owner | Main files | Can run in parallel? | Done when |
|---|---|---|---|---|
| 0. Integration + gating | Integration lead | `coderd/x/chatd/chatd.go` | No; central merge lane | Tool registration, prompt sync, and model selection are wired together |
| 1. Advisor runtime + tool | Backend agent | new `coderd/x/chatd/chatadvisor/`, new `coderd/x/chatd/chattool/advisor.go` | Yes | Tool can perform a tool-less advisor call in memory and return structured results |
| 2. Planning-only execution policy | Chatloop agent | `coderd/x/chatd/chatloop/chatloop.go`, related tests | Yes | Mixed `advisor` + action-tool batches are rejected cleanly and deterministically |
| 3. Metrics/usage/config | Backend/telemetry agent | `chatd.go`, `chatloop/metrics.go`, optional config plumbing | Partially; coordinate with integration lead | Advisor usage is separately visible in metadata/metrics and limits are enforced |
| 4. Frontend rendering | Frontend agent | `site/.../tools/Tool.tsx`, new `AdvisorTool.tsx`, stories | Yes after result schema stabilizes | Advisor renders as a readable card and story tests pass |
| 5. Dogfood + QA evidence | QA agent | dev server, Storybook, dogfood output | After backend + UI are usable | Repro videos, screenshots, and a concise QA report exist |
### Parallelization rules
- **Do not split `coderd/x/chatd/chatd.go` across multiple execution agents without an integration lead.** That file owns prompt building, tool registration, model resolution, and cost persistence.
- Workstreams 1 and 2 can be developed in parallel and then stacked onto the integration branch.
- Workstream 4 should begin once the backend result schema is agreed on, even if the backend is still behind a feature flag.
- Any agent that needs to re-check Mux behavior should clone `coder/mux` into a temporary directory (for example, `$(mktemp -d)/mux`) and inspect it read-only; do not vendor or copy code from Mux directly.
## Phase 0 — Preflight and guardrails
### Goals
- Align the team on the smallest shippable architecture.
- Prevent scope creep into MCP/dynamic-tool/sub-agent variants.
- Decide upfront what is MVP vs. follow-up.
### Tasks
1. **Confirm the MVP boundary.**
- Ship a built-in advisor tool first.
- Do **not** make MCP, dynamic tools, or sub-agents the primary implementation.
- Do **not** add transient streaming phases in the first backend PR unless they fall out almost for free.
2. **Confirm local workflow hygiene before coding.**
- Ensure the repo is using the project git hooks from `scripts/githooks`.
- Do not bypass hooks with `--no-verify`.
- Use `./scripts/develop.sh` for the full dev server rather than manual build/run commands.
3. **Lock the model-selection policy.**
- **Recommended MVP:** advisor uses the same resolved provider/model/cost config as the current chat, with advisor-specific max-output and usage caps.
- **Follow-up only if required:** add a separate `AdvisorModelConfigID`-style override that resolves through the existing `configCache`/model-config path. Do not invent a new free-form `provider:model` parser if chatd already stores provider/model separately.
4. **Lock the persistence policy.**
- **Recommended MVP:** no DB migration. Persist advisor-visible metadata in the tool result and record separate metrics in memory/Prometheus.
- **Only if product/billing explicitly asks for queryable advisor cost:** add a later DB migration or usage table, following the normal `queries/*.sql` + `make gen` workflow.
5. **Create an execution ADR note in the work item or tracking doc.**
- Capture: built-in tool, tool-less nested call, root-chat-only rollout, exclusive execution policy, MVP no-DB-migration default.
### Quality gate
- Everyone on the team can state the same answers to these questions:
- Is advisor a built-in tool? **Yes.**
- Can advisor run with action tools in the same batch? **No.**
- Does advisor get tools of its own? **No.**
- Is a DB migration required for MVP? **No, unless billing insists.**
## Phase 1 — Build the advisor runtime and tool wrapper
### Goals
Create the core advisor implementation in a way that is easy to test and keeps `chattool/` thin.
### Files to add
- `coderd/x/chatd/chatadvisor/types.go`
- `coderd/x/chatd/chatadvisor/guidance.go`
- `coderd/x/chatd/chatadvisor/handoff.go`
- `coderd/x/chatd/chatadvisor/runtime.go`
- `coderd/x/chatd/chatadvisor/runner.go`
- `coderd/x/chatd/chattool/advisor.go`
### Responsibilities by file
1. **`types.go`**
- Define the input/result schema used by the tool and UI.
- Keep the result shape close to Mux so the UI and model both have predictable cases.
- Recommended result variants:
- `advice`
- `limit_reached`
- `error`
Recommended shape:
```go
type AdvisorArgs struct {
Question string `json:"question"`
}
type AdvisorResult struct {
Type string `json:"type"`
Advice string `json:"advice,omitempty"`
Error string `json:"error,omitempty"`
AdvisorModel string `json:"advisor_model,omitempty"`
RemainingUses int `json:"remaining_uses,omitempty"`
Usage *AdvisorUsageResult `json:"usage,omitempty"`
}
```
2. **`guidance.go`**
- Hold two strings:
- the nested advisor system prompt;
- the parent-agent guidance block to inject into the outer system prompt.
- The nested advisor prompt must say, in plain language:
- you are advising the parent agent;
- you do not address the end user directly;
- you do not claim actions happened;
- you return concise strategic guidance and tradeoffs.
3. **`runtime.go`**
- Define the per-run runtime state.
- Recommended fields:
- resolved model + model config;
- provider keys/options reused from the outer chat;
- `MaxUsesPerRun`;
- `MaxOutputTokens`;
- atomic/current call counter;
- callback(s) to obtain the current prompt snapshot and current-step snapshot;
- optional metrics/usage hook.
- Add fail-fast validation for impossible config: nil model, non-positive limits, empty prompt builders, etc.
4. **`handoff.go`**
- Build the advisor handoff message from:
- the explicit question;
- the exact prompt/messages the parent model just used;
- the current step's text/reasoning snapshot, if available;
- the most recent relevant tool outputs, if they are already in the prompt snapshot.
- **Important:** use the already-prepared outer prompt tail, not a fresh DB reload. That keeps the advisor aligned with compaction and the exact context the outer model saw.
- Apply hard truncation budgets with recent-context bias.
5. **`runner.go`**
- Execute the nested advisor call.
- **Recommended implementation:** call `chatloop.Run()` in an in-memory, one-step mode:
- `Tools: nil`
- `ProviderTools: nil`
- `MaxSteps: 1`
- `PersistStep`: capture the assistant output in memory instead of writing DB rows
- Reuse the existing provider/model/cost path instead of building a second provider runner.
- Assert that no tool definitions are passed to the nested call.
6. **`chattool/advisor.go`**
- Keep this file thin and consistent with other built-ins.
- Responsibilities:
- decode `AdvisorArgs`;
- validate `Question` is non-empty and bounded;
- call the `chatadvisor` runner;
- return a structured tool response.
### Defensive programming requirements
- Assert `Question` is non-empty after trimming.
- Assert runtime limits are positive.
- Assert the nested advisor call runs with zero tools/provider tools.
- Assert `AdvisorResult.Type` is one of the known variants before returning.
- Assert remaining uses never goes negative.
### Acceptance criteria
- A unit test can call the advisor tool with a fake model and receive a stable `advice` result.
- The nested advisor call is impossible to run with tools accidentally attached.
- The core logic lives in `chatadvisor/`, not embedded inside `chatd.go`.
## Phase 2 — Wire advisor into chatd and keep prompt/tool availability in sync
### Goals
Register the tool in the right place, expose it only when eligible, and inject system guidance only when the tool is present.
### Files to modify
- `coderd/x/chatd/chatd.go`
- optionally a small helper file if `chatd.go` becomes too crowded
### Tasks
1. **Compute one eligibility boolean in `processChat()`.**
Recommended inputs:
- server-level advisor enabled flag;
- root chat only (`chat.ParentChatID == uuid.Nil` or equivalent existing root/child check);
- a usable resolved model/provider exists;
- optional experiment/workspace/org gate if product wants staged rollout.
2. **Create the runtime once per outer chat run.**
- Use the model/config/keys resolved by `resolveChatModel()`.
- Reuse provider options from the current chat's `ChatModelCallConfig`.
- Set `MaxUsesPerRun` and `MaxOutputTokens` from advisor config defaults.
3. **Register the tool in the built-in tool block.**
- Insert after the skill tools and before MCP tools in `processChat()`.
- Record `builtinToolNames["advisor"] = true` so metrics stay bounded.
4. **Inject advisor guidance into the outer system prompt using the same boolean.**
- Use `chatprompt.InsertSystem()` in the same prompt assembly path that already injects user/system instructions.
- Place the block near the existing instruction insertion, before plan-path/skill context blocks.
- Wrap the guidance in an explicit tag like `<advisor-guidance>` so it is easy to spot in tests and future refactors.
5. **Keep advisor out of child chats for the first release.**
- That avoids recursion/cost blowups with `spawn_agent` / `wait_agent` flows.
- Document this explicitly in the rollout notes and tests.
### Acceptance criteria
- If advisor is disabled, neither the tool nor the prompt guidance appears.
- If advisor is enabled, both the tool and the prompt guidance appear.
- Root chats can use advisor; child chats cannot.
- Built-in tool names include `advisor` so metrics do not collapse it into the generic `mcp` label.
## Phase 3 — Enforce planning-only execution policy in `chatloop`
### Goals
Prevent the model from calling `advisor` and action tools in the same execution batch.
### Files to modify
- `coderd/x/chatd/chatloop/chatloop.go`
- related chatloop tests
### Recommended implementation
Keep the MVP small; do **not** build a general policy engine yet.
1. Add a minimal field to `chatloop.RunOptions`, for example:
```go
ExclusiveToolName *string
```
2. In `Run()` / `executeTools()`, detect the case where the exclusive tool appears in the same local-tool batch as any other locally executed tool.
3. When that happens, synthesize structured tool-result errors for the affected calls instead of executing anything in the batch.
- `advisor` should receive a clear error like: _advisor must be called by itself before action tools_.
- The sibling action tools should receive a paired policy error like: _this tool was skipped because advisor must run alone_.
4. Let the outer model see those tool errors and retry cleanly.
- This is simpler and safer than partial execution or hidden deferral.
- It preserves deterministic transcript history for debugging.
5. Pass the just-finished step snapshot into the tool execution context.
- The advisor runtime should be able to see the current step's text/reasoning content, because that is often the best hint about what the outer model is trying to decide.
### Why this is the right fit
- It matches the intended semantics: advisor is consulted **before** taking action.
- It avoids subtle race conditions caused by concurrent built-in tool execution.
- It keeps the behavior easy to test with fake models.
### Acceptance criteria
- A model-emitted batch containing only `advisor` succeeds.
- A model-emitted batch containing `advisor` plus any other locally executed tool returns deterministic policy errors and executes nothing.
- Non-advisor tool execution stays unchanged for normal chats.
## Phase 4 — Usage limits, metrics, and configuration
### Goals
Make advisor safe to operate without over-designing billing/storage in the first release.
### Files to modify
- `coderd/x/chatd/chatd.go`
- `coderd/x/chatd/chatloop/metrics.go` as needed
- `coderd/x/chatd/chatd.go` `Config` struct and constructor path
- optional follow-up config/db files only if a separate advisor model or persistent billing is required
### Tasks
1. **Add explicit server config knobs for MVP.**
Recommended fields on `chatd.Config` or a nested advisor config struct:
- `AdvisorEnabled bool`
- `AdvisorMaxUsesPerRun int`
- `AdvisorMaxOutputTokens int64`
2. **Track usage per outer run.**
- Reset the counter for each `processChat()` invocation.
- Return `remaining_uses` in the tool result.
- Return `limit_reached` when the cap is exhausted.
3. **Expose advisor usage metadata in the tool result.**
- Include model name and token/cost summary if available.
- Use the same `callConfig.Cost` calculation path as the outer chat for MVP if advisor reuses the same model.
4. **Record server-side metrics.**
- Count advisor invocations, failures, and latency.
- Ensure they show up under the built-in tool label `advisor`.
5. **Optional decision gate: separate advisor model.**
- If product insists on a stronger/different advisor model, add a follow-up config hook that resolves another existing chat model config through the same `configCache` path.
- Keep that out of the first landing PR unless it is required for acceptance.
6. **Optional decision gate: queryable advisor cost.**
- If this becomes required, spin a follow-up DB task:
- update `coderd/database/queries/*.sql`;
- add migration files;
- run `make gen`;
- update audit mappings if a new auditable type/field is introduced.
### Acceptance criteria
- Advisor calls are capped per outer run.
- Limit exhaustion is user-visible in the tool result.
- Metrics distinguish advisor calls from other built-in tools.
- MVP does not require a schema migration unless explicitly approved.
## Phase 5 — Frontend rendering and Storybook coverage
### Goals
Make advisor feel intentional in the Agents UI without blocking the backend on fancy streaming UI.
### Files to modify
- `site/src/pages/AgentsPage/components/ChatElements/tools/Tool.tsx`
- new `site/src/pages/AgentsPage/components/ChatElements/tools/AdvisorTool.tsx`
- Storybook story file(s) in the same tools directory
### Delivery strategy
1. **Intermediate milestone during backend bring-up:** rely on the existing generic tool renderer if needed.
- This is acceptable only as a short-lived integration checkpoint.
2. **Release milestone:** add a dedicated lightweight `AdvisorTool` renderer.
- Reuse existing primitives:
- `ToolCollapsible`
- `ToolIcon`
- `Response` for markdown/prose rendering
- `ScrollArea` if the advice can be long
- Keep styling light and consistent with the Agents page.
- Do not add unnecessary React memoization in `site/src/pages/AgentsPage/`; that area is already React-Compiler aware.
3. **Render the structured result states cleanly.**
- `advice` — readable prose/markdown with optional metadata footer.
- `limit_reached` — warning-style message.
- `error` — error state with visible fallback text.
- `running` — existing tool loading state/spinner is enough for MVP.
4. **Add Storybook coverage instead of ad-hoc component tests.**
Recommended stories:
- successful advice;
- running/loading;
- limit reached;
- error.
5. **Keep the UI contract narrow.**
- Prefer one text field like `advice` plus small metadata rather than a deeply nested schema.
- That keeps the UI resilient to prompt iteration.
### Acceptance criteria
- The advisor tool card renders readable content rather than raw quoted JSON in the final release branch.
- Running, limit, and error states are visibly distinct.
- Storybook stories and play assertions cover the new states.
- Existing tool rendering flows remain unchanged.
## Phase 6 — Automated tests and validation gates
### Backend tests to add
1. **Advisor runtime/tool tests**
- question validation;
- tool-less nested execution assertion;
- success result shaping;
- limit-reached result shaping;
- error result shaping.
2. **Prompt/gating tests in chatd**
- advisor disabled ⇒ no tool, no guidance;
- advisor enabled/root chat ⇒ tool + guidance;
- child chat ⇒ advisor absent.
3. **Chatloop policy tests**
- advisor alone runs;
- advisor + action tool mixed batch returns deterministic policy errors;
- non-advisor tools still execute normally.
4. **Usage/metrics tests**
- per-run cap resets correctly;
- builtin tool labeling includes `advisor`;
- returned metadata includes model/usage summary when available.
### Frontend tests to add
- Storybook `play()` assertions for the advisor renderer states.
- Verify expand/collapse behavior and visible fallback text.
- Verify the message timeline still renders adjacent tools correctly.
### Recommended command sequence
Run these as the implementation matures, not only at the end:
1. Backend-focused gate after phases 1–4:
- `make test RUN=TestAdvisor`
- `make test RUN=TestChatloopAdvisor`
- `make lint`
2. Frontend-focused gate after phase 5:
- `pnpm test:storybook src/pages/AgentsPage/components/ChatElements/tools/AdvisorTool.stories.tsx`
- `pnpm lint`
- `pnpm format`
3. Final repo gate before handoff:
- `make pre-commit`
- run any additional targeted `make test RUN=...` selections covering touched chatd paths
> Use the exact new test names the implementing agents create; the names above are recommended anchors, not existing tests.
## Dogfooding plan
### Principle
Dogfood the change as a real agent feature, not just a unit-tested backend. Per the dogfood and `agent-browser` skills, the reviewer should get **watchable repro videos** plus screenshots that make the behavior obvious without reading logs.
### Required setup
1. Start the full dev environment with:
- `./scripts/develop.sh`
2. If the frontend renderer changes, also start Storybook from `site/` with:
- `pnpm storybook --no-open`
3. Use `agent-browser` directly — **never `npx agent-browser`**.
4. Use named browser sessions and an output folder such as:
- `./dogfood-output/advisor/`
- with subfolders `screenshots/` and `videos/`
### Evidence protocol
For every interactive scenario below:
1. Start video recording **before** the action.
2. Capture step-by-step screenshots at human pace.
3. Capture one annotated screenshot of the final state.
4. Stop the recording.
5. Note the exact pass/fail observation in the QA report.
For static UI states (for example Storybook error/limit cards), an annotated screenshot is sufficient; video is optional but still encouraged by this project’s review preference.
### Dogfood scenarios
#### Scenario A — Happy path in the real Agents UI
**Goal:** prove that a root agent chat can invoke advisor and produce a readable recommendation before taking further action.
Steps:
1. Open the Agents page with an advisor-enabled root chat.
2. Start a repro video.
3. Send a prompt that should reasonably trigger strategic planning, such as an architecture or multi-tradeoff question.
4. Capture screenshots of:
- the prompt before send;
- the running advisor state;
- the completed advisor card and the assistant’s follow-up response.
5. Stop recording.
Pass criteria:
- advisor appears in the timeline;
- the rendered result is readable;
- the assistant can continue after consuming the advisor output.
#### Scenario B — Advisor unavailable path
**Goal:** prove the feature is truly gated.
Suggested variants (at least one is required, both are better):
- feature flag/config off;
- child/sub-agent chat.
Evidence:
- annotated screenshot of the chat/tool state showing advisor is absent;
- short video if toggling the gate live is part of the repro.
Pass criteria:
- no advisor tool is available;
- no advisor-specific prompt behavior leaks through.
#### Scenario C — UI states in Storybook
**Goal:** prove the renderer handles non-happy states cleanly.
Required story states:
- success/advice;
- running;
- limit reached;
- error.
Evidence:
- one screenshot per state;
- at least one short video showing collapse/expand behavior.
Pass criteria:
- success renders readable advice;
- limit/error have visible fallback text;
- the component behaves like the other tool cards.
#### Scenario D — Regression sweep of nearby tools
**Goal:** ensure advisor does not break the surrounding chat timeline.
Check at minimum:
- another existing built-in tool still renders correctly near advisor;
- sub-agent/tool cards still expand/collapse normally;
- no obvious console errors appear in the Agents page during the advisor flow.
Evidence:
- screenshots of adjacent tool cards;
- console/error capture if anything suspicious appears.
### `agent-browser` usage notes for the QA agent
- Prefer `agent-browser batch` for 2+ sequential commands when no intermediate parsing is needed.
- Use `snapshot -i` to discover interactive refs.
- Re-snapshot after navigation or major DOM changes.
- Avoid `wait --load networkidle` unless the page is known to go idle; prefer explicit element/text waits or short fixed waits.
- Record videos at human pace and include pauses that a reviewer can follow.
## Rollout plan
### Initial rollout
- Gate behind a server-side advisor-enabled flag.
- Enable only for selected internal/root agent chats first.
- Watch metrics for:
- invocation count;
- failure rate;
- latency;
- obvious retry loops.
### Expansion conditions
Expand beyond the initial rollout only after the following are true:
- mixed-batch policy behavior is stable;
- cost impact is understood;
- frontend UX is readable in production-like dogfood;
- no recursion surprises have appeared with sub-agent flows.
### Explicit non-goals for the first release
- advisor inside child/sub-agent chats;
- provider-agnostic streaming phase UI;
- MCP-based external advisor implementation;
- mandatory DB-backed advisor cost reporting.
## Final acceptance checklist
- [ ] `advisor` is a built-in chatd tool, not an MCP/dynamic-tool substitute.
- [ ] The nested advisor call is tool-less and bounded to one in-memory step.
- [ ] One eligibility boolean controls both tool registration and prompt guidance injection.
- [ ] Root chats can use advisor; child chats cannot in the initial rollout.
- [ ] Mixed advisor/action batches produce deterministic policy errors instead of partial execution.
- [ ] Per-run usage caps and limit-reached behavior work.
- [ ] Advisor usage is visible in metadata/metrics without forcing a DB migration for MVP.
- [ ] The Agents UI has a readable advisor card and Storybook coverage.
- [ ] Dogfooding produced screenshots and repro videos for the required scenarios.
- [ ] Validation commands (`make lint`, targeted `make test`, Storybook tests, `make pre-commit`) passed before handoff.
## Suggested PR split
1. **PR 1 — Backend foundation**
- `chatadvisor/` package
- `chattool/advisor.go`
- `chatloop` exclusive policy
- chatd gating/prompt sync
- backend tests
2. **PR 2 — Frontend + QA**
- advisor renderer
- stories/play assertions
- dogfood artifacts and QA notes
3. **PR 3 — Optional follow-ups only if demanded by stakeholders**
- separate advisor model override
- persistent advisor billing/queryability
- transient phase-stream UX
</details>
---
_Generated with [`mux`](https://github.com/coder/mux) • Model: `anthropic:claude-opus-4-7` • Thinking: `max`_
|
||
|
|
eaf2609bb8 |
feat(coderd/x/chatd/chatadvisor): add advisor runtime and tool wrapper (#24620)
## Summary Introduce the `coderd/x/chatd/chatadvisor` package: the self-contained runtime that performs a nested, **tool-less, single-step** model call to return strategic guidance to the parent agent. Also ships the thin `AgentTool` wrapper that the agent exposes as the `advisor` built-in. ## Motivation The advisor is a "consult before you act" planning step. Keeping its runtime in its own package (rather than inside `chatd.go` or `chattool/`) makes the behavior easy to unit-test, keeps `chattool/` thin, and avoids import cycles with `chatprompt`. ## Changes - `chatadvisor/types.go`: `AdvisorArgs` and `AdvisorResult` (`advice` / `limit_reached` / `error` variants) plus usage metadata. Stable JSON shape for the UI. - `chatadvisor/guidance.go`: the nested advisor system prompt and `ParentGuidanceBlock` injected into the outer agent (tagged `<advisor-guidance>`). The nested prompt explicitly tells the advisor it is advising the parent agent, must not address the user, and must not claim actions happened. - `chatadvisor/handoff.go`: builds the advisor input from the already-prepared outer prompt tail (not a fresh DB reload) with hard truncation budgets and a recent-context bias so the advisor sees the exact context the outer model saw. - `chatadvisor/runner.go`: wraps `chatloop.Run()` in strict one-step mode (`Tools: nil`, `ProviderTools: nil`, `MaxSteps: 1`) so the nested call is structurally incapable of calling tools. - `chatadvisor/tool.go`: the `AgentTool` wrapper. Validates the question (non-empty, bounded to 2000 runes), invokes the runtime, and maps the result to a JSON tool response. - Defensive assertions throughout (question non-empty after trim, positive limits, zero tools on nested call, known result variants, non-negative remaining uses). - Unit tests cover question validation, tool-less execution, and each result variant. ## Stack context This is **PR 2 of 6** in the advisor feature stack. It depends on PR 1's `ExclusiveToolNames` hook; no consumer of this package exists yet (that lands in PR 4). ## Scope / non-goals - No wiring into `chatd`. - No HTTP/API surface. - No UI. - No separate provider; the runtime reuses the outer chat's resolved model/provider/keys. ## Validation - `go test ./coderd/x/chatd/chatadvisor/...` - `make lint` --- <details> <summary>📋 Implementation Plan (shared across the advisor stack)</summary> # Plan: Add a Mux-style advisor tool to coder agents/chatd ## Outcome Add a first-class `advisor` tool to agent chats in `coderd/x/chatd` that feels native to Coder: - it is a built-in server-side tool, not an MCP/dynamic-tool workaround; - it performs a nested **tool-less** model call for strategic advice; - it is exposed only when eligible, and the prompt mentions it only when it is actually available; - it is treated as a **planning-only** tool so it does not run alongside action tools in the same batch; - it tracks usage/cost separately enough for operators to reason about it; - it has a minimally polished UI in the Agents page; - and it ships with explicit dogfooding evidence, including screenshots and repro videos. ## Design decisions to lock before coding 1. **Primary architecture:** native built-in tool in `chattool/`, backed by a small `chatadvisor` package. 2. **Nested model execution:** reuse chatd's existing model/provider stack for a one-step, tool-less advisor call rather than inventing a new provider pathway. 3. **Execution policy:** treat `advisor` as an exclusive/planning-only tool; mixed batches must return structured policy errors and force the model to retry cleanly. 4. **Availability:** initial rollout is for root agent chats only; disable for child/sub-agent chats until recursion/cost policy is proven. 5. **Prompt sync:** use one eligibility boolean to drive both tool registration and advisor guidance injection. 6. **Persistence/cost split:** MVP should keep advisor usage visible in result metadata and server metrics; only add DB schema if product/billing explicitly needs queryable advisor-specific cost. 7. **UI scope:** generic tool rendering is an acceptable temporary milestone during backend bring-up, but the release candidate should include a dedicated lightweight advisor renderer. ## Delivery model The work should be executed as coordinated workstreams with one integration owner and parallel contributors for low-conflict areas. The integration owner should own `coderd/x/chatd/chatd.go` because prompt assembly, tool registration, and model resolution all converge there. ## Detailed workstreams ### Repo evidence used for this plan <details> <summary>Mux reference and current chatd seams</summary> **Mux reference implementation** - `src/node/services/tools/advisor.ts` — native advisor tool implementation. - `src/common/constants/advisor.ts` — advisor prompt/constants and truncation policy. - `src/common/utils/tools/tools.ts` — conditional tool registration. - `src/node/services/streamContextBuilder.ts` — injects advisor guidance only when the tool is available. **Current chatd seams** - `coderd/x/chatd/chatd.go` - `processChat()` — tool assembly, prompt assembly, and chatloop invocation. - `resolveChatModel()` — current model/provider/key resolution seam. - `type Config struct` — server-level chatd configuration surface. - `coderd/x/chatd/chatloop/chatloop.go` - `Run()` — main streaming/model loop. - `executeTools()` — built-in tool execution/batching seam. - `coderd/x/chatd/chattool/` — built-in tool implementations. - `site/src/pages/AgentsPage/components/ChatElements/tools/Tool.tsx` — tool renderer dispatch. - `site/src/pages/AgentsPage/components/ChatConversation/messageParsing.ts` and `ConversationTimeline.tsx` — tool/result merge and rendering flow. </details> ### Workstream map and ownership | Workstream | Primary owner | Main files | Can run in parallel? | Done when | |---|---|---|---|---| | 0. Integration + gating | Integration lead | `coderd/x/chatd/chatd.go` | No; central merge lane | Tool registration, prompt sync, and model selection are wired together | | 1. Advisor runtime + tool | Backend agent | new `coderd/x/chatd/chatadvisor/`, new `coderd/x/chatd/chattool/advisor.go` | Yes | Tool can perform a tool-less advisor call in memory and return structured results | | 2. Planning-only execution policy | Chatloop agent | `coderd/x/chatd/chatloop/chatloop.go`, related tests | Yes | Mixed `advisor` + action-tool batches are rejected cleanly and deterministically | | 3. Metrics/usage/config | Backend/telemetry agent | `chatd.go`, `chatloop/metrics.go`, optional config plumbing | Partially; coordinate with integration lead | Advisor usage is separately visible in metadata/metrics and limits are enforced | | 4. Frontend rendering | Frontend agent | `site/.../tools/Tool.tsx`, new `AdvisorTool.tsx`, stories | Yes after result schema stabilizes | Advisor renders as a readable card and story tests pass | | 5. Dogfood + QA evidence | QA agent | dev server, Storybook, dogfood output | After backend + UI are usable | Repro videos, screenshots, and a concise QA report exist | ### Parallelization rules - **Do not split `coderd/x/chatd/chatd.go` across multiple execution agents without an integration lead.** That file owns prompt building, tool registration, model resolution, and cost persistence. - Workstreams 1 and 2 can be developed in parallel and then stacked onto the integration branch. - Workstream 4 should begin once the backend result schema is agreed on, even if the backend is still behind a feature flag. - Any agent that needs to re-check Mux behavior should clone `coder/mux` into a temporary directory (for example, `$(mktemp -d)/mux`) and inspect it read-only; do not vendor or copy code from Mux directly. ## Phase 0 — Preflight and guardrails ### Goals - Align the team on the smallest shippable architecture. - Prevent scope creep into MCP/dynamic-tool/sub-agent variants. - Decide upfront what is MVP vs. follow-up. ### Tasks 1. **Confirm the MVP boundary.** - Ship a built-in advisor tool first. - Do **not** make MCP, dynamic tools, or sub-agents the primary implementation. - Do **not** add transient streaming phases in the first backend PR unless they fall out almost for free. 2. **Confirm local workflow hygiene before coding.** - Ensure the repo is using the project git hooks from `scripts/githooks`. - Do not bypass hooks with `--no-verify`. - Use `./scripts/develop.sh` for the full dev server rather than manual build/run commands. 3. **Lock the model-selection policy.** - **Recommended MVP:** advisor uses the same resolved provider/model/cost config as the current chat, with advisor-specific max-output and usage caps. - **Follow-up only if required:** add a separate `AdvisorModelConfigID`-style override that resolves through the existing `configCache`/model-config path. Do not invent a new free-form `provider:model` parser if chatd already stores provider/model separately. 4. **Lock the persistence policy.** - **Recommended MVP:** no DB migration. Persist advisor-visible metadata in the tool result and record separate metrics in memory/Prometheus. - **Only if product/billing explicitly asks for queryable advisor cost:** add a later DB migration or usage table, following the normal `queries/*.sql` + `make gen` workflow. 5. **Create an execution ADR note in the work item or tracking doc.** - Capture: built-in tool, tool-less nested call, root-chat-only rollout, exclusive execution policy, MVP no-DB-migration default. ### Quality gate - Everyone on the team can state the same answers to these questions: - Is advisor a built-in tool? **Yes.** - Can advisor run with action tools in the same batch? **No.** - Does advisor get tools of its own? **No.** - Is a DB migration required for MVP? **No, unless billing insists.** ## Phase 1 — Build the advisor runtime and tool wrapper ### Goals Create the core advisor implementation in a way that is easy to test and keeps `chattool/` thin. ### Files to add - `coderd/x/chatd/chatadvisor/types.go` - `coderd/x/chatd/chatadvisor/guidance.go` - `coderd/x/chatd/chatadvisor/handoff.go` - `coderd/x/chatd/chatadvisor/runtime.go` - `coderd/x/chatd/chatadvisor/runner.go` - `coderd/x/chatd/chattool/advisor.go` ### Responsibilities by file 1. **`types.go`** - Define the input/result schema used by the tool and UI. - Keep the result shape close to Mux so the UI and model both have predictable cases. - Recommended result variants: - `advice` - `limit_reached` - `error` Recommended shape: ```go type AdvisorArgs struct { Question string `json:"question"` } type AdvisorResult struct { Type string `json:"type"` Advice string `json:"advice,omitempty"` Error string `json:"error,omitempty"` AdvisorModel string `json:"advisor_model,omitempty"` RemainingUses int `json:"remaining_uses,omitempty"` Usage *AdvisorUsageResult `json:"usage,omitempty"` } ``` 2. **`guidance.go`** - Hold two strings: - the nested advisor system prompt; - the parent-agent guidance block to inject into the outer system prompt. - The nested advisor prompt must say, in plain language: - you are advising the parent agent; - you do not address the end user directly; - you do not claim actions happened; - you return concise strategic guidance and tradeoffs. 3. **`runtime.go`** - Define the per-run runtime state. - Recommended fields: - resolved model + model config; - provider keys/options reused from the outer chat; - `MaxUsesPerRun`; - `MaxOutputTokens`; - atomic/current call counter; - callback(s) to obtain the current prompt snapshot and current-step snapshot; - optional metrics/usage hook. - Add fail-fast validation for impossible config: nil model, non-positive limits, empty prompt builders, etc. 4. **`handoff.go`** - Build the advisor handoff message from: - the explicit question; - the exact prompt/messages the parent model just used; - the current step's text/reasoning snapshot, if available; - the most recent relevant tool outputs, if they are already in the prompt snapshot. - **Important:** use the already-prepared outer prompt tail, not a fresh DB reload. That keeps the advisor aligned with compaction and the exact context the outer model saw. - Apply hard truncation budgets with recent-context bias. 5. **`runner.go`** - Execute the nested advisor call. - **Recommended implementation:** call `chatloop.Run()` in an in-memory, one-step mode: - `Tools: nil` - `ProviderTools: nil` - `MaxSteps: 1` - `PersistStep`: capture the assistant output in memory instead of writing DB rows - Reuse the existing provider/model/cost path instead of building a second provider runner. - Assert that no tool definitions are passed to the nested call. 6. **`chattool/advisor.go`** - Keep this file thin and consistent with other built-ins. - Responsibilities: - decode `AdvisorArgs`; - validate `Question` is non-empty and bounded; - call the `chatadvisor` runner; - return a structured tool response. ### Defensive programming requirements - Assert `Question` is non-empty after trimming. - Assert runtime limits are positive. - Assert the nested advisor call runs with zero tools/provider tools. - Assert `AdvisorResult.Type` is one of the known variants before returning. - Assert remaining uses never goes negative. ### Acceptance criteria - A unit test can call the advisor tool with a fake model and receive a stable `advice` result. - The nested advisor call is impossible to run with tools accidentally attached. - The core logic lives in `chatadvisor/`, not embedded inside `chatd.go`. ## Phase 2 — Wire advisor into chatd and keep prompt/tool availability in sync ### Goals Register the tool in the right place, expose it only when eligible, and inject system guidance only when the tool is present. ### Files to modify - `coderd/x/chatd/chatd.go` - optionally a small helper file if `chatd.go` becomes too crowded ### Tasks 1. **Compute one eligibility boolean in `processChat()`.** Recommended inputs: - server-level advisor enabled flag; - root chat only (`chat.ParentChatID == uuid.Nil` or equivalent existing root/child check); - a usable resolved model/provider exists; - optional experiment/workspace/org gate if product wants staged rollout. 2. **Create the runtime once per outer chat run.** - Use the model/config/keys resolved by `resolveChatModel()`. - Reuse provider options from the current chat's `ChatModelCallConfig`. - Set `MaxUsesPerRun` and `MaxOutputTokens` from advisor config defaults. 3. **Register the tool in the built-in tool block.** - Insert after the skill tools and before MCP tools in `processChat()`. - Record `builtinToolNames["advisor"] = true` so metrics stay bounded. 4. **Inject advisor guidance into the outer system prompt using the same boolean.** - Use `chatprompt.InsertSystem()` in the same prompt assembly path that already injects user/system instructions. - Place the block near the existing instruction insertion, before plan-path/skill context blocks. - Wrap the guidance in an explicit tag like `<advisor-guidance>` so it is easy to spot in tests and future refactors. 5. **Keep advisor out of child chats for the first release.** - That avoids recursion/cost blowups with `spawn_agent` / `wait_agent` flows. - Document this explicitly in the rollout notes and tests. ### Acceptance criteria - If advisor is disabled, neither the tool nor the prompt guidance appears. - If advisor is enabled, both the tool and the prompt guidance appear. - Root chats can use advisor; child chats cannot. - Built-in tool names include `advisor` so metrics do not collapse it into the generic `mcp` label. ## Phase 3 — Enforce planning-only execution policy in `chatloop` ### Goals Prevent the model from calling `advisor` and action tools in the same execution batch. ### Files to modify - `coderd/x/chatd/chatloop/chatloop.go` - related chatloop tests ### Recommended implementation Keep the MVP small; do **not** build a general policy engine yet. 1. Add a minimal field to `chatloop.RunOptions`, for example: ```go ExclusiveToolName *string ``` 2. In `Run()` / `executeTools()`, detect the case where the exclusive tool appears in the same local-tool batch as any other locally executed tool. 3. When that happens, synthesize structured tool-result errors for the affected calls instead of executing anything in the batch. - `advisor` should receive a clear error like: _advisor must be called by itself before action tools_. - The sibling action tools should receive a paired policy error like: _this tool was skipped because advisor must run alone_. 4. Let the outer model see those tool errors and retry cleanly. - This is simpler and safer than partial execution or hidden deferral. - It preserves deterministic transcript history for debugging. 5. Pass the just-finished step snapshot into the tool execution context. - The advisor runtime should be able to see the current step's text/reasoning content, because that is often the best hint about what the outer model is trying to decide. ### Why this is the right fit - It matches the intended semantics: advisor is consulted **before** taking action. - It avoids subtle race conditions caused by concurrent built-in tool execution. - It keeps the behavior easy to test with fake models. ### Acceptance criteria - A model-emitted batch containing only `advisor` succeeds. - A model-emitted batch containing `advisor` plus any other locally executed tool returns deterministic policy errors and executes nothing. - Non-advisor tool execution stays unchanged for normal chats. ## Phase 4 — Usage limits, metrics, and configuration ### Goals Make advisor safe to operate without over-designing billing/storage in the first release. ### Files to modify - `coderd/x/chatd/chatd.go` - `coderd/x/chatd/chatloop/metrics.go` as needed - `coderd/x/chatd/chatd.go` `Config` struct and constructor path - optional follow-up config/db files only if a separate advisor model or persistent billing is required ### Tasks 1. **Add explicit server config knobs for MVP.** Recommended fields on `chatd.Config` or a nested advisor config struct: - `AdvisorEnabled bool` - `AdvisorMaxUsesPerRun int` - `AdvisorMaxOutputTokens int64` 2. **Track usage per outer run.** - Reset the counter for each `processChat()` invocation. - Return `remaining_uses` in the tool result. - Return `limit_reached` when the cap is exhausted. 3. **Expose advisor usage metadata in the tool result.** - Include model name and token/cost summary if available. - Use the same `callConfig.Cost` calculation path as the outer chat for MVP if advisor reuses the same model. 4. **Record server-side metrics.** - Count advisor invocations, failures, and latency. - Ensure they show up under the built-in tool label `advisor`. 5. **Optional decision gate: separate advisor model.** - If product insists on a stronger/different advisor model, add a follow-up config hook that resolves another existing chat model config through the same `configCache` path. - Keep that out of the first landing PR unless it is required for acceptance. 6. **Optional decision gate: queryable advisor cost.** - If this becomes required, spin a follow-up DB task: - update `coderd/database/queries/*.sql`; - add migration files; - run `make gen`; - update audit mappings if a new auditable type/field is introduced. ### Acceptance criteria - Advisor calls are capped per outer run. - Limit exhaustion is user-visible in the tool result. - Metrics distinguish advisor calls from other built-in tools. - MVP does not require a schema migration unless explicitly approved. ## Phase 5 — Frontend rendering and Storybook coverage ### Goals Make advisor feel intentional in the Agents UI without blocking the backend on fancy streaming UI. ### Files to modify - `site/src/pages/AgentsPage/components/ChatElements/tools/Tool.tsx` - new `site/src/pages/AgentsPage/components/ChatElements/tools/AdvisorTool.tsx` - Storybook story file(s) in the same tools directory ### Delivery strategy 1. **Intermediate milestone during backend bring-up:** rely on the existing generic tool renderer if needed. - This is acceptable only as a short-lived integration checkpoint. 2. **Release milestone:** add a dedicated lightweight `AdvisorTool` renderer. - Reuse existing primitives: - `ToolCollapsible` - `ToolIcon` - `Response` for markdown/prose rendering - `ScrollArea` if the advice can be long - Keep styling light and consistent with the Agents page. - Do not add unnecessary React memoization in `site/src/pages/AgentsPage/`; that area is already React-Compiler aware. 3. **Render the structured result states cleanly.** - `advice` — readable prose/markdown with optional metadata footer. - `limit_reached` — warning-style message. - `error` — error state with visible fallback text. - `running` — existing tool loading state/spinner is enough for MVP. 4. **Add Storybook coverage instead of ad-hoc component tests.** Recommended stories: - successful advice; - running/loading; - limit reached; - error. 5. **Keep the UI contract narrow.** - Prefer one text field like `advice` plus small metadata rather than a deeply nested schema. - That keeps the UI resilient to prompt iteration. ### Acceptance criteria - The advisor tool card renders readable content rather than raw quoted JSON in the final release branch. - Running, limit, and error states are visibly distinct. - Storybook stories and play assertions cover the new states. - Existing tool rendering flows remain unchanged. ## Phase 6 — Automated tests and validation gates ### Backend tests to add 1. **Advisor runtime/tool tests** - question validation; - tool-less nested execution assertion; - success result shaping; - limit-reached result shaping; - error result shaping. 2. **Prompt/gating tests in chatd** - advisor disabled ⇒ no tool, no guidance; - advisor enabled/root chat ⇒ tool + guidance; - child chat ⇒ advisor absent. 3. **Chatloop policy tests** - advisor alone runs; - advisor + action tool mixed batch returns deterministic policy errors; - non-advisor tools still execute normally. 4. **Usage/metrics tests** - per-run cap resets correctly; - builtin tool labeling includes `advisor`; - returned metadata includes model/usage summary when available. ### Frontend tests to add - Storybook `play()` assertions for the advisor renderer states. - Verify expand/collapse behavior and visible fallback text. - Verify the message timeline still renders adjacent tools correctly. ### Recommended command sequence Run these as the implementation matures, not only at the end: 1. Backend-focused gate after phases 1–4: - `make test RUN=TestAdvisor` - `make test RUN=TestChatloopAdvisor` - `make lint` 2. Frontend-focused gate after phase 5: - `pnpm test:storybook src/pages/AgentsPage/components/ChatElements/tools/AdvisorTool.stories.tsx` - `pnpm lint` - `pnpm format` 3. Final repo gate before handoff: - `make pre-commit` - run any additional targeted `make test RUN=...` selections covering touched chatd paths > Use the exact new test names the implementing agents create; the names above are recommended anchors, not existing tests. ## Dogfooding plan ### Principle Dogfood the change as a real agent feature, not just a unit-tested backend. Per the dogfood and `agent-browser` skills, the reviewer should get **watchable repro videos** plus screenshots that make the behavior obvious without reading logs. ### Required setup 1. Start the full dev environment with: - `./scripts/develop.sh` 2. If the frontend renderer changes, also start Storybook from `site/` with: - `pnpm storybook --no-open` 3. Use `agent-browser` directly — **never `npx agent-browser`**. 4. Use named browser sessions and an output folder such as: - `./dogfood-output/advisor/` - with subfolders `screenshots/` and `videos/` ### Evidence protocol For every interactive scenario below: 1. Start video recording **before** the action. 2. Capture step-by-step screenshots at human pace. 3. Capture one annotated screenshot of the final state. 4. Stop the recording. 5. Note the exact pass/fail observation in the QA report. For static UI states (for example Storybook error/limit cards), an annotated screenshot is sufficient; video is optional but still encouraged by this project’s review preference. ### Dogfood scenarios #### Scenario A — Happy path in the real Agents UI **Goal:** prove that a root agent chat can invoke advisor and produce a readable recommendation before taking further action. Steps: 1. Open the Agents page with an advisor-enabled root chat. 2. Start a repro video. 3. Send a prompt that should reasonably trigger strategic planning, such as an architecture or multi-tradeoff question. 4. Capture screenshots of: - the prompt before send; - the running advisor state; - the completed advisor card and the assistant’s follow-up response. 5. Stop recording. Pass criteria: - advisor appears in the timeline; - the rendered result is readable; - the assistant can continue after consuming the advisor output. #### Scenario B — Advisor unavailable path **Goal:** prove the feature is truly gated. Suggested variants (at least one is required, both are better): - feature flag/config off; - child/sub-agent chat. Evidence: - annotated screenshot of the chat/tool state showing advisor is absent; - short video if toggling the gate live is part of the repro. Pass criteria: - no advisor tool is available; - no advisor-specific prompt behavior leaks through. #### Scenario C — UI states in Storybook **Goal:** prove the renderer handles non-happy states cleanly. Required story states: - success/advice; - running; - limit reached; - error. Evidence: - one screenshot per state; - at least one short video showing collapse/expand behavior. Pass criteria: - success renders readable advice; - limit/error have visible fallback text; - the component behaves like the other tool cards. #### Scenario D — Regression sweep of nearby tools **Goal:** ensure advisor does not break the surrounding chat timeline. Check at minimum: - another existing built-in tool still renders correctly near advisor; - sub-agent/tool cards still expand/collapse normally; - no obvious console errors appear in the Agents page during the advisor flow. Evidence: - screenshots of adjacent tool cards; - console/error capture if anything suspicious appears. ### `agent-browser` usage notes for the QA agent - Prefer `agent-browser batch` for 2+ sequential commands when no intermediate parsing is needed. - Use `snapshot -i` to discover interactive refs. - Re-snapshot after navigation or major DOM changes. - Avoid `wait --load networkidle` unless the page is known to go idle; prefer explicit element/text waits or short fixed waits. - Record videos at human pace and include pauses that a reviewer can follow. ## Rollout plan ### Initial rollout - Gate behind a server-side advisor-enabled flag. - Enable only for selected internal/root agent chats first. - Watch metrics for: - invocation count; - failure rate; - latency; - obvious retry loops. ### Expansion conditions Expand beyond the initial rollout only after the following are true: - mixed-batch policy behavior is stable; - cost impact is understood; - frontend UX is readable in production-like dogfood; - no recursion surprises have appeared with sub-agent flows. ### Explicit non-goals for the first release - advisor inside child/sub-agent chats; - provider-agnostic streaming phase UI; - MCP-based external advisor implementation; - mandatory DB-backed advisor cost reporting. ## Final acceptance checklist - [ ] `advisor` is a built-in chatd tool, not an MCP/dynamic-tool substitute. - [ ] The nested advisor call is tool-less and bounded to one in-memory step. - [ ] One eligibility boolean controls both tool registration and prompt guidance injection. - [ ] Root chats can use advisor; child chats cannot in the initial rollout. - [ ] Mixed advisor/action batches produce deterministic policy errors instead of partial execution. - [ ] Per-run usage caps and limit-reached behavior work. - [ ] Advisor usage is visible in metadata/metrics without forcing a DB migration for MVP. - [ ] The Agents UI has a readable advisor card and Storybook coverage. - [ ] Dogfooding produced screenshots and repro videos for the required scenarios. - [ ] Validation commands (`make lint`, targeted `make test`, Storybook tests, `make pre-commit`) passed before handoff. ## Suggested PR split 1. **PR 1 — Backend foundation** - `chatadvisor/` package - `chattool/advisor.go` - `chatloop` exclusive policy - chatd gating/prompt sync - backend tests 2. **PR 2 — Frontend + QA** - advisor renderer - stories/play assertions - dogfood artifacts and QA notes 3. **PR 3 — Optional follow-ups only if demanded by stakeholders** - separate advisor model override - persistent advisor billing/queryability - transient phase-stream UX </details> --- _Generated with [`mux`](https://github.com/coder/mux) • Model: `anthropic:claude-opus-4-7` • Thinking: `max`_ |
||
|
|
a84534315c |
feat(coderd/x/chatd/chatloop): add exclusive tool execution policy (#24619)
## Summary
Introduce an **exclusive-tool execution policy** in `chatloop.Run()` so that a designated "planning-only" tool can never execute in the same batch as any other locally executed tool. This is the generic foundation for the advisor tool landed in the stacked PRs above.
## Motivation
The advisor tool (stacked on top) is intentionally a pre-action planning step. It must be consulted *alone*, not interleaved with side-effecting tools, so the model has to reason strategically before committing to actions. Rather than building advisor-specific plumbing into `chatloop`, this PR adds a small, general-purpose policy hook that any future tool can opt into.
## Changes
- `RunOptions.ExclusiveToolNames map[string]bool` declares which tool names must run exclusively within a single tool-execution batch.
- `executeTools()` detects mixed batches: if any exclusive tool name appears next to any other locally executed tool, no tool runs. Instead, structured `ToolResultOutputContentError` entries are synthesized for every tool in the batch so the model can cleanly retry.
- Deterministic, model-facing error copy:
- Exclusive tool gets: "must be called by itself before action tools".
- Sibling tools get: "skipped because `<exclusive>` must run alone".
- New tests in `chatloop_test.go` cover: single exclusive batch runs normally, mixed batch returns policy errors and executes nothing, non-exclusive batches are untouched.
## Stack context
This is **PR 1 of 6** in the advisor feature stack.
```
main
└─ feat/advisor-01-chatloop-exclusive-policy ← this PR
└─ feat/advisor-02-chatadvisor-pkg
└─ feat/advisor-03-config-api
├─ feat/advisor-04-chatd-runtime
│ └─ feat/advisor-05-chat-tool-renderer
└─ feat/advisor-06-admin-settings-ui
```
## Scope / non-goals
- No advisor-specific code lives in this PR.
- Zero behavior change when `ExclusiveToolNames` is empty (current default for every existing caller).
## Validation
- `go test ./coderd/x/chatd/chatloop/... -run TestExclusive`
- `make lint`
---
<details>
<summary>📋 Implementation Plan (shared across the advisor stack)</summary>
# Plan: Add a Mux-style advisor tool to coder agents/chatd
## Outcome
Add a first-class `advisor` tool to agent chats in `coderd/x/chatd` that feels native to Coder:
- it is a built-in server-side tool, not an MCP/dynamic-tool workaround;
- it performs a nested **tool-less** model call for strategic advice;
- it is exposed only when eligible, and the prompt mentions it only when it is actually available;
- it is treated as a **planning-only** tool so it does not run alongside action tools in the same batch;
- it tracks usage/cost separately enough for operators to reason about it;
- it has a minimally polished UI in the Agents page;
- and it ships with explicit dogfooding evidence, including screenshots and repro videos.
## Design decisions to lock before coding
1. **Primary architecture:** native built-in tool in `chattool/`, backed by a small `chatadvisor` package.
2. **Nested model execution:** reuse chatd's existing model/provider stack for a one-step, tool-less advisor call rather than inventing a new provider pathway.
3. **Execution policy:** treat `advisor` as an exclusive/planning-only tool; mixed batches must return structured policy errors and force the model to retry cleanly.
4. **Availability:** initial rollout is for root agent chats only; disable for child/sub-agent chats until recursion/cost policy is proven.
5. **Prompt sync:** use one eligibility boolean to drive both tool registration and advisor guidance injection.
6. **Persistence/cost split:** MVP should keep advisor usage visible in result metadata and server metrics; only add DB schema if product/billing explicitly needs queryable advisor-specific cost.
7. **UI scope:** generic tool rendering is an acceptable temporary milestone during backend bring-up, but the release candidate should include a dedicated lightweight advisor renderer.
## Delivery model
The work should be executed as coordinated workstreams with one integration owner and parallel contributors for low-conflict areas. The integration owner should own `coderd/x/chatd/chatd.go` because prompt assembly, tool registration, and model resolution all converge there.
## Detailed workstreams
### Repo evidence used for this plan
<details>
<summary>Mux reference and current chatd seams</summary>
**Mux reference implementation**
- `src/node/services/tools/advisor.ts` — native advisor tool implementation.
- `src/common/constants/advisor.ts` — advisor prompt/constants and truncation policy.
- `src/common/utils/tools/tools.ts` — conditional tool registration.
- `src/node/services/streamContextBuilder.ts` — injects advisor guidance only when the tool is available.
**Current chatd seams**
- `coderd/x/chatd/chatd.go`
- `processChat()` — tool assembly, prompt assembly, and chatloop invocation.
- `resolveChatModel()` — current model/provider/key resolution seam.
- `type Config struct` — server-level chatd configuration surface.
- `coderd/x/chatd/chatloop/chatloop.go`
- `Run()` — main streaming/model loop.
- `executeTools()` — built-in tool execution/batching seam.
- `coderd/x/chatd/chattool/` — built-in tool implementations.
- `site/src/pages/AgentsPage/components/ChatElements/tools/Tool.tsx` — tool renderer dispatch.
- `site/src/pages/AgentsPage/components/ChatConversation/messageParsing.ts` and `ConversationTimeline.tsx` — tool/result merge and rendering flow.
</details>
### Workstream map and ownership
| Workstream | Primary owner | Main files | Can run in parallel? | Done when |
|---|---|---|---|---|
| 0. Integration + gating | Integration lead | `coderd/x/chatd/chatd.go` | No; central merge lane | Tool registration, prompt sync, and model selection are wired together |
| 1. Advisor runtime + tool | Backend agent | new `coderd/x/chatd/chatadvisor/`, new `coderd/x/chatd/chattool/advisor.go` | Yes | Tool can perform a tool-less advisor call in memory and return structured results |
| 2. Planning-only execution policy | Chatloop agent | `coderd/x/chatd/chatloop/chatloop.go`, related tests | Yes | Mixed `advisor` + action-tool batches are rejected cleanly and deterministically |
| 3. Metrics/usage/config | Backend/telemetry agent | `chatd.go`, `chatloop/metrics.go`, optional config plumbing | Partially; coordinate with integration lead | Advisor usage is separately visible in metadata/metrics and limits are enforced |
| 4. Frontend rendering | Frontend agent | `site/.../tools/Tool.tsx`, new `AdvisorTool.tsx`, stories | Yes after result schema stabilizes | Advisor renders as a readable card and story tests pass |
| 5. Dogfood + QA evidence | QA agent | dev server, Storybook, dogfood output | After backend + UI are usable | Repro videos, screenshots, and a concise QA report exist |
### Parallelization rules
- **Do not split `coderd/x/chatd/chatd.go` across multiple execution agents without an integration lead.** That file owns prompt building, tool registration, model resolution, and cost persistence.
- Workstreams 1 and 2 can be developed in parallel and then stacked onto the integration branch.
- Workstream 4 should begin once the backend result schema is agreed on, even if the backend is still behind a feature flag.
- Any agent that needs to re-check Mux behavior should clone `coder/mux` into a temporary directory (for example, `$(mktemp -d)/mux`) and inspect it read-only; do not vendor or copy code from Mux directly.
## Phase 0 — Preflight and guardrails
### Goals
- Align the team on the smallest shippable architecture.
- Prevent scope creep into MCP/dynamic-tool/sub-agent variants.
- Decide upfront what is MVP vs. follow-up.
### Tasks
1. **Confirm the MVP boundary.**
- Ship a built-in advisor tool first.
- Do **not** make MCP, dynamic tools, or sub-agents the primary implementation.
- Do **not** add transient streaming phases in the first backend PR unless they fall out almost for free.
2. **Confirm local workflow hygiene before coding.**
- Ensure the repo is using the project git hooks from `scripts/githooks`.
- Do not bypass hooks with `--no-verify`.
- Use `./scripts/develop.sh` for the full dev server rather than manual build/run commands.
3. **Lock the model-selection policy.**
- **Recommended MVP:** advisor uses the same resolved provider/model/cost config as the current chat, with advisor-specific max-output and usage caps.
- **Follow-up only if required:** add a separate `AdvisorModelConfigID`-style override that resolves through the existing `configCache`/model-config path. Do not invent a new free-form `provider:model` parser if chatd already stores provider/model separately.
4. **Lock the persistence policy.**
- **Recommended MVP:** no DB migration. Persist advisor-visible metadata in the tool result and record separate metrics in memory/Prometheus.
- **Only if product/billing explicitly asks for queryable advisor cost:** add a later DB migration or usage table, following the normal `queries/*.sql` + `make gen` workflow.
5. **Create an execution ADR note in the work item or tracking doc.**
- Capture: built-in tool, tool-less nested call, root-chat-only rollout, exclusive execution policy, MVP no-DB-migration default.
### Quality gate
- Everyone on the team can state the same answers to these questions:
- Is advisor a built-in tool? **Yes.**
- Can advisor run with action tools in the same batch? **No.**
- Does advisor get tools of its own? **No.**
- Is a DB migration required for MVP? **No, unless billing insists.**
## Phase 1 — Build the advisor runtime and tool wrapper
### Goals
Create the core advisor implementation in a way that is easy to test and keeps `chattool/` thin.
### Files to add
- `coderd/x/chatd/chatadvisor/types.go`
- `coderd/x/chatd/chatadvisor/guidance.go`
- `coderd/x/chatd/chatadvisor/handoff.go`
- `coderd/x/chatd/chatadvisor/runtime.go`
- `coderd/x/chatd/chatadvisor/runner.go`
- `coderd/x/chatd/chattool/advisor.go`
### Responsibilities by file
1. **`types.go`**
- Define the input/result schema used by the tool and UI.
- Keep the result shape close to Mux so the UI and model both have predictable cases.
- Recommended result variants:
- `advice`
- `limit_reached`
- `error`
Recommended shape:
```go
type AdvisorArgs struct {
Question string `json:"question"`
}
type AdvisorResult struct {
Type string `json:"type"`
Advice string `json:"advice,omitempty"`
Error string `json:"error,omitempty"`
AdvisorModel string `json:"advisor_model,omitempty"`
RemainingUses int `json:"remaining_uses,omitempty"`
Usage *AdvisorUsageResult `json:"usage,omitempty"`
}
```
2. **`guidance.go`**
- Hold two strings:
- the nested advisor system prompt;
- the parent-agent guidance block to inject into the outer system prompt.
- The nested advisor prompt must say, in plain language:
- you are advising the parent agent;
- you do not address the end user directly;
- you do not claim actions happened;
- you return concise strategic guidance and tradeoffs.
3. **`runtime.go`**
- Define the per-run runtime state.
- Recommended fields:
- resolved model + model config;
- provider keys/options reused from the outer chat;
- `MaxUsesPerRun`;
- `MaxOutputTokens`;
- atomic/current call counter;
- callback(s) to obtain the current prompt snapshot and current-step snapshot;
- optional metrics/usage hook.
- Add fail-fast validation for impossible config: nil model, non-positive limits, empty prompt builders, etc.
4. **`handoff.go`**
- Build the advisor handoff message from:
- the explicit question;
- the exact prompt/messages the parent model just used;
- the current step's text/reasoning snapshot, if available;
- the most recent relevant tool outputs, if they are already in the prompt snapshot.
- **Important:** use the already-prepared outer prompt tail, not a fresh DB reload. That keeps the advisor aligned with compaction and the exact context the outer model saw.
- Apply hard truncation budgets with recent-context bias.
5. **`runner.go`**
- Execute the nested advisor call.
- **Recommended implementation:** call `chatloop.Run()` in an in-memory, one-step mode:
- `Tools: nil`
- `ProviderTools: nil`
- `MaxSteps: 1`
- `PersistStep`: capture the assistant output in memory instead of writing DB rows
- Reuse the existing provider/model/cost path instead of building a second provider runner.
- Assert that no tool definitions are passed to the nested call.
6. **`chattool/advisor.go`**
- Keep this file thin and consistent with other built-ins.
- Responsibilities:
- decode `AdvisorArgs`;
- validate `Question` is non-empty and bounded;
- call the `chatadvisor` runner;
- return a structured tool response.
### Defensive programming requirements
- Assert `Question` is non-empty after trimming.
- Assert runtime limits are positive.
- Assert the nested advisor call runs with zero tools/provider tools.
- Assert `AdvisorResult.Type` is one of the known variants before returning.
- Assert remaining uses never goes negative.
### Acceptance criteria
- A unit test can call the advisor tool with a fake model and receive a stable `advice` result.
- The nested advisor call is impossible to run with tools accidentally attached.
- The core logic lives in `chatadvisor/`, not embedded inside `chatd.go`.
## Phase 2 — Wire advisor into chatd and keep prompt/tool availability in sync
### Goals
Register the tool in the right place, expose it only when eligible, and inject system guidance only when the tool is present.
### Files to modify
- `coderd/x/chatd/chatd.go`
- optionally a small helper file if `chatd.go` becomes too crowded
### Tasks
1. **Compute one eligibility boolean in `processChat()`.**
Recommended inputs:
- server-level advisor enabled flag;
- root chat only (`chat.ParentChatID == uuid.Nil` or equivalent existing root/child check);
- a usable resolved model/provider exists;
- optional experiment/workspace/org gate if product wants staged rollout.
2. **Create the runtime once per outer chat run.**
- Use the model/config/keys resolved by `resolveChatModel()`.
- Reuse provider options from the current chat's `ChatModelCallConfig`.
- Set `MaxUsesPerRun` and `MaxOutputTokens` from advisor config defaults.
3. **Register the tool in the built-in tool block.**
- Insert after the skill tools and before MCP tools in `processChat()`.
- Record `builtinToolNames["advisor"] = true` so metrics stay bounded.
4. **Inject advisor guidance into the outer system prompt using the same boolean.**
- Use `chatprompt.InsertSystem()` in the same prompt assembly path that already injects user/system instructions.
- Place the block near the existing instruction insertion, before plan-path/skill context blocks.
- Wrap the guidance in an explicit tag like `<advisor-guidance>` so it is easy to spot in tests and future refactors.
5. **Keep advisor out of child chats for the first release.**
- That avoids recursion/cost blowups with `spawn_agent` / `wait_agent` flows.
- Document this explicitly in the rollout notes and tests.
### Acceptance criteria
- If advisor is disabled, neither the tool nor the prompt guidance appears.
- If advisor is enabled, both the tool and the prompt guidance appear.
- Root chats can use advisor; child chats cannot.
- Built-in tool names include `advisor` so metrics do not collapse it into the generic `mcp` label.
## Phase 3 — Enforce planning-only execution policy in `chatloop`
### Goals
Prevent the model from calling `advisor` and action tools in the same execution batch.
### Files to modify
- `coderd/x/chatd/chatloop/chatloop.go`
- related chatloop tests
### Recommended implementation
Keep the MVP small; do **not** build a general policy engine yet.
1. Add a minimal field to `chatloop.RunOptions`, for example:
```go
ExclusiveToolName *string
```
2. In `Run()` / `executeTools()`, detect the case where the exclusive tool appears in the same local-tool batch as any other locally executed tool.
3. When that happens, synthesize structured tool-result errors for the affected calls instead of executing anything in the batch.
- `advisor` should receive a clear error like: _advisor must be called by itself before action tools_.
- The sibling action tools should receive a paired policy error like: _this tool was skipped because advisor must run alone_.
4. Let the outer model see those tool errors and retry cleanly.
- This is simpler and safer than partial execution or hidden deferral.
- It preserves deterministic transcript history for debugging.
5. Pass the just-finished step snapshot into the tool execution context.
- The advisor runtime should be able to see the current step's text/reasoning content, because that is often the best hint about what the outer model is trying to decide.
### Why this is the right fit
- It matches the intended semantics: advisor is consulted **before** taking action.
- It avoids subtle race conditions caused by concurrent built-in tool execution.
- It keeps the behavior easy to test with fake models.
### Acceptance criteria
- A model-emitted batch containing only `advisor` succeeds.
- A model-emitted batch containing `advisor` plus any other locally executed tool returns deterministic policy errors and executes nothing.
- Non-advisor tool execution stays unchanged for normal chats.
## Phase 4 — Usage limits, metrics, and configuration
### Goals
Make advisor safe to operate without over-designing billing/storage in the first release.
### Files to modify
- `coderd/x/chatd/chatd.go`
- `coderd/x/chatd/chatloop/metrics.go` as needed
- `coderd/x/chatd/chatd.go` `Config` struct and constructor path
- optional follow-up config/db files only if a separate advisor model or persistent billing is required
### Tasks
1. **Add explicit server config knobs for MVP.**
Recommended fields on `chatd.Config` or a nested advisor config struct:
- `AdvisorEnabled bool`
- `AdvisorMaxUsesPerRun int`
- `AdvisorMaxOutputTokens int64`
2. **Track usage per outer run.**
- Reset the counter for each `processChat()` invocation.
- Return `remaining_uses` in the tool result.
- Return `limit_reached` when the cap is exhausted.
3. **Expose advisor usage metadata in the tool result.**
- Include model name and token/cost summary if available.
- Use the same `callConfig.Cost` calculation path as the outer chat for MVP if advisor reuses the same model.
4. **Record server-side metrics.**
- Count advisor invocations, failures, and latency.
- Ensure they show up under the built-in tool label `advisor`.
5. **Optional decision gate: separate advisor model.**
- If product insists on a stronger/different advisor model, add a follow-up config hook that resolves another existing chat model config through the same `configCache` path.
- Keep that out of the first landing PR unless it is required for acceptance.
6. **Optional decision gate: queryable advisor cost.**
- If this becomes required, spin a follow-up DB task:
- update `coderd/database/queries/*.sql`;
- add migration files;
- run `make gen`;
- update audit mappings if a new auditable type/field is introduced.
### Acceptance criteria
- Advisor calls are capped per outer run.
- Limit exhaustion is user-visible in the tool result.
- Metrics distinguish advisor calls from other built-in tools.
- MVP does not require a schema migration unless explicitly approved.
## Phase 5 — Frontend rendering and Storybook coverage
### Goals
Make advisor feel intentional in the Agents UI without blocking the backend on fancy streaming UI.
### Files to modify
- `site/src/pages/AgentsPage/components/ChatElements/tools/Tool.tsx`
- new `site/src/pages/AgentsPage/components/ChatElements/tools/AdvisorTool.tsx`
- Storybook story file(s) in the same tools directory
### Delivery strategy
1. **Intermediate milestone during backend bring-up:** rely on the existing generic tool renderer if needed.
- This is acceptable only as a short-lived integration checkpoint.
2. **Release milestone:** add a dedicated lightweight `AdvisorTool` renderer.
- Reuse existing primitives:
- `ToolCollapsible`
- `ToolIcon`
- `Response` for markdown/prose rendering
- `ScrollArea` if the advice can be long
- Keep styling light and consistent with the Agents page.
- Do not add unnecessary React memoization in `site/src/pages/AgentsPage/`; that area is already React-Compiler aware.
3. **Render the structured result states cleanly.**
- `advice` — readable prose/markdown with optional metadata footer.
- `limit_reached` — warning-style message.
- `error` — error state with visible fallback text.
- `running` — existing tool loading state/spinner is enough for MVP.
4. **Add Storybook coverage instead of ad-hoc component tests.**
Recommended stories:
- successful advice;
- running/loading;
- limit reached;
- error.
5. **Keep the UI contract narrow.**
- Prefer one text field like `advice` plus small metadata rather than a deeply nested schema.
- That keeps the UI resilient to prompt iteration.
### Acceptance criteria
- The advisor tool card renders readable content rather than raw quoted JSON in the final release branch.
- Running, limit, and error states are visibly distinct.
- Storybook stories and play assertions cover the new states.
- Existing tool rendering flows remain unchanged.
## Phase 6 — Automated tests and validation gates
### Backend tests to add
1. **Advisor runtime/tool tests**
- question validation;
- tool-less nested execution assertion;
- success result shaping;
- limit-reached result shaping;
- error result shaping.
2. **Prompt/gating tests in chatd**
- advisor disabled ⇒ no tool, no guidance;
- advisor enabled/root chat ⇒ tool + guidance;
- child chat ⇒ advisor absent.
3. **Chatloop policy tests**
- advisor alone runs;
- advisor + action tool mixed batch returns deterministic policy errors;
- non-advisor tools still execute normally.
4. **Usage/metrics tests**
- per-run cap resets correctly;
- builtin tool labeling includes `advisor`;
- returned metadata includes model/usage summary when available.
### Frontend tests to add
- Storybook `play()` assertions for the advisor renderer states.
- Verify expand/collapse behavior and visible fallback text.
- Verify the message timeline still renders adjacent tools correctly.
### Recommended command sequence
Run these as the implementation matures, not only at the end:
1. Backend-focused gate after phases 1–4:
- `make test RUN=TestAdvisor`
- `make test RUN=TestChatloopAdvisor`
- `make lint`
2. Frontend-focused gate after phase 5:
- `pnpm test:storybook src/pages/AgentsPage/components/ChatElements/tools/AdvisorTool.stories.tsx`
- `pnpm lint`
- `pnpm format`
3. Final repo gate before handoff:
- `make pre-commit`
- run any additional targeted `make test RUN=...` selections covering touched chatd paths
> Use the exact new test names the implementing agents create; the names above are recommended anchors, not existing tests.
## Dogfooding plan
### Principle
Dogfood the change as a real agent feature, not just a unit-tested backend. Per the dogfood and `agent-browser` skills, the reviewer should get **watchable repro videos** plus screenshots that make the behavior obvious without reading logs.
### Required setup
1. Start the full dev environment with:
- `./scripts/develop.sh`
2. If the frontend renderer changes, also start Storybook from `site/` with:
- `pnpm storybook --no-open`
3. Use `agent-browser` directly — **never `npx agent-browser`**.
4. Use named browser sessions and an output folder such as:
- `./dogfood-output/advisor/`
- with subfolders `screenshots/` and `videos/`
### Evidence protocol
For every interactive scenario below:
1. Start video recording **before** the action.
2. Capture step-by-step screenshots at human pace.
3. Capture one annotated screenshot of the final state.
4. Stop the recording.
5. Note the exact pass/fail observation in the QA report.
For static UI states (for example Storybook error/limit cards), an annotated screenshot is sufficient; video is optional but still encouraged by this project’s review preference.
### Dogfood scenarios
#### Scenario A — Happy path in the real Agents UI
**Goal:** prove that a root agent chat can invoke advisor and produce a readable recommendation before taking further action.
Steps:
1. Open the Agents page with an advisor-enabled root chat.
2. Start a repro video.
3. Send a prompt that should reasonably trigger strategic planning, such as an architecture or multi-tradeoff question.
4. Capture screenshots of:
- the prompt before send;
- the running advisor state;
- the completed advisor card and the assistant’s follow-up response.
5. Stop recording.
Pass criteria:
- advisor appears in the timeline;
- the rendered result is readable;
- the assistant can continue after consuming the advisor output.
#### Scenario B — Advisor unavailable path
**Goal:** prove the feature is truly gated.
Suggested variants (at least one is required, both are better):
- feature flag/config off;
- child/sub-agent chat.
Evidence:
- annotated screenshot of the chat/tool state showing advisor is absent;
- short video if toggling the gate live is part of the repro.
Pass criteria:
- no advisor tool is available;
- no advisor-specific prompt behavior leaks through.
#### Scenario C — UI states in Storybook
**Goal:** prove the renderer handles non-happy states cleanly.
Required story states:
- success/advice;
- running;
- limit reached;
- error.
Evidence:
- one screenshot per state;
- at least one short video showing collapse/expand behavior.
Pass criteria:
- success renders readable advice;
- limit/error have visible fallback text;
- the component behaves like the other tool cards.
#### Scenario D — Regression sweep of nearby tools
**Goal:** ensure advisor does not break the surrounding chat timeline.
Check at minimum:
- another existing built-in tool still renders correctly near advisor;
- sub-agent/tool cards still expand/collapse normally;
- no obvious console errors appear in the Agents page during the advisor flow.
Evidence:
- screenshots of adjacent tool cards;
- console/error capture if anything suspicious appears.
### `agent-browser` usage notes for the QA agent
- Prefer `agent-browser batch` for 2+ sequential commands when no intermediate parsing is needed.
- Use `snapshot -i` to discover interactive refs.
- Re-snapshot after navigation or major DOM changes.
- Avoid `wait --load networkidle` unless the page is known to go idle; prefer explicit element/text waits or short fixed waits.
- Record videos at human pace and include pauses that a reviewer can follow.
## Rollout plan
### Initial rollout
- Gate behind a server-side advisor-enabled flag.
- Enable only for selected internal/root agent chats first.
- Watch metrics for:
- invocation count;
- failure rate;
- latency;
- obvious retry loops.
### Expansion conditions
Expand beyond the initial rollout only after the following are true:
- mixed-batch policy behavior is stable;
- cost impact is understood;
- frontend UX is readable in production-like dogfood;
- no recursion surprises have appeared with sub-agent flows.
### Explicit non-goals for the first release
- advisor inside child/sub-agent chats;
- provider-agnostic streaming phase UI;
- MCP-based external advisor implementation;
- mandatory DB-backed advisor cost reporting.
## Final acceptance checklist
- [ ] `advisor` is a built-in chatd tool, not an MCP/dynamic-tool substitute.
- [ ] The nested advisor call is tool-less and bounded to one in-memory step.
- [ ] One eligibility boolean controls both tool registration and prompt guidance injection.
- [ ] Root chats can use advisor; child chats cannot in the initial rollout.
- [ ] Mixed advisor/action batches produce deterministic policy errors instead of partial execution.
- [ ] Per-run usage caps and limit-reached behavior work.
- [ ] Advisor usage is visible in metadata/metrics without forcing a DB migration for MVP.
- [ ] The Agents UI has a readable advisor card and Storybook coverage.
- [ ] Dogfooding produced screenshots and repro videos for the required scenarios.
- [ ] Validation commands (`make lint`, targeted `make test`, Storybook tests, `make pre-commit`) passed before handoff.
## Suggested PR split
1. **PR 1 — Backend foundation**
- `chatadvisor/` package
- `chattool/advisor.go`
- `chatloop` exclusive policy
- chatd gating/prompt sync
- backend tests
2. **PR 2 — Frontend + QA**
- advisor renderer
- stories/play assertions
- dogfood artifacts and QA notes
3. **PR 3 — Optional follow-ups only if demanded by stakeholders**
- separate advisor model override
- persistent advisor billing/queryability
- transient phase-stream UX
</details>
---
_Generated with [`mux`](https://github.com/coder/mux) • Model: `anthropic:claude-opus-4-7` • Thinking: `max`_
|
||
|
|
f993b72628 |
fix: introduce ResourceAiSeat for fine-grained AI seat RBAC (#24613)
Fixes: https://github.com/coder/internal/issues/1444 |
||
|
|
dbb50ebaaf |
feat: remove 429 from aibridge circuit breaker failure conditions (#24701)
## Description Removes 429 (Too Many Requests) from the circuit breaker failure conditions. Rate limiting is now handled by automatic key failover instead of tripping the circuit breaker. ## Changes `DefaultIsFailure` no longer treats 429 as a circuit breaker failure. The circuit breaker now only trips on server overload responses (503, 529). Tests and integration tests updated to use 503 instead of 429 for tripping circuits. Description strings in deployment config updated to reflect the change. Closes https://github.com/coder/internal/issues/1445 > [!NOTE] > Initially generated by Coder Agents, modified and reviewed by @ssncferreira |
||
|
|
fb84e72319 |
feat: add secret requirement contract to dynamic parameters (#24785)
Adds structured `secret_requirements` to dynamic parameter responses and enforces missing required secrets during workspace start. Stop, delete, and tag rendering paths skip secret requirement enforcement so unmet secrets do not prevent cleanup. The SDK, generated API docs/types, and backend render/resolver/wsbuilder tests are updated for the new contract. |
||
|
|
70d46943db |
fix: match on ID instead of username (#24797)
The username suffix could put the name past the 32 character limit, causing the test to flake. Instead of using a suffix, match on the expected ID instead. |
||
|
|
be57af5ff0 |
feat: add exit code and status to workspace agent scripts (#24505)
For scripts that have not finished or in dry run cases these will be omitted. |