Files
coder/coderd/x/chatd/configcache.go
T
Thomas Kosiewski 17409a515c feat(coderd): wire advisor runtime to admin config (#24622)
## Summary

Wire the advisor runtime into `chatd`: read the admin config on every `runChat`, gate tool registration and system-prompt guidance on a **single eligibility boolean**, register the `advisor` built-in tool, and apply the exclusive-tool policy from PR 1.

## Motivation

This is the integration seam where PRs 1–3 come together into an actual user-visible feature. Gating is deliberately root-chat-only for the initial rollout; child/sub-agent chats still do not see the tool or the guidance block.

## Changes

### `coderd/x/chatd/chatd.go`
- `loadAdvisorConfig(ctx, logger)` reads the admin config (from PR 3) on each run. If `ModelConfigID` is set, it resolves the override model via `configCache.ModelConfigByID`; otherwise it falls back to the outer chat's model and provider options. Reasoning effort is plumbed into provider options via `applyAdvisorReasoningEffort`.
- One computed `advisorEligible` boolean drives **both** tool registration (after skill tools, before MCP tools) and guidance injection via `chatprompt.InsertSystem(prompt, chatadvisor.ParentGuidanceBlock)`.
- `setAdvisorPromptSnapshot` closures capture the outer prompt state at the right points in the lifecycle (`renderPlanPathPrompt`, `ReloadMessages`, `PrepareMessages`) so the advisor handoff uses the same context the outer model saw.
- `ExclusiveToolNames["advisor"] = true` is passed to `chatloop.Run()` so mixed batches are rejected cleanly (PR 1 machinery).
- `builtinToolNames["advisor"] = true` so metrics keep advisor distinct from the generic `mcp` label.

### Child-chat guard
- Child/sub-agent chats deliberately do not see the advisor tool or guidance block, to avoid recursion/cost blowups until the pattern is proven. This is covered by `TestAdvisorGating_ChildChat` (currently skipped pending a rewrite against the new `plan`/`explore` subagent infrastructure; core gating logic is still exercised by `TestAdvisorGating_Disabled` and `TestAdvisorGating_RootChat`).

## Stack context

This is **PR 4 of 6** in the advisor feature stack. It depends on PRs 1–3.

## Scope / non-goals

- No frontend changes. The feature is invocable via the backend but renders generically until PR 5.
- No separate provider runner; the nested advisor call reuses the existing model/provider path.
- No DB migration.

## Validation

- `go test ./coderd/x/chatd/... -run TestAdvisor`
- `go build ./...`
- `make lint`

---

<details>
<summary>📋 Implementation Plan (shared across the advisor stack)</summary>

# Plan: Add a Mux-style advisor tool to coder agents/chatd

## Outcome

Add a first-class `advisor` tool to agent chats in `coderd/x/chatd` that feels native to Coder:

- it is a built-in server-side tool, not an MCP/dynamic-tool workaround;
- it performs a nested **tool-less** model call for strategic advice;
- it is exposed only when eligible, and the prompt mentions it only when it is actually available;
- it is treated as a **planning-only** tool so it does not run alongside action tools in the same batch;
- it tracks usage/cost separately enough for operators to reason about it;
- it has a minimally polished UI in the Agents page;
- and it ships with explicit dogfooding evidence, including screenshots and repro videos.

## Design decisions to lock before coding

1. **Primary architecture:** native built-in tool in `chattool/`, backed by a small `chatadvisor` package.
2. **Nested model execution:** reuse chatd's existing model/provider stack for a one-step, tool-less advisor call rather than inventing a new provider pathway.
3. **Execution policy:** treat `advisor` as an exclusive/planning-only tool; mixed batches must return structured policy errors and force the model to retry cleanly.
4. **Availability:** initial rollout is for root agent chats only; disable for child/sub-agent chats until recursion/cost policy is proven.
5. **Prompt sync:** use one eligibility boolean to drive both tool registration and advisor guidance injection.
6. **Persistence/cost split:** MVP should keep advisor usage visible in result metadata and server metrics; only add DB schema if product/billing explicitly needs queryable advisor-specific cost.
7. **UI scope:** generic tool rendering is an acceptable temporary milestone during backend bring-up, but the release candidate should include a dedicated lightweight advisor renderer.

## Delivery model

The work should be executed as coordinated workstreams with one integration owner and parallel contributors for low-conflict areas. The integration owner should own `coderd/x/chatd/chatd.go` because prompt assembly, tool registration, and model resolution all converge there.

## Detailed workstreams

### Repo evidence used for this plan

<details>
<summary>Mux reference and current chatd seams</summary>

**Mux reference implementation**

- `src/node/services/tools/advisor.ts` — native advisor tool implementation.
- `src/common/constants/advisor.ts` — advisor prompt/constants and truncation policy.
- `src/common/utils/tools/tools.ts` — conditional tool registration.
- `src/node/services/streamContextBuilder.ts` — injects advisor guidance only when the tool is available.

**Current chatd seams**

- `coderd/x/chatd/chatd.go`
  - `processChat()` — tool assembly, prompt assembly, and chatloop invocation.
  - `resolveChatModel()` — current model/provider/key resolution seam.
  - `type Config struct` — server-level chatd configuration surface.
- `coderd/x/chatd/chatloop/chatloop.go`
  - `Run()` — main streaming/model loop.
  - `executeTools()` — built-in tool execution/batching seam.
- `coderd/x/chatd/chattool/` — built-in tool implementations.
- `site/src/pages/AgentsPage/components/ChatElements/tools/Tool.tsx` — tool renderer dispatch.
- `site/src/pages/AgentsPage/components/ChatConversation/messageParsing.ts` and `ConversationTimeline.tsx` — tool/result merge and rendering flow.

</details>

### Workstream map and ownership

| Workstream | Primary owner | Main files | Can run in parallel? | Done when |
|---|---|---|---|---|
| 0. Integration + gating | Integration lead | `coderd/x/chatd/chatd.go` | No; central merge lane | Tool registration, prompt sync, and model selection are wired together |
| 1. Advisor runtime + tool | Backend agent | new `coderd/x/chatd/chatadvisor/`, new `coderd/x/chatd/chattool/advisor.go` | Yes | Tool can perform a tool-less advisor call in memory and return structured results |
| 2. Planning-only execution policy | Chatloop agent | `coderd/x/chatd/chatloop/chatloop.go`, related tests | Yes | Mixed `advisor` + action-tool batches are rejected cleanly and deterministically |
| 3. Metrics/usage/config | Backend/telemetry agent | `chatd.go`, `chatloop/metrics.go`, optional config plumbing | Partially; coordinate with integration lead | Advisor usage is separately visible in metadata/metrics and limits are enforced |
| 4. Frontend rendering | Frontend agent | `site/.../tools/Tool.tsx`, new `AdvisorTool.tsx`, stories | Yes after result schema stabilizes | Advisor renders as a readable card and story tests pass |
| 5. Dogfood + QA evidence | QA agent | dev server, Storybook, dogfood output | After backend + UI are usable | Repro videos, screenshots, and a concise QA report exist |

### Parallelization rules

- **Do not split `coderd/x/chatd/chatd.go` across multiple execution agents without an integration lead.** That file owns prompt building, tool registration, model resolution, and cost persistence.
- Workstreams 1 and 2 can be developed in parallel and then stacked onto the integration branch.
- Workstream 4 should begin once the backend result schema is agreed on, even if the backend is still behind a feature flag.
- Any agent that needs to re-check Mux behavior should clone `coder/mux` into a temporary directory (for example, `$(mktemp -d)/mux`) and inspect it read-only; do not vendor or copy code from Mux directly.

## Phase 0 — Preflight and guardrails

### Goals

- Align the team on the smallest shippable architecture.
- Prevent scope creep into MCP/dynamic-tool/sub-agent variants.
- Decide upfront what is MVP vs. follow-up.

### Tasks

1. **Confirm the MVP boundary.**
   - Ship a built-in advisor tool first.
   - Do **not** make MCP, dynamic tools, or sub-agents the primary implementation.
   - Do **not** add transient streaming phases in the first backend PR unless they fall out almost for free.

2. **Confirm local workflow hygiene before coding.**
   - Ensure the repo is using the project git hooks from `scripts/githooks`.
   - Do not bypass hooks with `--no-verify`.
   - Use `./scripts/develop.sh` for the full dev server rather than manual build/run commands.

3. **Lock the model-selection policy.**
   - **Recommended MVP:** advisor uses the same resolved provider/model/cost config as the current chat, with advisor-specific max-output and usage caps.
   - **Follow-up only if required:** add a separate `AdvisorModelConfigID`-style override that resolves through the existing `configCache`/model-config path. Do not invent a new free-form `provider:model` parser if chatd already stores provider/model separately.

4. **Lock the persistence policy.**
   - **Recommended MVP:** no DB migration. Persist advisor-visible metadata in the tool result and record separate metrics in memory/Prometheus.
   - **Only if product/billing explicitly asks for queryable advisor cost:** add a later DB migration or usage table, following the normal `queries/*.sql` + `make gen` workflow.

5. **Create an execution ADR note in the work item or tracking doc.**
   - Capture: built-in tool, tool-less nested call, root-chat-only rollout, exclusive execution policy, MVP no-DB-migration default.

### Quality gate

- Everyone on the team can state the same answers to these questions:
  - Is advisor a built-in tool? **Yes.**
  - Can advisor run with action tools in the same batch? **No.**
  - Does advisor get tools of its own? **No.**
  - Is a DB migration required for MVP? **No, unless billing insists.**

## Phase 1 — Build the advisor runtime and tool wrapper

### Goals

Create the core advisor implementation in a way that is easy to test and keeps `chattool/` thin.

### Files to add

- `coderd/x/chatd/chatadvisor/types.go`
- `coderd/x/chatd/chatadvisor/guidance.go`
- `coderd/x/chatd/chatadvisor/handoff.go`
- `coderd/x/chatd/chatadvisor/runtime.go`
- `coderd/x/chatd/chatadvisor/runner.go`
- `coderd/x/chatd/chattool/advisor.go`

### Responsibilities by file

1. **`types.go`**
   - Define the input/result schema used by the tool and UI.
   - Keep the result shape close to Mux so the UI and model both have predictable cases.
   - Recommended result variants:
     - `advice`
     - `limit_reached`
     - `error`

   Recommended shape:

   ```go
   type AdvisorArgs struct {
       Question string `json:"question"`
   }

   type AdvisorResult struct {
       Type          string              `json:"type"`
       Advice        string              `json:"advice,omitempty"`
       Error         string              `json:"error,omitempty"`
       AdvisorModel  string              `json:"advisor_model,omitempty"`
       RemainingUses int                 `json:"remaining_uses,omitempty"`
       Usage         *AdvisorUsageResult `json:"usage,omitempty"`
   }
   ```

2. **`guidance.go`**
   - Hold two strings:
     - the nested advisor system prompt;
     - the parent-agent guidance block to inject into the outer system prompt.
   - The nested advisor prompt must say, in plain language:
     - you are advising the parent agent;
     - you do not address the end user directly;
     - you do not claim actions happened;
     - you return concise strategic guidance and tradeoffs.

3. **`runtime.go`**
   - Define the per-run runtime state.
   - Recommended fields:
     - resolved model + model config;
     - provider keys/options reused from the outer chat;
     - `MaxUsesPerRun`;
     - `MaxOutputTokens`;
     - atomic/current call counter;
     - callback(s) to obtain the current prompt snapshot and current-step snapshot;
     - optional metrics/usage hook.
   - Add fail-fast validation for impossible config: nil model, non-positive limits, empty prompt builders, etc.

4. **`handoff.go`**
   - Build the advisor handoff message from:
     - the explicit question;
     - the exact prompt/messages the parent model just used;
     - the current step's text/reasoning snapshot, if available;
     - the most recent relevant tool outputs, if they are already in the prompt snapshot.
   - **Important:** use the already-prepared outer prompt tail, not a fresh DB reload. That keeps the advisor aligned with compaction and the exact context the outer model saw.
   - Apply hard truncation budgets with recent-context bias.

5. **`runner.go`**
   - Execute the nested advisor call.
   - **Recommended implementation:** call `chatloop.Run()` in an in-memory, one-step mode:
     - `Tools: nil`
     - `ProviderTools: nil`
     - `MaxSteps: 1`
     - `PersistStep`: capture the assistant output in memory instead of writing DB rows
   - Reuse the existing provider/model/cost path instead of building a second provider runner.
   - Assert that no tool definitions are passed to the nested call.

6. **`chattool/advisor.go`**
   - Keep this file thin and consistent with other built-ins.
   - Responsibilities:
     - decode `AdvisorArgs`;
     - validate `Question` is non-empty and bounded;
     - call the `chatadvisor` runner;
     - return a structured tool response.

### Defensive programming requirements

- Assert `Question` is non-empty after trimming.
- Assert runtime limits are positive.
- Assert the nested advisor call runs with zero tools/provider tools.
- Assert `AdvisorResult.Type` is one of the known variants before returning.
- Assert remaining uses never goes negative.

### Acceptance criteria

- A unit test can call the advisor tool with a fake model and receive a stable `advice` result.
- The nested advisor call is impossible to run with tools accidentally attached.
- The core logic lives in `chatadvisor/`, not embedded inside `chatd.go`.

## Phase 2 — Wire advisor into chatd and keep prompt/tool availability in sync

### Goals

Register the tool in the right place, expose it only when eligible, and inject system guidance only when the tool is present.

### Files to modify

- `coderd/x/chatd/chatd.go`
- optionally a small helper file if `chatd.go` becomes too crowded

### Tasks

1. **Compute one eligibility boolean in `processChat()`.**
   Recommended inputs:
   - server-level advisor enabled flag;
   - root chat only (`chat.ParentChatID == uuid.Nil` or equivalent existing root/child check);
   - a usable resolved model/provider exists;
   - optional experiment/workspace/org gate if product wants staged rollout.

2. **Create the runtime once per outer chat run.**
   - Use the model/config/keys resolved by `resolveChatModel()`.
   - Reuse provider options from the current chat's `ChatModelCallConfig`.
   - Set `MaxUsesPerRun` and `MaxOutputTokens` from advisor config defaults.

3. **Register the tool in the built-in tool block.**
   - Insert after the skill tools and before MCP tools in `processChat()`.
   - Record `builtinToolNames["advisor"] = true` so metrics stay bounded.

4. **Inject advisor guidance into the outer system prompt using the same boolean.**
   - Use `chatprompt.InsertSystem()` in the same prompt assembly path that already injects user/system instructions.
   - Place the block near the existing instruction insertion, before plan-path/skill context blocks.
   - Wrap the guidance in an explicit tag like `<advisor-guidance>` so it is easy to spot in tests and future refactors.

5. **Keep advisor out of child chats for the first release.**
   - That avoids recursion/cost blowups with `spawn_agent` / `wait_agent` flows.
   - Document this explicitly in the rollout notes and tests.

### Acceptance criteria

- If advisor is disabled, neither the tool nor the prompt guidance appears.
- If advisor is enabled, both the tool and the prompt guidance appear.
- Root chats can use advisor; child chats cannot.
- Built-in tool names include `advisor` so metrics do not collapse it into the generic `mcp` label.

## Phase 3 — Enforce planning-only execution policy in `chatloop`

### Goals

Prevent the model from calling `advisor` and action tools in the same execution batch.

### Files to modify

- `coderd/x/chatd/chatloop/chatloop.go`
- related chatloop tests

### Recommended implementation

Keep the MVP small; do **not** build a general policy engine yet.

1. Add a minimal field to `chatloop.RunOptions`, for example:

   ```go
   ExclusiveToolName *string
   ```

2. In `Run()` / `executeTools()`, detect the case where the exclusive tool appears in the same local-tool batch as any other locally executed tool.

3. When that happens, synthesize structured tool-result errors for the affected calls instead of executing anything in the batch.
   - `advisor` should receive a clear error like: _advisor must be called by itself before action tools_.
   - The sibling action tools should receive a paired policy error like: _this tool was skipped because advisor must run alone_.

4. Let the outer model see those tool errors and retry cleanly.
   - This is simpler and safer than partial execution or hidden deferral.
   - It preserves deterministic transcript history for debugging.

5. Pass the just-finished step snapshot into the tool execution context.
   - The advisor runtime should be able to see the current step's text/reasoning content, because that is often the best hint about what the outer model is trying to decide.

### Why this is the right fit

- It matches the intended semantics: advisor is consulted **before** taking action.
- It avoids subtle race conditions caused by concurrent built-in tool execution.
- It keeps the behavior easy to test with fake models.

### Acceptance criteria

- A model-emitted batch containing only `advisor` succeeds.
- A model-emitted batch containing `advisor` plus any other locally executed tool returns deterministic policy errors and executes nothing.
- Non-advisor tool execution stays unchanged for normal chats.

## Phase 4 — Usage limits, metrics, and configuration

### Goals

Make advisor safe to operate without over-designing billing/storage in the first release.

### Files to modify

- `coderd/x/chatd/chatd.go`
- `coderd/x/chatd/chatloop/metrics.go` as needed
- `coderd/x/chatd/chatd.go` `Config` struct and constructor path
- optional follow-up config/db files only if a separate advisor model or persistent billing is required

### Tasks

1. **Add explicit server config knobs for MVP.**
   Recommended fields on `chatd.Config` or a nested advisor config struct:
   - `AdvisorEnabled bool`
   - `AdvisorMaxUsesPerRun int`
   - `AdvisorMaxOutputTokens int64`

2. **Track usage per outer run.**
   - Reset the counter for each `processChat()` invocation.
   - Return `remaining_uses` in the tool result.
   - Return `limit_reached` when the cap is exhausted.

3. **Expose advisor usage metadata in the tool result.**
   - Include model name and token/cost summary if available.
   - Use the same `callConfig.Cost` calculation path as the outer chat for MVP if advisor reuses the same model.

4. **Record server-side metrics.**
   - Count advisor invocations, failures, and latency.
   - Ensure they show up under the built-in tool label `advisor`.

5. **Optional decision gate: separate advisor model.**
   - If product insists on a stronger/different advisor model, add a follow-up config hook that resolves another existing chat model config through the same `configCache` path.
   - Keep that out of the first landing PR unless it is required for acceptance.

6. **Optional decision gate: queryable advisor cost.**
   - If this becomes required, spin a follow-up DB task:
     - update `coderd/database/queries/*.sql`;
     - add migration files;
     - run `make gen`;
     - update audit mappings if a new auditable type/field is introduced.

### Acceptance criteria

- Advisor calls are capped per outer run.
- Limit exhaustion is user-visible in the tool result.
- Metrics distinguish advisor calls from other built-in tools.
- MVP does not require a schema migration unless explicitly approved.

## Phase 5 — Frontend rendering and Storybook coverage

### Goals

Make advisor feel intentional in the Agents UI without blocking the backend on fancy streaming UI.

### Files to modify

- `site/src/pages/AgentsPage/components/ChatElements/tools/Tool.tsx`
- new `site/src/pages/AgentsPage/components/ChatElements/tools/AdvisorTool.tsx`
- Storybook story file(s) in the same tools directory

### Delivery strategy

1. **Intermediate milestone during backend bring-up:** rely on the existing generic tool renderer if needed.
   - This is acceptable only as a short-lived integration checkpoint.

2. **Release milestone:** add a dedicated lightweight `AdvisorTool` renderer.
   - Reuse existing primitives:
     - `ToolCollapsible`
     - `ToolIcon`
     - `Response` for markdown/prose rendering
     - `ScrollArea` if the advice can be long
   - Keep styling light and consistent with the Agents page.
   - Do not add unnecessary React memoization in `site/src/pages/AgentsPage/`; that area is already React-Compiler aware.

3. **Render the structured result states cleanly.**
   - `advice` — readable prose/markdown with optional metadata footer.
   - `limit_reached` — warning-style message.
   - `error` — error state with visible fallback text.
   - `running` — existing tool loading state/spinner is enough for MVP.

4. **Add Storybook coverage instead of ad-hoc component tests.**
   Recommended stories:
   - successful advice;
   - running/loading;
   - limit reached;
   - error.

5. **Keep the UI contract narrow.**
   - Prefer one text field like `advice` plus small metadata rather than a deeply nested schema.
   - That keeps the UI resilient to prompt iteration.

### Acceptance criteria

- The advisor tool card renders readable content rather than raw quoted JSON in the final release branch.
- Running, limit, and error states are visibly distinct.
- Storybook stories and play assertions cover the new states.
- Existing tool rendering flows remain unchanged.

## Phase 6 — Automated tests and validation gates

### Backend tests to add

1. **Advisor runtime/tool tests**
   - question validation;
   - tool-less nested execution assertion;
   - success result shaping;
   - limit-reached result shaping;
   - error result shaping.

2. **Prompt/gating tests in chatd**
   - advisor disabled ⇒ no tool, no guidance;
   - advisor enabled/root chat ⇒ tool + guidance;
   - child chat ⇒ advisor absent.

3. **Chatloop policy tests**
   - advisor alone runs;
   - advisor + action tool mixed batch returns deterministic policy errors;
   - non-advisor tools still execute normally.

4. **Usage/metrics tests**
   - per-run cap resets correctly;
   - builtin tool labeling includes `advisor`;
   - returned metadata includes model/usage summary when available.

### Frontend tests to add

- Storybook `play()` assertions for the advisor renderer states.
- Verify expand/collapse behavior and visible fallback text.
- Verify the message timeline still renders adjacent tools correctly.

### Recommended command sequence

Run these as the implementation matures, not only at the end:

1. Backend-focused gate after phases 1–4:
   - `make test RUN=TestAdvisor`
   - `make test RUN=TestChatloopAdvisor`
   - `make lint`

2. Frontend-focused gate after phase 5:
   - `pnpm test:storybook src/pages/AgentsPage/components/ChatElements/tools/AdvisorTool.stories.tsx`
   - `pnpm lint`
   - `pnpm format`

3. Final repo gate before handoff:
   - `make pre-commit`
   - run any additional targeted `make test RUN=...` selections covering touched chatd paths

> Use the exact new test names the implementing agents create; the names above are recommended anchors, not existing tests.

## Dogfooding plan

### Principle

Dogfood the change as a real agent feature, not just a unit-tested backend. Per the dogfood and `agent-browser` skills, the reviewer should get **watchable repro videos** plus screenshots that make the behavior obvious without reading logs.

### Required setup

1. Start the full dev environment with:
   - `./scripts/develop.sh`
2. If the frontend renderer changes, also start Storybook from `site/` with:
   - `pnpm storybook --no-open`
3. Use `agent-browser` directly — **never `npx agent-browser`**.
4. Use named browser sessions and an output folder such as:
   - `./dogfood-output/advisor/`
   - with subfolders `screenshots/` and `videos/`

### Evidence protocol

For every interactive scenario below:

1. Start video recording **before** the action.
2. Capture step-by-step screenshots at human pace.
3. Capture one annotated screenshot of the final state.
4. Stop the recording.
5. Note the exact pass/fail observation in the QA report.

For static UI states (for example Storybook error/limit cards), an annotated screenshot is sufficient; video is optional but still encouraged by this project’s review preference.

### Dogfood scenarios

#### Scenario A — Happy path in the real Agents UI

**Goal:** prove that a root agent chat can invoke advisor and produce a readable recommendation before taking further action.

Steps:

1. Open the Agents page with an advisor-enabled root chat.
2. Start a repro video.
3. Send a prompt that should reasonably trigger strategic planning, such as an architecture or multi-tradeoff question.
4. Capture screenshots of:
   - the prompt before send;
   - the running advisor state;
   - the completed advisor card and the assistant’s follow-up response.
5. Stop recording.

Pass criteria:

- advisor appears in the timeline;
- the rendered result is readable;
- the assistant can continue after consuming the advisor output.

#### Scenario B — Advisor unavailable path

**Goal:** prove the feature is truly gated.

Suggested variants (at least one is required, both are better):

- feature flag/config off;
- child/sub-agent chat.

Evidence:

- annotated screenshot of the chat/tool state showing advisor is absent;
- short video if toggling the gate live is part of the repro.

Pass criteria:

- no advisor tool is available;
- no advisor-specific prompt behavior leaks through.

#### Scenario C — UI states in Storybook

**Goal:** prove the renderer handles non-happy states cleanly.

Required story states:

- success/advice;
- running;
- limit reached;
- error.

Evidence:

- one screenshot per state;
- at least one short video showing collapse/expand behavior.

Pass criteria:

- success renders readable advice;
- limit/error have visible fallback text;
- the component behaves like the other tool cards.

#### Scenario D — Regression sweep of nearby tools

**Goal:** ensure advisor does not break the surrounding chat timeline.

Check at minimum:

- another existing built-in tool still renders correctly near advisor;
- sub-agent/tool cards still expand/collapse normally;
- no obvious console errors appear in the Agents page during the advisor flow.

Evidence:

- screenshots of adjacent tool cards;
- console/error capture if anything suspicious appears.

### `agent-browser` usage notes for the QA agent

- Prefer `agent-browser batch` for 2+ sequential commands when no intermediate parsing is needed.
- Use `snapshot -i` to discover interactive refs.
- Re-snapshot after navigation or major DOM changes.
- Avoid `wait --load networkidle` unless the page is known to go idle; prefer explicit element/text waits or short fixed waits.
- Record videos at human pace and include pauses that a reviewer can follow.

## Rollout plan

### Initial rollout

- Gate behind a server-side advisor-enabled flag.
- Enable only for selected internal/root agent chats first.
- Watch metrics for:
  - invocation count;
  - failure rate;
  - latency;
  - obvious retry loops.

### Expansion conditions

Expand beyond the initial rollout only after the following are true:

- mixed-batch policy behavior is stable;
- cost impact is understood;
- frontend UX is readable in production-like dogfood;
- no recursion surprises have appeared with sub-agent flows.

### Explicit non-goals for the first release

- advisor inside child/sub-agent chats;
- provider-agnostic streaming phase UI;
- MCP-based external advisor implementation;
- mandatory DB-backed advisor cost reporting.

## Final acceptance checklist

- [ ] `advisor` is a built-in chatd tool, not an MCP/dynamic-tool substitute.
- [ ] The nested advisor call is tool-less and bounded to one in-memory step.
- [ ] One eligibility boolean controls both tool registration and prompt guidance injection.
- [ ] Root chats can use advisor; child chats cannot in the initial rollout.
- [ ] Mixed advisor/action batches produce deterministic policy errors instead of partial execution.
- [ ] Per-run usage caps and limit-reached behavior work.
- [ ] Advisor usage is visible in metadata/metrics without forcing a DB migration for MVP.
- [ ] The Agents UI has a readable advisor card and Storybook coverage.
- [ ] Dogfooding produced screenshots and repro videos for the required scenarios.
- [ ] Validation commands (`make lint`, targeted `make test`, Storybook tests, `make pre-commit`) passed before handoff.

## Suggested PR split

1. **PR 1 — Backend foundation**
   - `chatadvisor/` package
   - `chattool/advisor.go`
   - `chatloop` exclusive policy
   - chatd gating/prompt sync
   - backend tests

2. **PR 2 — Frontend + QA**
   - advisor renderer
   - stories/play assertions
   - dogfood artifacts and QA notes

3. **PR 3 — Optional follow-ups only if demanded by stakeholders**
   - separate advisor model override
   - persistent advisor billing/queryability
   - transient phase-stream UX


</details>

---
_Generated with [`mux`](https://github.com/coder/mux) • Model: `anthropic:claude-opus-4-7` • Thinking: `max`_
2026-04-30 15:07:33 +02:00

520 lines
14 KiB
Go

package chatd
import (
"context"
"database/sql"
"encoding/json"
"errors"
"fmt"
"slices"
"sync"
"time"
"github.com/ammario/tlru"
"github.com/google/uuid"
"tailscale.com/util/singleflight"
"github.com/coder/coder/v2/coderd/database"
"github.com/coder/coder/v2/codersdk"
"github.com/coder/quartz"
)
const (
chatConfigProvidersTTL = 10 * time.Second
chatConfigModelConfigTTL = 10 * time.Second
chatConfigUserPromptTTL = 5 * time.Second
chatConfigAdvisorConfigTTL = 10 * time.Second
// Bound user-prompt cache cardinality so one-shot users do not
// accumulate forever in long-lived chatd processes.
chatConfigUserPromptEntryLimit = 64 * 1024
)
type cachedProviders struct {
providers []database.ChatProvider
expiresAt time.Time
}
type cachedAdvisorConfig struct {
config codersdk.AdvisorConfig
expiresAt time.Time
}
type cachedModelConfig struct {
config database.ChatModelConfig
expiresAt time.Time
}
type modelConfigSnapshot struct {
epoch uint64
generation uint64
}
// cloneModelConfig returns a shallow copy of cfg with Options
// deep-cloned so the cache owns its own backing array.
func cloneModelConfig(cfg database.ChatModelConfig) database.ChatModelConfig {
cfg.Options = slices.Clone(cfg.Options)
return cfg
}
type chatConfigCache struct {
db database.Store
clock quartz.Clock
// ctx is the server-scoped context used for all DB fills.
// Cache fills run inside singleflight.Do where one caller
// becomes the leader for all coalesced waiters. Using a
// per-request context would mean the leader's cancellation
// (timeout, user disconnect) fans the error to every waiter.
// Storing the server context here makes that impossible by
// construction — callers cannot pass a request context into
// the shared fill path.
ctx context.Context
mu sync.RWMutex
// Providers (singleton).
providers *cachedProviders
providerGeneration uint64
providerFetches singleflight.Group[string, []database.ChatProvider]
// Model configs (keyed by ID).
modelTopologyEpoch uint64
modelConfigs map[uuid.UUID]cachedModelConfig
modelConfigFetches singleflight.Group[string, database.ChatModelConfig]
// Default model config (singleton).
defaultModelConfig *cachedModelConfig
defaultModelConfigGeneration uint64
defaultModelConfigFetches singleflight.Group[string, database.ChatModelConfig]
// User custom prompts (keyed by user ID).
userPromptEpoch uint64
userPrompts *tlru.Cache[uuid.UUID, string]
userPromptFetches singleflight.Group[string, string]
// Advisor configuration (singleton).
advisorConfig *cachedAdvisorConfig
advisorConfigGeneration uint64
advisorConfigFetches singleflight.Group[string, codersdk.AdvisorConfig]
}
func newChatConfigCache(ctx context.Context, db database.Store, clock quartz.Clock) *chatConfigCache {
return &chatConfigCache{
db: db,
clock: clock,
ctx: ctx,
modelConfigs: make(map[uuid.UUID]cachedModelConfig),
userPrompts: tlru.New[uuid.UUID](
tlru.ConstantCost[string],
chatConfigUserPromptEntryLimit,
),
}
}
// singleflightDoChan wraps a singleflight group's DoChan method,
// allowing the caller to abandon the wait if their context is
// canceled while the shared fill continues running to completion.
// This separates two lifetimes: the fill runs under the server-scoped
// context, while each caller waits under its own request-scoped context.
func singleflightDoChan[K comparable, V any](
ctx context.Context,
group *singleflight.Group[K, V],
key K,
fn func() (V, error),
) (V, error) {
ch := group.DoChan(key, fn)
select {
case <-ctx.Done():
var zero V
return zero, ctx.Err()
case res := <-ch:
return res.Val, res.Err
}
}
func (c *chatConfigCache) EnabledProviders(ctx context.Context) ([]database.ChatProvider, error) {
if providers, ok := c.cachedProviders(); ok {
return providers, nil
}
generation := c.providersGeneration()
providers, err := singleflightDoChan(
ctx,
&c.providerFetches,
fmt.Sprintf("%d:providers", generation),
func() ([]database.ChatProvider, error) {
if cached, ok := c.cachedProviders(); ok {
return cached, nil
}
fetched, err := c.db.GetEnabledChatProviders(c.ctx)
if err != nil {
return nil, err
}
c.storeProviders(generation, fetched)
return slices.Clone(fetched), nil
},
)
if err != nil {
return nil, err
}
return slices.Clone(providers), nil
}
func (c *chatConfigCache) cachedProviders() ([]database.ChatProvider, bool) {
c.mu.RLock()
entry := c.providers
c.mu.RUnlock()
if entry == nil {
return nil, false
}
if c.clock.Now().Before(entry.expiresAt) {
return slices.Clone(entry.providers), true
}
c.mu.Lock()
if current := c.providers; current != nil && !c.clock.Now().Before(current.expiresAt) {
c.providers = nil
}
c.mu.Unlock()
return nil, false
}
func (c *chatConfigCache) providersGeneration() uint64 {
c.mu.RLock()
generation := c.providerGeneration
c.mu.RUnlock()
return generation
}
func (c *chatConfigCache) storeProviders(generation uint64, providers []database.ChatProvider) {
c.mu.Lock()
defer c.mu.Unlock()
if c.providerGeneration != generation {
return
}
c.providers = &cachedProviders{
providers: slices.Clone(providers),
expiresAt: c.clock.Now().Add(chatConfigProvidersTTL),
}
}
func (c *chatConfigCache) InvalidateProviders() {
c.mu.Lock()
c.providers = nil
c.providerGeneration++
// Provider topology changed — model selections depend on
// provider existence, so flush all model-config state.
clear(c.modelConfigs)
c.modelTopologyEpoch++
c.defaultModelConfig = nil
c.defaultModelConfigGeneration++
c.mu.Unlock()
}
func (c *chatConfigCache) ModelConfigByID(ctx context.Context, id uuid.UUID) (database.ChatModelConfig, error) {
if config, ok := c.cachedModelConfig(id); ok {
return config, nil
}
snap := c.modelConfigSnapshot()
config, err := singleflightDoChan(ctx, &c.modelConfigFetches, fmt.Sprintf("%d:%s", snap.epoch, id), func() (database.ChatModelConfig, error) {
if cached, ok := c.cachedModelConfig(id); ok {
return cached, nil
}
fetched, err := c.db.GetChatModelConfigByID(c.ctx, id)
if err != nil {
return database.ChatModelConfig{}, err
}
c.storeModelConfig(snap, fetched)
return cloneModelConfig(fetched), nil
})
if err != nil {
return database.ChatModelConfig{}, err
}
return config, nil
}
func (c *chatConfigCache) cachedModelConfig(id uuid.UUID) (database.ChatModelConfig, bool) {
c.mu.RLock()
entry, ok := c.modelConfigs[id]
c.mu.RUnlock()
if !ok {
return database.ChatModelConfig{}, false
}
if c.clock.Now().Before(entry.expiresAt) {
return cloneModelConfig(entry.config), true
}
c.mu.Lock()
if current, ok := c.modelConfigs[id]; ok && !c.clock.Now().Before(current.expiresAt) {
delete(c.modelConfigs, id)
}
c.mu.Unlock()
return database.ChatModelConfig{}, false
}
func (c *chatConfigCache) modelConfigSnapshot() modelConfigSnapshot {
c.mu.RLock()
snap := modelConfigSnapshot{epoch: c.modelTopologyEpoch}
c.mu.RUnlock()
return snap
}
func (c *chatConfigCache) storeModelConfig(snap modelConfigSnapshot, config database.ChatModelConfig) {
c.mu.Lock()
defer c.mu.Unlock()
if c.modelTopologyEpoch != snap.epoch {
return
}
c.modelConfigs[config.ID] = cachedModelConfig{
config: cloneModelConfig(config),
expiresAt: c.clock.Now().Add(chatConfigModelConfigTTL),
}
}
func (c *chatConfigCache) DefaultModelConfig(ctx context.Context) (database.ChatModelConfig, error) {
if config, ok := c.cachedDefaultModelConfig(); ok {
return config, nil
}
snap := c.defaultModelConfigSnapshot()
config, err := singleflightDoChan(ctx, &c.defaultModelConfigFetches, fmt.Sprintf("%d:default", snap.epoch), func() (database.ChatModelConfig, error) {
if cached, ok := c.cachedDefaultModelConfig(); ok {
return cached, nil
}
fetched, err := c.db.GetDefaultChatModelConfig(c.ctx)
if err != nil {
return database.ChatModelConfig{}, err
}
c.storeDefaultModelConfig(snap, fetched)
return cloneModelConfig(fetched), nil
})
if err != nil {
return database.ChatModelConfig{}, err
}
return config, nil
}
func (c *chatConfigCache) cachedDefaultModelConfig() (database.ChatModelConfig, bool) {
c.mu.RLock()
entry := c.defaultModelConfig
c.mu.RUnlock()
if entry == nil {
return database.ChatModelConfig{}, false
}
if c.clock.Now().Before(entry.expiresAt) {
return cloneModelConfig(entry.config), true
}
c.mu.Lock()
if current := c.defaultModelConfig; current != nil && !c.clock.Now().Before(current.expiresAt) {
c.defaultModelConfig = nil
}
c.mu.Unlock()
return database.ChatModelConfig{}, false
}
func (c *chatConfigCache) defaultModelConfigSnapshot() modelConfigSnapshot {
c.mu.RLock()
snap := modelConfigSnapshot{
epoch: c.modelTopologyEpoch,
generation: c.defaultModelConfigGeneration,
}
c.mu.RUnlock()
return snap
}
func (c *chatConfigCache) storeDefaultModelConfig(snap modelConfigSnapshot, config database.ChatModelConfig) {
c.mu.Lock()
defer c.mu.Unlock()
if c.modelTopologyEpoch != snap.epoch {
return
}
if c.defaultModelConfigGeneration != snap.generation {
return
}
c.defaultModelConfig = &cachedModelConfig{
config: cloneModelConfig(config),
expiresAt: c.clock.Now().Add(chatConfigModelConfigTTL),
}
}
func (c *chatConfigCache) UserPrompt(ctx context.Context, userID uuid.UUID) (string, error) {
if prompt, ok := c.cachedUserPrompt(userID); ok {
return prompt, nil
}
epoch := c.currentUserPromptEpoch()
prompt, err := singleflightDoChan(ctx, &c.userPromptFetches, fmt.Sprintf("%d:%s", epoch, userID), func() (string, error) {
if cached, ok := c.cachedUserPrompt(userID); ok {
return cached, nil
}
fetched, err := c.db.GetUserChatCustomPrompt(c.ctx, userID)
if err != nil {
if errors.Is(err, sql.ErrNoRows) {
c.storeUserPrompt(epoch, userID, "")
return "", nil
}
return "", err
}
c.storeUserPrompt(epoch, userID, fetched)
return fetched, nil
})
if err != nil {
return "", err
}
return prompt, nil
}
func (c *chatConfigCache) cachedUserPrompt(userID uuid.UUID) (string, bool) {
prompt, _, ok := c.userPrompts.Get(userID)
if !ok {
return "", false
}
return prompt, true
}
func (c *chatConfigCache) currentUserPromptEpoch() uint64 {
c.mu.RLock()
epoch := c.userPromptEpoch
c.mu.RUnlock()
return epoch
}
func (c *chatConfigCache) storeUserPrompt(epoch uint64, userID uuid.UUID, prompt string) {
c.mu.Lock()
defer c.mu.Unlock()
if c.userPromptEpoch != epoch {
return
}
c.userPrompts.Set(userID, prompt, chatConfigUserPromptTTL)
}
func (c *chatConfigCache) InvalidateModelConfig(id uuid.UUID) {
c.mu.Lock()
delete(c.modelConfigs, id)
c.modelTopologyEpoch++
c.defaultModelConfig = nil
c.defaultModelConfigGeneration++
c.mu.Unlock()
}
func (c *chatConfigCache) InvalidateUserPrompt(userID uuid.UUID) {
c.mu.Lock()
c.userPrompts.Delete(userID)
c.userPromptEpoch++
c.mu.Unlock()
}
// InvalidateAdvisorConfig drops the cached advisor configuration so the
// next AdvisorConfig call re-fetches from the database. Called from the
// ChatConfigEvent subscriber after an admin writes
// PUT /api/experimental/chats/config/advisor; without this the cache
// could serve stale enabled/model/limits for up to
// chatConfigAdvisorConfigTTL. Bumping the generation counter also
// discards any in-flight fill started before the invalidation, so a
// stale DB read cannot re-cache the pre-update value.
func (c *chatConfigCache) InvalidateAdvisorConfig() {
c.mu.Lock()
c.advisorConfig = nil
c.advisorConfigGeneration++
c.mu.Unlock()
}
// AdvisorConfig returns the deployment-wide advisor configuration. The
// underlying site-config row changes on the order of hours or days, so
// this cache saves a per-turn DB round trip on chats that reference the
// advisor. Parse errors and lookup errors are surfaced to the caller;
// callers that prefer silent fallback handle that at the call site.
func (c *chatConfigCache) AdvisorConfig(ctx context.Context) (codersdk.AdvisorConfig, error) {
if config, ok := c.cachedAdvisorConfig(); ok {
return config, nil
}
generation := c.advisorConfigGenerationSnapshot()
config, err := singleflightDoChan(
ctx,
&c.advisorConfigFetches,
fmt.Sprintf("%d:advisor", generation),
func() (codersdk.AdvisorConfig, error) {
if cached, ok := c.cachedAdvisorConfig(); ok {
return cached, nil
}
raw, err := c.db.GetChatAdvisorConfig(c.ctx)
if err != nil {
return codersdk.AdvisorConfig{}, err
}
var cfg codersdk.AdvisorConfig
if err := json.Unmarshal([]byte(raw), &cfg); err != nil {
return codersdk.AdvisorConfig{}, err
}
c.storeAdvisorConfig(generation, cfg)
return cfg, nil
},
)
if err != nil {
return codersdk.AdvisorConfig{}, err
}
return config, nil
}
func (c *chatConfigCache) cachedAdvisorConfig() (codersdk.AdvisorConfig, bool) {
c.mu.RLock()
entry := c.advisorConfig
c.mu.RUnlock()
if entry == nil {
return codersdk.AdvisorConfig{}, false
}
if c.clock.Now().Before(entry.expiresAt) {
return entry.config, true
}
c.mu.Lock()
if current := c.advisorConfig; current != nil && !c.clock.Now().Before(current.expiresAt) {
c.advisorConfig = nil
}
c.mu.Unlock()
return codersdk.AdvisorConfig{}, false
}
func (c *chatConfigCache) advisorConfigGenerationSnapshot() uint64 {
c.mu.RLock()
generation := c.advisorConfigGeneration
c.mu.RUnlock()
return generation
}
func (c *chatConfigCache) storeAdvisorConfig(generation uint64, config codersdk.AdvisorConfig) {
c.mu.Lock()
defer c.mu.Unlock()
if c.advisorConfigGeneration != generation {
return
}
c.advisorConfig = &cachedAdvisorConfig{
config: config,
expiresAt: c.clock.Now().Add(chatConfigAdvisorConfigTTL),
}
}