mirror of
https://github.com/coder/coder.git
synced 2026-06-02 20:48:20 +00:00
033ed0bb827b8f2e40a8cfb49f868944037f073c
60 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
033ed0bb82 |
feat: add admin-configurable chat title generation model (#24838)
Adds an admin-configurable deployment-wide setting that controls which
model is used for chat title generation. Admins can pick any enabled
chat model config from the Agents settings page, or leave the setting
unset to keep the existing fast-models-then-chat-model fallback
algorithm.
When a model is selected, both automatic and manual title generation use
only that model, with no silent fallback. When the configured model is
disabled, missing credentials, or otherwise unusable, automatic title
generation skips entirely (best-effort) and manual title regeneration
returns a clear error, so admins notice the misconfiguration instead of
silently routing title traffic through another provider.
## Surface
- New deployment-wide setting stored as a `site_configs` row
(`agents_chat_title_generation_model_override`).
- New experimental endpoint `GET/PUT
/api/experimental/chats/config/model-override/{context}`.
- Frontend: title generation now appears as a third dropdown on the
Agents admin settings page alongside the existing general and explore
context overrides.
## DRY refactors folded in
Title generation is integrated as a third value of the existing
`ChatModelOverrideContext` type alongside `general` and `explore`,
sharing the parameterized HTTP route, SDK methods, generated types, and
frontend API plumbing rather than introducing a parallel surface. The
`Agent` prefix was dropped from the type and route since title
generation is not a delegated agent.
The chatd model-override resolver is also shared.
`resolveConfiguredModelOverride` now takes a `failureMode` parameter:
- Subagent overrides use soft failure: misconfigured overrides are
logged and the parent model is used.
- Title generation uses hard failure: misconfigured overrides return an
explicit error so manual title regeneration surfaces the
misconfiguration and automatic title generation skips instead of
silently falling back.
> Mux is acting on Mike's behalf.
|
||
|
|
04cc983833 |
fix: add preset support to MCP tools (#24694)
The chat tools (`read_template`, `create_workspace`) did not surface or respect template version presets. Presets were invisible to the LLM and preset parameter defaults were never applied at workspace creation. The `toolsdk` MCP surface had the same gap (ref #24695, now subsumed here). ## What this changes - **`read_template`** returns presets with `id`, `name`, `default`, `description`, `icon`, `parameters`, and `desired_prebuild_instances` (when set), so the LLM can pick the right preset and prefer prebuilt-backed ones. - **`create_workspace`** accepts a `preset_id`. The wsbuilder applies preset parameter defaults and may claim a prebuilt workspace. - **`start_workspace`** does *not* accept a preset. Presets are a creation-time choice; subsequent starts use the workspace's existing version and parameters. Users who need a specific preset or version on an existing chat can create the workspace out-of-band (CLI / UI / API) with the desired configuration and attach the chat to it. - **`toolsdk`** gains `GetTemplate` (with presets including `desired_prebuild_instances`), preset support on `CreateWorkspace`, and preset + `rich_parameters` support on `CreateWorkspaceBuild`. The `template_version_preset_id` description warns about preset/version affinity. > 🤖 Generated with [Coder Agents](https://coder.com/agents) and reviewed by a human. Co-authored-by: Max schwenk <maschwenk@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
17409a515c |
feat(coderd): wire advisor runtime to admin config (#24622)
## Summary Wire the advisor runtime into `chatd`: read the admin config on every `runChat`, gate tool registration and system-prompt guidance on a **single eligibility boolean**, register the `advisor` built-in tool, and apply the exclusive-tool policy from PR 1. ## Motivation This is the integration seam where PRs 1–3 come together into an actual user-visible feature. Gating is deliberately root-chat-only for the initial rollout; child/sub-agent chats still do not see the tool or the guidance block. ## Changes ### `coderd/x/chatd/chatd.go` - `loadAdvisorConfig(ctx, logger)` reads the admin config (from PR 3) on each run. If `ModelConfigID` is set, it resolves the override model via `configCache.ModelConfigByID`; otherwise it falls back to the outer chat's model and provider options. Reasoning effort is plumbed into provider options via `applyAdvisorReasoningEffort`. - One computed `advisorEligible` boolean drives **both** tool registration (after skill tools, before MCP tools) and guidance injection via `chatprompt.InsertSystem(prompt, chatadvisor.ParentGuidanceBlock)`. - `setAdvisorPromptSnapshot` closures capture the outer prompt state at the right points in the lifecycle (`renderPlanPathPrompt`, `ReloadMessages`, `PrepareMessages`) so the advisor handoff uses the same context the outer model saw. - `ExclusiveToolNames["advisor"] = true` is passed to `chatloop.Run()` so mixed batches are rejected cleanly (PR 1 machinery). - `builtinToolNames["advisor"] = true` so metrics keep advisor distinct from the generic `mcp` label. ### Child-chat guard - Child/sub-agent chats deliberately do not see the advisor tool or guidance block, to avoid recursion/cost blowups until the pattern is proven. This is covered by `TestAdvisorGating_ChildChat` (currently skipped pending a rewrite against the new `plan`/`explore` subagent infrastructure; core gating logic is still exercised by `TestAdvisorGating_Disabled` and `TestAdvisorGating_RootChat`). ## Stack context This is **PR 4 of 6** in the advisor feature stack. It depends on PRs 1–3. ## Scope / non-goals - No frontend changes. The feature is invocable via the backend but renders generically until PR 5. - No separate provider runner; the nested advisor call reuses the existing model/provider path. - No DB migration. ## Validation - `go test ./coderd/x/chatd/... -run TestAdvisor` - `go build ./...` - `make lint` --- <details> <summary>📋 Implementation Plan (shared across the advisor stack)</summary> # Plan: Add a Mux-style advisor tool to coder agents/chatd ## Outcome Add a first-class `advisor` tool to agent chats in `coderd/x/chatd` that feels native to Coder: - it is a built-in server-side tool, not an MCP/dynamic-tool workaround; - it performs a nested **tool-less** model call for strategic advice; - it is exposed only when eligible, and the prompt mentions it only when it is actually available; - it is treated as a **planning-only** tool so it does not run alongside action tools in the same batch; - it tracks usage/cost separately enough for operators to reason about it; - it has a minimally polished UI in the Agents page; - and it ships with explicit dogfooding evidence, including screenshots and repro videos. ## Design decisions to lock before coding 1. **Primary architecture:** native built-in tool in `chattool/`, backed by a small `chatadvisor` package. 2. **Nested model execution:** reuse chatd's existing model/provider stack for a one-step, tool-less advisor call rather than inventing a new provider pathway. 3. **Execution policy:** treat `advisor` as an exclusive/planning-only tool; mixed batches must return structured policy errors and force the model to retry cleanly. 4. **Availability:** initial rollout is for root agent chats only; disable for child/sub-agent chats until recursion/cost policy is proven. 5. **Prompt sync:** use one eligibility boolean to drive both tool registration and advisor guidance injection. 6. **Persistence/cost split:** MVP should keep advisor usage visible in result metadata and server metrics; only add DB schema if product/billing explicitly needs queryable advisor-specific cost. 7. **UI scope:** generic tool rendering is an acceptable temporary milestone during backend bring-up, but the release candidate should include a dedicated lightweight advisor renderer. ## Delivery model The work should be executed as coordinated workstreams with one integration owner and parallel contributors for low-conflict areas. The integration owner should own `coderd/x/chatd/chatd.go` because prompt assembly, tool registration, and model resolution all converge there. ## Detailed workstreams ### Repo evidence used for this plan <details> <summary>Mux reference and current chatd seams</summary> **Mux reference implementation** - `src/node/services/tools/advisor.ts` — native advisor tool implementation. - `src/common/constants/advisor.ts` — advisor prompt/constants and truncation policy. - `src/common/utils/tools/tools.ts` — conditional tool registration. - `src/node/services/streamContextBuilder.ts` — injects advisor guidance only when the tool is available. **Current chatd seams** - `coderd/x/chatd/chatd.go` - `processChat()` — tool assembly, prompt assembly, and chatloop invocation. - `resolveChatModel()` — current model/provider/key resolution seam. - `type Config struct` — server-level chatd configuration surface. - `coderd/x/chatd/chatloop/chatloop.go` - `Run()` — main streaming/model loop. - `executeTools()` — built-in tool execution/batching seam. - `coderd/x/chatd/chattool/` — built-in tool implementations. - `site/src/pages/AgentsPage/components/ChatElements/tools/Tool.tsx` — tool renderer dispatch. - `site/src/pages/AgentsPage/components/ChatConversation/messageParsing.ts` and `ConversationTimeline.tsx` — tool/result merge and rendering flow. </details> ### Workstream map and ownership | Workstream | Primary owner | Main files | Can run in parallel? | Done when | |---|---|---|---|---| | 0. Integration + gating | Integration lead | `coderd/x/chatd/chatd.go` | No; central merge lane | Tool registration, prompt sync, and model selection are wired together | | 1. Advisor runtime + tool | Backend agent | new `coderd/x/chatd/chatadvisor/`, new `coderd/x/chatd/chattool/advisor.go` | Yes | Tool can perform a tool-less advisor call in memory and return structured results | | 2. Planning-only execution policy | Chatloop agent | `coderd/x/chatd/chatloop/chatloop.go`, related tests | Yes | Mixed `advisor` + action-tool batches are rejected cleanly and deterministically | | 3. Metrics/usage/config | Backend/telemetry agent | `chatd.go`, `chatloop/metrics.go`, optional config plumbing | Partially; coordinate with integration lead | Advisor usage is separately visible in metadata/metrics and limits are enforced | | 4. Frontend rendering | Frontend agent | `site/.../tools/Tool.tsx`, new `AdvisorTool.tsx`, stories | Yes after result schema stabilizes | Advisor renders as a readable card and story tests pass | | 5. Dogfood + QA evidence | QA agent | dev server, Storybook, dogfood output | After backend + UI are usable | Repro videos, screenshots, and a concise QA report exist | ### Parallelization rules - **Do not split `coderd/x/chatd/chatd.go` across multiple execution agents without an integration lead.** That file owns prompt building, tool registration, model resolution, and cost persistence. - Workstreams 1 and 2 can be developed in parallel and then stacked onto the integration branch. - Workstream 4 should begin once the backend result schema is agreed on, even if the backend is still behind a feature flag. - Any agent that needs to re-check Mux behavior should clone `coder/mux` into a temporary directory (for example, `$(mktemp -d)/mux`) and inspect it read-only; do not vendor or copy code from Mux directly. ## Phase 0 — Preflight and guardrails ### Goals - Align the team on the smallest shippable architecture. - Prevent scope creep into MCP/dynamic-tool/sub-agent variants. - Decide upfront what is MVP vs. follow-up. ### Tasks 1. **Confirm the MVP boundary.** - Ship a built-in advisor tool first. - Do **not** make MCP, dynamic tools, or sub-agents the primary implementation. - Do **not** add transient streaming phases in the first backend PR unless they fall out almost for free. 2. **Confirm local workflow hygiene before coding.** - Ensure the repo is using the project git hooks from `scripts/githooks`. - Do not bypass hooks with `--no-verify`. - Use `./scripts/develop.sh` for the full dev server rather than manual build/run commands. 3. **Lock the model-selection policy.** - **Recommended MVP:** advisor uses the same resolved provider/model/cost config as the current chat, with advisor-specific max-output and usage caps. - **Follow-up only if required:** add a separate `AdvisorModelConfigID`-style override that resolves through the existing `configCache`/model-config path. Do not invent a new free-form `provider:model` parser if chatd already stores provider/model separately. 4. **Lock the persistence policy.** - **Recommended MVP:** no DB migration. Persist advisor-visible metadata in the tool result and record separate metrics in memory/Prometheus. - **Only if product/billing explicitly asks for queryable advisor cost:** add a later DB migration or usage table, following the normal `queries/*.sql` + `make gen` workflow. 5. **Create an execution ADR note in the work item or tracking doc.** - Capture: built-in tool, tool-less nested call, root-chat-only rollout, exclusive execution policy, MVP no-DB-migration default. ### Quality gate - Everyone on the team can state the same answers to these questions: - Is advisor a built-in tool? **Yes.** - Can advisor run with action tools in the same batch? **No.** - Does advisor get tools of its own? **No.** - Is a DB migration required for MVP? **No, unless billing insists.** ## Phase 1 — Build the advisor runtime and tool wrapper ### Goals Create the core advisor implementation in a way that is easy to test and keeps `chattool/` thin. ### Files to add - `coderd/x/chatd/chatadvisor/types.go` - `coderd/x/chatd/chatadvisor/guidance.go` - `coderd/x/chatd/chatadvisor/handoff.go` - `coderd/x/chatd/chatadvisor/runtime.go` - `coderd/x/chatd/chatadvisor/runner.go` - `coderd/x/chatd/chattool/advisor.go` ### Responsibilities by file 1. **`types.go`** - Define the input/result schema used by the tool and UI. - Keep the result shape close to Mux so the UI and model both have predictable cases. - Recommended result variants: - `advice` - `limit_reached` - `error` Recommended shape: ```go type AdvisorArgs struct { Question string `json:"question"` } type AdvisorResult struct { Type string `json:"type"` Advice string `json:"advice,omitempty"` Error string `json:"error,omitempty"` AdvisorModel string `json:"advisor_model,omitempty"` RemainingUses int `json:"remaining_uses,omitempty"` Usage *AdvisorUsageResult `json:"usage,omitempty"` } ``` 2. **`guidance.go`** - Hold two strings: - the nested advisor system prompt; - the parent-agent guidance block to inject into the outer system prompt. - The nested advisor prompt must say, in plain language: - you are advising the parent agent; - you do not address the end user directly; - you do not claim actions happened; - you return concise strategic guidance and tradeoffs. 3. **`runtime.go`** - Define the per-run runtime state. - Recommended fields: - resolved model + model config; - provider keys/options reused from the outer chat; - `MaxUsesPerRun`; - `MaxOutputTokens`; - atomic/current call counter; - callback(s) to obtain the current prompt snapshot and current-step snapshot; - optional metrics/usage hook. - Add fail-fast validation for impossible config: nil model, non-positive limits, empty prompt builders, etc. 4. **`handoff.go`** - Build the advisor handoff message from: - the explicit question; - the exact prompt/messages the parent model just used; - the current step's text/reasoning snapshot, if available; - the most recent relevant tool outputs, if they are already in the prompt snapshot. - **Important:** use the already-prepared outer prompt tail, not a fresh DB reload. That keeps the advisor aligned with compaction and the exact context the outer model saw. - Apply hard truncation budgets with recent-context bias. 5. **`runner.go`** - Execute the nested advisor call. - **Recommended implementation:** call `chatloop.Run()` in an in-memory, one-step mode: - `Tools: nil` - `ProviderTools: nil` - `MaxSteps: 1` - `PersistStep`: capture the assistant output in memory instead of writing DB rows - Reuse the existing provider/model/cost path instead of building a second provider runner. - Assert that no tool definitions are passed to the nested call. 6. **`chattool/advisor.go`** - Keep this file thin and consistent with other built-ins. - Responsibilities: - decode `AdvisorArgs`; - validate `Question` is non-empty and bounded; - call the `chatadvisor` runner; - return a structured tool response. ### Defensive programming requirements - Assert `Question` is non-empty after trimming. - Assert runtime limits are positive. - Assert the nested advisor call runs with zero tools/provider tools. - Assert `AdvisorResult.Type` is one of the known variants before returning. - Assert remaining uses never goes negative. ### Acceptance criteria - A unit test can call the advisor tool with a fake model and receive a stable `advice` result. - The nested advisor call is impossible to run with tools accidentally attached. - The core logic lives in `chatadvisor/`, not embedded inside `chatd.go`. ## Phase 2 — Wire advisor into chatd and keep prompt/tool availability in sync ### Goals Register the tool in the right place, expose it only when eligible, and inject system guidance only when the tool is present. ### Files to modify - `coderd/x/chatd/chatd.go` - optionally a small helper file if `chatd.go` becomes too crowded ### Tasks 1. **Compute one eligibility boolean in `processChat()`.** Recommended inputs: - server-level advisor enabled flag; - root chat only (`chat.ParentChatID == uuid.Nil` or equivalent existing root/child check); - a usable resolved model/provider exists; - optional experiment/workspace/org gate if product wants staged rollout. 2. **Create the runtime once per outer chat run.** - Use the model/config/keys resolved by `resolveChatModel()`. - Reuse provider options from the current chat's `ChatModelCallConfig`. - Set `MaxUsesPerRun` and `MaxOutputTokens` from advisor config defaults. 3. **Register the tool in the built-in tool block.** - Insert after the skill tools and before MCP tools in `processChat()`. - Record `builtinToolNames["advisor"] = true` so metrics stay bounded. 4. **Inject advisor guidance into the outer system prompt using the same boolean.** - Use `chatprompt.InsertSystem()` in the same prompt assembly path that already injects user/system instructions. - Place the block near the existing instruction insertion, before plan-path/skill context blocks. - Wrap the guidance in an explicit tag like `<advisor-guidance>` so it is easy to spot in tests and future refactors. 5. **Keep advisor out of child chats for the first release.** - That avoids recursion/cost blowups with `spawn_agent` / `wait_agent` flows. - Document this explicitly in the rollout notes and tests. ### Acceptance criteria - If advisor is disabled, neither the tool nor the prompt guidance appears. - If advisor is enabled, both the tool and the prompt guidance appear. - Root chats can use advisor; child chats cannot. - Built-in tool names include `advisor` so metrics do not collapse it into the generic `mcp` label. ## Phase 3 — Enforce planning-only execution policy in `chatloop` ### Goals Prevent the model from calling `advisor` and action tools in the same execution batch. ### Files to modify - `coderd/x/chatd/chatloop/chatloop.go` - related chatloop tests ### Recommended implementation Keep the MVP small; do **not** build a general policy engine yet. 1. Add a minimal field to `chatloop.RunOptions`, for example: ```go ExclusiveToolName *string ``` 2. In `Run()` / `executeTools()`, detect the case where the exclusive tool appears in the same local-tool batch as any other locally executed tool. 3. When that happens, synthesize structured tool-result errors for the affected calls instead of executing anything in the batch. - `advisor` should receive a clear error like: _advisor must be called by itself before action tools_. - The sibling action tools should receive a paired policy error like: _this tool was skipped because advisor must run alone_. 4. Let the outer model see those tool errors and retry cleanly. - This is simpler and safer than partial execution or hidden deferral. - It preserves deterministic transcript history for debugging. 5. Pass the just-finished step snapshot into the tool execution context. - The advisor runtime should be able to see the current step's text/reasoning content, because that is often the best hint about what the outer model is trying to decide. ### Why this is the right fit - It matches the intended semantics: advisor is consulted **before** taking action. - It avoids subtle race conditions caused by concurrent built-in tool execution. - It keeps the behavior easy to test with fake models. ### Acceptance criteria - A model-emitted batch containing only `advisor` succeeds. - A model-emitted batch containing `advisor` plus any other locally executed tool returns deterministic policy errors and executes nothing. - Non-advisor tool execution stays unchanged for normal chats. ## Phase 4 — Usage limits, metrics, and configuration ### Goals Make advisor safe to operate without over-designing billing/storage in the first release. ### Files to modify - `coderd/x/chatd/chatd.go` - `coderd/x/chatd/chatloop/metrics.go` as needed - `coderd/x/chatd/chatd.go` `Config` struct and constructor path - optional follow-up config/db files only if a separate advisor model or persistent billing is required ### Tasks 1. **Add explicit server config knobs for MVP.** Recommended fields on `chatd.Config` or a nested advisor config struct: - `AdvisorEnabled bool` - `AdvisorMaxUsesPerRun int` - `AdvisorMaxOutputTokens int64` 2. **Track usage per outer run.** - Reset the counter for each `processChat()` invocation. - Return `remaining_uses` in the tool result. - Return `limit_reached` when the cap is exhausted. 3. **Expose advisor usage metadata in the tool result.** - Include model name and token/cost summary if available. - Use the same `callConfig.Cost` calculation path as the outer chat for MVP if advisor reuses the same model. 4. **Record server-side metrics.** - Count advisor invocations, failures, and latency. - Ensure they show up under the built-in tool label `advisor`. 5. **Optional decision gate: separate advisor model.** - If product insists on a stronger/different advisor model, add a follow-up config hook that resolves another existing chat model config through the same `configCache` path. - Keep that out of the first landing PR unless it is required for acceptance. 6. **Optional decision gate: queryable advisor cost.** - If this becomes required, spin a follow-up DB task: - update `coderd/database/queries/*.sql`; - add migration files; - run `make gen`; - update audit mappings if a new auditable type/field is introduced. ### Acceptance criteria - Advisor calls are capped per outer run. - Limit exhaustion is user-visible in the tool result. - Metrics distinguish advisor calls from other built-in tools. - MVP does not require a schema migration unless explicitly approved. ## Phase 5 — Frontend rendering and Storybook coverage ### Goals Make advisor feel intentional in the Agents UI without blocking the backend on fancy streaming UI. ### Files to modify - `site/src/pages/AgentsPage/components/ChatElements/tools/Tool.tsx` - new `site/src/pages/AgentsPage/components/ChatElements/tools/AdvisorTool.tsx` - Storybook story file(s) in the same tools directory ### Delivery strategy 1. **Intermediate milestone during backend bring-up:** rely on the existing generic tool renderer if needed. - This is acceptable only as a short-lived integration checkpoint. 2. **Release milestone:** add a dedicated lightweight `AdvisorTool` renderer. - Reuse existing primitives: - `ToolCollapsible` - `ToolIcon` - `Response` for markdown/prose rendering - `ScrollArea` if the advice can be long - Keep styling light and consistent with the Agents page. - Do not add unnecessary React memoization in `site/src/pages/AgentsPage/`; that area is already React-Compiler aware. 3. **Render the structured result states cleanly.** - `advice` — readable prose/markdown with optional metadata footer. - `limit_reached` — warning-style message. - `error` — error state with visible fallback text. - `running` — existing tool loading state/spinner is enough for MVP. 4. **Add Storybook coverage instead of ad-hoc component tests.** Recommended stories: - successful advice; - running/loading; - limit reached; - error. 5. **Keep the UI contract narrow.** - Prefer one text field like `advice` plus small metadata rather than a deeply nested schema. - That keeps the UI resilient to prompt iteration. ### Acceptance criteria - The advisor tool card renders readable content rather than raw quoted JSON in the final release branch. - Running, limit, and error states are visibly distinct. - Storybook stories and play assertions cover the new states. - Existing tool rendering flows remain unchanged. ## Phase 6 — Automated tests and validation gates ### Backend tests to add 1. **Advisor runtime/tool tests** - question validation; - tool-less nested execution assertion; - success result shaping; - limit-reached result shaping; - error result shaping. 2. **Prompt/gating tests in chatd** - advisor disabled ⇒ no tool, no guidance; - advisor enabled/root chat ⇒ tool + guidance; - child chat ⇒ advisor absent. 3. **Chatloop policy tests** - advisor alone runs; - advisor + action tool mixed batch returns deterministic policy errors; - non-advisor tools still execute normally. 4. **Usage/metrics tests** - per-run cap resets correctly; - builtin tool labeling includes `advisor`; - returned metadata includes model/usage summary when available. ### Frontend tests to add - Storybook `play()` assertions for the advisor renderer states. - Verify expand/collapse behavior and visible fallback text. - Verify the message timeline still renders adjacent tools correctly. ### Recommended command sequence Run these as the implementation matures, not only at the end: 1. Backend-focused gate after phases 1–4: - `make test RUN=TestAdvisor` - `make test RUN=TestChatloopAdvisor` - `make lint` 2. Frontend-focused gate after phase 5: - `pnpm test:storybook src/pages/AgentsPage/components/ChatElements/tools/AdvisorTool.stories.tsx` - `pnpm lint` - `pnpm format` 3. Final repo gate before handoff: - `make pre-commit` - run any additional targeted `make test RUN=...` selections covering touched chatd paths > Use the exact new test names the implementing agents create; the names above are recommended anchors, not existing tests. ## Dogfooding plan ### Principle Dogfood the change as a real agent feature, not just a unit-tested backend. Per the dogfood and `agent-browser` skills, the reviewer should get **watchable repro videos** plus screenshots that make the behavior obvious without reading logs. ### Required setup 1. Start the full dev environment with: - `./scripts/develop.sh` 2. If the frontend renderer changes, also start Storybook from `site/` with: - `pnpm storybook --no-open` 3. Use `agent-browser` directly — **never `npx agent-browser`**. 4. Use named browser sessions and an output folder such as: - `./dogfood-output/advisor/` - with subfolders `screenshots/` and `videos/` ### Evidence protocol For every interactive scenario below: 1. Start video recording **before** the action. 2. Capture step-by-step screenshots at human pace. 3. Capture one annotated screenshot of the final state. 4. Stop the recording. 5. Note the exact pass/fail observation in the QA report. For static UI states (for example Storybook error/limit cards), an annotated screenshot is sufficient; video is optional but still encouraged by this project’s review preference. ### Dogfood scenarios #### Scenario A — Happy path in the real Agents UI **Goal:** prove that a root agent chat can invoke advisor and produce a readable recommendation before taking further action. Steps: 1. Open the Agents page with an advisor-enabled root chat. 2. Start a repro video. 3. Send a prompt that should reasonably trigger strategic planning, such as an architecture or multi-tradeoff question. 4. Capture screenshots of: - the prompt before send; - the running advisor state; - the completed advisor card and the assistant’s follow-up response. 5. Stop recording. Pass criteria: - advisor appears in the timeline; - the rendered result is readable; - the assistant can continue after consuming the advisor output. #### Scenario B — Advisor unavailable path **Goal:** prove the feature is truly gated. Suggested variants (at least one is required, both are better): - feature flag/config off; - child/sub-agent chat. Evidence: - annotated screenshot of the chat/tool state showing advisor is absent; - short video if toggling the gate live is part of the repro. Pass criteria: - no advisor tool is available; - no advisor-specific prompt behavior leaks through. #### Scenario C — UI states in Storybook **Goal:** prove the renderer handles non-happy states cleanly. Required story states: - success/advice; - running; - limit reached; - error. Evidence: - one screenshot per state; - at least one short video showing collapse/expand behavior. Pass criteria: - success renders readable advice; - limit/error have visible fallback text; - the component behaves like the other tool cards. #### Scenario D — Regression sweep of nearby tools **Goal:** ensure advisor does not break the surrounding chat timeline. Check at minimum: - another existing built-in tool still renders correctly near advisor; - sub-agent/tool cards still expand/collapse normally; - no obvious console errors appear in the Agents page during the advisor flow. Evidence: - screenshots of adjacent tool cards; - console/error capture if anything suspicious appears. ### `agent-browser` usage notes for the QA agent - Prefer `agent-browser batch` for 2+ sequential commands when no intermediate parsing is needed. - Use `snapshot -i` to discover interactive refs. - Re-snapshot after navigation or major DOM changes. - Avoid `wait --load networkidle` unless the page is known to go idle; prefer explicit element/text waits or short fixed waits. - Record videos at human pace and include pauses that a reviewer can follow. ## Rollout plan ### Initial rollout - Gate behind a server-side advisor-enabled flag. - Enable only for selected internal/root agent chats first. - Watch metrics for: - invocation count; - failure rate; - latency; - obvious retry loops. ### Expansion conditions Expand beyond the initial rollout only after the following are true: - mixed-batch policy behavior is stable; - cost impact is understood; - frontend UX is readable in production-like dogfood; - no recursion surprises have appeared with sub-agent flows. ### Explicit non-goals for the first release - advisor inside child/sub-agent chats; - provider-agnostic streaming phase UI; - MCP-based external advisor implementation; - mandatory DB-backed advisor cost reporting. ## Final acceptance checklist - [ ] `advisor` is a built-in chatd tool, not an MCP/dynamic-tool substitute. - [ ] The nested advisor call is tool-less and bounded to one in-memory step. - [ ] One eligibility boolean controls both tool registration and prompt guidance injection. - [ ] Root chats can use advisor; child chats cannot in the initial rollout. - [ ] Mixed advisor/action batches produce deterministic policy errors instead of partial execution. - [ ] Per-run usage caps and limit-reached behavior work. - [ ] Advisor usage is visible in metadata/metrics without forcing a DB migration for MVP. - [ ] The Agents UI has a readable advisor card and Storybook coverage. - [ ] Dogfooding produced screenshots and repro videos for the required scenarios. - [ ] Validation commands (`make lint`, targeted `make test`, Storybook tests, `make pre-commit`) passed before handoff. ## Suggested PR split 1. **PR 1 — Backend foundation** - `chatadvisor/` package - `chattool/advisor.go` - `chatloop` exclusive policy - chatd gating/prompt sync - backend tests 2. **PR 2 — Frontend + QA** - advisor renderer - stories/play assertions - dogfood artifacts and QA notes 3. **PR 3 — Optional follow-ups only if demanded by stakeholders** - separate advisor model override - persistent advisor billing/queryability - transient phase-stream UX </details> --- _Generated with [`mux`](https://github.com/coder/mux) • Model: `anthropic:claude-opus-4-7` • Thinking: `max`_ |
||
|
|
06bad73df4 |
feat: add admin-configurable advisor API, SDK, and queries (#24621)
## Summary
Add the **admin-configurable advisor configuration**: database-backed storage, SDK types, and the experimental HTTP handlers that back the admin settings UI (later PRs). Follows the same "site-configs" pattern as Virtual Desktop.
## Motivation
The advisor needs runtime-tunable knobs (enable/disable, per-run cap, max output tokens, reasoning effort, optional model override) without a service restart or redeploy. Using the existing `site_configs` K/V table keeps this pattern consistent with other admin features and avoids a bespoke schema.
## Changes
### Database (`coderd/database/queries/siteconfig.sql`)
- `GetChatAdvisorConfig` returns the stored JSON blob (default `'{}'`) under key `agents_advisor_config`.
- `UpsertChatAdvisorConfig` uses the standard `INSERT ... ON CONFLICT` pattern.
- Regenerated via `make gen` (queries.sql.go + mocks).
### SDK (`codersdk/chats.go`)
- `AdvisorConfig` type with `Enabled`, `MaxUsesPerRun`, `MaxOutputTokens`, `ReasoningEffort` (`""` / `low` / `medium` / `high`), `ModelConfigID uuid.UUID`.
- Client methods: `ChatAdvisorConfig(ctx)` / `UpdateChatAdvisorConfig(ctx, cfg)`.
### API (`coderd/exp_chats.go`)
- `GET /api/experimental/chats/config/advisor`: reads current config; relies on `ActorFromContext` validation.
- `PUT /api/experimental/chats/config/advisor`: requires `policy.ActionUpdate` on `rbac.ResourceDeploymentConfig`.
- Handlers unmarshal `{}` to a typed zero value and re-marshal on upsert for schema stability.
- Tests in `exp_chats_test.go` cover empty defaults, round-trip update, unauthorized update, and invalid body.
## Stack context
This is **PR 3 of 6** in the advisor feature stack. Consumed by:
- PR 4 (`feat/advisor-04-chatd-runtime`), which reads this config on every `runChat`.
- PR 6 (`feat/advisor-06-admin-settings-ui`), which renders the admin form.
## Scope / non-goals
- No `chatd` read path (lands in PR 4).
- No UI (lands in PR 6).
- `agents_advisor_config` remains a single-row JSON blob; we intentionally do not shard per-org/per-template yet.
## Validation
- `make gen`
- `go test ./coderd/database/... -run TestChatAdvisor`
- `go test ./coderd/... -run TestChatAdvisorConfig`
- `make lint`
---
<details>
<summary>📋 Implementation Plan (shared across the advisor stack)</summary>
# Plan: Add a Mux-style advisor tool to coder agents/chatd
## Outcome
Add a first-class `advisor` tool to agent chats in `coderd/x/chatd` that feels native to Coder:
- it is a built-in server-side tool, not an MCP/dynamic-tool workaround;
- it performs a nested **tool-less** model call for strategic advice;
- it is exposed only when eligible, and the prompt mentions it only when it is actually available;
- it is treated as a **planning-only** tool so it does not run alongside action tools in the same batch;
- it tracks usage/cost separately enough for operators to reason about it;
- it has a minimally polished UI in the Agents page;
- and it ships with explicit dogfooding evidence, including screenshots and repro videos.
## Design decisions to lock before coding
1. **Primary architecture:** native built-in tool in `chattool/`, backed by a small `chatadvisor` package.
2. **Nested model execution:** reuse chatd's existing model/provider stack for a one-step, tool-less advisor call rather than inventing a new provider pathway.
3. **Execution policy:** treat `advisor` as an exclusive/planning-only tool; mixed batches must return structured policy errors and force the model to retry cleanly.
4. **Availability:** initial rollout is for root agent chats only; disable for child/sub-agent chats until recursion/cost policy is proven.
5. **Prompt sync:** use one eligibility boolean to drive both tool registration and advisor guidance injection.
6. **Persistence/cost split:** MVP should keep advisor usage visible in result metadata and server metrics; only add DB schema if product/billing explicitly needs queryable advisor-specific cost.
7. **UI scope:** generic tool rendering is an acceptable temporary milestone during backend bring-up, but the release candidate should include a dedicated lightweight advisor renderer.
## Delivery model
The work should be executed as coordinated workstreams with one integration owner and parallel contributors for low-conflict areas. The integration owner should own `coderd/x/chatd/chatd.go` because prompt assembly, tool registration, and model resolution all converge there.
## Detailed workstreams
### Repo evidence used for this plan
<details>
<summary>Mux reference and current chatd seams</summary>
**Mux reference implementation**
- `src/node/services/tools/advisor.ts` — native advisor tool implementation.
- `src/common/constants/advisor.ts` — advisor prompt/constants and truncation policy.
- `src/common/utils/tools/tools.ts` — conditional tool registration.
- `src/node/services/streamContextBuilder.ts` — injects advisor guidance only when the tool is available.
**Current chatd seams**
- `coderd/x/chatd/chatd.go`
- `processChat()` — tool assembly, prompt assembly, and chatloop invocation.
- `resolveChatModel()` — current model/provider/key resolution seam.
- `type Config struct` — server-level chatd configuration surface.
- `coderd/x/chatd/chatloop/chatloop.go`
- `Run()` — main streaming/model loop.
- `executeTools()` — built-in tool execution/batching seam.
- `coderd/x/chatd/chattool/` — built-in tool implementations.
- `site/src/pages/AgentsPage/components/ChatElements/tools/Tool.tsx` — tool renderer dispatch.
- `site/src/pages/AgentsPage/components/ChatConversation/messageParsing.ts` and `ConversationTimeline.tsx` — tool/result merge and rendering flow.
</details>
### Workstream map and ownership
| Workstream | Primary owner | Main files | Can run in parallel? | Done when |
|---|---|---|---|---|
| 0. Integration + gating | Integration lead | `coderd/x/chatd/chatd.go` | No; central merge lane | Tool registration, prompt sync, and model selection are wired together |
| 1. Advisor runtime + tool | Backend agent | new `coderd/x/chatd/chatadvisor/`, new `coderd/x/chatd/chattool/advisor.go` | Yes | Tool can perform a tool-less advisor call in memory and return structured results |
| 2. Planning-only execution policy | Chatloop agent | `coderd/x/chatd/chatloop/chatloop.go`, related tests | Yes | Mixed `advisor` + action-tool batches are rejected cleanly and deterministically |
| 3. Metrics/usage/config | Backend/telemetry agent | `chatd.go`, `chatloop/metrics.go`, optional config plumbing | Partially; coordinate with integration lead | Advisor usage is separately visible in metadata/metrics and limits are enforced |
| 4. Frontend rendering | Frontend agent | `site/.../tools/Tool.tsx`, new `AdvisorTool.tsx`, stories | Yes after result schema stabilizes | Advisor renders as a readable card and story tests pass |
| 5. Dogfood + QA evidence | QA agent | dev server, Storybook, dogfood output | After backend + UI are usable | Repro videos, screenshots, and a concise QA report exist |
### Parallelization rules
- **Do not split `coderd/x/chatd/chatd.go` across multiple execution agents without an integration lead.** That file owns prompt building, tool registration, model resolution, and cost persistence.
- Workstreams 1 and 2 can be developed in parallel and then stacked onto the integration branch.
- Workstream 4 should begin once the backend result schema is agreed on, even if the backend is still behind a feature flag.
- Any agent that needs to re-check Mux behavior should clone `coder/mux` into a temporary directory (for example, `$(mktemp -d)/mux`) and inspect it read-only; do not vendor or copy code from Mux directly.
## Phase 0 — Preflight and guardrails
### Goals
- Align the team on the smallest shippable architecture.
- Prevent scope creep into MCP/dynamic-tool/sub-agent variants.
- Decide upfront what is MVP vs. follow-up.
### Tasks
1. **Confirm the MVP boundary.**
- Ship a built-in advisor tool first.
- Do **not** make MCP, dynamic tools, or sub-agents the primary implementation.
- Do **not** add transient streaming phases in the first backend PR unless they fall out almost for free.
2. **Confirm local workflow hygiene before coding.**
- Ensure the repo is using the project git hooks from `scripts/githooks`.
- Do not bypass hooks with `--no-verify`.
- Use `./scripts/develop.sh` for the full dev server rather than manual build/run commands.
3. **Lock the model-selection policy.**
- **Recommended MVP:** advisor uses the same resolved provider/model/cost config as the current chat, with advisor-specific max-output and usage caps.
- **Follow-up only if required:** add a separate `AdvisorModelConfigID`-style override that resolves through the existing `configCache`/model-config path. Do not invent a new free-form `provider:model` parser if chatd already stores provider/model separately.
4. **Lock the persistence policy.**
- **Recommended MVP:** no DB migration. Persist advisor-visible metadata in the tool result and record separate metrics in memory/Prometheus.
- **Only if product/billing explicitly asks for queryable advisor cost:** add a later DB migration or usage table, following the normal `queries/*.sql` + `make gen` workflow.
5. **Create an execution ADR note in the work item or tracking doc.**
- Capture: built-in tool, tool-less nested call, root-chat-only rollout, exclusive execution policy, MVP no-DB-migration default.
### Quality gate
- Everyone on the team can state the same answers to these questions:
- Is advisor a built-in tool? **Yes.**
- Can advisor run with action tools in the same batch? **No.**
- Does advisor get tools of its own? **No.**
- Is a DB migration required for MVP? **No, unless billing insists.**
## Phase 1 — Build the advisor runtime and tool wrapper
### Goals
Create the core advisor implementation in a way that is easy to test and keeps `chattool/` thin.
### Files to add
- `coderd/x/chatd/chatadvisor/types.go`
- `coderd/x/chatd/chatadvisor/guidance.go`
- `coderd/x/chatd/chatadvisor/handoff.go`
- `coderd/x/chatd/chatadvisor/runtime.go`
- `coderd/x/chatd/chatadvisor/runner.go`
- `coderd/x/chatd/chattool/advisor.go`
### Responsibilities by file
1. **`types.go`**
- Define the input/result schema used by the tool and UI.
- Keep the result shape close to Mux so the UI and model both have predictable cases.
- Recommended result variants:
- `advice`
- `limit_reached`
- `error`
Recommended shape:
```go
type AdvisorArgs struct {
Question string `json:"question"`
}
type AdvisorResult struct {
Type string `json:"type"`
Advice string `json:"advice,omitempty"`
Error string `json:"error,omitempty"`
AdvisorModel string `json:"advisor_model,omitempty"`
RemainingUses int `json:"remaining_uses,omitempty"`
Usage *AdvisorUsageResult `json:"usage,omitempty"`
}
```
2. **`guidance.go`**
- Hold two strings:
- the nested advisor system prompt;
- the parent-agent guidance block to inject into the outer system prompt.
- The nested advisor prompt must say, in plain language:
- you are advising the parent agent;
- you do not address the end user directly;
- you do not claim actions happened;
- you return concise strategic guidance and tradeoffs.
3. **`runtime.go`**
- Define the per-run runtime state.
- Recommended fields:
- resolved model + model config;
- provider keys/options reused from the outer chat;
- `MaxUsesPerRun`;
- `MaxOutputTokens`;
- atomic/current call counter;
- callback(s) to obtain the current prompt snapshot and current-step snapshot;
- optional metrics/usage hook.
- Add fail-fast validation for impossible config: nil model, non-positive limits, empty prompt builders, etc.
4. **`handoff.go`**
- Build the advisor handoff message from:
- the explicit question;
- the exact prompt/messages the parent model just used;
- the current step's text/reasoning snapshot, if available;
- the most recent relevant tool outputs, if they are already in the prompt snapshot.
- **Important:** use the already-prepared outer prompt tail, not a fresh DB reload. That keeps the advisor aligned with compaction and the exact context the outer model saw.
- Apply hard truncation budgets with recent-context bias.
5. **`runner.go`**
- Execute the nested advisor call.
- **Recommended implementation:** call `chatloop.Run()` in an in-memory, one-step mode:
- `Tools: nil`
- `ProviderTools: nil`
- `MaxSteps: 1`
- `PersistStep`: capture the assistant output in memory instead of writing DB rows
- Reuse the existing provider/model/cost path instead of building a second provider runner.
- Assert that no tool definitions are passed to the nested call.
6. **`chattool/advisor.go`**
- Keep this file thin and consistent with other built-ins.
- Responsibilities:
- decode `AdvisorArgs`;
- validate `Question` is non-empty and bounded;
- call the `chatadvisor` runner;
- return a structured tool response.
### Defensive programming requirements
- Assert `Question` is non-empty after trimming.
- Assert runtime limits are positive.
- Assert the nested advisor call runs with zero tools/provider tools.
- Assert `AdvisorResult.Type` is one of the known variants before returning.
- Assert remaining uses never goes negative.
### Acceptance criteria
- A unit test can call the advisor tool with a fake model and receive a stable `advice` result.
- The nested advisor call is impossible to run with tools accidentally attached.
- The core logic lives in `chatadvisor/`, not embedded inside `chatd.go`.
## Phase 2 — Wire advisor into chatd and keep prompt/tool availability in sync
### Goals
Register the tool in the right place, expose it only when eligible, and inject system guidance only when the tool is present.
### Files to modify
- `coderd/x/chatd/chatd.go`
- optionally a small helper file if `chatd.go` becomes too crowded
### Tasks
1. **Compute one eligibility boolean in `processChat()`.**
Recommended inputs:
- server-level advisor enabled flag;
- root chat only (`chat.ParentChatID == uuid.Nil` or equivalent existing root/child check);
- a usable resolved model/provider exists;
- optional experiment/workspace/org gate if product wants staged rollout.
2. **Create the runtime once per outer chat run.**
- Use the model/config/keys resolved by `resolveChatModel()`.
- Reuse provider options from the current chat's `ChatModelCallConfig`.
- Set `MaxUsesPerRun` and `MaxOutputTokens` from advisor config defaults.
3. **Register the tool in the built-in tool block.**
- Insert after the skill tools and before MCP tools in `processChat()`.
- Record `builtinToolNames["advisor"] = true` so metrics stay bounded.
4. **Inject advisor guidance into the outer system prompt using the same boolean.**
- Use `chatprompt.InsertSystem()` in the same prompt assembly path that already injects user/system instructions.
- Place the block near the existing instruction insertion, before plan-path/skill context blocks.
- Wrap the guidance in an explicit tag like `<advisor-guidance>` so it is easy to spot in tests and future refactors.
5. **Keep advisor out of child chats for the first release.**
- That avoids recursion/cost blowups with `spawn_agent` / `wait_agent` flows.
- Document this explicitly in the rollout notes and tests.
### Acceptance criteria
- If advisor is disabled, neither the tool nor the prompt guidance appears.
- If advisor is enabled, both the tool and the prompt guidance appear.
- Root chats can use advisor; child chats cannot.
- Built-in tool names include `advisor` so metrics do not collapse it into the generic `mcp` label.
## Phase 3 — Enforce planning-only execution policy in `chatloop`
### Goals
Prevent the model from calling `advisor` and action tools in the same execution batch.
### Files to modify
- `coderd/x/chatd/chatloop/chatloop.go`
- related chatloop tests
### Recommended implementation
Keep the MVP small; do **not** build a general policy engine yet.
1. Add a minimal field to `chatloop.RunOptions`, for example:
```go
ExclusiveToolName *string
```
2. In `Run()` / `executeTools()`, detect the case where the exclusive tool appears in the same local-tool batch as any other locally executed tool.
3. When that happens, synthesize structured tool-result errors for the affected calls instead of executing anything in the batch.
- `advisor` should receive a clear error like: _advisor must be called by itself before action tools_.
- The sibling action tools should receive a paired policy error like: _this tool was skipped because advisor must run alone_.
4. Let the outer model see those tool errors and retry cleanly.
- This is simpler and safer than partial execution or hidden deferral.
- It preserves deterministic transcript history for debugging.
5. Pass the just-finished step snapshot into the tool execution context.
- The advisor runtime should be able to see the current step's text/reasoning content, because that is often the best hint about what the outer model is trying to decide.
### Why this is the right fit
- It matches the intended semantics: advisor is consulted **before** taking action.
- It avoids subtle race conditions caused by concurrent built-in tool execution.
- It keeps the behavior easy to test with fake models.
### Acceptance criteria
- A model-emitted batch containing only `advisor` succeeds.
- A model-emitted batch containing `advisor` plus any other locally executed tool returns deterministic policy errors and executes nothing.
- Non-advisor tool execution stays unchanged for normal chats.
## Phase 4 — Usage limits, metrics, and configuration
### Goals
Make advisor safe to operate without over-designing billing/storage in the first release.
### Files to modify
- `coderd/x/chatd/chatd.go`
- `coderd/x/chatd/chatloop/metrics.go` as needed
- `coderd/x/chatd/chatd.go` `Config` struct and constructor path
- optional follow-up config/db files only if a separate advisor model or persistent billing is required
### Tasks
1. **Add explicit server config knobs for MVP.**
Recommended fields on `chatd.Config` or a nested advisor config struct:
- `AdvisorEnabled bool`
- `AdvisorMaxUsesPerRun int`
- `AdvisorMaxOutputTokens int64`
2. **Track usage per outer run.**
- Reset the counter for each `processChat()` invocation.
- Return `remaining_uses` in the tool result.
- Return `limit_reached` when the cap is exhausted.
3. **Expose advisor usage metadata in the tool result.**
- Include model name and token/cost summary if available.
- Use the same `callConfig.Cost` calculation path as the outer chat for MVP if advisor reuses the same model.
4. **Record server-side metrics.**
- Count advisor invocations, failures, and latency.
- Ensure they show up under the built-in tool label `advisor`.
5. **Optional decision gate: separate advisor model.**
- If product insists on a stronger/different advisor model, add a follow-up config hook that resolves another existing chat model config through the same `configCache` path.
- Keep that out of the first landing PR unless it is required for acceptance.
6. **Optional decision gate: queryable advisor cost.**
- If this becomes required, spin a follow-up DB task:
- update `coderd/database/queries/*.sql`;
- add migration files;
- run `make gen`;
- update audit mappings if a new auditable type/field is introduced.
### Acceptance criteria
- Advisor calls are capped per outer run.
- Limit exhaustion is user-visible in the tool result.
- Metrics distinguish advisor calls from other built-in tools.
- MVP does not require a schema migration unless explicitly approved.
## Phase 5 — Frontend rendering and Storybook coverage
### Goals
Make advisor feel intentional in the Agents UI without blocking the backend on fancy streaming UI.
### Files to modify
- `site/src/pages/AgentsPage/components/ChatElements/tools/Tool.tsx`
- new `site/src/pages/AgentsPage/components/ChatElements/tools/AdvisorTool.tsx`
- Storybook story file(s) in the same tools directory
### Delivery strategy
1. **Intermediate milestone during backend bring-up:** rely on the existing generic tool renderer if needed.
- This is acceptable only as a short-lived integration checkpoint.
2. **Release milestone:** add a dedicated lightweight `AdvisorTool` renderer.
- Reuse existing primitives:
- `ToolCollapsible`
- `ToolIcon`
- `Response` for markdown/prose rendering
- `ScrollArea` if the advice can be long
- Keep styling light and consistent with the Agents page.
- Do not add unnecessary React memoization in `site/src/pages/AgentsPage/`; that area is already React-Compiler aware.
3. **Render the structured result states cleanly.**
- `advice` — readable prose/markdown with optional metadata footer.
- `limit_reached` — warning-style message.
- `error` — error state with visible fallback text.
- `running` — existing tool loading state/spinner is enough for MVP.
4. **Add Storybook coverage instead of ad-hoc component tests.**
Recommended stories:
- successful advice;
- running/loading;
- limit reached;
- error.
5. **Keep the UI contract narrow.**
- Prefer one text field like `advice` plus small metadata rather than a deeply nested schema.
- That keeps the UI resilient to prompt iteration.
### Acceptance criteria
- The advisor tool card renders readable content rather than raw quoted JSON in the final release branch.
- Running, limit, and error states are visibly distinct.
- Storybook stories and play assertions cover the new states.
- Existing tool rendering flows remain unchanged.
## Phase 6 — Automated tests and validation gates
### Backend tests to add
1. **Advisor runtime/tool tests**
- question validation;
- tool-less nested execution assertion;
- success result shaping;
- limit-reached result shaping;
- error result shaping.
2. **Prompt/gating tests in chatd**
- advisor disabled ⇒ no tool, no guidance;
- advisor enabled/root chat ⇒ tool + guidance;
- child chat ⇒ advisor absent.
3. **Chatloop policy tests**
- advisor alone runs;
- advisor + action tool mixed batch returns deterministic policy errors;
- non-advisor tools still execute normally.
4. **Usage/metrics tests**
- per-run cap resets correctly;
- builtin tool labeling includes `advisor`;
- returned metadata includes model/usage summary when available.
### Frontend tests to add
- Storybook `play()` assertions for the advisor renderer states.
- Verify expand/collapse behavior and visible fallback text.
- Verify the message timeline still renders adjacent tools correctly.
### Recommended command sequence
Run these as the implementation matures, not only at the end:
1. Backend-focused gate after phases 1–4:
- `make test RUN=TestAdvisor`
- `make test RUN=TestChatloopAdvisor`
- `make lint`
2. Frontend-focused gate after phase 5:
- `pnpm test:storybook src/pages/AgentsPage/components/ChatElements/tools/AdvisorTool.stories.tsx`
- `pnpm lint`
- `pnpm format`
3. Final repo gate before handoff:
- `make pre-commit`
- run any additional targeted `make test RUN=...` selections covering touched chatd paths
> Use the exact new test names the implementing agents create; the names above are recommended anchors, not existing tests.
## Dogfooding plan
### Principle
Dogfood the change as a real agent feature, not just a unit-tested backend. Per the dogfood and `agent-browser` skills, the reviewer should get **watchable repro videos** plus screenshots that make the behavior obvious without reading logs.
### Required setup
1. Start the full dev environment with:
- `./scripts/develop.sh`
2. If the frontend renderer changes, also start Storybook from `site/` with:
- `pnpm storybook --no-open`
3. Use `agent-browser` directly — **never `npx agent-browser`**.
4. Use named browser sessions and an output folder such as:
- `./dogfood-output/advisor/`
- with subfolders `screenshots/` and `videos/`
### Evidence protocol
For every interactive scenario below:
1. Start video recording **before** the action.
2. Capture step-by-step screenshots at human pace.
3. Capture one annotated screenshot of the final state.
4. Stop the recording.
5. Note the exact pass/fail observation in the QA report.
For static UI states (for example Storybook error/limit cards), an annotated screenshot is sufficient; video is optional but still encouraged by this project’s review preference.
### Dogfood scenarios
#### Scenario A — Happy path in the real Agents UI
**Goal:** prove that a root agent chat can invoke advisor and produce a readable recommendation before taking further action.
Steps:
1. Open the Agents page with an advisor-enabled root chat.
2. Start a repro video.
3. Send a prompt that should reasonably trigger strategic planning, such as an architecture or multi-tradeoff question.
4. Capture screenshots of:
- the prompt before send;
- the running advisor state;
- the completed advisor card and the assistant’s follow-up response.
5. Stop recording.
Pass criteria:
- advisor appears in the timeline;
- the rendered result is readable;
- the assistant can continue after consuming the advisor output.
#### Scenario B — Advisor unavailable path
**Goal:** prove the feature is truly gated.
Suggested variants (at least one is required, both are better):
- feature flag/config off;
- child/sub-agent chat.
Evidence:
- annotated screenshot of the chat/tool state showing advisor is absent;
- short video if toggling the gate live is part of the repro.
Pass criteria:
- no advisor tool is available;
- no advisor-specific prompt behavior leaks through.
#### Scenario C — UI states in Storybook
**Goal:** prove the renderer handles non-happy states cleanly.
Required story states:
- success/advice;
- running;
- limit reached;
- error.
Evidence:
- one screenshot per state;
- at least one short video showing collapse/expand behavior.
Pass criteria:
- success renders readable advice;
- limit/error have visible fallback text;
- the component behaves like the other tool cards.
#### Scenario D — Regression sweep of nearby tools
**Goal:** ensure advisor does not break the surrounding chat timeline.
Check at minimum:
- another existing built-in tool still renders correctly near advisor;
- sub-agent/tool cards still expand/collapse normally;
- no obvious console errors appear in the Agents page during the advisor flow.
Evidence:
- screenshots of adjacent tool cards;
- console/error capture if anything suspicious appears.
### `agent-browser` usage notes for the QA agent
- Prefer `agent-browser batch` for 2+ sequential commands when no intermediate parsing is needed.
- Use `snapshot -i` to discover interactive refs.
- Re-snapshot after navigation or major DOM changes.
- Avoid `wait --load networkidle` unless the page is known to go idle; prefer explicit element/text waits or short fixed waits.
- Record videos at human pace and include pauses that a reviewer can follow.
## Rollout plan
### Initial rollout
- Gate behind a server-side advisor-enabled flag.
- Enable only for selected internal/root agent chats first.
- Watch metrics for:
- invocation count;
- failure rate;
- latency;
- obvious retry loops.
### Expansion conditions
Expand beyond the initial rollout only after the following are true:
- mixed-batch policy behavior is stable;
- cost impact is understood;
- frontend UX is readable in production-like dogfood;
- no recursion surprises have appeared with sub-agent flows.
### Explicit non-goals for the first release
- advisor inside child/sub-agent chats;
- provider-agnostic streaming phase UI;
- MCP-based external advisor implementation;
- mandatory DB-backed advisor cost reporting.
## Final acceptance checklist
- [ ] `advisor` is a built-in chatd tool, not an MCP/dynamic-tool substitute.
- [ ] The nested advisor call is tool-less and bounded to one in-memory step.
- [ ] One eligibility boolean controls both tool registration and prompt guidance injection.
- [ ] Root chats can use advisor; child chats cannot in the initial rollout.
- [ ] Mixed advisor/action batches produce deterministic policy errors instead of partial execution.
- [ ] Per-run usage caps and limit-reached behavior work.
- [ ] Advisor usage is visible in metadata/metrics without forcing a DB migration for MVP.
- [ ] The Agents UI has a readable advisor card and Storybook coverage.
- [ ] Dogfooding produced screenshots and repro videos for the required scenarios.
- [ ] Validation commands (`make lint`, targeted `make test`, Storybook tests, `make pre-commit`) passed before handoff.
## Suggested PR split
1. **PR 1 — Backend foundation**
- `chatadvisor/` package
- `chattool/advisor.go`
- `chatloop` exclusive policy
- chatd gating/prompt sync
- backend tests
2. **PR 2 — Frontend + QA**
- advisor renderer
- stories/play assertions
- dogfood artifacts and QA notes
3. **PR 3 — Optional follow-ups only if demanded by stakeholders**
- separate advisor model override
- persistent advisor billing/queryability
- transient phase-stream UX
</details>
---
_Generated with [`mux`](https://github.com/coder/mux) • Model: `anthropic:claude-opus-4-7` • Thinking: `max`_
|
||
|
|
5222db86c7 | feat: add after_id pagination for chat messages (#24531) | ||
|
|
2f26903af9 |
feat: add admin UI control for chat auto-archive days (#24704)
Relates to #24642 Adds admin UI controls for managing chat auto-archive (days) under "Lifecycle". Also adds a "Days" label to the right of the pre-existing unitless numeric input for consistency. Exemplary screenshot below. More screens available in Storybook. <img width="847" height="585" alt="Screenshot 2026-04-24 at 16 48 59" src="https://github.com/user-attachments/assets/d38de5f8-d379-4b06-b175-ac399f31e578" /> |
||
|
|
c7cac9debe |
fix: persist per-turn model on chats and queued messages (#24688)
Previously, `chats.last_model_config_id` was not updated when a user sent a mid-chat message with a different model, and queued messages did not store their own per-turn model, so promotion ran against whatever the chat row said at promote time. Chat watch events also did not merge `last_model_config_id` into the site's root, child, and per-chat caches, so sidebar labels stayed stale after direct sends and queued promotions. - Add nullable `chat_queued_messages.model_config_id`, backfilled from `chats.last_model_config_id`. Queued inserts round-trip the effective model id at enqueue time. - In `coderd/x/chatd`, direct sends update `chats.last_model_config_id` inside the same transaction that inserts the admitted user message. Manual promotion and auto-promotion use the queued row's stored `model_config_id`, with a fallback to `chats.last_model_config_id` for legacy NULL rows during rollout. `PromoteQueuedOptions.ModelConfigID` is now ignored. - On the site, extract `mergeWatchedChatSummary` and `mergeWatchedChatIntoCaches` in `site/src/api/queries/chats.ts` so status-change watch events merge `last_model_config_id` into the root infinite chat list, the parent-embedded child entry, and the per-chat `chatKey(chatId)` cache. `updated_at` guards against stale watch payloads clobbering newer cached state, while diff status events still merge their PR metadata because they are timestamped outside the chat row. Watch timestamps are compared as instants so variable fractional precision does not make fresh events look stale. - Queued promotion validates stored model config IDs before admission. Invalid legacy queued IDs fall back to the chat's current model config instead of dropping the queued message during auto-promotion. - Backend and frontend regression coverage added for admission, queue promotion (including FIFO across mixed models, legacy NULL fallback, and invalid queued model IDs), and chat watch cache merging. > Mux is acting on Mike's behalf. |
||
|
|
a876287d36 |
feat: auto-archive inactive chats with audit trail (#24642)
Adds a background job in `dbpurge` that periodically archives chats inactive beyond a configurable threshold. Each archived root chat gets a background audit entry tagged `chat_auto_archive`. Disabled by default. * New `AutoArchiveInactiveChats` SQL query with LATERAL last-activity subquery and partial index on archive candidates * `site_configs`-backed `auto_archive_days` setting with admin-only PUT, any-authenticated-user GET * Cascade archive via `root_chat_id`; pinned chats and active threads exempt * Root-only audit dispatch on detached context, matching manual archive (`patchChat`) behavior * 11 subtests covering disabled no-op, boundary, deleted messages, child activity, pinned exemption, multi-owner, idempotency, and batch pagination PR #24643 adds per-owner digest notifications. PR #24704 adds the requisite UI controls. > 🤖 |
||
|
|
3d90546aae |
feat: add general subagent model override (#24610)
Adds a deployment-wide admin override for general delegated subagents.
## What changed
- store the general override in `site_configs` and expose it through the
shared `agent-model-override/{context}` API
- apply the general override when spawning delegated general subagents,
while preserving the existing Explore override behavior
- reuse a shared Agents settings form for the general and Explore
override sections
## Validation
- `make gen`
- `go test ./coderd -run 'TestChatModelOverrides'`
- `go test ./coderd/x/chatd -run
'TestSpawnAgent_(GeneralUsesConfiguredModelOverride|GeneralOverrideLogsAndFallsBackWhenCredentialsUnavailable|GeneralOverrideLogsAndFallsBackWhenProviderDisabled)'`
- `pnpm -C site lint:types`
- `pnpm -C site test:storybook --
AgentSettingsAgentsPageView.stories.tsx`
- `make lint`
- `make pre-commit`
> Mux is acting on Mike's behalf.
|
||
|
|
c602a31856 |
fix(coderd): reject pinning child chats in patchChat handler (#24669)
The UI already prevents child (delegated/subagent) chats from being
pinned, but the `PATCH /api/experimental/chats/{chat}` endpoint did not
enforce this. A direct API call could pin a child chat.
- Add a `400 Bad Request` guard in `patchChat` when `pinOrder > 0` and
the chat has a `ParentChatID`
- Add `TestChatPinOrder/RejectsChildChat` test
> 🤖
|
||
|
|
b5a625549e |
feat: migrate agents-access to org-scoped system role for proper chat RBAC (#24438)
The agents-access role previously granted chat permissions at user
scope, but chats are org-scoped objects. Rego skips user-level perms
when org_owner is set, making the grants invisible. Handler-level
band-aids used synthetic non-org-scoped objects as a workaround.
- Migrates agents-access from users.rbac_roles (site-level) to
organization_members.roles (org-scoped) via DB migration
- Redefines agents-access as a predefined org-scoped builtin role
alongside organization-admin, organization-auditor, etc., with
Member permissions granting chat create/read/update
- Excludes ResourceChat from OrgMemberPermissions so org membership
alone no longer grants chat access
- Fixes handler Authorize checks to use org-scoped objects with
semantically correct actions (ActionUpdate for message/tool operations)
- Grants org admins the ability to assign agents-access
Closes #24250
Fixes CODAGT-174
Note: this does not update the "Usage" endpoints. Tracked by CODAGT-161.
> 🤖
|
||
|
|
f8fe5d680b |
fix(coderd): reject API operations on archived chats (#24633)
Archived chats accept mutations (messages, edits, queued-message promotions, tool-result submissions) via the API, causing them to re-enter the processing pipeline. This violates the hard-stop design intent from PR #23758. Add archived checks at three layers: - HTTP handlers (postChatMessages, patchChatMessage, promoteChatQueuedMessage, postChatToolResults): return 400 after auth so callers get a clear error. - Daemon functions (SendMessage, EditMessage, PromoteQueued, SubmitToolResults): return ErrChatArchived after row lock, guarding against future callers that bypass the handler. - AcquireChats SQL: filter out archived chats so they are never acquired for processing. Fixes CODAGT-245 |
||
|
|
7904bed947 |
fix: fall back to local git watcher for chat diff drawer (#24512)
The Ctrl+D diff drawer in `coder exp agents` only rendered PR-backed
diffs returned by `/api/experimental/chats/{id}/diff`. Local working
tree changes in a chat's workspace returned an empty diff, so the
drawer showed "No diff contents" with no file summary.
Centralise diff loading behind a single `fetchChatDiffContents` helper
that first hits `/diff`, then falls back to the chat git watcher
WebSocket (`/stream/git`) when the remote diff is empty. Aggregate the
agent's `WorkspaceAgentRepoChanges` into a `ChatDiffContents` value so
the drawer can derive the file summary and styled body from the local
unified diff. Missing workspaces, missing agents, and watcher timeouts
are treated as graceful fallbacks that render the empty-diff
placeholder instead of a hard error.
> Mux is opening this PR on Mike's behalf.
|
||
|
|
9634739aed |
fix: support Bedrock ambient AWS credentials for Agents providers (#24397)
> This PR was authored by Mux on behalf of Mike. Adds AWS Bedrock ambient credential support to the Agents provider path. Bedrock providers can now be saved without a stored API key and authenticated via the standard AWS SDK credential chain on the Coder server (IAM roles, `AWS_ACCESS_KEY_ID`, etc.). Also fixes missing `Base URL` forwarding for Bedrock. ## Changes **Backend runtime** (`coderd/x/chatd/chatprovider/chatprovider.go`): - New `ProviderAllowsAmbientCredentials(provider)` helper. Currently returns true only for Bedrock. - `ModelFromConfig` no longer errors on an empty API key when the provider is in the ambient-allowed set AND was explicitly resolved via `ByProvider`. This preserves the policy gate: unresolvable providers (disabled central key, user-key-required without a user key) still error. - `setResolvedProviderAPIKey` internalizes the ambient-credentials contract via `ProviderAllowsAmbientCredentials`, so a resolved-but-keyless Bedrock provider is represented as an empty `ByProvider` entry rather than a post-hoc sentinel patch in the caller. - `WithAPIKey` is only appended when a token is present. - `WithBaseURL(baseURL)` is now forwarded for Bedrock (was previously missing). **Backend admin API** (`coderd/exp_chats.go`): - `validateChatProviderCentralAPIKey` exempts Bedrock from requiring a stored API key when central credentials are enabled. - AI Gateway separation (`ChatProviderAPIKeysFromDeploymentValues`) is unchanged. No silent reuse of `CODER_AIBRIDGE_BEDROCK_*` flags. **Frontend** (`site/src/pages/AgentsPage/components/ChatModelAdminPanel/*`): - API Key field is optional for Bedrock when central credentials are enabled. - Bedrock-specific descriptions on API Key and Base URL fields (bearer-token vs ambient modes, `AWS_REGION` guidance). - Right-aligned "Clear stored token" action switches an existing Bedrock provider back to ambient mode. - `hasEffectiveAPIKey` treats Bedrock with central credentials enabled as configured, so the provider list shows the correct status icon. - Three new stories: `ProviderFormBedrockAmbientCredentials`, `ProviderFormBedrockBearerToken`, `ProviderFormBedrockClearBearerToken`. **Docs** (`docs/ai-coder/agents/models.md`, `docs/ai-coder/ai-gateway/setup.md`): - New "Configuring AWS Bedrock" section covering both credential modes, region resolution, and the Base URL override. - Explicit note that the `us-east-1` region fallback only applies to bearer-token mode; ambient credentials require a region from the standard AWS SDK chain. - Cross-reference in AI Gateway docs clarifying that `CODER_AIBRIDGE_BEDROCK_*` flags are a separate configuration path from Agents. ## Not in scope - Reusing AI Gateway Bedrock flags as an implicit Agents fallback. - Per-provider AWS access key, secret, or region fields (would need a migration and audit-table review). - IMDS or network-backed credential probes in admin/listing request paths. ## Related Dogfood deployment integration: https://github.com/coder/dogfood/pull/324 |
||
|
|
cc4e04afde |
feat(site): display file attachments in chat UI (#24281)
Renders the durable file attachments introduced in #24280 in the chat interface. Without this, attachments were stored and served correctly but the UI showed raw file parts with no previews or download UX. Every attachment gets a download affordance, split into three rendering tiers: - **Images** — thumbnail with a hover/focus overlay containing a download link. `onFocusCapture`/`onBlurCapture` with `contains(relatedTarget)` keeps the overlay open while tabbing between the image and its download link. - **Text-like files** (`text/*`, `application/json`) — expandable preview button with loading + error-with-retry states and the same download overlay. Preview fetches throw a typed `FetchTextAttachmentError` with a `.status` field instead of a stringly-typed error. - **Everything else** — compact `FileCard` with extension badge, filename, and download link. User-side and assistant-side rendering now share `AttachmentBlocks.tsx` (`AttachmentPreviewFrame`, `TextAttachmentButton`, `ImageAttachmentButton`, `FileCard`, plus `getAttachmentHref`/`getAttachmentName`) instead of two near-duplicate implementations. The text-attachment overlay anchors to the preview surface so the download button stays pinned even when a loading/error status line widens the row below. `ComputerRenderer` detects when a screenshot was stored as a durable attachment (`attachment_file_id`) and suppresses the stale base64 rendering — the screenshot appears as a proper file part instead. `ToolLabel` shows the attached filename for `attach_file` tool calls. Storybook coverage in `ConversationTimeline.stories.tsx` was expanded to cover every tier (single/multiple images, inline + file-id text, JSON, download-only files, fetch-failure retry, mixed attachments + file references) with play-function assertions. <img width="811" height="150" alt="image" src="https://github.com/user-attachments/assets/27c71081-3502-4e80-92a7-d8adf1ff9323" /> ## Cleanup Per Mathias' post-merge suggestion on #24280, this PR also relocates `coderd/chatfiles` → `coderd/x/chatfiles` so the durable-attachment helpers live beside the rest of the `chatd` experimental surface. Closes CODAGT-91 |
||
|
|
ad1906589d |
fix(coderd): allow deleting chat providers used in historical chats (#24568)
Drop the `chat_model_configs.provider -> chat_providers.provider` foreign key and soft-delete model configs when their provider is removed. The provider row is now hard-deleted inside a transaction that also tombstones its model configs and promotes a replacement default when needed. Historical chats and messages keep pointing at the soft-deleted model config rows, which are hidden from live/admin queries but still resolve for read. The runtime chat path already falls back to the default model config when a soft-deleted config is looked up. Replaces the lost FK validation in the create/update model-config handlers with an explicit provider lookup that returns the existing `Chat provider is not configured.` 400. ## UX **Admin deleting a chat provider that has historical usage** - Before: blocked with 400 `Provider models are still referenced by existing chats.` Admins had no in-product way to remove a provider that had ever been used. - After: delete succeeds (204). Any model configs under that provider are soft-deleted. If the removed provider owned the default model config, one of the remaining live configs is auto-promoted to the new default. The promotion is deterministic (`ensureDefaultChatModelConfig` picks the first live config by `provider ASC, model ASC, updated_at DESC, id DESC`); there is no picker, and no toast or response detail names which config became the new default. **End users with chats that used a deleted provider's model** - Old chats still open and their history still renders unchanged. - Sending a new turn in such a chat silently falls back to the current default model. No banner or warning tells the user the original model is gone. - The model picker no longer lists the deleted model. - If no default model config exists at all after the delete, sending a new turn fails with `no default chat model config is available`. **Admin creating or updating a model config against a provider that is not configured** - Same as before: 400 `Chat provider is not configured.` Only the detection mechanism changed (explicit `FOR UPDATE` lookup inside the transaction, which also serializes against a concurrent provider delete). **Admin updating a model config whose row disappears mid-transaction** - Now returns the standard 404 `Resource not found or you do not have access to this resource` instead of the previous 500 that leaked `sql: no rows in result set` in the detail. Unrelated internal races (for example a race on the promoted default candidate) are still reported as 500 so they are not misclassified as "your target is gone". Closes CODAGT-23 |
||
|
|
c968a1f3a3 |
feat: make database.Chat auditable (#24485)
Wire database.Chat into the audit system so chat lifecycle events
(creation, patches, etc.) produce audit log entries.
Part of CODAGT-200.
> 🤖
|
||
|
|
1203f625b7 |
feat(coderd): accept parameters in start_workspace tool (#24434)
When the chat `start_workspace` tool triggers an active-version upgrade that introduces new required parameters, the build fails with a parameter validation error. Previously this returned a message telling the user to update from the UI — a dead end for the model. This PR lets the model recover inside the chat by: 1. Accepting an optional `parameters` map on `start_workspace` (same schema as `create_workspace`), forwarded as `RichParameterValues`. 2. Returning structured JSON error responses that preserve validation details and the workspace's `template_id`, so the model can call `read_template` to discover what changed. 3. Replacing the UI-only guidance in `exp_chats.go` with model-actionable retry instructions. The expected model flow on an active-version parameter failure is now: ``` start_workspace → fails (structured error with template_id + validations) read_template → discovers new required parameters start_workspace → retries with parameters map → workspace starts ``` <img width="846" height="511" alt="image" src="https://github.com/user-attachments/assets/d18b6864-5970-4225-8da0-0f2ab134ccb4" /> |
||
|
|
410f9a5e19 |
feat: allow renaming of agent chat title (#24489)
Co-authored-by: Coder Agents <noreply@coder.com> |
||
|
|
18a30a7a10 | feat: add chat debug HTTP handlers and API docs (#23918) | ||
|
|
ea00d2d396 |
fix(coderd): enforce workspace authz on watchChatGit (#24477)
`watchChatGit` proxies a live websocket to the workspace agent's git watcher (`/api/v0/git/watch`), streaming repository diffs back through the chat stream. Before this change it only enforced `chat:read` (via `ExtractChatParam`) plus an implicit `workspace:read` from the dbauthz wrapper on `GetWorkspaceAgentsInLatestBuildByWorkspaceID`. The sibling `watchChatDesktop` handler already fetches the workspace and requires `policy.ActionApplicationConnect` or `policy.ActionSSH` before dialing. Built-in roles like **Template Admin** and **Org Admin** grant `workspace:read` without SSH/ApplicationConnect, and **Owner** also loses both under `DisableOwnerWorkspaceExec`. A chat owner whose exec-level workspace access was revoked *after* the chat was bound could therefore keep streaming repository content from the workspace agent through the chat's git-watch endpoint. Mirror `watchChatDesktop`: fetch the workspace and require `ApplicationConnect || SSH` before any agent-tunnel activity. Adds one real-coderdtest regression test (`TestWatchChatGitAuthz`) that demotes the chat's owner to template-admin after binding and asserts the git-watch endpoint returns 403; the mock-based `TestWatchChatGit` in `coderd/workspaceagents_internal_test.go` continues to cover the no-workspace / disconnected-agent / websocket-proxy paths. Fixes CODAGT-184. |
||
|
|
fc2493780f |
fix: exclude subagent chats from sidebar pagination (#24404)
GetChats now returns only root chats (parent_chat_id IS NULL). A new GetChildChatsByParentIDs query fetches children for visible roots and embeds them in each parent's Children field. The singular getChat endpoint does the same. Archive invariant is one-way: parent archived implies child archived. Parent archive/unarchive cascades via root_chat_id. Individual child archive is permitted; child unarchive while the parent is archived is rejected atomically (row lock on child, re-read parent inside the transaction). Embedded children are filtered by the caller's archive state so individually-archived children stay hidden from active-parent views. Gitsync MarkStale uses GetChatsByWorkspaceIDs directly; MarkStaleParams.OwnerID removed (dead after the switch). Frontend: buildChatTree reads from the embedded children field, WebSocket handlers route child events into the parent's children array, and archiving a child strips it from the parent cache. |
||
|
|
ef6969dd70 |
feat(coderd/x/chatd): agent-created file attachments in chat (#24280)
Agents can already see workspace files and take screenshots, but users could not download those artifacts from chat. This PR adds durable chat attachments to chatd. `attach_file`, explicit `computer` screenshot actions (not the automatic post-action screenshots), and `propose_plan` now fetch bytes over the agent connection, store them in `chat_files`, link them to the chat, and carry attachment metadata in tool responses so `buildAssistantPartsForPersist` can materialize ordinary `type:"file"` assistant parts that the chat file APIs serve. The same storage helpers are reused for other artifact-producing paths. `wait_agent` recordings and thumbnails are stored as chat files and linked back to the parent chat, with best-effort relinking so parent chats retain those artifacts without leaving orphaned rows when chat-file caps reject links. `storeChatAttachment` wraps insert + link in one transaction, files are capped at 10 MB each and 20 per chat, and serving defaults to `Content-Disposition: attachment` with an explicit inline-safe allowlist. This PR also consolidates chat-file media policy in `coderd/chatfiles`. Uploads and tool-generated attachments share byte-based MIME detection, SVG blocking, inline-safety rules, and compatible `text/plain` refinement for JSON, CSV, and Markdown. Prompt construction still only inlines synthetic pasted text for model consumption; assistant-created attachments are persisted for the user and intentionally not replayed into later LLM turns. UI follow-up lives in #24281. Relates to CODAGT-91 |
||
|
|
73b5058923 |
feat: add Explore mode as subagent-only modality (#24448)
> This PR was authored by Mux on behalf of Mike. Introduce Explore mode, a read-only subagent modality for delegated discovery and code investigation. ## What Adds a `spawn_explore_agent` tool that creates child chats restricted to read-only operations. An admin can optionally configure a deployment-wide model override so Explore subagents use a model optimized for large context or reasoning without changing the root chat's model. ### Backend - New `ChatModeExplore` enum value (migration 000471). - `spawn_explore_agent` tool definition with read-only allowlist: `read_file`, `execute`, `process_output`, `read_skill`, `read_skill_file`. Write tools, file editors, and nested subagent spawning are blocked. - Deployment config storage for the Explore model override (`agents_chat_explore_model_override` in `site_configs`). - Model resolution hierarchy: configured override, then current turn model, then global default. Silent fallback with warning log when the override becomes unavailable. - RBAC: `AsChatd` for daemon reads, `ActionRead` and `ActionUpdate` on `ResourceDeploymentConfig` for admin API calls. - Plan mode root chats can use `spawn_explore_agent` for read-only research, matching the planning prompt guidance. - The Explore override config API now reports malformed saved overrides as "treated as unset" so admins can clear them explicitly. ### Frontend - `ExploreModelOverrideSettings` component in admin agent behavior settings. Uses `ModelSelector`, handles unavailable model warnings, and supports explicit Save and Clear actions. - Malformed saved overrides show a warning and require an explicit Save to clear, instead of Clear auto-submitting behind the scenes. ### Tests - Integration: `TestExploreSubagentIsReadOnly` (full spawn flow, tool verification, prompt overlay, DB state). - Unit: tool allowlist tests for explore, plan, and default modes. - Internal: model override resolution with valid, invalid UUID, disabled, and unconfigured override scenarios. - RBAC: `dbauthz_test.go` for `GetChatExploreModelOverride` and `UpsertChatExploreModelOverride`. - API: admin set and clear, malformed stored override reporting, disabled model rejection, non-admin denial. |
||
|
|
91b35a25ee |
fix(coderd): auto-update workspace to active template version on chat start (#24424)
## Problem When a template has `require_active_version` enabled and the chat agent tries to start a workspace that is stopped on an older template version, the agent gets stuck in an infinite loop: `start_workspace` fails with a 403 (the old version is not the active version and the user lacks `ActionUpdate` on the template), then `create_workspace` sees the existing stopped workspace and tells the agent to use `start_workspace`, repeat forever. The root cause is that `chatStartWorkspace()` passes the start build request through without setting `TemplateVersionID`, so `wsbuilder` defaults to the previous build's template version — which RBAC rejects when `RequireActiveVersion` is true. ## Fix In `chatStartWorkspace()` (`coderd/exp_chats.go`), when the template's access control has `RequireActiveVersion` enabled, explicitly set `req.TemplateVersionID` to `template.ActiveVersionID` before calling `postWorkspaceBuildsInternal()`. This mirrors how the autobuild executor handles the same scenario (`coderd/autobuild/lifecycle_executor.go`). If the new active version introduces required parameters that cannot be resolved automatically (no defaults, no previous values), the build fails at parameter validation before a provisioner job is created. In that case, a clear error message tells the user to update and start the workspace from the UI instead of surfacing a raw internal error. On successful auto-update, the tool response includes `updated_to_active_version`, `update_reason`, and a human-readable `message` so the model can explain to the user what happened. <img width="782" height="122" alt="image" src="https://github.com/user-attachments/assets/289430d6-066e-41cf-bc97-cd013dcf717d" /> ### Changes - **`coderd/exp_chats.go`**: `chatStartWorkspace()` loads the template, checks `RequireActiveVersion` via `AccessControlStore`, and pins the build to the active version when required. New `isChatStartWorkspaceManualUpdateRequiredError()` classifies parameter validation failures from both the dynamic parameters path (`DiagnosticError`) and the classic path (`ErrParameterValidation` sentinel). - **`coderd/wsbuilder/wsbuilder.go`**: New `ErrParameterValidation` sentinel error, wrapped into the classic parameter validation `BuildError` so callers can use `errors.Is` instead of string matching. - **`coderd/x/chatd/chattool/startworkspace.go`**: `waitForAgentAndRespond` now returns `map[string]any` instead of `fantasy.ToolResponse`, letting the caller annotate the result (e.g. auto-update metadata) before converting. Error handling for `StartFn` checks for `httperror.Responder` errors to surface clean messages for the manual-update case. - **`coderd/x/chatd/chattool/startworkspace_test.go`**: Two new tests — `StoppedWorkspaceReportsAutoUpdate` (verifies auto-update fields in response) and `ManualUpdateRequired` (verifies clean error message without internal wrapping). ### Follow-up The manual-update error message could include a direct link to the workspace settings page, but the chattool layer does not currently have access to the deployment's access URL. Plumbing it through is straightforward but out of scope for this fix. Closes CODAGT-192 |
||
|
|
3452ab3166 |
chore: add client_type field to chats and telemetry (#24342)
Add a `chat_client_type` enum (`ui` | `api`) and `client_type` column to the `chats` table. The column defaults to `api` for new rows so API callers don't need to set it explicitly. Existing rows are backfilled to `ui`. The field flows through `CreateChatRequest`, `chatd.CreateOptions`, `InsertChat`, and is returned in the `Chat` response via `db2sdk`. <details> <summary>Implementation notes (Coder Agents generated)</summary> ### Changes **Database migration (000469)** - New enum `chat_client_type` with values `ui`, `api`. - New `client_type` column, `NOT NULL DEFAULT 'api'`. - Backfill: `UPDATE chats SET client_type = 'ui'`. **SQL query** — `InsertChat` now includes `client_type`. **SDK** — `ChatClientType` type added; `ClientType` field added to both `CreateChatRequest` (optional, defaults server-side to `api`) and `Chat` response. **Handler** — `postChats` maps the request field (defaulting to `api`) and passes it through `chatd.CreateOptions`. **Sub-agent** — Child chats inherit their parent's `client_type`. **db2sdk** — Maps the database value to the SDK type. ### Decision log - Default is `api` (not `ui`) so existing API integrations get the correct value without code changes. - Backfill sets existing rows to `ui` per requirement. - Child chats inherit `client_type` from parent rather than defaulting. </details> |
||
|
|
1cf0354f72 |
feat: add plan mode with restricted tool boundary (#24236)
> This PR was authored by Mux on behalf of Mike. ## Summary - add persistent plan mode for chats and the chat-specific plan file flow - add structured planning tools such as `ask_user_question` and `propose_plan` - keep `write_file` and `edit_files` constrained to the chat-specific plan file during plan turns - allow shell exploration in plan mode, including subagents, via `execute` and `process_output` - block implementation-oriented, provider-native, MCP, dynamic, and computer-use tools during plan turns - update the chat UI, tests, and docs for the new planning flow |
||
|
|
227f20df6a |
perf(coderd): cheaper chatd org membership checks (#24361)
This change reuses the authenticated subject's existing organization membership information during chat creation instead of issuing an `OrganizationMembers` query. The current query is still correct, so this is not required for correctness. However, `workspaceapps` already answers the same question more cheaply from the request's RBAC subject. This extracts that logic into `rbac.Subject.HasOrganizationMembership` and reuses it in both places, removing an extra database lookup from chat creation without changing the authorization behavior. I'm currently debugging a Coder agents scaletest regression where a run on April 2, 2026 with 4800 concurrent chat creations passed, while the same run on April 15, 2026 does not. We could stagger chat creation to reduce the burst, but I'd rather understand why this bottleneck appeared in the first place so we can keep making small hot-path improvements like this one instead of only smoothing over the symptom. |
||
|
|
c552f9f281 |
fix: stop group spend limits from leaking across org boundaries (#24294)
Three SQL queries (`GetUserGroupSpendLimit`, `ResolveUserChatSpendLimit`, `GetUserChatSpendInPeriod`) aggregated chat spend limits and usage globally across all organizations. A restrictive group limit in org A would bleed into org B. ## Changes - Add `organization_id` parameter to all three SQL queries in `coderd/database/queries/chats.sql` - When nil UUID is passed, queries fall back to global behavior (backward compat for HTTP dashboard endpoints) - When real org ID is passed, limits and spend are scoped to that organization - Thread `organizationID` through `ResolveUsageLimitStatus` → `checkUsageLimit` → all chatd call sites - Update dbauthz wrappers for new param structs - HTTP endpoints (`chatCostSummary`, `getMyChatUsageLimitStatus`) pass `uuid.Nil` with TODO for future org-scoped UI - Add `TestResolveUsageLimitStatus_OrgScoped` with 5 test cases covering org isolation, nil-UUID fallback, spend scoping, and user override priority Closes coder/internal#1466 > 🤖 |
||
|
|
22062ec52e |
feat: add organization scoping to chats (#23827)
Fixes https://github.com/coder/internal/issues/1436 * Adds organization_id to chats with backfill (workspace org → user org membership → default org) * No support yet for ACLs (follow-up issue) - Cross-org workspace binding rejected (both in `CreateChatRequest` and in `create_workspace` tool - Adds `OrganizationAutocomplete` to `AgentCreateForm` - Docs updated with `organization_id` in chats-api.md > 🤖 Written by a Coder Agent. Reviewed by many humans and many agents. --------- Co-authored-by: Mathias Fredriksson <mafredri@gmail.com> |
||
|
|
a62ead8588 |
fix(coderd): sort pinned chats first in GetChats pagination (#24222)
The GetChats SQL query ordered by (updated_at, id) DESC with no pin_order awareness. A pinned chat with an old updated_at could land on page 2+ and be invisible in the sidebar's Pinned section. Add a 4-column ORDER BY: pinned-first flag DESC, negated pin_order DESC, updated_at DESC, id DESC. The negation trick keeps all sort columns DESC so the cursor tuple < comparison still works. Update the after_id cursor clause to match the expanded sort key. Fix the false handler comment claiming PinChatByID bumps updated_at. |
||
|
|
36141fafad |
feat: stack insights tables vertically and paginate Pull requests table (#24198)
The "By model" and "Pull requests" tables on the PR Insights page (`/agents/settings/insights`) were side-by-side at `lg` breakpoints, and the Pull requests table was hard-capped at 20 rows by the backend. - Replaced `lg:grid-cols-2` with a single-column stacked layout so both tables span the full content width. - Removed the `LIMIT 20` from the `GetPRInsightsRecentPRs` SQL query so all PRs in the selected time range are returned. - Can add this back if we need it. If we do, we should add a little subheader above this table to indicate that we're not showing all PRs within the selected timeframe. - Added client-side pagination to the Pull requests table using `PaginationWidgetBase` (page size 10), matching the existing pattern in `ChatCostSummaryView`. - Renamed the section heading from "Recent" to "Pull requests" since it now shows the full set for the time range. <img width="1481" height="1817" alt="image" src="https://github.com/user-attachments/assets/0066c42f-4d7b-4cee-b64b-6680848edc68" /> > 🤖 PR generated with Coder Agents |
||
|
|
38d4da82b9 | refactor: send raw typed payloads over chat WebSockets (#24148) | ||
|
|
b969d66978 |
feat: add dynamic tools support for chat API (#24036)
Adds client-executed dynamic tools to the chat API. Dynamic tools are
declared by the client at chat creation time, presented to the LLM
alongside built-in tools, but executed by the client rather than chatd.
This enables external systems (Slack bots, IDE extensions, Discord bots,
CI/CD integrations) to plug custom tools into the LLM chat loop without
modifying chatd's built-in tool set.
Modeled after OpenAI's Assistants API: the chat pauses with
`requires_action` status when the LLM calls a dynamic tool, the client
POSTs results back via `POST /chats/{id}/tool-results`, and the chat
resumes.
See [this example](https://github.com/coder/coder-slackbot-poc) as a
reference for how this is used. It's highly-configurable, which would
enable creating chats from webhooks, periodically polling, or running as
a Slackbot.
<details>
<summary>Design context</summary>
### Architecture
The chatloop **exits** when it encounters dynamic tools and
**re-enters** when results arrive. No blocking channels, no pubsub for
tool results, no in-memory registry. The DB is the only coordination
mechanism.
```
Phase 1 (chatloop):
LLM response → execute built-in tools only →
Persist(assistant + built-in results) →
status = requires_action → chatloop exits
Phase 2 (POST /tool-results):
Persist(dynamic tool results) →
status = pending → wakeCh → chatloop re-enters
```
### Validation (POST /tool-results)
1. Chat status must be `requires_action` (409 if not)
2. Read chat's `dynamic_tools` → set of dynamic tool names
3. Read last assistant message → extract tool-call parts matching
dynamic tool names
4. Submitted tool_call_ids must match exactly (400 for missing/extra)
5. Persist tool-result message parts, set status to `pending`, signal
wake
### Idempotency
Tool call IDs scoped per LLM step. State machine (`requires_action` →
`pending`) is the guard. First POST wins, subsequent get 409.
### Mixed tool calls
When the LLM calls both built-in and dynamic tools in one step, built-in
tools execute immediately. Their results are persisted in phase 1.
Dynamic tool results arrive via POST in phase 2. The LLM sees all
results when the chatloop resumes.
</details>
> 🤖 Generated by Coder Agents
|
||
|
|
233343c010 |
feat: add chat and chat_files cleanup to dbpurge (#23833)
Fixes https://github.com/coder/coder/issues/23910 Adds periodic cleanup of chats and chat files to the dbpurge background goroutine, with a configurable retention period exposed in the Agent settings UI. > 🤖 Written by a Coder Agent. Reviewed by a human. |
||
|
|
d5a1792f07 |
feat: track chat file associations with chat_file_links on chats (#23537)
Needed by #23833 Adds a `chat_file_links` association table to track which files are associated with each chat. - `AppendChatFileIDs` query links a file to a chat with deduplication - `GetChatFileMetadataByIDs` query returns lightweight file metadata by IDs - Tool-created files (e.g. `propose_plan`) are linked to the chat after insert - User-uploaded files are linked to the chat when the referencing message is sent - Single-chat GET endpoint hydrates `files: ChatFileMetadata[]` on the response > 🤖 Created by Coder Agents and massaged into shape by a human. |
||
|
|
648787e739 |
feat: expose busy_behavior on chat message API (#24054)
The backend (`chatd.go`) already fully implements both `"queue"` and `"interrupt"` busy behaviors for `SendMessage`, and the `message_agent` subagent tool already leverages both internally. However the HTTP API hardcoded `"queue"` and the SDK had no way for callers to request interrupt-on-send. This adds a `ChatBusyBehavior` enum type to the SDK and an optional `busy_behavior` field on `CreateChatMessageRequest`. The HTTP handler validates the field and passes it through to `chatd.SendMessage`. Default remains `"queue"` for full backward compatibility. <details><summary>Implementation notes</summary> - `codersdk/chats.go`: New `ChatBusyBehavior` type with `ChatBusyBehaviorQueue` and `ChatBusyBehaviorInterrupt` constants. Added `BusyBehavior` field to `CreateChatMessageRequest` with `enums` tag for codegen. - `coderd/exp_chats.go`: `postChatMessages` now reads `req.BusyBehavior`, maps SDK constants to `chatd.SendMessageBusyBehavior*`, returns 400 on invalid values. - `site/src/api/typesGenerated.ts`: Auto-generated via `make gen`. - No frontend behavior changes — the field is available but unused by the UI. </details> > [!NOTE] > Generated by Coder Agents |
||
|
|
4cfbf544a0 |
feat: add per-chat system prompt option (#24053)
Adds a `system_prompt` field to `CreateChatRequest` that allows API consumers to provide custom instructions when creating a chat. The per-chat prompt is stored as a separate system message (`role=system`, `visibility=model`) in the `chat_messages` table, inserted between the deployment system prompt and the workspace awareness message. Also moves deployment system prompt resolution from the HTTP handler (`resolvedChatSystemPrompt`) into `chatd.CreateChat` where it belongs. The handler no longer assembles system prompts — `CreateOptions.SystemPrompt` is now purely the per-chat user prompt, and the deployment prompt is resolved internally by chatd. No database schema changes required. **Message insertion order:** 1. Deployment system prompt (resolved by chatd, existing) 2. Per-chat user system prompt (new, from `CreateOptions.SystemPrompt`) 3. Workspace awareness (existing) 4. Initial user message (existing) 🤖 Generated with [Coder Agents](https://coder.com/agents) |
||
|
|
a2ce74f398 |
feat: add total_runtime_ms to chat cost analytics endpoints (#24050)
Surface the aggregated `runtime_ms` from `chat_messages` through all
four cost analytics queries (summary, per-model, per-chat, per-user).
This is the key billing metric for agent compute time.
The per-chat breakdown already groups by `root_chat_id`, so subagent
runtime is automatically rolled up under the parent chat — no additional
query changes needed.
<details>
<summary>Implementation details</summary>
**SQL** (`coderd/database/queries/chats.sql`): Added
`COALESCE(SUM(cm.runtime_ms), 0)::bigint AS total_runtime_ms` to
`GetChatCostSummary`, `GetChatCostPerModel`, `GetChatCostPerChat`, and
`GetChatCostPerUser`.
**Go SDK** (`codersdk/chats.go`): Added `TotalRuntimeMs int64` to
`ChatCostSummary`, `ChatCostModelBreakdown`, `ChatCostChatBreakdown`,
and `ChatCostUserRollup`.
**Handler** (`coderd/exp_chats.go`): Wired the new field through all
converter functions and the response assembly.
**Tests** (`coderd/exp_chats_test.go`): Updated fixture to seed non-zero
`runtime_ms` values and added assertions for the new field at summary,
per-model, and per-chat levels.
</details>
> 🤖 Generated by Coder Agents
|
||
|
|
7d0a0c6495 | feat: provider key policies and user provider settings (#23751) | ||
|
|
19e44f4136 |
fix: target specific chat in MarkStale instead of broadcasting to all workspace chats (#23883)
## Problem Subagent chats were receiving git context (branch, remote origin, PR status) from their parent or sibling chats' git operations. When a git operation triggers external auth, the workspace agent sends `chat_id` identifying which chat initiated it — but this was broken at two levels: 1. **Agent side:** `CODER_CHAT_ID` was never injected into process environments. `chatd` sets `Coder-Chat-Id` HTTP headers and the agent extracts them for process isolation, but never propagated `CODER_CHAT_ID` to `cmd.Env`. So `gitaskpass` always sent an empty `chat_id`. 2. **Server side:** `workspaceAgentsExternalAuth` ignored the `chat_id` query param. `MarkStale` broadcast git context to **all** chats on the workspace via `filterChatsByWorkspaceID`. ## Fix - Inject `CODER_CHAT_ID` into `cmd.Env` in `agentproc` when the chat ID is known, so `gitaskpass` can read and forward it. - Read `chat_id` from query params in `workspaceAgentsExternalAuth` and thread it through `chatGitRef`. - Refactor `MarkStale` to accept a `MarkStaleParams` struct. When `ChatID` is provided, target only that specific chat. When empty (legacy agents, non-chat git operations), fall back to the existing workspace-wide broadcast. - Extract `markStaleSingle` helper to deduplicate the upsert+publish logic. <details><summary>Investigation notes</summary> ### Data flow before fix ``` chatd → sets Coder-Chat-Id header on agent conn agent → extracts chatID, stores on process struct agent → does NOT set CODER_CHAT_ID in cmd.Env ← gap 1 gitaskpass → reads CODER_CHAT_ID (always empty), sends chat_id="" server handler → ignores chat_id query param ← gap 2 MarkStale → broadcasts to ALL workspace chats ``` ### Data flow after fix ``` chatd → sets Coder-Chat-Id header on agent conn agent → extracts chatID, stores on process struct agent → sets CODER_CHAT_ID in cmd.Env gitaskpass → reads CODER_CHAT_ID, sends chat_id=<uuid> server handler → reads chat_id, passes to MarkStale MarkStale → targets only that specific chat ``` </details> |
||
|
|
7861fcf1f6 |
perf(coderd): stop inline-resolving diff status on every GetChat call (#23901)
## Problem
Every `GET /api/experimental/chats/{chatID}` call was blocking for
200-800ms because the `getChat` handler called `resolveChatDiffStatus`,
which unconditionally hit the git provider API (e.g. GitHub's `GET
/repos/{owner}/{repo}/pulls?head=...`) via `ResolveBranchPullRequest` —
even when the cached diff status was fresh.
This made every chat page load at `/agents/{id}` noticeably slow.
## Root cause
The call chain was:
1. `getChat` → `resolveChatDiffStatus`
2. `resolveChatDiffStatus` → `resolveChatDiffReference` →
`gp.ResolveBranchPullRequest(...)` **(external HTTP call)**
3. Only **after** the external call: `chatDiffStatusIsStale(status,
now)` check
The staleness check happened after the expensive work, so every request
paid the cost regardless of cache freshness.
## Fix
`getChat` now returns the cached `chat_diff_statuses` row directly from
the database. The background `gitsync` worker already keeps these rows
fresh (every `DiffStatusTTL = 120s`), so inline resolution was
redundant.
The `resolveChatDiffContents` endpoint (which fetches actual diff
content) still uses the full resolution path since it needs to make
provider API calls by design.
## Changes
- `getChat` reads cached diff status from DB instead of calling
`resolveChatDiffStatus`
- Remove `resolveChatDiffStatus` (dead code — no production callers)
- Remove `chatDiffStatusIsStale` and `chatDiffStatusTTL` (dead code)
- Remove `RefreshesStaleStatusWithExternalAuth` test (tested the removed
inline refresh path)
<details><summary>Decision log</summary>
- **Why not just add a staleness gate?** The background worker already
handles refreshes on the same schedule. Adding an early-return-if-fresh
would work but leaves dead code for the stale path that's never
exercised in production (the worker gets there first). Removing the
inline path entirely is simpler and eliminates the external API
dependency from the read path.
- **Why keep `resolveChatDiffContents` unchanged?** That endpoint's job
is to fetch the actual diff content from the provider, so external API
calls are inherent to its purpose.
</details>
|
||
|
|
5cba59af79 |
fix(coderd): unarchive child chats with parents (#23761)
Unarchiving a root chat now restores descendant chats in the database and emits lifecycle events for every affected chat so passive sessions converge without a full refetch. This keeps archive and unarchive symmetric at both the data and watch-stream layers by returning the affected chat family from the database, using those post-update rows for chatd pubsub fanout, and covering descendant lifecycle delivery with a watch-level regression test. Closes #23666 |
||
|
|
bbf3fbc830 |
fix(coderd/x/chatd): archive chat hard-interrupts active stream (#23758)
Archiving a chat now transitions pending or running chats to waiting before setting the archived flag. This publishes a status notification on `ChatStreamNotifyChannel` so `subscribeChatControl` cancels the active `processChat` context via `ErrInterrupted` — the same codepath used by the stop button. The `processChat` cleanup also skips queued-message auto-promotion when the chat is archived, so archiving behaves like a hard stop rather than interrupt-and-continue. Relates to https://github.com/coder/coder/issues/23666 |
||
|
|
3ce82bb885 |
feat: add chat-access site-wide role to gate chat creation (#23724)
- Add `chat-access` built-in role granting chat CRUD at User scope
- Exclude `ResourceChat` from member, org member, and org service
account `allPermsExcept` calls
- Allow system, owner, and user-admin to assign the new role
- Migration auto-assigns role to users who have ever created a chat
- Update RBAC test matrix: `memberMe` denied, `chatAccessUser` allowed
**Breaking change**: Members without `chat-access` lose chat creation
ability. Migration covers existing chat creators. Members who have never
created a chat do not get this role automatically applied.
> 🤖 This PR was created by a Coder Agent and reviewed by me.
|
||
|
|
bcdc35ee3e |
feat: add chat read/unread indicator to sidebar (#23129)
## Summary Adds read/unread tracking for chats so users can see which agent conversations have new assistant messages they haven't viewed. ## Backend Changes - Adds `last_read_message_id` column to the `chats` table (migration 000439). - Computes `has_unread` as a virtual column in `GetChatsByOwnerID` using an `EXISTS` subquery checking for assistant messages beyond the read cursor. - Exposes `has_unread` on the `codersdk.Chat` struct and auto-generated TypeScript types. - Updates `last_read_message_id` on stream connect/disconnect in `streamChat`, avoiding per-message API calls during active streaming. - Uses `context.WithoutCancel` for the deferred disconnect write so the DB update succeeds even after the client disconnects. ## Frontend Changes - Bold title (`font-semibold`) for unread chats in the sidebar. - Small blue dot indicator next to the relative timestamp. - Suppresses unread indicator for the currently active chat via `isActive` from NavLink. ## Design Decisions - Only `assistant` messages count as unread — the user's own messages don't trigger the indicator. - No foreign key on `last_read_message_id` since messages can be deleted (via rollback/truncation) and the column is just a high-water mark. - Zero API calls during streaming: exactly 2 DB writes per stream session (connect + disconnect). - Unread state refreshes on chat list load and window focus. The `watchChats` WebSocket optimistically marks non-active chats as unread on `status_change` events, but does not carry a server-computed `has_unread` field. Navigating to a chat optimistically clears its unread indicator in the cache. |
||
|
|
2312e5c428 |
feat: add manual chat title regeneration (#23633)
## Summary
Adds a "Generate new title" action that lets users manually regenerate a
chat's title using richer conversation context than the automatic
first-message title path.
## Changes
### Backend
- **New endpoint:** `POST
/api/experimental/chats/{chatID}/title/regenerate` returns the updated
Chat with a regenerated title
- **Manual title algorithm:** Extracts useful user/assistant text turns
→ selects first user turn + last 3 turns → builds context with gap
markers → renders prompt with anti-recency guidance → calls lightweight
model → normalizes output
- **Helpers:** `extractManualTitleTurns`,
`selectManualTitleTurnIndexes`, `buildManualTitleContext`,
`renderManualTitlePrompt`, `generateManualTitle` — all private, with the
public `Server.RegenerateChatTitle` method
- **SDK:** `ExperimentalClient.RegenerateChatTitle(ctx, chatID) (Chat,
error)`
- Persists title via existing `UpdateChatByID` and broadcasts
`ChatEventKindTitleChange`
### Frontend
- API client method + React Query mutation with cache invalidation
- "Generate new title" menu item (with wand icon) in both TopBar and
Sidebar dropdown menus
- Loading/disabled state while regeneration is in-flight
- Error toast on failure
- Stories updated for both menus
### Tests
- `quickgen_test.go`: Table-driven tests for all 4 helper functions
(turn extraction, index selection, context building, prompt rendering)
- `exp_chats_test.go`: Handler tests (ChatNotFound,
NotFoundForDifferentUser, NoDaemon)
## Design notes
- The existing auto-title path (`maybeGenerateChatTitle`, `titleInput`)
is completely unchanged
- Manual regeneration uses richer context (first user turn + last 3
turns + gap markers) vs the auto path's single first message
- Endpoint is experimental and marked with `@x-apidocgen {"skip": true}`
|
||
|
|
113aaa79a0 |
feat: add pinned chats with drag-to-reorder (#23615)
https://github.com/user-attachments/assets/bd5d12a1-61b3-4b7d-83b6-317bdfb60b3c ## Summary Adds pinned chats to the agents page sidebar with server-side persistence and drag-to-reorder. Users can pin/unpin chats via the context menu, and pinned chats appear in a dedicated "Pinned" section above the time-grouped list. ## Database Migration `000453_chat_pin_order`: adds `pin_order integer DEFAULT 0 NOT NULL` column on `chats` (0 = unpinned, 1+ = pinned in display order). Three SQL queries handle pin operations server-side using CTEs with `ROW_NUMBER()`: - `PinChatByID`: normalizes existing orders and appends to end - `UnpinChatByID`: sets target to 0 and compacts remaining pins - `UpdateChatPinOrder`: shifts neighbors, clamps to `[1, pinned_count]` All queries exclude archived chats. `ArchiveChatByID` clears `pin_order` on archive. The handler rejects pinning archived chats with 400. ## Backend Pin/unpin/reorder go through the existing `PATCH /api/experimental/chats/{chat}` via the `pin_order` field on `UpdateChatRequest`. The handler routes based on current pin state: `pin_order == 0` unpins, `> 0` on an already-pinned chat reorders, `> 0` on an unpinned chat appends to end. ## Frontend - `pinChat` / `unpinChat` / `reorderPinnedChat` optimistic mutations using shared `isChatListQuery` predicate - Sidebar renders Pinned section above time groups, excludes pinned chats from time groups - Pin/Unpin context menu items (hidden for child/delegated chats) - `@dnd-kit/core` + `@dnd-kit/sortable` for drag-to-reorder with `MouseSensor`, `TouchSensor`, and `KeyboardSensor` - Local pin-order override prevents flash on drop; click blocker prevents NavLink navigation after drag --- *PR generated with Coder Agents* |
||
|
|
bfee7e6245 |
fix: populate all chat fields in pubsub events (#23664)
*Problem:* `publishChatPubsubEvent` was constructing a partial
`codersdk.Chat` that omitted `LastModelConfigID` and other fields. Go's
zero-value UUID caused the sidebar to show "Default model" for chats
received via SSE.
*Solution:*
- Extracted `convertChat`/`convertChats` from `exp_chats.go` into
`db2sdk.Chat`/`db2sdk.Chats`, alongside existing `ChatMessage`,
`ChatQueuedMessage`, and `ChatDiffStatus` converters.
`publishChatPubsubEvent` now calls `db2sdk.Chat(chat, nil)` instead of
maintaining its own copy of the conversion logic
- Added backend integration test
`TestWatchChats/CreatedEventIncludesAllChatFields`
- Added frontend regression tests for nil-UUID and valid model config ID
cases
> 🤖 Created by Coder Agents, reviewed by this human.
|
||
|
|
4f063cdc47 |
feat: separate default and additional Coder Agents system prompts (#23616)
Admins can now control whether the built-in Coder Agents default system prompt is prepended to their custom instructions, rather than having the custom prompt silently replace the default. **Changes:** - New `include_default_system_prompt` boolean toggle (defaults to `true` for existing deployments) stored as a site config key — no migration needed. - GET `/api/experimental/chats/config/system-prompt` returns the toggle state, the custom prompt, and a preview of the built-in default. - PUT persists both the toggle and custom prompt atomically in a single transaction. - `resolvedChatSystemPrompt()` composes `[default?, custom?]` joined by `\n\n`, falling back to the built-in default on DB errors. - Settings UI adds a Switch toggle with conditional helper text and a "Preview" button that shows the built-in default prompt via the existing `TextPreviewDialog`. - Comprehensive test coverage: 15 subtests covering toggle behavior, prompt composition matrix, auth boundaries, and integration with chat creation. |