mirror of
https://github.com/coder/coder.git
synced 2026-06-03 13:08:25 +00:00
8a2f28fa6a2ea8bf755dd7836cf331eead9628d2
17 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
06bad73df4 |
feat: add admin-configurable advisor API, SDK, and queries (#24621)
## Summary
Add the **admin-configurable advisor configuration**: database-backed storage, SDK types, and the experimental HTTP handlers that back the admin settings UI (later PRs). Follows the same "site-configs" pattern as Virtual Desktop.
## Motivation
The advisor needs runtime-tunable knobs (enable/disable, per-run cap, max output tokens, reasoning effort, optional model override) without a service restart or redeploy. Using the existing `site_configs` K/V table keeps this pattern consistent with other admin features and avoids a bespoke schema.
## Changes
### Database (`coderd/database/queries/siteconfig.sql`)
- `GetChatAdvisorConfig` returns the stored JSON blob (default `'{}'`) under key `agents_advisor_config`.
- `UpsertChatAdvisorConfig` uses the standard `INSERT ... ON CONFLICT` pattern.
- Regenerated via `make gen` (queries.sql.go + mocks).
### SDK (`codersdk/chats.go`)
- `AdvisorConfig` type with `Enabled`, `MaxUsesPerRun`, `MaxOutputTokens`, `ReasoningEffort` (`""` / `low` / `medium` / `high`), `ModelConfigID uuid.UUID`.
- Client methods: `ChatAdvisorConfig(ctx)` / `UpdateChatAdvisorConfig(ctx, cfg)`.
### API (`coderd/exp_chats.go`)
- `GET /api/experimental/chats/config/advisor`: reads current config; relies on `ActorFromContext` validation.
- `PUT /api/experimental/chats/config/advisor`: requires `policy.ActionUpdate` on `rbac.ResourceDeploymentConfig`.
- Handlers unmarshal `{}` to a typed zero value and re-marshal on upsert for schema stability.
- Tests in `exp_chats_test.go` cover empty defaults, round-trip update, unauthorized update, and invalid body.
## Stack context
This is **PR 3 of 6** in the advisor feature stack. Consumed by:
- PR 4 (`feat/advisor-04-chatd-runtime`), which reads this config on every `runChat`.
- PR 6 (`feat/advisor-06-admin-settings-ui`), which renders the admin form.
## Scope / non-goals
- No `chatd` read path (lands in PR 4).
- No UI (lands in PR 6).
- `agents_advisor_config` remains a single-row JSON blob; we intentionally do not shard per-org/per-template yet.
## Validation
- `make gen`
- `go test ./coderd/database/... -run TestChatAdvisor`
- `go test ./coderd/... -run TestChatAdvisorConfig`
- `make lint`
---
<details>
<summary>📋 Implementation Plan (shared across the advisor stack)</summary>
# Plan: Add a Mux-style advisor tool to coder agents/chatd
## Outcome
Add a first-class `advisor` tool to agent chats in `coderd/x/chatd` that feels native to Coder:
- it is a built-in server-side tool, not an MCP/dynamic-tool workaround;
- it performs a nested **tool-less** model call for strategic advice;
- it is exposed only when eligible, and the prompt mentions it only when it is actually available;
- it is treated as a **planning-only** tool so it does not run alongside action tools in the same batch;
- it tracks usage/cost separately enough for operators to reason about it;
- it has a minimally polished UI in the Agents page;
- and it ships with explicit dogfooding evidence, including screenshots and repro videos.
## Design decisions to lock before coding
1. **Primary architecture:** native built-in tool in `chattool/`, backed by a small `chatadvisor` package.
2. **Nested model execution:** reuse chatd's existing model/provider stack for a one-step, tool-less advisor call rather than inventing a new provider pathway.
3. **Execution policy:** treat `advisor` as an exclusive/planning-only tool; mixed batches must return structured policy errors and force the model to retry cleanly.
4. **Availability:** initial rollout is for root agent chats only; disable for child/sub-agent chats until recursion/cost policy is proven.
5. **Prompt sync:** use one eligibility boolean to drive both tool registration and advisor guidance injection.
6. **Persistence/cost split:** MVP should keep advisor usage visible in result metadata and server metrics; only add DB schema if product/billing explicitly needs queryable advisor-specific cost.
7. **UI scope:** generic tool rendering is an acceptable temporary milestone during backend bring-up, but the release candidate should include a dedicated lightweight advisor renderer.
## Delivery model
The work should be executed as coordinated workstreams with one integration owner and parallel contributors for low-conflict areas. The integration owner should own `coderd/x/chatd/chatd.go` because prompt assembly, tool registration, and model resolution all converge there.
## Detailed workstreams
### Repo evidence used for this plan
<details>
<summary>Mux reference and current chatd seams</summary>
**Mux reference implementation**
- `src/node/services/tools/advisor.ts` — native advisor tool implementation.
- `src/common/constants/advisor.ts` — advisor prompt/constants and truncation policy.
- `src/common/utils/tools/tools.ts` — conditional tool registration.
- `src/node/services/streamContextBuilder.ts` — injects advisor guidance only when the tool is available.
**Current chatd seams**
- `coderd/x/chatd/chatd.go`
- `processChat()` — tool assembly, prompt assembly, and chatloop invocation.
- `resolveChatModel()` — current model/provider/key resolution seam.
- `type Config struct` — server-level chatd configuration surface.
- `coderd/x/chatd/chatloop/chatloop.go`
- `Run()` — main streaming/model loop.
- `executeTools()` — built-in tool execution/batching seam.
- `coderd/x/chatd/chattool/` — built-in tool implementations.
- `site/src/pages/AgentsPage/components/ChatElements/tools/Tool.tsx` — tool renderer dispatch.
- `site/src/pages/AgentsPage/components/ChatConversation/messageParsing.ts` and `ConversationTimeline.tsx` — tool/result merge and rendering flow.
</details>
### Workstream map and ownership
| Workstream | Primary owner | Main files | Can run in parallel? | Done when |
|---|---|---|---|---|
| 0. Integration + gating | Integration lead | `coderd/x/chatd/chatd.go` | No; central merge lane | Tool registration, prompt sync, and model selection are wired together |
| 1. Advisor runtime + tool | Backend agent | new `coderd/x/chatd/chatadvisor/`, new `coderd/x/chatd/chattool/advisor.go` | Yes | Tool can perform a tool-less advisor call in memory and return structured results |
| 2. Planning-only execution policy | Chatloop agent | `coderd/x/chatd/chatloop/chatloop.go`, related tests | Yes | Mixed `advisor` + action-tool batches are rejected cleanly and deterministically |
| 3. Metrics/usage/config | Backend/telemetry agent | `chatd.go`, `chatloop/metrics.go`, optional config plumbing | Partially; coordinate with integration lead | Advisor usage is separately visible in metadata/metrics and limits are enforced |
| 4. Frontend rendering | Frontend agent | `site/.../tools/Tool.tsx`, new `AdvisorTool.tsx`, stories | Yes after result schema stabilizes | Advisor renders as a readable card and story tests pass |
| 5. Dogfood + QA evidence | QA agent | dev server, Storybook, dogfood output | After backend + UI are usable | Repro videos, screenshots, and a concise QA report exist |
### Parallelization rules
- **Do not split `coderd/x/chatd/chatd.go` across multiple execution agents without an integration lead.** That file owns prompt building, tool registration, model resolution, and cost persistence.
- Workstreams 1 and 2 can be developed in parallel and then stacked onto the integration branch.
- Workstream 4 should begin once the backend result schema is agreed on, even if the backend is still behind a feature flag.
- Any agent that needs to re-check Mux behavior should clone `coder/mux` into a temporary directory (for example, `$(mktemp -d)/mux`) and inspect it read-only; do not vendor or copy code from Mux directly.
## Phase 0 — Preflight and guardrails
### Goals
- Align the team on the smallest shippable architecture.
- Prevent scope creep into MCP/dynamic-tool/sub-agent variants.
- Decide upfront what is MVP vs. follow-up.
### Tasks
1. **Confirm the MVP boundary.**
- Ship a built-in advisor tool first.
- Do **not** make MCP, dynamic tools, or sub-agents the primary implementation.
- Do **not** add transient streaming phases in the first backend PR unless they fall out almost for free.
2. **Confirm local workflow hygiene before coding.**
- Ensure the repo is using the project git hooks from `scripts/githooks`.
- Do not bypass hooks with `--no-verify`.
- Use `./scripts/develop.sh` for the full dev server rather than manual build/run commands.
3. **Lock the model-selection policy.**
- **Recommended MVP:** advisor uses the same resolved provider/model/cost config as the current chat, with advisor-specific max-output and usage caps.
- **Follow-up only if required:** add a separate `AdvisorModelConfigID`-style override that resolves through the existing `configCache`/model-config path. Do not invent a new free-form `provider:model` parser if chatd already stores provider/model separately.
4. **Lock the persistence policy.**
- **Recommended MVP:** no DB migration. Persist advisor-visible metadata in the tool result and record separate metrics in memory/Prometheus.
- **Only if product/billing explicitly asks for queryable advisor cost:** add a later DB migration or usage table, following the normal `queries/*.sql` + `make gen` workflow.
5. **Create an execution ADR note in the work item or tracking doc.**
- Capture: built-in tool, tool-less nested call, root-chat-only rollout, exclusive execution policy, MVP no-DB-migration default.
### Quality gate
- Everyone on the team can state the same answers to these questions:
- Is advisor a built-in tool? **Yes.**
- Can advisor run with action tools in the same batch? **No.**
- Does advisor get tools of its own? **No.**
- Is a DB migration required for MVP? **No, unless billing insists.**
## Phase 1 — Build the advisor runtime and tool wrapper
### Goals
Create the core advisor implementation in a way that is easy to test and keeps `chattool/` thin.
### Files to add
- `coderd/x/chatd/chatadvisor/types.go`
- `coderd/x/chatd/chatadvisor/guidance.go`
- `coderd/x/chatd/chatadvisor/handoff.go`
- `coderd/x/chatd/chatadvisor/runtime.go`
- `coderd/x/chatd/chatadvisor/runner.go`
- `coderd/x/chatd/chattool/advisor.go`
### Responsibilities by file
1. **`types.go`**
- Define the input/result schema used by the tool and UI.
- Keep the result shape close to Mux so the UI and model both have predictable cases.
- Recommended result variants:
- `advice`
- `limit_reached`
- `error`
Recommended shape:
```go
type AdvisorArgs struct {
Question string `json:"question"`
}
type AdvisorResult struct {
Type string `json:"type"`
Advice string `json:"advice,omitempty"`
Error string `json:"error,omitempty"`
AdvisorModel string `json:"advisor_model,omitempty"`
RemainingUses int `json:"remaining_uses,omitempty"`
Usage *AdvisorUsageResult `json:"usage,omitempty"`
}
```
2. **`guidance.go`**
- Hold two strings:
- the nested advisor system prompt;
- the parent-agent guidance block to inject into the outer system prompt.
- The nested advisor prompt must say, in plain language:
- you are advising the parent agent;
- you do not address the end user directly;
- you do not claim actions happened;
- you return concise strategic guidance and tradeoffs.
3. **`runtime.go`**
- Define the per-run runtime state.
- Recommended fields:
- resolved model + model config;
- provider keys/options reused from the outer chat;
- `MaxUsesPerRun`;
- `MaxOutputTokens`;
- atomic/current call counter;
- callback(s) to obtain the current prompt snapshot and current-step snapshot;
- optional metrics/usage hook.
- Add fail-fast validation for impossible config: nil model, non-positive limits, empty prompt builders, etc.
4. **`handoff.go`**
- Build the advisor handoff message from:
- the explicit question;
- the exact prompt/messages the parent model just used;
- the current step's text/reasoning snapshot, if available;
- the most recent relevant tool outputs, if they are already in the prompt snapshot.
- **Important:** use the already-prepared outer prompt tail, not a fresh DB reload. That keeps the advisor aligned with compaction and the exact context the outer model saw.
- Apply hard truncation budgets with recent-context bias.
5. **`runner.go`**
- Execute the nested advisor call.
- **Recommended implementation:** call `chatloop.Run()` in an in-memory, one-step mode:
- `Tools: nil`
- `ProviderTools: nil`
- `MaxSteps: 1`
- `PersistStep`: capture the assistant output in memory instead of writing DB rows
- Reuse the existing provider/model/cost path instead of building a second provider runner.
- Assert that no tool definitions are passed to the nested call.
6. **`chattool/advisor.go`**
- Keep this file thin and consistent with other built-ins.
- Responsibilities:
- decode `AdvisorArgs`;
- validate `Question` is non-empty and bounded;
- call the `chatadvisor` runner;
- return a structured tool response.
### Defensive programming requirements
- Assert `Question` is non-empty after trimming.
- Assert runtime limits are positive.
- Assert the nested advisor call runs with zero tools/provider tools.
- Assert `AdvisorResult.Type` is one of the known variants before returning.
- Assert remaining uses never goes negative.
### Acceptance criteria
- A unit test can call the advisor tool with a fake model and receive a stable `advice` result.
- The nested advisor call is impossible to run with tools accidentally attached.
- The core logic lives in `chatadvisor/`, not embedded inside `chatd.go`.
## Phase 2 — Wire advisor into chatd and keep prompt/tool availability in sync
### Goals
Register the tool in the right place, expose it only when eligible, and inject system guidance only when the tool is present.
### Files to modify
- `coderd/x/chatd/chatd.go`
- optionally a small helper file if `chatd.go` becomes too crowded
### Tasks
1. **Compute one eligibility boolean in `processChat()`.**
Recommended inputs:
- server-level advisor enabled flag;
- root chat only (`chat.ParentChatID == uuid.Nil` or equivalent existing root/child check);
- a usable resolved model/provider exists;
- optional experiment/workspace/org gate if product wants staged rollout.
2. **Create the runtime once per outer chat run.**
- Use the model/config/keys resolved by `resolveChatModel()`.
- Reuse provider options from the current chat's `ChatModelCallConfig`.
- Set `MaxUsesPerRun` and `MaxOutputTokens` from advisor config defaults.
3. **Register the tool in the built-in tool block.**
- Insert after the skill tools and before MCP tools in `processChat()`.
- Record `builtinToolNames["advisor"] = true` so metrics stay bounded.
4. **Inject advisor guidance into the outer system prompt using the same boolean.**
- Use `chatprompt.InsertSystem()` in the same prompt assembly path that already injects user/system instructions.
- Place the block near the existing instruction insertion, before plan-path/skill context blocks.
- Wrap the guidance in an explicit tag like `<advisor-guidance>` so it is easy to spot in tests and future refactors.
5. **Keep advisor out of child chats for the first release.**
- That avoids recursion/cost blowups with `spawn_agent` / `wait_agent` flows.
- Document this explicitly in the rollout notes and tests.
### Acceptance criteria
- If advisor is disabled, neither the tool nor the prompt guidance appears.
- If advisor is enabled, both the tool and the prompt guidance appear.
- Root chats can use advisor; child chats cannot.
- Built-in tool names include `advisor` so metrics do not collapse it into the generic `mcp` label.
## Phase 3 — Enforce planning-only execution policy in `chatloop`
### Goals
Prevent the model from calling `advisor` and action tools in the same execution batch.
### Files to modify
- `coderd/x/chatd/chatloop/chatloop.go`
- related chatloop tests
### Recommended implementation
Keep the MVP small; do **not** build a general policy engine yet.
1. Add a minimal field to `chatloop.RunOptions`, for example:
```go
ExclusiveToolName *string
```
2. In `Run()` / `executeTools()`, detect the case where the exclusive tool appears in the same local-tool batch as any other locally executed tool.
3. When that happens, synthesize structured tool-result errors for the affected calls instead of executing anything in the batch.
- `advisor` should receive a clear error like: _advisor must be called by itself before action tools_.
- The sibling action tools should receive a paired policy error like: _this tool was skipped because advisor must run alone_.
4. Let the outer model see those tool errors and retry cleanly.
- This is simpler and safer than partial execution or hidden deferral.
- It preserves deterministic transcript history for debugging.
5. Pass the just-finished step snapshot into the tool execution context.
- The advisor runtime should be able to see the current step's text/reasoning content, because that is often the best hint about what the outer model is trying to decide.
### Why this is the right fit
- It matches the intended semantics: advisor is consulted **before** taking action.
- It avoids subtle race conditions caused by concurrent built-in tool execution.
- It keeps the behavior easy to test with fake models.
### Acceptance criteria
- A model-emitted batch containing only `advisor` succeeds.
- A model-emitted batch containing `advisor` plus any other locally executed tool returns deterministic policy errors and executes nothing.
- Non-advisor tool execution stays unchanged for normal chats.
## Phase 4 — Usage limits, metrics, and configuration
### Goals
Make advisor safe to operate without over-designing billing/storage in the first release.
### Files to modify
- `coderd/x/chatd/chatd.go`
- `coderd/x/chatd/chatloop/metrics.go` as needed
- `coderd/x/chatd/chatd.go` `Config` struct and constructor path
- optional follow-up config/db files only if a separate advisor model or persistent billing is required
### Tasks
1. **Add explicit server config knobs for MVP.**
Recommended fields on `chatd.Config` or a nested advisor config struct:
- `AdvisorEnabled bool`
- `AdvisorMaxUsesPerRun int`
- `AdvisorMaxOutputTokens int64`
2. **Track usage per outer run.**
- Reset the counter for each `processChat()` invocation.
- Return `remaining_uses` in the tool result.
- Return `limit_reached` when the cap is exhausted.
3. **Expose advisor usage metadata in the tool result.**
- Include model name and token/cost summary if available.
- Use the same `callConfig.Cost` calculation path as the outer chat for MVP if advisor reuses the same model.
4. **Record server-side metrics.**
- Count advisor invocations, failures, and latency.
- Ensure they show up under the built-in tool label `advisor`.
5. **Optional decision gate: separate advisor model.**
- If product insists on a stronger/different advisor model, add a follow-up config hook that resolves another existing chat model config through the same `configCache` path.
- Keep that out of the first landing PR unless it is required for acceptance.
6. **Optional decision gate: queryable advisor cost.**
- If this becomes required, spin a follow-up DB task:
- update `coderd/database/queries/*.sql`;
- add migration files;
- run `make gen`;
- update audit mappings if a new auditable type/field is introduced.
### Acceptance criteria
- Advisor calls are capped per outer run.
- Limit exhaustion is user-visible in the tool result.
- Metrics distinguish advisor calls from other built-in tools.
- MVP does not require a schema migration unless explicitly approved.
## Phase 5 — Frontend rendering and Storybook coverage
### Goals
Make advisor feel intentional in the Agents UI without blocking the backend on fancy streaming UI.
### Files to modify
- `site/src/pages/AgentsPage/components/ChatElements/tools/Tool.tsx`
- new `site/src/pages/AgentsPage/components/ChatElements/tools/AdvisorTool.tsx`
- Storybook story file(s) in the same tools directory
### Delivery strategy
1. **Intermediate milestone during backend bring-up:** rely on the existing generic tool renderer if needed.
- This is acceptable only as a short-lived integration checkpoint.
2. **Release milestone:** add a dedicated lightweight `AdvisorTool` renderer.
- Reuse existing primitives:
- `ToolCollapsible`
- `ToolIcon`
- `Response` for markdown/prose rendering
- `ScrollArea` if the advice can be long
- Keep styling light and consistent with the Agents page.
- Do not add unnecessary React memoization in `site/src/pages/AgentsPage/`; that area is already React-Compiler aware.
3. **Render the structured result states cleanly.**
- `advice` — readable prose/markdown with optional metadata footer.
- `limit_reached` — warning-style message.
- `error` — error state with visible fallback text.
- `running` — existing tool loading state/spinner is enough for MVP.
4. **Add Storybook coverage instead of ad-hoc component tests.**
Recommended stories:
- successful advice;
- running/loading;
- limit reached;
- error.
5. **Keep the UI contract narrow.**
- Prefer one text field like `advice` plus small metadata rather than a deeply nested schema.
- That keeps the UI resilient to prompt iteration.
### Acceptance criteria
- The advisor tool card renders readable content rather than raw quoted JSON in the final release branch.
- Running, limit, and error states are visibly distinct.
- Storybook stories and play assertions cover the new states.
- Existing tool rendering flows remain unchanged.
## Phase 6 — Automated tests and validation gates
### Backend tests to add
1. **Advisor runtime/tool tests**
- question validation;
- tool-less nested execution assertion;
- success result shaping;
- limit-reached result shaping;
- error result shaping.
2. **Prompt/gating tests in chatd**
- advisor disabled ⇒ no tool, no guidance;
- advisor enabled/root chat ⇒ tool + guidance;
- child chat ⇒ advisor absent.
3. **Chatloop policy tests**
- advisor alone runs;
- advisor + action tool mixed batch returns deterministic policy errors;
- non-advisor tools still execute normally.
4. **Usage/metrics tests**
- per-run cap resets correctly;
- builtin tool labeling includes `advisor`;
- returned metadata includes model/usage summary when available.
### Frontend tests to add
- Storybook `play()` assertions for the advisor renderer states.
- Verify expand/collapse behavior and visible fallback text.
- Verify the message timeline still renders adjacent tools correctly.
### Recommended command sequence
Run these as the implementation matures, not only at the end:
1. Backend-focused gate after phases 1–4:
- `make test RUN=TestAdvisor`
- `make test RUN=TestChatloopAdvisor`
- `make lint`
2. Frontend-focused gate after phase 5:
- `pnpm test:storybook src/pages/AgentsPage/components/ChatElements/tools/AdvisorTool.stories.tsx`
- `pnpm lint`
- `pnpm format`
3. Final repo gate before handoff:
- `make pre-commit`
- run any additional targeted `make test RUN=...` selections covering touched chatd paths
> Use the exact new test names the implementing agents create; the names above are recommended anchors, not existing tests.
## Dogfooding plan
### Principle
Dogfood the change as a real agent feature, not just a unit-tested backend. Per the dogfood and `agent-browser` skills, the reviewer should get **watchable repro videos** plus screenshots that make the behavior obvious without reading logs.
### Required setup
1. Start the full dev environment with:
- `./scripts/develop.sh`
2. If the frontend renderer changes, also start Storybook from `site/` with:
- `pnpm storybook --no-open`
3. Use `agent-browser` directly — **never `npx agent-browser`**.
4. Use named browser sessions and an output folder such as:
- `./dogfood-output/advisor/`
- with subfolders `screenshots/` and `videos/`
### Evidence protocol
For every interactive scenario below:
1. Start video recording **before** the action.
2. Capture step-by-step screenshots at human pace.
3. Capture one annotated screenshot of the final state.
4. Stop the recording.
5. Note the exact pass/fail observation in the QA report.
For static UI states (for example Storybook error/limit cards), an annotated screenshot is sufficient; video is optional but still encouraged by this project’s review preference.
### Dogfood scenarios
#### Scenario A — Happy path in the real Agents UI
**Goal:** prove that a root agent chat can invoke advisor and produce a readable recommendation before taking further action.
Steps:
1. Open the Agents page with an advisor-enabled root chat.
2. Start a repro video.
3. Send a prompt that should reasonably trigger strategic planning, such as an architecture or multi-tradeoff question.
4. Capture screenshots of:
- the prompt before send;
- the running advisor state;
- the completed advisor card and the assistant’s follow-up response.
5. Stop recording.
Pass criteria:
- advisor appears in the timeline;
- the rendered result is readable;
- the assistant can continue after consuming the advisor output.
#### Scenario B — Advisor unavailable path
**Goal:** prove the feature is truly gated.
Suggested variants (at least one is required, both are better):
- feature flag/config off;
- child/sub-agent chat.
Evidence:
- annotated screenshot of the chat/tool state showing advisor is absent;
- short video if toggling the gate live is part of the repro.
Pass criteria:
- no advisor tool is available;
- no advisor-specific prompt behavior leaks through.
#### Scenario C — UI states in Storybook
**Goal:** prove the renderer handles non-happy states cleanly.
Required story states:
- success/advice;
- running;
- limit reached;
- error.
Evidence:
- one screenshot per state;
- at least one short video showing collapse/expand behavior.
Pass criteria:
- success renders readable advice;
- limit/error have visible fallback text;
- the component behaves like the other tool cards.
#### Scenario D — Regression sweep of nearby tools
**Goal:** ensure advisor does not break the surrounding chat timeline.
Check at minimum:
- another existing built-in tool still renders correctly near advisor;
- sub-agent/tool cards still expand/collapse normally;
- no obvious console errors appear in the Agents page during the advisor flow.
Evidence:
- screenshots of adjacent tool cards;
- console/error capture if anything suspicious appears.
### `agent-browser` usage notes for the QA agent
- Prefer `agent-browser batch` for 2+ sequential commands when no intermediate parsing is needed.
- Use `snapshot -i` to discover interactive refs.
- Re-snapshot after navigation or major DOM changes.
- Avoid `wait --load networkidle` unless the page is known to go idle; prefer explicit element/text waits or short fixed waits.
- Record videos at human pace and include pauses that a reviewer can follow.
## Rollout plan
### Initial rollout
- Gate behind a server-side advisor-enabled flag.
- Enable only for selected internal/root agent chats first.
- Watch metrics for:
- invocation count;
- failure rate;
- latency;
- obvious retry loops.
### Expansion conditions
Expand beyond the initial rollout only after the following are true:
- mixed-batch policy behavior is stable;
- cost impact is understood;
- frontend UX is readable in production-like dogfood;
- no recursion surprises have appeared with sub-agent flows.
### Explicit non-goals for the first release
- advisor inside child/sub-agent chats;
- provider-agnostic streaming phase UI;
- MCP-based external advisor implementation;
- mandatory DB-backed advisor cost reporting.
## Final acceptance checklist
- [ ] `advisor` is a built-in chatd tool, not an MCP/dynamic-tool substitute.
- [ ] The nested advisor call is tool-less and bounded to one in-memory step.
- [ ] One eligibility boolean controls both tool registration and prompt guidance injection.
- [ ] Root chats can use advisor; child chats cannot in the initial rollout.
- [ ] Mixed advisor/action batches produce deterministic policy errors instead of partial execution.
- [ ] Per-run usage caps and limit-reached behavior work.
- [ ] Advisor usage is visible in metadata/metrics without forcing a DB migration for MVP.
- [ ] The Agents UI has a readable advisor card and Storybook coverage.
- [ ] Dogfooding produced screenshots and repro videos for the required scenarios.
- [ ] Validation commands (`make lint`, targeted `make test`, Storybook tests, `make pre-commit`) passed before handoff.
## Suggested PR split
1. **PR 1 — Backend foundation**
- `chatadvisor/` package
- `chattool/advisor.go`
- `chatloop` exclusive policy
- chatd gating/prompt sync
- backend tests
2. **PR 2 — Frontend + QA**
- advisor renderer
- stories/play assertions
- dogfood artifacts and QA notes
3. **PR 3 — Optional follow-ups only if demanded by stakeholders**
- separate advisor model override
- persistent advisor billing/queryability
- transient phase-stream UX
</details>
---
_Generated with [`mux`](https://github.com/coder/mux) • Model: `anthropic:claude-opus-4-7` • Thinking: `max`_
|
||
|
|
a6a8fd94d7 |
build(Makefile): enable parallel make -j gen with correct dependency graph (#22612)
`make gen` could not run with `-j` because inter-target dependency edges were missing. Multiple recipes compile `coderd/rbac` (which includes generated files like `object_gen.go`), and without explicit ordering, parallel runs produced syntax errors from mid-write reads. Three main changes: **Dependency graph fixes** declare the compile-time chain through `coderd/rbac` so that `object_gen.go` is written before anything that imports it is compiled. The DB generation targets use a GNU Make 4.3+ grouped target (`&:`) so Make knows `generate.sh` co-produces `querier.go`, `unique_constraint.go`, `dbmetrics`, and `dbauthz` in a single invocation. `SKIP_DUMP_SQL=1` avoids re-entrant `make` inside `generate.sh` when the Makefile already guarantees `dump.sql` is fresh. **`scripts/atomicwrite` package** replaces `os.WriteFile` in all gen scripts with a temp-file-in-same-dir + rename pattern, preventing interrupted runs from leaving partial files. **`.PRECIOUS` and shell atomic writes** protect git-tracked generated files from Make's default delete-on-error behavior. Since these files are committed, deletion is worse than staleness -- `git restore` is the recovery path. CI now runs `make -j --output-sync -B gen` (~32s, down from ~85s serial). | Scenario | Before | After | |-----------------------------------|--------------------|----------| | `make gen` (serial) | 95s | 95s | | `make -j gen` (parallel) | race error | **22s** | | CI `make -j --output-sync -B gen` | forced serial ~85s | **~32s** | |
||
|
|
5073493850 |
feat(coderd/database/dbmetrics): add query_counts_total metric (#21506)
Adds a new Prometheus metric `coderd_db_query_counts_total` that tracks the total number of queries by route, method, and query name. This is aimed at helping us track down potential optimization candidates for HTTP handlers that may trigger a number of queries. It is expected to be used alongside `coderd_api_requests_processed_total` for correlation. Depends upon new middleware introduced in https://github.com/coder/coder/pull/21498 Relates to https://github.com/coder/internal/issues/1214 |
||
|
|
dc598856e3 |
chore: improve build deadline code (#19203)
- Adds/improves a lot of comments to make the autostop calculation code clearer - Changes the behavior of the enterprise template schedule store to match the behavior of the workspace TTL endpoint when the new TTL is zero - Fixes a bug in the workspace TTL endpoint where it could unset the build deadline, even though a max_deadline was specified - Adds a new constraint to the workspace_builds table that enforces the deadline is non-zero and below the max_deadline if it is set - Adds CHECK constraint enum generation to scripts/dbgen, used for testing the above constraint - Adds Dean and Danielle as CODEOWNERS for the autostop calculation code |
||
|
|
3c2f3d640b |
chore: remove dbmem (#18803)
Remove the in-memory database. Addresses #15109. |
||
|
|
74e1d5c4b6 |
feat: implement OAuth2 dynamic client registration (RFC 7591/7592) (#18645)
# Implement OAuth2 Dynamic Client Registration (RFC 7591/7592)
This PR implements OAuth2 Dynamic Client Registration according to RFC 7591 and Client Configuration Management according to RFC 7592. These standards allow OAuth2 clients to register themselves programmatically with Coder as an authorization server.
Key changes include:
1. Added database schema extensions to support RFC 7591/7592 fields in the `oauth2_provider_apps` table
2. Implemented `/oauth2/register` endpoint for dynamic client registration (RFC 7591)
3. Added client configuration management endpoints (RFC 7592):
- GET/PUT/DELETE `/oauth2/clients/{client_id}`
- Registration access token validation middleware
4. Added comprehensive validation for OAuth2 client metadata:
- URI validation with support for custom schemes for native apps
- Grant type and response type validation
- Token endpoint authentication method validation
5. Enhanced developer documentation with:
- RFC compliance guidelines
- Testing best practices to avoid race conditions
- Systematic debugging approaches for OAuth2 implementations
The implementation follows security best practices from the RFCs, including proper token handling, secure defaults, and appropriate error responses. This enables third-party applications to integrate with Coder's OAuth2 provider capabilities programmatically.
|
||
|
|
17ddee05e5 |
chore: update golang to 1.24.1 (#17035)
- Update go.mod to use Go 1.24.1 - Update GitHub Actions setup-go action to use Go 1.24.1 - Fix linting issues with golangci-lint by: - Updating to golangci-lint v1.57.1 (more compatible with Go 1.24.1) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <claude@anthropic.com> |
||
|
|
de41bd6b95 |
feat: add support for workspace app audit (#16801)
This change adds support for workspace app auditing. To avoid audit log spam, we introduce the concept of app audit sessions. An audit session is unique per workspace app, user, ip, user agent and http status code. The sessions are stored in a separate table from audit logs to allow use-case specific optimizations. Sessions are ephemeral and the table does not function as a log. The logic for auditing is placed in the DBTokenProvider for workspace apps so that wsproxies are included. This is the final change affecting the API fo #15139. Updates #15139 |
||
|
|
ccfffc6911 |
chore: add tx metrics and logs for serialization errors (#15215)
Before db_metrics were all or nothing. Now `InTx` metrics are always recorded, and query metrics are opt in. Adds instrumentation & logging around serialization failures in the database. |
||
|
|
5abfe5afd0 | chore: rename dbfake to dbmem (#10432) | ||
|
|
1262eef2c0 |
feat: add support for coder_script (#9584)
* Add basic migrations * Improve schema * Refactor agent scripts into it's own package * Support legacy start and stop script format * Pipe the scripts! * Finish the piping * Fix context usage * It works! * Fix sql query * Fix SQL query * Rename `LogSourceID` -> `SourceID` * Fix the FE * fmt * Rename migrations * Fix log tests * Fix lint err * Fix gen * Fix story type * Rename source to script * Fix schema jank * Uncomment test * Rename proto to TimeoutSeconds * Fix comments * Fix comments * Fix legacy endpoint without specified log_source * Fix non-blocking by default in agent * Fix resources tests * Fix dbfake * Fix resources * Fix linting I think * Add fixtures * fmt * Fix startup script behavior * Fix comments * Fix context * Fix cancel * Fix SQL tests * Fix e2e tests * Interrupt on Windows * Fix agent leaking script process * Fix migrations * Fix stories * Fix duplicate logs appearing * Gen * Fix log location * Fix tests * Fix tests * Fix log output * Show display name in output * Fix print * Return timeout on start context * Gen * Fix fixture * Fix the agent status * Fix startup timeout msg * Fix command using shared context * Fix timeout draining * Change signal type * Add deterministic colors to startup script logs --------- Co-authored-by: Muhammad Atif Ali <atif@coder.com> |
||
|
|
2ac532982d |
feat(coderd/database): generate foreign key constraints and add database.IsForeignKeyViolation (#9657)
* feat(coderd/database): generate foreign key constraints, add database.IsForeignKeyViolation * address PR comments |
||
|
|
22e781eced |
chore: add /v2 to import module path (#9072)
* chore: add /v2 to import module path go mod requires semantic versioning with versions greater than 1.x This was a mechanical update by running: ``` go install github.com/marwan-at-work/mod/cmd/mod@latest mod upgrade ``` Migrate generated files to import /v2 * Fix gen |
||
|
|
2fae9b0a69 | chore(scripts/dbgen): only add arg validation for dbfake (#8578) | ||
|
|
3b433181be |
chore: add custom querier functions to dbgen (#8496)
* chore: add custom querier functions to dbgen * chore: parse package was missing some imports, so force them |
||
|
|
b4057bd74a |
feat: make pgCoordinator generally available (#8419)
* pgCoord to GA, fix tests Signed-off-by: Spike Curtis <spike@coder.com> * Fix generation and coordinator delete RBAC Signed-off-by: Spike Curtis <spike@coder.com> * Fix fakeQuerier -> FakeQuerier Signed-off-by: Spike Curtis <spike@coder.com> --------- Signed-off-by: Spike Curtis <spike@coder.com> |
||
|
|
bbb0fab1de |
chore: merge database gen scripts (#8073)
* chore: merge database gen scripts * Fix type params gen * Merge enum into dbgen |