## Problem
Subagent chats were receiving git context (branch, remote origin, PR
status) from their parent or sibling chats' git operations. When a git
operation triggers external auth, the workspace agent sends `chat_id`
identifying which chat initiated it — but this was broken at two levels:
1. **Agent side:** `CODER_CHAT_ID` was never injected into process
environments. `chatd` sets `Coder-Chat-Id` HTTP headers and the
agent extracts them for process isolation, but never propagated
`CODER_CHAT_ID` to `cmd.Env`. So `gitaskpass` always sent an empty
`chat_id`.
2. **Server side:** `workspaceAgentsExternalAuth` ignored the `chat_id`
query param. `MarkStale` broadcast git context to **all** chats on
the workspace via `filterChatsByWorkspaceID`.
## Fix
- Inject `CODER_CHAT_ID` into `cmd.Env` in `agentproc` when the chat
ID is known, so `gitaskpass` can read and forward it.
- Read `chat_id` from query params in `workspaceAgentsExternalAuth`
and thread it through `chatGitRef`.
- Refactor `MarkStale` to accept a `MarkStaleParams` struct. When
`ChatID` is provided, target only that specific chat. When empty
(legacy agents, non-chat git operations), fall back to the existing
workspace-wide broadcast.
- Extract `markStaleSingle` helper to deduplicate the upsert+publish
logic.
<details><summary>Investigation notes</summary>
### Data flow before fix
```
chatd → sets Coder-Chat-Id header on agent conn
agent → extracts chatID, stores on process struct
agent → does NOT set CODER_CHAT_ID in cmd.Env ← gap 1
gitaskpass → reads CODER_CHAT_ID (always empty), sends chat_id=""
server handler → ignores chat_id query param ← gap 2
MarkStale → broadcasts to ALL workspace chats
```
### Data flow after fix
```
chatd → sets Coder-Chat-Id header on agent conn
agent → extracts chatID, stores on process struct
agent → sets CODER_CHAT_ID in cmd.Env
gitaskpass → reads CODER_CHAT_ID, sends chat_id=<uuid>
server handler → reads chat_id, passes to MarkStale
MarkStale → targets only that specific chat
```
</details>
## Problem
Every `GET /api/experimental/chats/{chatID}` call was blocking for
200-800ms because the `getChat` handler called `resolveChatDiffStatus`,
which unconditionally hit the git provider API (e.g. GitHub's `GET
/repos/{owner}/{repo}/pulls?head=...`) via `ResolveBranchPullRequest` —
even when the cached diff status was fresh.
This made every chat page load at `/agents/{id}` noticeably slow.
## Root cause
The call chain was:
1. `getChat` → `resolveChatDiffStatus`
2. `resolveChatDiffStatus` → `resolveChatDiffReference` →
`gp.ResolveBranchPullRequest(...)` **(external HTTP call)**
3. Only **after** the external call: `chatDiffStatusIsStale(status,
now)` check
The staleness check happened after the expensive work, so every request
paid the cost regardless of cache freshness.
## Fix
`getChat` now returns the cached `chat_diff_statuses` row directly from
the database. The background `gitsync` worker already keeps these rows
fresh (every `DiffStatusTTL = 120s`), so inline resolution was
redundant.
The `resolveChatDiffContents` endpoint (which fetches actual diff
content) still uses the full resolution path since it needs to make
provider API calls by design.
## Changes
- `getChat` reads cached diff status from DB instead of calling
`resolveChatDiffStatus`
- Remove `resolveChatDiffStatus` (dead code — no production callers)
- Remove `chatDiffStatusIsStale` and `chatDiffStatusTTL` (dead code)
- Remove `RefreshesStaleStatusWithExternalAuth` test (tested the removed
inline refresh path)
<details><summary>Decision log</summary>
- **Why not just add a staleness gate?** The background worker already
handles refreshes on the same schedule. Adding an early-return-if-fresh
would work but leaves dead code for the stale path that's never
exercised in production (the worker gets there first). Removing the
inline path entirely is simpler and eliminates the external API
dependency from the read path.
- **Why keep `resolveChatDiffContents` unchanged?** That endpoint's job
is to fetch the actual diff content from the provider, so external API
calls are inherent to its purpose.
</details>
Removes 6 fragile `require.Equal(t, codersdk.ChatStatusPending,
chat.Status)` assertions from chat relay and creation tests.
**Root cause**: In HA tests with two replicas sharing the same DB, the
worker can acquire a just-created chat (flipping `pending → running` via
`AcquireChats`) before the HTTP response reaches the test. All affected
tests already synchronize via `require.Eventually` waiting for `running`
status, making the initial assertion both redundant and racy.
- Remove 5 assertions in `enterprise/coderd/exp_chats_test.go` (all
`TestChatStreamRelay` subtests)
- Remove 1 assertion in `coderd/exp_chats_test.go` (`TestPostChats`)
- An existing comment in `TestPostChats/Success` already documents this
exact race
Fixes flake:
https://github.com/coder/coder/actions/runs/23807597632/job/69385425724
> 🤖 Written by a Coder Agent. Will be reviewed by a human.
Unarchiving a root chat now restores descendant chats in the database
and emits lifecycle events for every affected chat so passive sessions
converge without a full refetch.
This keeps archive and unarchive symmetric at both the data and
watch-stream layers by returning the affected chat family from the
database, using those post-update rows for chatd pubsub fanout, and
covering descendant lifecycle delivery with a watch-level regression
test.
Closes#23666
Fixes flaky `TestOpenAIReasoningWithWebSearchRoundTripStoreFalse` and
`TestOpenAIReasoningWithWebSearchRoundTrip`.
## Changes
- Gate the `processChat` control subscriber's cancel callback behind a
`chan struct{}` that is closed after publishing `"running"` status
- Add `TestGatedControlCancel` with 4 subtests exercising the gate logic
<details>
<summary>Root cause analysis</summary>
`SendMessage` publishes a `"pending"` notification on
`chat:stream:<chatID>` via PostgreSQL `NOTIFY`. `processChat` subscribes
to the same channel for control signals. Due to async NOTIFY delivery,
the `"pending"` notification can arrive at the control subscriber
**after** it registers its queue — even though it was published
**before**. `shouldCancelChatFromControlNotification("pending")` returns
`true`, immediately self-interrupting the processor before it does any
work.
The fix gates the cancel callback behind a closed channel. The channel
is closed after `processChat` publishes `"running"` status, so stale
notifications from before initialization are harmlessly ignored.
`close()` provides a happens-before guarantee in the Go memory model.
</details>
> 🤖 Written by a Coder Agent. Reviewed by a human.
Replaces the generic red `ErrorAlert` ("Forbidden.") with a proactive
permission check and friendly info alert when a user lacks the
`agents-access` role.
- Add `createChat` permission check to `permissions.json` using
`owner_id: "me"`
- Handle `"me"` owner substitution in `renderPermissions` (SSR path)
- Pass `canCreateChat` from `useAuthenticated().permissions` into
`AgentCreateForm`
- Show `ChatAccessDeniedAlert` and disable input immediately (no need to
trigger a 403 first)
- Also catch 403 errors as a fallback in case permissions aren't yet
loaded
- Add `ForbiddenNoAgentsRole` Storybook story with `play` assertions
- Add `TestRenderPermissionsResolvesMe` Go test to pin the `"me"`
sentinel substitution
<details><summary>Implementation plan & decision log</summary>
- Uses the existing `permissions.json` + `checkAuthorization` system
rather than a separate API call
- `owner_id: "me"` is resolved to the actor's ID by both the auth-check
API endpoint and the SSR `renderPermissions` function
- Go test uses a real `rbac.StrictCachingAuthorizer` (not a mock) so it
verifies both the sentinel substitution and the RBAC role evaluation
end-to-end
- Alert follows the exact same `Alert` pattern as the 409 usage-limit
block
- Uses `severity="info"` and links to the getting-started docs Step 3
- Textarea is disabled proactively so the user never sees the scary
generic error
</details>
> 🤖 Created by a Coder Agent and will be reviewed by a human.
Registers a new aibridge provider for ChatGPT by reusing the existing
OpenAI provider with a different `Name` and `BaseURL`
(https://chatgpt.com/backend-api/codex). The ChatGPT backend API is
OpenAI-compatible, so no new provider type is needed.
ChatGPT authenticates exclusively via per-user OAuth JWTs (BYOK mode) —
no centralized API key is configured. The OpenAI provider already
handles this: when no key is set, it falls through to the bearer token
from the request's Authorization header.
Depends on #23811
## Description
Adds support for multiple Copilot provider instances to route requests to different Copilot upstreams (individual, business, enterprise). Each instance has its own name and base URL, enabling per-upstream metrics, logs, circuit breakers, API dump, and routing.
## Changes
* Add Copilot business and enterprise provider names and host constants
* Register three Copilot provider instances in aibridged (default, business, enterprise)
* Update `defaultAIBridgeProvider` in `aibridgeproxy` to route new Copilot hosts to their corresponding providers
## Related
* Depends on: https://github.com/coder/aibridge/pull/240
* Closes: https://github.com/coder/aibridge/issues/152
Note: documentation changes will be added in a follow-up PR.
_Disclaimer: initially produced by Claude Opus 4.6, heavily modified and reviewed by @ssncferreira ._
Archiving a chat now transitions pending or running chats to waiting
before setting the archived flag. This publishes a status notification
on `ChatStreamNotifyChannel` so `subscribeChatControl` cancels the
active `processChat` context via `ErrInterrupted` — the same codepath
used by the stop button.
The `processChat` cleanup also skips queued-message auto-promotion when
the chat is archived, so archiving behaves like a hard stop rather than
interrupt-and-continue.
Relates to https://github.com/coder/coder/issues/23666
_Disclaimer: produced using Claude Opus 4.6, reviewed by me, and
validated against Dogfood dataset._
The `ListAIBridgeSessions` query materialized and aggregated all
matching interceptions before paginating, then ran expensive
token/prompt lookups across the full dataset. For a page of 25 sessions
against ~200k interceptions (our dogfood dataset), this meant:
- Three CTEs scanning all rows (filtered_interceptions, session_tokens,
session_root)
- ARRAY_AGG(fi.id) collecting every interception ID per session
- Lateral prompt lookup via ANY(array_of_all_ids) running for every
session, not just the page
- ~90MB of disk sorts and JIT compilation kicking in
The improvement is to restructure to paginate first and enrich after: a
single CTE groups interceptions into sessions with only cheap aggregates
(MIN, MAX, COUNT), applies cursor pagination and LIMIT, then lateral
joins fetch metadata, tokens, and prompts for just the ~25-row page.
Measured against 220k interceptions / 160k sessions:
| Metric | Before | After |
|--------------------|--------|-------|
| Execution time | 1800ms | 185ms |
| Shared buffer hits | 737k | 2.6k |
| Disk sort spill | 86MB | 16MB |
| Lateral loops | 160k | 25 |
https://grafana.dev.coder.com/goto/fbODPGtvR?orgId=1 the results are
identical, just _much_ faster.
---
Also includes some additional tests which I added prior to refactoring
the query to ensure no regressions on edge-cases.
---------
Signed-off-by: Danny Kopping <danny@coder.com>
The flaky test assumed the second streamed OpenAI request had already
been captured when the chat status event arrived. In practice, the
capture server can record that second request slightly later, which
intermittently left `streamRequestCount` at `1`.
This change waits for the second captured request before asserting on
the follow-up payload and relaxes the count check to a sanity check. The
test still verifies the `store=false` round-trip behavior without
depending on that timing race.
Fixescoder/internal#1433
- Add `chat-access` built-in role granting chat CRUD at User scope
- Exclude `ResourceChat` from member, org member, and org service
account `allPermsExcept` calls
- Allow system, owner, and user-admin to assign the new role
- Migration auto-assigns role to users who have ever created a chat
- Update RBAC test matrix: `memberMe` denied, `chatAccessUser` allowed
**Breaking change**: Members without `chat-access` lose chat creation
ability. Migration covers existing chat creators. Members who have never
created a chat do not get this role automatically applied.
> 🤖 This PR was created by a Coder Agent and reviewed by me.
## Summary
Fixes three flaky chatd tests that intermittently fail due to timing
races with the background run loop.
Closescoder/internal#1428
## Root Cause
`CreateChat` and `PromoteQueued` call `signalWake()` which writes to
`wakeCh`, triggering `processOnce` immediately. Even though
`newTestServer` sets `PendingChatAcquireInterval: testutil.WaitLong` to
prevent ticker-based polling, the wake channel bypasses this. This
causes `processOnce` to acquire and process the chat concurrently with
the test's manual DB updates and assertions.
### Failing tests
| Test | Failure | Cause |
|------|---------|-------|
| `TestPromoteQueuedAllowsAlreadyQueuedMessageWhenUsageLimitReached` |
`expected: "pending", actual: "running"` | Wake from `CreateChat` races
with manual `UpdateChatStatus`; wake from `PromoteQueued` acquires the
chat before the status assertion |
| `TestSendMessageInterruptBehaviorQueuesAndInterruptsWhenBusy` |
`should have 1 item(s), but has 2` | Wake from `CreateChat` triggers
`processChat` which auto-promotes a queued message, adding an extra row
to `chat_messages` |
| `TestSubscribeNoPubsubNoDuplicateMessageParts` | `Condition satisfied`
(duplicate events) | Pre-existing `WaitGroup.Add/Wait` race in the
`Eventually` + `WaitUntilIdleForTest` pattern |
## Fix
Introduces a `waitForChatProcessed` helper that:
1. Polls until the chat reaches a **terminal state** (not pending AND
not running)
2. Then calls `WaitUntilIdleForTest` to wait for the inflight
`WaitGroup`
Waiting for a terminal state (not just "not pending") avoids a
`sync.WaitGroup` `Add/Wait` race: `AcquireChats` updates the DB status
to `running` **before** `processOnce` calls `inflight.Add(1)`. Checking
only `status != pending` could return while `Add(1)` hasn't happened
yet, causing `Wait()` to return prematurely.
### Per-test changes
- **`TestSendMessageInterruptBehaviorQueuesAndInterruptsWhenBusy`**:
Call `waitForChatProcessed` after `CreateChat` before manually setting
running status
-
**`TestPromoteQueuedAllowsAlreadyQueuedMessageWhenUsageLimitReached`**:
Call `waitForChatProcessed` after `CreateChat`; remove the inherently
racy `status == pending` assertion after `PromoteQueued` (the wake
immediately acquires the chat). Key assertions on promoted message,
queue state, and message count remain.
- **`TestSubscribeNoPubsubNoDuplicateMessageParts`**: Replace inline
`Eventually` with the safer `waitForChatProcessed` helper
## Verification
All three tests pass 150 consecutive executions with `-race -count=10`
across 15 runs (0 failures).
Adds a nullable JSONB column `last_injected_context` to the `chats`
table that stores the most recently persisted injected context parts
(AGENTS.md context-file and skill message parts). The column is updated
only when `persistInstructionFiles()` runs — on first workspace attach
or when the agent changes — so there are no redundant writes on
subsequent turns.
Internal fields (`ContextFileContent`, `ContextFileOS`,
`ContextFileDirectory`, `SkillDir`) are stripped at write time so the
column only holds small metadata. No stripping needed on the read path.
<details>
<summary>Implementation notes</summary>
- New migration `000456` adds nullable `last_injected_context JSONB`
column.
- New SQL query `UpdateChatLastInjectedContext` writes the column
without touching `updated_at`.
- `persistInstructionFiles()` strips internal fields from parts via
`StripInternal()` before persisting.
- Sentinel path (no AGENTS.md) persists skill-only parts when skills
exist.
- `codersdk.Chat` exposes `LastInjectedContext []ChatMessagePart`
(omitempty).
- `db2sdk.Chat()` passes through the already-clean data.
</details>
## Problem
`aibridgeproxyd` sends `X-AI-Bridge-Request-Id` on every MITM request to
`aibridged` for cross-service log correlation, but aibridged never reads
it. The header is silently forwarded to upstream LLM providers.
## Changes
* Renamed the header to `X-Coder-AI-Governance-Request-Id` to match the
existing `X-Coder-AI-Governance-*` convention.
* `aibridged` now extracts the header, logs it and strips it before
forwarding upstream.
* Added `TestServeHTTP_StripInternalHeaders` to verify no `X-Coder-*`
headers leak to upstream
Adds suffix-based agent selection for chatd. Template authors can direct
chat traffic to a specific root workspace agent by naming it with the
`-coderd-chat` suffix (for example, `coder_agent "dev-coderd-chat"`).
When no suffix match exists, chatd falls back to the first root agent by
`DisplayOrder`, then `Name`. Multiple suffix matches return an error.
The selection logic lives in `coderd/x/chatd/internal/agentselect` and
is shared by chatd core plus the workspace chat tools so all chat entry
points pick the same agent deterministically.
No database migrations, API contract changes, or provider changes. The
experimental sandbox template was split out to #23777.
This fixes a flaky `TestConfigCache_UserPrompt_ExpiredEntryRefetches` by
making the seeded user prompt entry unambiguously expired before the
cache lookup runs.
The test previously inserted a `tlru` entry with a zero TTL, which
depends on `Set` and `Get` landing in different clock ticks. Switching
that seed entry to a negative TTL keeps the bounded `tlru` cache
behavior while removing the same-tick race.
Close https://github.com/coder/internal/issues/1432
## Summary
Skills are now discovered once on the first turn (or when the workspace
agent changes) and persisted as `skill` message parts alongside
`context-file` parts. On subsequent turns, the skill index is
reconstructed from persisted parts instead of re-dialing the workspace
agent.
This makes skills consistent with the AGENTS.md pattern and is
groundwork for a future `/context` endpoint that surfaces loaded
workspace context to the frontend.
## Changes
- Add `skill` `ChatMessagePartType` with `SkillName` and
`SkillDescription` fields
- Extend `persistInstructionFiles` to also discover and persist skills
as parts
- Add `skillsFromParts()` to reconstruct skill index from persisted
parts on subsequent turns
- Update `runChat()` to use `skillsFromParts` instead of re-dialing
workspace for skills
- Frontend: handle new `skill` part type (skip rendering, hide
metadata-only messages)
## Before / After
| | AGENTS.md | Skills |
|---|---|---|
| **Before** | Persist as `context-file` parts, reconstruct from parts |
In-memory `skillsCache` only, re-dial workspace on cache miss |
| **After** | Persist as `context-file` parts, reconstruct from parts |
Persist as `skill` parts, reconstruct from parts |
The in-memory `skillsCache` remains for `read_skill`/`read_skill_file`
tool calls that need full skill bodies on demand.
<details><summary>Design context</summary>
This is the first step toward a unified workspace context
representation. Currently:
- Context files are persisted as message parts (works)
- Skills were only in-memory (inconsistent)
- Workspace MCP servers are cached in-memory (future work)
Persisting skills as parts means a future `/context` endpoint can query
both context files and skills from the same message parts in the DB,
without depending on ephemeral server-side caches.
</details>
## Problem
Commit 386b449 (PR #23745) changed the `OneWayWebSocketEventSender`
event channel from unbuffered to buffered(64) to reduce chat streaming
latency. This introduced a nondeterministic race in `sendEvent`:
```go
sendEvent := func(event codersdk.ServerSentEvent) error {
select {
case eventC <- event: // buffered channel — almost always ready
case <-ctx.Done(): // also ready after cancellation
}
return nil
}
```
After context cancellation, Go's `select` randomly picks between two
ready cases, so `send()` sometimes returns `nil` instead of `ctx.Err()`.
With the old unbuffered channel the send case was rarely ready (no
reader), masking the bug.
## Fix
Add a priority `select` that checks `ctx.Done()` before attempting the
channel send:
```go
select {
case <-ctx.Done():
return ctx.Err()
default:
}
select {
case eventC <- event:
case <-ctx.Done():
return ctx.Err()
}
```
This is the standard Go pattern for prioritizing one channel over
another. When the context is already cancelled, the first select returns
immediately. The second select still handles the case where cancellation
happens concurrently with the send.
## Verification
- Ran the flaky test 20× in a loop (`-count=20`): all passed
- Ran the full `TestOneWayWebSocketEventSender` suite 5× (`-count=5`):
all passed
- Ran the complete `coderd/httpapi` test package: all passed
Fixescoder/internal#1429
Previously, generating a new agent title used a page-global pending
state, so one in-flight regeneration disabled the action for every chat
in the Agents UI.
This change tracks regenerations by chat ID, updates the Agents page
contracts to use `regeneratingTitleChatIds`, and adds sidebar story
coverage that proves only the active chat is disabled.
Previously, when a user sent a message, there was a 0–1000ms (avg
~500ms) polling delay before processing began.
`SendMessage`/`CreateChat`/`EditMessage` set `status='pending'` in the
DB and returned, but nothing woke the processing loop — it was a blind
1-second ticker.
## Changes
**Event-driven acquisition (main change):** Adds a `wakeCh` channel to
the chatd `Server`. `CreateChat`, `SendMessage`, `EditMessage`, and
`PromoteQueued` call `signalWake()` after committing their transactions,
which wakes the run loop to call `processOnce` immediately. The 1-second
ticker remains as a fallback safety net for edge cases (stale recovery,
missed signals).
**Buffer WebSocket write channel:** Changes the
`OneWayWebSocketEventSender` event channel from unbuffered to buffered
(64), decoupling the event producer from WebSocket write speed. The
existing 10s write timeout guards against stuck connections.
<details><summary>Implementation plan & analysis</summary>
The full latency analysis identified these sources of delay in the
streaming pipeline:
1. **Chat acquisition polling** — 0–1000ms (avg 500ms) dead time per
message. Fixed by wake channel.
2. **Unbuffered WebSocket write channel** — each token blocked on the
previous WS write completing. Fixed by buffering.
3. **PersistStep DB transaction per step** — `FOR UPDATE` lock + batch
insert. Not addressed in this PR (medium risk, would overlap DB write
with next provider TTFB).
4. **Multi-hop channel pipeline** — 4 channel hops per token. Not
addressed (medium complexity).
</details>
<details><summary>Test stabilization notes</summary>
`signalWake()` causes the chatd daemon to process chats immediately
after creation/send/edit, which exposed timing assumptions in several
tests that expected chats to remain in `pending` status long enough to
assert on. These tests were updated with `require.Eventually` +
`WaitUntilIdleForTest` patterns to wait for processing to settle before
asserting.
The race detector (`test-go-race-pg`) shows failures in
`TestCreateWorkspaceTool_EndToEnd` and `TestAwaitSubagentCompletion` —
these appear to be pre-existing races in the end-to-end chat flow that
are now exercised more aggressively because processing starts
immediately instead of after a 1s delay. Main branch CI (race detector)
passes without these changes.
</details>
Short prompts were producing title-generation meta responses such as "I
am a title generator" and prompt-echo titles. This rewrites the
automatic and manual title prompts to be shorter, less self-referential,
and more focused on returning only the title text.
The change also removes the broader post-generation guard layer, updates
manual regeneration to send real conversation text instead of a meta
instruction, and keeps regression coverage focused on the slimmer prompt
contract.
Closes#22136
This pull-request implements a `<ClientFilter />` to our `Request Logs`
page for AI Bridge. This will allow the user to select a client which
they wish to filter against. Technically the backend is able to actually
filter against multiple clients at once however the frontend doesn't
currently have a nice way of supporting this (future improvement).
<img width="1447" height="831" alt="image"
src="https://github.com/user-attachments/assets/0be234e2-25f2-4a89-b971-d74817395da1"
/>
---------
Co-authored-by: Jeremy Ruppel <jeremy.ruppel@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds skill discovery and tools to chatd so the agent can discover and
load `.agents/skills/` from workspaces, following the same pattern as
AGENTS.md instruction loading and MCP tool discovery.
## What changed
### `chattool/skill.go` — discovery, loading, and tools
- **DiscoverSkills** — walks `.agents/skills/` via `conn.LS()` +
`conn.ReadFile()`, parses SKILL.md frontmatter (name + description),
validates kebab-case names match directory names, silently skips
broken/missing entries.
- **FormatSkillIndex** — renders a compact `<available-skills>` XML
block for system prompt injection (~60 tokens for 3 skills). Progressive
disclosure: only names + descriptions in context, full body loaded on
demand.
- **LoadSkillBody** / **LoadSkillFile** — on-demand loading with path
traversal protection and size caps (64KB for SKILL.md, 512KB for
supporting files).
- **read_skill** / **read_skill_file** tools — `fantasy.AgentTool`
implementations following the same pattern as ReadFile and
WorkspaceMCPTool. Receive pre-discovered `[]SkillMeta` via closure to
avoid re-scanning on every call.
### `chatd.go` — integration into runChat
- Skills discovered in the `g2` errgroup parallel with instructions and
MCP tools.
- `skillsCache` (sync.Map) per chat+agent, same invalidation pattern as
MCP tools cache.
- Skill index injected via `InsertSystem` after workspace instructions.
- Re-injected in `ReloadMessages` callback so it survives compaction.
- `read_skill` + `read_skill_file` tools registered when skills are
present (for both root and subagent chats).
- Cache cleaned up in `cleanupStreamIfIdle` alongside MCP tools cache.
## Format compatibility
Uses the same `.agents/skills/<name>/SKILL.md` format as
[coder/mux](https://github.com/coder/mux) and
[openai/codex](https://github.com/openai/codex).
## Summary
Adds read/unread tracking for chats so users can see which agent
conversations have new assistant messages they haven't viewed.
## Backend Changes
- Adds `last_read_message_id` column to the `chats` table (migration
000439).
- Computes `has_unread` as a virtual column in `GetChatsByOwnerID` using
an `EXISTS` subquery checking for assistant messages beyond the read
cursor.
- Exposes `has_unread` on the `codersdk.Chat` struct and auto-generated
TypeScript types.
- Updates `last_read_message_id` on stream connect/disconnect in
`streamChat`, avoiding per-message API calls during active streaming.
- Uses `context.WithoutCancel` for the deferred disconnect write so the
DB update succeeds even after the client disconnects.
## Frontend Changes
- Bold title (`font-semibold`) for unread chats in the sidebar.
- Small blue dot indicator next to the relative timestamp.
- Suppresses unread indicator for the currently active chat via
`isActive` from NavLink.
## Design Decisions
- Only `assistant` messages count as unread — the user's own messages
don't trigger the indicator.
- No foreign key on `last_read_message_id` since messages can be deleted
(via rollback/truncation) and the column is just a high-water mark.
- Zero API calls during streaming: exactly 2 DB writes per stream
session (connect + disconnect).
- Unread state refreshes on chat list load and window focus. The
`watchChats` WebSocket optimistically marks non-active chats as unread
on `status_change` events, but does not carry a server-computed
`has_unread` field. Navigating to a chat optimistically clears its
unread indicator in the cache.
- Trim leading/trailing whitespace from metadata `value` and `error`
fields before storage
- Trimming happens before length validation so whitespace-padded values
are handled correctly
- Add `TrimWhitespace` test covering spaces, tabs, newlines, and
preserved inner whitespace
- No backfill needed (unlogged table, stores only latest value)
> 🤖 Created by a Coder Agent, reviewed by me.
Replace overly-broad `AsSystemRestricted` with purpose-built actors:
- **OAuth2 provider paths** → `AsSystemOAuth2` (13 call sites across
`tokens.go`, `registration.go`, `apikey.go`)
- **Provisioner daemon health read** → `AsSystemReadProvisionerDaemons`
(1 site in `healthcheck/provisioner.go`)
- **Provisionerd file cache paths** → `AsProvisionerd` (2 sites in
`provisionerdserver.go`, matching existing usage nearby)
<details>
<summary>Implementation notes</summary>
Each replacement actor is a strict subset of `AsSystemRestricted`. Every
DB method
at each call site is already covered by the narrower actor's
permissions:
- `subjectSystemOAuth2`: OAuth2App/Secret/CodeToken (all), ApiKey (Read,
Delete), User (Read), Organization (Read)
- `subjectSystemReadProvisionerDaemons`: ProvisionerDaemon (Read)
- `subjectProvisionerd`: File (Create, Read) plus provisionerd-scoped
resources
No new permissions added. `nolint:gocritic` comments updated to reflect
the new actors.
</details>
> 🤖 Created by a Coder Agent, reviewed by me.
Add a per-MCP-server `model_intent` toggle that wraps tool schemas with
a
`model_intent` field, requiring the LLM to provide a human-readable
description of each tool call's purpose. The intent string is shown as a
status label in the UI instead of opaque tool names, and is
transparently
stripped before the call reaches the remote MCP server.
Built-in tools have rich specialized renderers (terminal blocks, file
diffs,
etc.) and don't need this. MCP tools hit `GenericToolRenderer` which
only
shows raw tool names and JSON — that's where model_intent adds value.
The model learns what to provide via the JSON Schema `description` on
the
`model_intent` property itself — no system prompt changes needed.
<details>
<summary>Implementation details</summary>
### Architecture
Inspired by the `withModelIntent()` pattern from `coder/blink`, adapted
for
Go + React. The wrapping is entirely in the `mcpclient` layer — tool
implementations never see `model_intent`.
**Schema wrapping** (`mcpToolWrapper.Info()`): When enabled, wraps the
original tool parameters under a `properties` key and adds a
`model_intent`
string field with a rich description that teaches the model inline.
**Input unwrapping** (`mcpToolWrapper.Run()`): Strips `model_intent` and
unwraps `properties` before forwarding to the remote MCP server. Handles
three input shapes models may produce:
1. `{ model_intent, properties: {...} }` — correct format
2. `{ model_intent, key: val, ... }` — flat, no wrapper
3. Malformed — falls through gracefully
**Frontend extraction**: `streamState.ts` extracts `model_intent` from
incrementally parsed streaming JSON. `messageParsing.ts` extracts it
from
persisted tool call args.
**UI rendering**: `GenericToolRenderer` shows the capitalized intent
string
as the primary label when available, falling back to the raw tool name.
### Changes
- Database: `model_intent` boolean column on `mcp_server_configs`
- SDK: `ModelIntent` field on config/create/update types
- API: pass-through in create/update handlers + converter
- mcpclient: schema wrapping in `Info()`, input unwrapping in `Run()`
- Frontend: extraction from streaming + persisted args
- UI: intent label in `GenericToolRenderer`, toggle in admin panel
- Tests: 6 new tests (schema wrapping, unwrapping, passthrough,
fallback)
### Decision log
- **Option lives on MCPServerConfig, not model config**: Built-in tools
already have rich renderers; only MCP tools benefit from model_intent.
- **No system prompt changes**: The JSON Schema `description` on the
`model_intent` property teaches the model inline.
- **Pointer bool on update request**: Follows existing pattern (`*bool`)
so PATCH requests don't reset the value when omitted.
</details>
Fixes expired MCP OAuth2 tokens causing 401 errors and stale
`auth_connected` status in the UI.
When users authenticate MCP servers (e.g. GitHub) via OAuth2, the access
token and refresh token are stored in the database. However, when the
access token expired, nothing refreshed it anywhere:
- **chatd**: sent the expired token as-is, getting a 401 and skipping
the MCP server
- **list/get endpoints**: reported `auth_connected: true` just because a
token record existed, regardless of expiry
## Changes
### Shared utility: `mcpclient.RefreshOAuth2Token`
Pure function that uses `golang.org/x/oauth2` `TokenSource` to check if
a token is expired (or within 10s of expiry) and refresh it. No DB
dependency — callers handle persistence.
### chatd (`coderd/x/chatd/chatd.go`)
Before calling `mcpclient.ConnectAll`, refreshes expired tokens.
Persists new credentials to the database. Falls back to the old token if
refresh fails.
### List/get MCP server endpoints (`coderd/mcp.go`)
Both `listMCPServerConfigs` and `getMCPServerConfig` now attempt refresh
when checking `auth_connected`. If the token is expired:
- **Has refresh token**: attempt refresh, persist result, report
`auth_connected` based on success
- **No refresh token**: report `auth_connected: false` if expired
This means the UI accurately reflects whether the user's token is
actually usable, rather than just whether a record exists.
<details>
<summary>Design notes</summary>
- `RefreshOAuth2Token` lives in `mcpclient` to avoid circular imports
(`coderd` → `chatd` → `mcpclient` is fine; `chatd` → `coderd` would be
circular).
- DB persistence is handled by each caller with their own authz context
(`AsSystemRestricted` in both cases).
- The `buildAuthHeaders` warning in mcpclient about expired tokens is
kept as defense-in-depth logging.
</details>
## Problem
Chats with a persisted `agent_id` binding hang indefinitely when the
workspace is stopped. The stale agent row still exists in the DB, so
`ensureWorkspaceAgent` succeeds, but the dial blocks forever in
`AwaitReachable`. The MCP discovery goroutine used an unbounded context,
so `g2.Wait()` never returned and the LLM never started.
## Fix
Three targeted changes restore the pre-binding behavior where stopped
workspaces degrade gracefully instead of blocking:
1. **`dialWithLazyValidation`**: "no agents in latest build" is now a
terminal fast-fail — the hanging dial is canceled and
`errChatHasNoWorkspaceAgent` returned immediately, instead of falling
through to `waitForOriginalDial`.
2. **Pre-LLM workspace setup**: MCP discovery and instruction
persistence gate on `workspaceAgentIDForConn` before attempting any
dial. MCP discovery is bounded by a 5s timeout and checks the in-memory
tool cache first (using the cheap cached agent from
`ensureWorkspaceAgent`), so the common subsequent-turn path has zero DB
queries.
3. **`persistInstructionFiles`**: tracks whether the workspace
connection succeeded and skips sentinel persistence on failure, so the
next turn retries if the workspace is restarted.
## Scenarios
**Running workspace, subsequent turn (hot path):** MCP cache hit via
in-memory cached agent. Zero DB queries, zero dials. Unchanged from
#23274.
**Stopped workspace, persisted binding (the bug):** MCP cache hit (stale
descriptors, fine — they fail at invocation). Pre-LLM setup completes
instantly. Tool invocation enters `dialWithLazyValidation`, dial fails
or hangs, validation discovers no agents, returns
`errChatHasNoWorkspaceAgent`. Model sees the error and can call
`start_workspace`.
**New chat, running workspace:** `ensureWorkspaceAgent` resolves via
latest-build, persists binding. MCP discovery dials and caches tools.
**New chat, stopped workspace:** `ensureWorkspaceAgent` finds no agents,
returns `errChatHasNoWorkspaceAgent`. Pre-LLM setup skips. LLM starts
with built-in tools only.
**Rebuilt workspace (agent switched):** MCP cache hit with stale agent
(harmless for one turn). Tool invocation dials stale agent, fails fast,
`dialWithLazyValidation` switches to new agent, persists updated
binding.
**Workspace restarted after stop:** No sentinel was persisted during the
stopped turn, so instruction persistence retries. Agent binding switches
to the new agent via `workspaceAgentIDForConn`.
**Transient DB error during validation:** Not
`errChatHasNoWorkspaceAgent`, so `dialWithLazyValidation` falls through
to `waitForOriginalDial` (cannot prove stale). No false positive.
**Tool invocation on stopped workspace:** `getWorkspaceConn` calls
`ensureWorkspaceAgent` (returns stale row), then
`dialWithLazyValidation` validation discovers no agents, returns
`errChatHasNoWorkspaceAgent`, cached state cleared, error returned to
model.
## Problem
The agent chat delayed-startup banner ("Response startup is taking
longer than expected") could appear even though the model was already
streaming.
The root cause is in `Subscribe()`: `message_part` events were delivered
via the fast local in-process stream, while `status` events were
delivered via PostgreSQL pubsub. Both feed into the same `select`
statement, and Go's `select` picks whichever channel is ready first —
there is no ordering guarantee between channels. So a `message_part`
could outrun the `status=running` that logically precedes it.
The frontend saw content arrive while it still thought the chat was
pending, triggering the banner.
## Fix
Also forward `status` events from the local channel, alongside
`message_part`.
Both event types already travel through the same FIFO subscriber
channel: `publishStatus()` is called before the first `message_part`, so
channel ordering guarantees the frontend sees `status=running` before
any content.
Pubsub still delivers a duplicate `status` event later; the frontend
deduplicates it (`setChatStatus` is idempotent — it early-returns when
the status hasn't changed).
## Summary
Adds a "Generate new title" action that lets users manually regenerate a
chat's title using richer conversation context than the automatic
first-message title path.
## Changes
### Backend
- **New endpoint:** `POST
/api/experimental/chats/{chatID}/title/regenerate` returns the updated
Chat with a regenerated title
- **Manual title algorithm:** Extracts useful user/assistant text turns
→ selects first user turn + last 3 turns → builds context with gap
markers → renders prompt with anti-recency guidance → calls lightweight
model → normalizes output
- **Helpers:** `extractManualTitleTurns`,
`selectManualTitleTurnIndexes`, `buildManualTitleContext`,
`renderManualTitlePrompt`, `generateManualTitle` — all private, with the
public `Server.RegenerateChatTitle` method
- **SDK:** `ExperimentalClient.RegenerateChatTitle(ctx, chatID) (Chat,
error)`
- Persists title via existing `UpdateChatByID` and broadcasts
`ChatEventKindTitleChange`
### Frontend
- API client method + React Query mutation with cache invalidation
- "Generate new title" menu item (with wand icon) in both TopBar and
Sidebar dropdown menus
- Loading/disabled state while regeneration is in-flight
- Error toast on failure
- Stories updated for both menus
### Tests
- `quickgen_test.go`: Table-driven tests for all 4 helper functions
(turn extraction, index selection, context building, prompt rendering)
- `exp_chats_test.go`: Handler tests (ChatNotFound,
NotFoundForDifferentUser, NoDaemon)
## Design notes
- The existing auto-title path (`maybeGenerateChatTitle`, `titleInput`)
is completely unchanged
- Manual regeneration uses richer context (first user turn + last 3
turns + gap markers) vs the auto path's single first message
- Endpoint is experimental and marked with `@x-apidocgen {"skip": true}`
https://github.com/user-attachments/assets/bd5d12a1-61b3-4b7d-83b6-317bdfb60b3c
## Summary
Adds pinned chats to the agents page sidebar with server-side
persistence and drag-to-reorder. Users can pin/unpin chats via the
context menu, and pinned chats appear in a dedicated "Pinned" section
above the time-grouped list.
## Database
Migration `000453_chat_pin_order`: adds `pin_order integer DEFAULT 0 NOT
NULL` column on `chats` (0 = unpinned, 1+ = pinned in display order).
Three SQL queries handle pin operations server-side using CTEs with
`ROW_NUMBER()`:
- `PinChatByID`: normalizes existing orders and appends to end
- `UnpinChatByID`: sets target to 0 and compacts remaining pins
- `UpdateChatPinOrder`: shifts neighbors, clamps to `[1, pinned_count]`
All queries exclude archived chats. `ArchiveChatByID` clears `pin_order`
on archive. The handler rejects pinning archived chats with 400.
## Backend
Pin/unpin/reorder go through the existing `PATCH
/api/experimental/chats/{chat}` via the `pin_order` field on
`UpdateChatRequest`. The handler routes based on current pin state:
`pin_order == 0` unpins, `> 0` on an already-pinned chat reorders, `> 0`
on an unpinned chat appends to end.
## Frontend
- `pinChat` / `unpinChat` / `reorderPinnedChat` optimistic mutations
using shared `isChatListQuery` predicate
- Sidebar renders Pinned section above time groups, excludes pinned
chats from time groups
- Pin/Unpin context menu items (hidden for child/delegated chats)
- `@dnd-kit/core` + `@dnd-kit/sortable` for drag-to-reorder with
`MouseSensor`, `TouchSensor`, and `KeyboardSensor`
- Local pin-order override prevents flash on drop; click blocker
prevents NavLink navigation after drag
---
*PR generated with Coder Agents*
Coder's chat (chatd) can now discover and use MCP servers configured in
a workspace's `.mcp.json` file. This brings project-specific tooling
(GitHub, databases, docs servers, etc.) into the chat without any manual
configuration.
## How it works
The workspace agent reads `.mcp.json` from the workspace directory (same
format Claude Code uses), connects to the declared MCP servers —
spawning child processes for stdio servers and connecting over the
network for HTTP/SSE — and caches their tool lists. Two new agent HTTP
endpoints expose this:
- `GET /api/v0/mcp/tools` returns the cached tool list (supports
`?refresh=true`)
- `POST /api/v0/mcp/call-tool` proxies calls to the correct server
On each chat turn, chatd calls `ListMCPTools` through the existing
`AgentConn` tailnet connection, wraps each tool as a
`fantasy.AgentTool`, and adds them to the LLM's tool set alongside
built-in and admin-configured MCP tools. Tool names are prefixed with
the server name (`github__create_issue`) to avoid collisions.
Failed server connections are logged and skipped — they never block the
agent or break the chat. Child stdio processes are terminated on agent
shutdown.
*Problem:* `publishChatPubsubEvent` was constructing a partial
`codersdk.Chat` that omitted `LastModelConfigID` and other fields. Go's
zero-value UUID caused the sidebar to show "Default model" for chats
received via SSE.
*Solution:*
- Extracted `convertChat`/`convertChats` from `exp_chats.go` into
`db2sdk.Chat`/`db2sdk.Chats`, alongside existing `ChatMessage`,
`ChatQueuedMessage`, and `ChatDiffStatus` converters.
`publishChatPubsubEvent` now calls `db2sdk.Chat(chat, nil)` instead of
maintaining its own copy of the conversion logic
- Added backend integration test
`TestWatchChats/CreatedEventIncludesAllChatFields`
- Added frontend regression tests for nil-UUID and valid model config ID
cases
> 🤖 Created by Coder Agents, reviewed by this human.
Adds an `enabled` toggle to the chat model admin create/edit form so
admins
can disable a model without soft-deleting it. Disabled models stay
visible
in admin settings but stop appearing in user-facing model selectors.
The backend already supported this (`chat_model_configs.enabled` column,
filtered queries, and SDK fields). This change wires it into the admin
UI
and adds coverage on both sides.
**Backend:** three new subtests in `coderd/exp_chats_test.go` verifying
the visibility contract (admin sees disabled models, non-admin doesn't,
update-to-disabled preserves the record).
**Frontend:** `enabled` field added to form logic and seeded from the
existing model (defaults to `true` for new models). A Switch+Tooltip
control renders in the form header, matching the MCP Server panel
pattern.
Two interaction stories cover the create-disabled and toggle-existing
flows.
> **PR Stack**
> 1. **#23351** ← `#23282` *(you are here)*
> 2. #23282 ← `#23275`
> 3. #23275 ← `#23349`
> 4. #23349 ← `main`
---
## Summary
`chatretry.Retry()` used pure exponential backoff (1 s, 2 s, 4 s, …) and
never consulted provider `Retry-After` headers. Fantasy's
`ProviderError` carries `ResponseHeaders` including `Retry-After`, but
`chaterror.Classify()` only parsed error text and silently dropped the
structured transport metadata.
This makes `Retry-After` a first-class signal in the classification →
retry pipeline.
<img width="853" height="346" alt="image"
src="https://github.com/user-attachments/assets/65f012b6-8173-43d2-957e-ab9faddea525"
/>
## Changes
### `coderd/chatd/chaterror/classify.go`
- Added `RetryAfter time.Duration` field to `ClassifiedError` — a
normalized minimum retry delay derived from provider response metadata.
- `Classify()` now calls `extractProviderErrorDetails()` before falling
back to text heuristics. Structured `ProviderError.StatusCode` takes
priority over regex extraction.
- `normalizeClassification()` preserves and clamps `RetryAfter`.
### `coderd/chatd/chaterror/provider_error.go` (new)
Provider-specific extraction, isolated from the text-based
classification logic:
- `extractProviderErrorDetails()` unwraps `*fantasy.ProviderError` from
the error chain via `errors.As`.
- `retryAfterFromHeaders()` parses headers in priority order:
1. `retry-after-ms` (OpenAI-specific, millisecond precision)
2. `retry-after` (standard HTTP — integer seconds or HTTP-date)
- Case-insensitive header key lookup.
### `coderd/chatd/chatretry/chatretry.go`
- `effectiveDelay(attempt, classified)` computes `max(Delay(attempt),
classified.RetryAfter)` — the provider hint acts as a floor without
weakening the local exponential backoff.
- `Retry()` now uses `effectiveDelay` and passes the effective delay to
both `onRetry(...)` and the sleep timer, so downstream payloads, logs,
and the frontend countdown stay aligned automatically.
### Tests
- `classify_test.go`: Structured provider status + `Retry-After`
extraction, `retry-after-ms` priority, HTTP-date parsing, invalid header
fallback, `WithProvider` preservation.
- `chatretry_test.go`: Retry-after-as-floor semantics — longer hint
wins, shorter hint keeps base delay.
## Design notes
- **No SDK/API/frontend changes needed.** `codersdk.ChatStreamRetry`
already carries `DelayMs` and `RetryingAt`, and the frontend already
consumes them. The fix is purely in the server-side delay computation.
- **Existing retryability rules unchanged.** This fixes *when* we sleep,
not *whether* an error is retryable.
- **Provider hint is a floor:** `max(baseDelay, RetryAfter)` ensures we
never retry earlier than the provider asks, and never weaken our own
backoff curve.
## Changes
- **Commit 1**: Remove 17 unnecessary `//nolint` directives:
- `//nolint:varnamelen` — linter not active
- `//nolint:unused` on exported `SlimUnsupported`
- `//nolint:govet` in `coderd/httpmw/csrf` — no longer fires
- `//nolint:revive` on functions refactored since the nolint was added
- `//nolint:paralleltest` citing Go 1.22 loop variable capture
(obsolete)
- Bare `//nolint` narrowed to specific `//nolint:gocritic` with
justification
- **Commit 2**: Fix root causes behind 5 dangerous nolint suppressions:
- Add `MinVersion: tls.VersionTLS12` to TLS client config (removes
`gosec` G402)
- Delete trivial unexported wrappers `apiKey()`/`normalizeProvider()` in
chatprovider (removes `revive` confusing-naming)
- Add doc comments to `StartWithAssert` and `Router` (removes `revive`
exported)
- Rename unused parameters to `_` in integration test helpers
> 🤖 This PR was created using Coder Agents and reviewed by me.
Admins can now control whether the built-in Coder Agents default system
prompt is prepended to their custom instructions, rather than having the
custom prompt silently replace the default.
**Changes:**
- New `include_default_system_prompt` boolean toggle (defaults to `true`
for existing deployments) stored as a site config key — no migration
needed.
- GET `/api/experimental/chats/config/system-prompt` returns the toggle
state, the custom prompt, and a preview of the built-in default.
- PUT persists both the toggle and custom prompt atomically in a single
transaction.
- `resolvedChatSystemPrompt()` composes `[default?, custom?]` joined by
`\n\n`, falling back to the built-in default on DB errors.
- Settings UI adds a Switch toggle with conditional helper text and a
"Preview" button that shows the built-in default prompt via the existing
`TextPreviewDialog`.
- Comprehensive test coverage: 15 subtests covering toggle behavior,
prompt composition matrix, auth boundaries, and integration with chat
creation.
- Adds `GET /api/experimental/chats/by-workspace` endpoint that returns
workspace_id → latest chat_id mapping
- Modifies FE to fetch this alongside the workspace list, gated on
`agents` experiment and render an "Agent" badge similar to the existing
"Task" badge in `WorkspacesTable`
- Badge links to the "latest chat" linked to the given workspace.
Notes:
- Intentionally uses `fetchWithPostFilter` for RBAC to decouple from
workspaces API — will migrate to `workspaces_expanded` view later.
- If users have multiple chats linked to the same workspace, the badge
will link to the most recently updated one.
> 🤖 This PR was created with the help of Coder Agents, and has been
reviewed by my human. 🧑💻
## Summary
Adds an entitlement-gated **AI add-on** column to both the **Users**
table and the **Organization Members** table. When
`ai_governance_user_limit` is entitled, each row shows whether the user
is consuming an AI seat.
## Background
The AI governance add-on tracks which users are consuming AI seats.
Admins need visibility into per-user seat consumption directly from the
user management tables. This change surfaces that information through
both the site-wide Users table and the per-organization Members table,
gated behind the `ai_governance_user_limit` entitlement so the column
only appears when the feature is licensed.
## Implementation
### Backend
- **New SQL query** `GetUserAISeatStates`
(`coderd/database/queries/aiseatstate.sql`) — returns user IDs consuming
an AI seat, derived from:
- Users with entries in `aibridge_interceptions` (AI Bridge usage)
- Users who own workspaces with `has_ai_task = true` builds (AI Tasks
usage)
- **SDK types** — added `has_ai_seat: boolean` to `codersdk.User` and
`codersdk.OrganizationMemberWithUserData`
- **Handler wiring** — both the Users list endpoint (`coderd/users.go`)
and all Members endpoints (`coderd/members.go`) query AI seat state per
page of user IDs and populate the response field
- **dbauthz** — per-user `ActionRead` checks on `ResourceUserObject`
### Frontend
- **Shared `AISeatCell` component**
(`site/src/modules/users/AISeatCell.tsx`) — green `CircleCheck` for
consuming, gray `X` for non-consuming
- **`TableColumnHelpTooltip`** — extended with `ai_addon` variant with
tooltip: *"Users with access to AI features like AI Bridge, Boundary, or
Tasks who are actively consuming a seat."*
- **Column visibility** gated behind
`useFeatureVisibility().ai_governance_user_limit`
## Validation
- Backend: dbauthz full method suite (`TestMethodTestSuite`) passes
including new `GetUserAISeatStates` test
- Backend: `TestGetUsers`, `TestUsersFilter`, CLI golden file tests pass
- Frontend: 7/7 tests pass across `UsersPage.test.tsx` and
`OrganizationMembersPage.test.tsx` (column visibility gating both
directions)
- `go build ./coderd/...` compiles clean
- `pnpm --dir site run lint:types` passes
- `make gen` clean
## Risks
- **Pagination performance**: The AI seat query is scoped to the current
page's user IDs (not a full table scan), keeping it efficient for
paginated views.
- **Semantic scope**: The workspace-side AI seat derivation uses "any
build with `has_ai_task = true`" rather than "latest build only". If the
product intent is latest-build-only, this can be tightened in a
follow-up.
---
_Generated with `mux` • Model: `anthropic:claude-opus-4-6` • Thinking:
`xhigh` • Cost: `$27.25`_
<!-- mux-attribution: model=anthropic:claude-opus-4-6 thinking=xhigh
costs=27.25 -->
## Summary
Adds a process-wide cache for three hot database queries in `chatd` that
were hitting Postgres on **every chat turn** despite returning
rarely-changing configuration data:
| Query | Before (50k turns) | After | Reduction |
|---|---|---|---|
| `GetEnabledChatProviders` | ~98.6k calls | ~500-1000 | ~99% |
| `GetChatModelConfigByID` | ~49.2k calls | ~500-1000 | ~98% |
| `GetUserChatCustomPrompt` | ~46.7k calls | ~1000-2000 | ~97% |
These were identified via `coder exp scaletest chat` (5000 concurrent
chats × 10 turns) as the dominant source of Postgres load during chat
processing.
## Design
Follows the established **webpush subscription cache pattern**
(`coderd/webpush/webpush.go`):
- `sync.RWMutex` + `tailscale.com/util/singleflight` (generic) +
generation-based stale prevention + TTL
- 10s TTL for provider/model config, 5s TTL for user prompts
- Negative caching for `sql.ErrNoRows` on user prompts (the common case
— most users don't set custom prompts)
- Deep-clones `ChatModelConfig.Options` (`json.RawMessage` = `[]byte`)
on both store and read paths
### Invalidation
Single pubsub channel (`chat:config_change`) with kind discriminator for
cross-replica cache invalidation. Seven publish points in
`coderd/chats.go` cover all admin mutation endpoints
(create/update/delete for providers and model configs, put for user
prompts).
_This PR was generated with mux and was reviewed by a human_
Follow-up to #23282. The retry and terminal error callouts had a few UX
oddities:
- Auto-retrying states reused backend error text that said "Please try
again" even while the UI was already retrying on behalf of the user.
- Terminal error states also said "Please try again" with no action the
user could take.
- `startup_timeout` had no specific title or retry copy — it fell
through to the generic "Retrying request" heading.
- The kind pill showed raw enum values like `startup_timeout` and
`rate_limit`.
- Terminal error metadata showed a "Retryable" / "Not retryable" label
that does not help users.
- A separate "Provider anthropic" metadata row duplicated information
already present in the message body.
- The `usage-limit` error kind used a hyphen while every backend kind
uses underscores.
Changes:
**Backend (`chaterror/message.go`)**
- Split message generation into `terminalMessage()` and
`retryMessage()`, replacing the old `userFacingMessage()`.
- Terminal messages include HTTP status codes and actionable guidance
(e.g. "Check the API key, permissions, and billing settings.").
- Retry messages are clean factual statements without status codes or
remediation, suitable for the retry countdown UI (e.g. "Anthropic is
temporarily overloaded.").
- Removed "Please try again" / "Please try again later" from all paths.
- `StreamRetryPayload` calls `retryMessage()` instead of forwarding
`classified.Message`.
**Frontend**
- Removed the parallel frontend message-generation system:
`getRetryMessage()`, `getProviderDisplayName()`,
`getRetryProviderSubject()`, and the `PROVIDER_DISPLAY_NAMES` map are
all deleted from `chatStatusHelpers.ts`.
- `liveStatusModel.ts` passes `retryState.error` through directly — the
backend owns the copy.
- Added specific title and retry copy for `startup_timeout`, and
extended the title mapping to cover `auth` and `config`.
- Kind pills now show humanized labels ("Startup timeout", "Rate limit",
etc.) instead of raw enum strings.
- Removed the redundant "Provider anthropic" metadata row.
- Removed the terminal "Retryable" / "Not retryable" badge.
- Normalized `"usage-limit"` → `"usage_limit"` and added it to
`ChatProviderFailureKind` so all error kinds follow the same underscore
convention and live in one enum.
Refs #23282.
## Summary
This change removes the steady-state "resolve the latest workspace
agent" query from chat execution.
Instead of asking the database for the latest build's agent on every
turn, a chat now persists the workspace/build/agent binding it actually
uses and reuses that binding across subsequent turns. The common path
becomes "load the bound agent by ID and dial it", with fallback paths to
repair the binding when it is missing, stale, or intentionally changed.
## What changes
- add `workspace_id`, `build_id`, and `agent_id` binding fields to
`chats`
- expose those fields through the chat API / SDK so the execution
context is explicit
- load the persisted binding first in chatd, instead of always resolving
the latest build's agent
- persist a refreshed binding when chatd has to re-resolve the workspace
agent
- keep child / subagent chats on the same bound workspace context by
inheriting the parent binding
- leave `build_id` / `agent_id` unset for flows like `create_workspace`,
then bind them lazily on the next agent-backed turn
## Runtime behavior
The binding is treated as an optimistic cache of the agent a chat should
use:
- if the bound agent still exists and dials successfully, we use it
without a latest-build lookup
- if the bound agent is missing or no longer reachable, chatd
re-resolves against the latest build and persists the new binding
- if a workspace mutation changes the chat's target workspace, the
binding is updated as part of that mutation
To avoid reintroducing a hot-path query, dialing uses lazy validation:
- start dialing the cached agent immediately
- only validate against the latest build if the dial is still pending
after a short delay
- if validation finds a different agent, cancel the stale dial, switch
to the current agent, and persist the repaired binding
## Result
The hot path stops issuing
`GetWorkspaceAgentsInLatestBuildByWorkspaceID` for every user message,
which is the source of the DB pressure this PR is addressing. At the
same time, chats still converge to the correct workspace agent when the
binding becomes stale due to rebuilds or explicit workspace changes.
Problem: previously, the deployment-wide chat template allowlist was never actually wired in from `chatd.go`
- Extracts `parseChatTemplateAllowlist` into shared `coderd/util/xjson.ParseUUIDList`
- Adds `Server.chatTemplateAllowlist()` method that reads the allowlist from DB
- Passes `AllowedTemplateIDs` callback to `ListTemplates`, `ReadTemplate`, and `CreateWorkspace` tool constructors
> 🤖 Created by Coder Agents and reviewed by a human.