## Summary
Adds a process-wide cache for three hot database queries in `chatd` that
were hitting Postgres on **every chat turn** despite returning
rarely-changing configuration data:
| Query | Before (50k turns) | After | Reduction |
|---|---|---|---|
| `GetEnabledChatProviders` | ~98.6k calls | ~500-1000 | ~99% |
| `GetChatModelConfigByID` | ~49.2k calls | ~500-1000 | ~98% |
| `GetUserChatCustomPrompt` | ~46.7k calls | ~1000-2000 | ~97% |
These were identified via `coder exp scaletest chat` (5000 concurrent
chats × 10 turns) as the dominant source of Postgres load during chat
processing.
## Design
Follows the established **webpush subscription cache pattern**
(`coderd/webpush/webpush.go`):
- `sync.RWMutex` + `tailscale.com/util/singleflight` (generic) +
generation-based stale prevention + TTL
- 10s TTL for provider/model config, 5s TTL for user prompts
- Negative caching for `sql.ErrNoRows` on user prompts (the common case
— most users don't set custom prompts)
- Deep-clones `ChatModelConfig.Options` (`json.RawMessage` = `[]byte`)
on both store and read paths
### Invalidation
Single pubsub channel (`chat:config_change`) with kind discriminator for
cross-replica cache invalidation. Seven publish points in
`coderd/chats.go` cover all admin mutation endpoints
(create/update/delete for providers and model configs, put for user
prompts).
_This PR was generated with mux and was reviewed by a human_
Follow-up to #23282. The retry and terminal error callouts had a few UX
oddities:
- Auto-retrying states reused backend error text that said "Please try
again" even while the UI was already retrying on behalf of the user.
- Terminal error states also said "Please try again" with no action the
user could take.
- `startup_timeout` had no specific title or retry copy — it fell
through to the generic "Retrying request" heading.
- The kind pill showed raw enum values like `startup_timeout` and
`rate_limit`.
- Terminal error metadata showed a "Retryable" / "Not retryable" label
that does not help users.
- A separate "Provider anthropic" metadata row duplicated information
already present in the message body.
- The `usage-limit` error kind used a hyphen while every backend kind
uses underscores.
Changes:
**Backend (`chaterror/message.go`)**
- Split message generation into `terminalMessage()` and
`retryMessage()`, replacing the old `userFacingMessage()`.
- Terminal messages include HTTP status codes and actionable guidance
(e.g. "Check the API key, permissions, and billing settings.").
- Retry messages are clean factual statements without status codes or
remediation, suitable for the retry countdown UI (e.g. "Anthropic is
temporarily overloaded.").
- Removed "Please try again" / "Please try again later" from all paths.
- `StreamRetryPayload` calls `retryMessage()` instead of forwarding
`classified.Message`.
**Frontend**
- Removed the parallel frontend message-generation system:
`getRetryMessage()`, `getProviderDisplayName()`,
`getRetryProviderSubject()`, and the `PROVIDER_DISPLAY_NAMES` map are
all deleted from `chatStatusHelpers.ts`.
- `liveStatusModel.ts` passes `retryState.error` through directly — the
backend owns the copy.
- Added specific title and retry copy for `startup_timeout`, and
extended the title mapping to cover `auth` and `config`.
- Kind pills now show humanized labels ("Startup timeout", "Rate limit",
etc.) instead of raw enum strings.
- Removed the redundant "Provider anthropic" metadata row.
- Removed the terminal "Retryable" / "Not retryable" badge.
- Normalized `"usage-limit"` → `"usage_limit"` and added it to
`ChatProviderFailureKind` so all error kinds follow the same underscore
convention and live in one enum.
Refs #23282.
## Summary
This change removes the steady-state "resolve the latest workspace
agent" query from chat execution.
Instead of asking the database for the latest build's agent on every
turn, a chat now persists the workspace/build/agent binding it actually
uses and reuses that binding across subsequent turns. The common path
becomes "load the bound agent by ID and dial it", with fallback paths to
repair the binding when it is missing, stale, or intentionally changed.
## What changes
- add `workspace_id`, `build_id`, and `agent_id` binding fields to
`chats`
- expose those fields through the chat API / SDK so the execution
context is explicit
- load the persisted binding first in chatd, instead of always resolving
the latest build's agent
- persist a refreshed binding when chatd has to re-resolve the workspace
agent
- keep child / subagent chats on the same bound workspace context by
inheriting the parent binding
- leave `build_id` / `agent_id` unset for flows like `create_workspace`,
then bind them lazily on the next agent-backed turn
## Runtime behavior
The binding is treated as an optimistic cache of the agent a chat should
use:
- if the bound agent still exists and dials successfully, we use it
without a latest-build lookup
- if the bound agent is missing or no longer reachable, chatd
re-resolves against the latest build and persists the new binding
- if a workspace mutation changes the chat's target workspace, the
binding is updated as part of that mutation
To avoid reintroducing a hot-path query, dialing uses lazy validation:
- start dialing the cached agent immediately
- only validate against the latest build if the dial is still pending
after a short delay
- if validation finds a different agent, cancel the stale dial, switch
to the current agent, and persist the repaired binding
## Result
The hot path stops issuing
`GetWorkspaceAgentsInLatestBuildByWorkspaceID` for every user message,
which is the source of the DB pressure this PR is addressing. At the
same time, chats still converge to the correct workspace agent when the
binding becomes stale due to rebuilds or explicit workspace changes.
Problem: previously, the deployment-wide chat template allowlist was never actually wired in from `chatd.go`
- Extracts `parseChatTemplateAllowlist` into shared `coderd/util/xjson.ParseUUIDList`
- Adds `Server.chatTemplateAllowlist()` method that reads the allowlist from DB
- Passes `AllowedTemplateIDs` callback to `ListTemplates`, `ReadTemplate`, and `CreateWorkspace` tool constructors
> 🤖 Created by Coder Agents and reviewed by a human.
## Problem
MCP OAuth2 auto-discovery stripped the path component from the MCP
server URL
before looking up Protected Resource Metadata. Per RFC 9728 §3.1, the
well-known
URL should be path-aware:
```
{origin}/.well-known/oauth-protected-resource{path}
```
For `https://api.githubcopilot.com/mcp/`, the correct metadata URL is
`https://api.githubcopilot.com/.well-known/oauth-protected-resource/mcp/`,
not
`https://api.githubcopilot.com/.well-known/oauth-protected-resource`
(which
returns 404).
The same issue applied to RFC 8414 Authorization Server Metadata for
issuers
with path components (e.g. `https://github.com/login/oauth` →
`/.well-known/oauth-authorization-server/login/oauth`).
## Fix
Replace the `mcp-go` `OAuthHandler`-based discovery with a
self-contained
implementation that correctly follows path-aware well-known URI
construction for
both RFC 9728 and RFC 8414, falling back to root-level URLs when the
path-aware
form returns an error. Also implements RFC 7591 registration directly,
removing
the `mcp-go/client/transport` dependency from the discovery path.
Note: this fix resolves the discovery half of the problem for servers
like
GitHub Copilot. Full OAuth2 support for GitHub's MCP server also
requires
dynamic client registration (RFC 7591), which GitHub's authorization
server
does not currently support — that will be addressed separately.
- Clarify the system prompt to prefer tools before asking the user for
clarification.
- Limit clarification to cases where ambiguity or user preferences
materially affect the outcome.
- Remove the contradictory instruction to always start by asking
clarifying questions.
> 🤖 This PR has been reviewed by the author.
### Changes
**coder/coder:**
- `coderd/aibridge/aibridge.go` — Added `HeaderCoderBYOKToken` constant,
`IsBYOK()` helper, and updated `ExtractAuthToken` to check the BYOK
header first.
- `enterprise/aibridged/http.go` — BYOK-aware header stripping: in BYOK
mode only the BYOK header is stripped (user's LLM credentials
preserved); in centralized mode all auth headers are stripped.
<hr/>
**NOTE**: `X-Coder-Token` was removed! As of now `ExtractAuthToken`
retrieves token either from `X-Coder-AI-Governance-BYOK-Token` or from
`Authorization`/`X-Api-Key`.
---------
Co-authored-by: Susana Ferreira <susana@coder.com>
Co-authored-by: Danny Kopping <danny@coder.com>
## Summary
Adds a general-purpose `map[string]string` label system to chats, stored
as jsonb with a GIN index for efficient containment queries.
This is a standalone foundational feature that will be used by the
upcoming Automations feature for session identity (matching webhook
events to existing chats), replacing the need for bespoke session-key
tables.
## Changes
### Database
- **Migration 000451**: Adds `labels jsonb NOT NULL DEFAULT '{}'` column
to `chats` table with a GIN index (`idx_chats_labels`)
- **`InsertChat`**: Accepts labels on creation via `COALESCE(@labels,
'{}')`
- **`UpdateChatByID`**: Supports partial update —
`COALESCE(sqlc.narg('labels'), labels)` preserves existing labels when
NULL is passed
- **`GetChats`**: New `has_labels` filter using PostgreSQL `@>`
containment operator
- **`GetAuthorizedChats`**: Synced with generated `GetChats` (new column
scan + query param)
### API
- **Create chat** (`POST /chats`): Accepts optional `labels` field,
validated before creation
- **Update chat** (`PATCH /chats/{chat}`): Supports `labels` field for
atomic label replacement
- **List chats** (`GET /chats`): Supports `?label=key:value` query
parameters (multiple are AND-ed)
### SDK
- `Chat`, `CreateChatRequest`, `UpdateChatRequest`, `ListChatsOptions`
all gain `Labels` fields
- `UpdateChatRequest.Labels` is a pointer (`*map[string]string`) so
`nil` means "don't change" vs empty map means "clear all"
### Validation (`coderd/httpapi/labels.go`)
- Max 50 labels per chat
- Key: 1–64 chars, must match `[a-zA-Z0-9][a-zA-Z0-9._/-]*` (supports
namespaced keys like `github.repo`, `automation/pr-number`)
- Value: 1–256 chars
- 13 test cases covering all edge cases
### Chat runtime
- `chatd.CreateOptions` gains `Labels` field, threaded through to
`InsertChat`
- Existing `UpdateChatByID` callers (e.g., quickgen title updates) are
unaffected — NULL labels preserve existing values via COALESCE
We had a bug where computer use base64-encoded screenshots would not be
interpreted as screenshots anymore once saved to the db, loaded back
into memory, and sent to Anthropic. Instead, they would be interpreted
as regular text. Once a computer use agent made enough screenshots and
stopped, and you tried sending it another message, you'd get an out of
context error:
<img width="808" height="367" alt="Screenshot 2026-03-23 at 12 02 54"
src="https://github.com/user-attachments/assets/f0bf6be2-4863-47ca-a7a9-9e6d9dfceeed"
/>
This PR fixes that.
## Summary
Introduces a new `context-file` ChatMessagePart type for persisting
workspace instruction files (AGENTS.md) as durable, frontend-visible
message parts. This is the foundation for showing loaded context files
in the chat input's context indicator tooltip.
### Problem
Previously, instruction files were resolved transiently on every turn
via `resolveInstructions()` → `InsertSystem()` and injected into the
in-memory prompt without persistence. The frontend had no knowledge that
instruction files were loaded into context, and there was no way to
surface this information to users.
### Solution
Instruction files are now read **once** when a workspace is first
attached to a chat (matching how [openai/codex handles
it](https://developers.openai.com/codex/guides/agents-md)) and persisted
as `user`-role, `both`-visibility message parts with a new
`context-file` type. This ensures:
- **Durability**: survives page refresh (data is in the DB, returned by
`getChatMessages`)
- **Cache-friendly**: `user`-role avoids the system-message hoisting
that providers do, keeping the instruction content in a stable position
for prompt caching
- **Frontend-visible**: the frontend receives paths and truncation
status for future context indicator rendering
- **Extensible**: the same pattern works for Skills (future)
### Key changes
| Layer | Change |
|---|---|
| **SDK** (`codersdk/chats.go`) | Add `ChatMessagePartTypeContextFile`
with `context_file_path`, `context_file_content` (internal, stripped
from API), `context_file_truncated` fields |
| **Prompt expansion** (`chatprompt`) | Expand `context-file` parts to
`<workspace-context>` text blocks in `partsToMessageParts()` |
| **Chat engine** (`chatd.go`) | Add `persistInstructionFiles()`, called
on first turn with a workspace. Remove per-turn `resolveInstructions()`
+ `InsertSystem()` from `processChat()` and `ReloadMessages` |
| **Frontend** | Ignore `context-file` parts in `messageParsing.ts` and
`streamState.ts` (no rendering yet — follow-up will add tooltip display)
|
### How it works
1. On each turn, `processChat` checks if any loaded message contains
`context-file` parts
2. If not (first turn with a workspace), reads AGENTS.md files via the
workspace agent connection and persists them
3. For this first turn, also injects the instruction text into the
prompt (since messages were loaded before persistence)
4. On all subsequent turns, `ConvertMessagesWithFiles()` encounters the
persisted `context-file` parts and expands them into text automatically
— no extra resolution needed
The `ComputerUseProviderTool` function needed a little bit of an
adjustment because I changed `NewComputerUseTool`'s signature in
upstream fantasy a little bit.
- Stores a deployment-wide agents template allowlist in `site_configs`
(`agents_template_allowlist`)
- Adds `GET/PUT /api/experimental/chats/config/template-allowlist`
endpoints
- Filters `list_templates`, `read_template`, and `create_workspace` chat
tools by allowlist, if defined (empty=all allowed)
- Add "Templates" admin settings tab in Agents UI ([what it looks
like](https://624de63c6aacee003aa84340-sitjilsyrr.chromatic.com/?path=/story/pages-agentspage-agentsettingspageview--template-allowlist))
> 🤖 This PR was created with the help of Coder Agents, and has been
reviewed by my human. 🧑💻
## Problem
When chatd pushes a branch and then creates a PR (e.g. `git push`
followed by `gh pr create`), the gitsync background worker often picks
up the stale `chat_diff_statuses` row between the two operations. At
that point no PR exists yet, so the worker skips the row. However, the
acquisition SQL locks the row for **5 minutes** (crash-recovery
interval), creating a dead zone where the PR diff is invisible in the UI
until the user manually navigates to the chat.
### Root cause
1. `git push` triggers `GIT_ASKPASS` → coderd external-auth handler →
`MarkStale()` sets `stale_at = now - 1s`
2. Background worker acquires the row within ~10s, atomically bumps
`stale_at = NOW() + 5 min` (crash-recovery lock)
3. Worker calls `ResolveBranchPullRequest` → no PR exists yet → returns
`nil` → worker skips with `continue`
4. `gh pr create` completes moments later, but uses its own auth (not
`GIT_ASKPASS`), so no second `MarkStale` fires
5. Row is locked for 5 minutes before the worker can retry
Loading the chat works immediately because `GET /chats/{chat}` calls
`resolveChatDiffStatus` synchronously, which discovers the PR inline.
## Fix
When `ResolveBranchPullRequest` returns nil (no PR yet) **and** the row
was recently marked stale (within 2 minutes), apply a short 15-second
backoff via `BackoffChatDiffStatus` instead of letting the 5-minute
acquisition lock stand. Outside the retry window, the worker skips the
row as before — no indefinite fast-polling for branches that never
receive a PR.
To make the "recently marked stale" check work, `updated_at` is no
longer overwritten by the acquisition and backoff SQL queries. This
preserves it as a reliable "last externally changed" timestamp (set by
`MarkStale` or a successful refresh).
### Behavior summary
| Scenario | `updated_at` age | Backoff | Effective retry |
|---|---|---|---|
| Fresh push, no PR yet | < 2 min | 15s (`NoPRBackoff`) | ~15s |
| Old row, no PR | ≥ 2 min | None (skip) | ~5 min (acquisition lock) |
| Error (any age) | Any | 120s (`DiffStatusTTL`) | ~120s |
| Success (any age) | Any | 120s (`DiffStatusTTL`) | ~120s |
## Changes
- **`coderd/database/queries/chats.sql`** — Remove `updated_at = NOW()`
from `AcquireStaleChatDiffStatuses` and `BackoffChatDiffStatus`
- **`coderd/database/queries.sql.go`** — Regenerated
- **`coderd/x/gitsync/worker.go`** — Add `NoPRBackoff` (15s) and
`NoPRRetryWindow` (2 min) constants; apply short backoff only within the
retry window
- **`coderd/x/gitsync/worker_test.go`** — Add
`TestWorker_NoPR_RecentMarkStale_BacksOffShort` and
`TestWorker_NoPR_OldRow_Skips`
- Add `SanitizePromptText` stripping ~24 invisible Unicode codepoints
and collapsing excessive newlines
- Apply at write and read paths for defense-in-depth
- Frontend: warn in both prompt textareas when invisible characters
detected
- Explicit codepoint list (not blanket `unicode.Cf`) to avoid breaking
flag emoji
- 34 Go tests + idempotency meta-test, 11 TS unit tests, 4 Storybook
stories
> This PR was created with the help of Coder Agents, and was reviewed by my human.
- Add `X-Coder-Owner-Id`, `X-Coder-Chat-Id`, `X-Coder-Subchat-Id`,
`X-Coder-Workspace-Id` headers to all outgoing LLM API requests from
chatd
- Extend `ModelFromConfig` with `extraHeaders` param, forwarded via
Fantasy `WithHeaders` on all 8 providers
- Add `CoderHeaders(database.Chat)` helper to build the header map from
chat state
- Update all 4 `ModelFromConfig` call sites (resolveChatModel,
computer-use override, title gen, push summary)
- Thread `database.Chat` into `generatePushSummary` (was `chatTitle
string`)
- Tests: `TestCoderHeaders` (4 subtests),
`TestModelFromConfig_ExtraHeaders` (OpenAI + Anthropic),
`TestModelFromConfig_NilExtraHeaders`
- Refactor existing `TestModelFromConfig_UserAgent` to use channel-based
signaling
> 🤖 This PR was generated by Coder Agents and self-reviewed by a human.
create_workspace could create a replacement workspace after a single 5s
agent dial failed, even when the existing workspace agent had recently
checked in. That made temporary reachability blips look like dead
workspaces and let chatd replace a running workspace too aggressively.
Use the workspace agent's DB-backed status with the deployment's
AgentInactiveDisconnectTimeout before allowing replacement. Recently
connected and still-connecting agents now reuse the existing workspace,
while disconnected or timed-out agents still allow a new workspace. This
also threads the inactivity timeout through chatd and adds focused
coverage for the reuse and replacement branches.
## Problem
When an MCP tool returns an `EmbeddedResource` content item (e.g. GitHub
MCP server returning file contents via `get_file_contents`), the
`convertCallResult` function falls through to the `default` case,
producing:
```
[unsupported content type: mcp.EmbeddedResource]
```
This loses the actual resource content and shows an unhelpful message in
the chat UI.
## Root Cause
The type switch in `convertCallResult` handles `TextContent`,
`ImageContent`, and `AudioContent`, but not the other two `mcp.Content`
implementations from `mcp-go`:
- `mcp.EmbeddedResource` — wraps a `ResourceContents` (either
`TextResourceContents` or `BlobResourceContents`)
- `mcp.ResourceLink` — contains a URI, name, and description
## Fix
Add two new cases to the type switch:
1. **`mcp.EmbeddedResource`**: nested type switch on `.Resource`:
- `TextResourceContents` → append `.Text` to `textParts`
- `BlobResourceContents` → base64-decode `.Blob` as binary (type
`"image"` or `"media"` based on MIME)
- Unknown → fallback `[unsupported embedded resource type: ...]`
2. **`mcp.ResourceLink`**: render as `[resource: Name (URI)]` text
## Testing
Added 3 new test cases (all passing, full suite 23/23 PASS):
- `TestConnectAll_EmbeddedResourceText` — text resource extraction
- `TestConnectAll_EmbeddedResourceBlob` — binary blob decoding
- `TestConnectAll_ResourceLink` — resource link rendering
Child chats created via `spawn_agent` and `spawn_computer_use_agent`
were not inheriting the parent's `MCPServerIDs`, meaning subagents lost
access to the parent's MCP server tools.
## Changes
- Pass `parent.MCPServerIDs` in the `CreateOptions` for both
`createChildSubagentChat()` and the `spawn_computer_use_agent` tool
handler in `coderd/x/chatd/subagent.go`.
## Tests
Added 3 tests in `subagent_internal_test.go`:
- `TestCreateChildSubagentChat_InheritsMCPServerIDs` — verifies child
chat gets parent's MCP server IDs (multiple servers)
- `TestSpawnComputerUseAgent_InheritsMCPServerIDs` — verifies computer
use subagent gets parent's MCP server IDs via the tool
- `TestCreateChildSubagentChat_NoMCPServersStaysEmpty` — verifies no
regression when parent has no MCP servers
*Disclaimer: implemented by a Coder Agent and reviewed by me.*
Renames the env vars used by chatd integration tests from the canonical
`SOMEPROVIDER_API_KEY` (e.g. `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`) to
`SOMEPROVIDER_TEST_API_KEY` (e.g. `ANTHROPIC_TEST_API_KEY`,
`OPENAI_TEST_API_KEY`) so that test-specific keys don't collide with
production/canonical provider credentials.
Relates to https://github.com/coder/internal/issues/1425
See also:
https://codercom.slack.com/archives/C0AGTPWLA3U/p1774433646799499
## Summary
Adds `xhigh` to the OpenAI reasoning effort normalizer so GPT-5.4 class
models can use `reasoning_effort: xhigh` without it being silently
dropped.
## Problem
The SDK schema (`codersdk/chats.go`) already advertises `xhigh` as a
valid `reasoning_effort` value, but the runtime normalizer in
`chatprovider.go` only accepts `minimal|low|medium|high` for the OpenAI
provider. When a user sets `xhigh`, `ReasoningEffortFromChat()` returns
`nil` and the value never reaches the OpenAI API.
## Changes
- **Fantasy dependency**: Updated `kylecarbs/fantasy` (cj/go1.25) which
now includes the `ReasoningEffortXHigh` constant
([kylecarbs/fantasy#9](https://github.com/kylecarbs/fantasy/pull/9)).
- **`chatprovider.go`**: Adds `fantasyopenai.ReasoningEffortXHigh` to
the OpenAI case in `ReasoningEffortFromChat()`.
- **`chatprovider_test.go`**: Adds `OpenAIXHighEffort` test case.
## Upstream
-
[charmbracelet/fantasy#186](https://github.com/charmbracelet/fantasy/pull/186)
Consolidates coderdtest invocations in 7 tests to reduce 23 instances to 7 across:
- `TestGetUser` (3 → 1) — read-only user lookups
- `TestUserTerminalFont` (3 → 1) — each creates own user via
CreateAnotherUser
- `TestUserTaskNotificationAlertDismissed` (3 → 1) — each creates own
user
- `TestUserLogin` (3 → 1) — each creates/deletes own user
- `TestExpMcpConfigureClaudeCode` (5 → 1) — writes to isolated temp dirs
- `TestOAuth2RegistrationTokenSecurity` (3 → 1) — independent
registrations
- `TestOAuth2SpecificErrorScenarios` (3 → 1) — independent error
scenarios
> 🤖 This PR was created with the help of Coder Agents, and has been
reviewed by my human. 🧑💻
## Problem
Template administrators cannot delete templates that have running
prebuilds.
The `deleteTemplate` handler fetches all non-deleted workspaces and
blocks
deletion if any exist, making no distinction between human-owned
workspaces
and prebuild workspaces (owned by the system `PrebuildsSystemUserID`).
This forces admins into a manual multi-step workflow: set
`desired_instances`
to 0 on every preset, wait for the reconciler to drain prebuilds, then
retry
deletion. Prebuilds are an internal system concern that admins should
not need
to manage manually.
## Fix
Replace the blanket `len(workspaces) > 0` guard in `deleteTemplate` with
a
loop that only blocks deletion when a non-prebuild (human-owned)
workspace
exists. Prebuild workspaces — owned by `database.PrebuildsSystemUserID`
— are
now ignored during the check.
Once the template is soft-deleted (`deleted=true`), the existing
prebuilds
reconciler detects `isActive()=false` and cleans up remaining prebuilds
asynchronously. No changes to the reconciler are needed.
The error message and HTTP status for human workspaces remain unchanged.
## Testing
Added two new subtests to `TestDeleteTemplate`:
- **`OnlyPrebuilds`**: deletion succeeds when only prebuild workspaces
exist.
- **`PrebuildsAndHumanWorkspaces`**: deletion is blocked when both
prebuild
and human workspaces exist.
Existing reconciler test ("soft-deleted templates MAY have prebuilds")
already
covers post-deletion prebuild cleanup.
> **PR Stack**
> 1. #23351 ← `#23282`
> 2. #23282 ← `#23275`
> 3. **#23275** ← `#23349` *(you are here)*
> 4. #23349 ← `main`
---
## Summary
Extracts a structured error classification subsystem for agent chat
(`chatd`) so that retry and error payloads carry machine-readable
metadata — error kind, provider name, HTTP status code, and retryability
— instead of raw error strings.
This is the **backend half** of the error-handling work. The frontend
counterpart is in #23282.
## Changes
### New package: `coderd/chatd/chaterror/`
Canonical error classification — extracts error kind, provider, status
code, and user-facing message from raw provider errors. One source of
truth that drives both retry policy and stream payloads.
- **`kind.go`**: Error kind enum (`rate_limit`, `timeout`, `auth`,
`config`, `overloaded`, `unknown`).
- **`signals.go`**: Signal extraction — parses provider name, HTTP
status code, and retryability from error strings and wrapped types.
- **`classify.go`**: Classification logic — maps extracted signals to an
error kind.
- **`message.go`**: User-facing message templates keyed by kind +
signals.
- **`payload.go`**: Projectors that build `ChatStreamError` and
`ChatStreamRetry` payloads from a classified error.
### Modified
- **`codersdk/chats.go`**: Added `Kind`, `Provider`, `Retryable`,
`StatusCode` fields to `ChatStreamError` and `ChatStreamRetry`.
- **`coderd/chatd/chatretry/`**: Thinned to retry-policy only;
classification logic moved to `chaterror`.
- **`coderd/chatd/chatloop/`**: Added per-attempt first-chunk timeout
(60 s) via `guardedStream` wrapper — produces retryable
`startup_timeout` errors instead of hanging forever.
- **`coderd/chatd/chatd.go`**: Publishes normalized retry/error payloads
via `chaterror` projectors.
Nine subtests covering the poll loop, pubsub notification path,
timeout, context cancellation, descendant auth check, and both
error-status branches in handleSubagentDone.
Wire p.clock through awaitSubagentCompletion's timer and ticker
so future tests can use quartz mock clock. Tests use channel-based
coordination and context.WithTimeout instead of time.Sleep.
Coverage: awaitSubagentCompletion 0%->70.3%, handleSubagentDone
0%->100%, checkSubagentCompletion 0%->77.8%,
latestSubagentAssistantMessage 0%->78.9%.
## Summary
Large pasted text that the UI collapses into an attachment chip was
completely invisible to the LLM. Providers only accept specific MIME
types (images, PDFs) in file content blocks — a `text/plain` `FilePart`
is silently dropped, so the model received nothing for pasted content.
## Fix
Detect paste-originated text files by their
`pasted-text-{timestamp}.txt` filename pattern and convert them to
`fantasy.TextPart` with a bounded 128 KiB inline body and truncation
notice. Binary uploads and real uploaded text files keep their existing
`FilePart` semantics.
The detection uses the existing frontend naming convention
(`pasted-text-YYYY-MM-DD-HH-MM-SS.txt`) combined with a text-like MIME
check for defense-in-depth. A TODO marks this for migration to explicit
origin metadata.
<details>
<summary>Review notes: intentionally skipped findings</summary>
A 10-reviewer deep review was run on this change. The following findings
were raised and intentionally dropped after cross-check. Documenting
them here so future reviewers do not re-flag the same concerns:
**"Unresolved file IDs cause silent data loss" (Edge Case Analyst P1)**
— When a file ID is not in the resolver map, `name` stays empty and
paste detection fails. This is pre-existing behavior for ALL file types
(not introduced by this change). The resolver calls `GetChatFilesByIDs`
which returns whatever rows exist; missing IDs simply fall through to an
empty `FilePart`. The Contract Auditor independently traced this path
and confirmed the fallback is safe. If the file was deleted between
message construction and conversion, the model already saw nothing
before this patch — this change does not make it worse.
**"String builder pre-allocation overhead" (Performance Analyst P1)** —
Misidentified scope. `formatSyntheticPasteText` is only called when
`isSyntheticPaste` returns true (actual synthetic pastes), not for every
file part. The `Grow()` call is correct and efficient.
**"Constant naming violates Uber style" (Style Reviewer P1)** —
Over-severity. `syntheticPasteInlineBudget` is standard Go camelCase for
unexported constants, consistent with the Uber guide and surrounding
code.
**"`IsSyntheticPasteForTest` naming is misleading" (Style Reviewer P2)**
— This is the standard Go `export_test.go` pattern. The `ForTest` suffix
is conventional.
</details>
Linear's MCP server (`mcp.linear.app`) returns `token_type="bearer"`
(lowercase) in its OAuth2 token response but rejects requests that use
the lowercase form in the `Authorization` header. RFC 6750 says the
scheme is case-insensitive, but Linear enforces capital-B `Bearer`.
Confirmed by running the actual Linear MCP OAuth flow end-to-end:
- `Authorization: Bearer <token>` → **42 tools, works**
- `Authorization: bearer <token>` → **401 invalid_token**
This is a one-line fix: normalize any case variant of `bearer` to
`Bearer` before building the `Authorization` header, matching the
behavior of the mcp-go library's own OAuth handler.
## Problem
MCP servers like Linear (`mcp.linear.app`) require PKCE (RFC 7636) for
their OAuth2 flow. Without it, the token exchange may succeed but the
resulting access token is immediately rejected with a 401
`invalid_token` error when the chat daemon tries to connect to the MCP
server.
This means users can authenticate successfully in the UI (the OAuth
popup completes, `auth_connected` shows `true`), but the model never
receives the MCP tools — they silently fail to load.
### Root cause
The `mcpServerOAuth2Connect` handler was calling
`oauth2Config.AuthCodeURL(state)` without any PKCE parameters
(`code_challenge`, `code_challenge_method`). The callback was calling
`oauth2Config.Exchange(ctx, code)` without a `code_verifier`. Linear's
MCP OAuth endpoint decoded state confirms it expected PKCE with
`codeChallengeMethod: "plain"`.
### Investigation
- The chat (`c2c04fc5-5622-4b71-a5a9-80508e86f78e`) had the Linear MCP
server ID in `mcp_server_ids`
- `auth_connected: true` (token row exists in DB)
- No "expired" or "empty token" warnings in logs
- Server log showed: `skipping MCP server due to connection failure ...
error="initialize: transport error: request failed with status 401:
{"error":"invalid_token","error_description":"Missing or invalid access
token"}"`
- Decoding Linear's OAuth state revealed PKCE was expected
## Changes
- Generate a PKCE `code_verifier` during the OAuth2 connect step using
`oauth2.GenerateVerifier()` and store it in a cookie scoped to the
callback path
- Include `code_challenge` (S256) in the authorization redirect URL via
`oauth2.S256ChallengeOption()`
- Pass the `code_verifier` during the token exchange in the callback via
`oauth2.VerifierOption()`
- Fix a nil-pointer guard on `api.HTTPClient` in the callback
- Add tests verifying PKCE parameters are sent correctly and backwards
compatibility when no verifier cookie is present
- Detect `-testify.m` sub-test filtering in `SetupSuite` and skip the `Accounting` check.
> 🤖 This PR was created with the help of Coder Agents, and was reviewed by my human. 🧑💻
Adds a `propose_plan` tool that presents a workspace markdown file as a
dedicated plan card in the agent UI.
The workflow is: the agent uses `write_file`/`edit_files` to build a
plan file (e.g. `/home/coder/PLAN.md`), then calls `propose_plan(path)`
to present it. The backend reads the file via `ReadFile` and the
frontend renders it as an expanded markdown preview card.
**Backend** (`coderd/x/chatd/chattool/proposeplan.go`): new tool
registered as root-chat-only. Validates `.md` suffix, requires an
absolute path, reads raw file content from the workspace agent. Includes
1 MiB size cap.
**Frontend** (`site/src/components/ai-elements/tool/`): dedicated
`ProposePlanTool` component with `ToolCollapsible` + `ScrollArea` +
`Response` markdown renderer, expanded by default. Custom icon
(`ClipboardListIcon`) and filename-based label.
**System prompt** (`coderd/x/chatd/prompt.go`): added `<planning>`
section guiding the agent to research → write plan file → iterate → call
`propose_plan`.
OpenAI Responses follow-up turns were replaying full assistant/tool
history even when `store=true`, which breaks after reasoning +
provider-executed `web_search` output.
This change persists the OpenAI response ID on assistant messages, then
in `coderd/x/chatd` switches `store=true` follow-ups to
`previous_response_id` chaining with a system + new-user-only prompt.
`store=false` and missing-ID cases still fall back to manual replay.
It also updates the fake OpenAI server and integration coverage for the
chaining contract, and carries the rebased path move to `coderd/x/chatd`
plus the migration renumber needed after rebasing onto `main`.
## Problem
`TestConnectAll_MultipleServers` flakes with:
```
net/http: HTTP/1.x transport connection broken: http: CloseIdleConnections called
```
Each MCP client connection implicitly uses `http.DefaultTransport`. When
`httptest.Server.Close()` runs during parallel test cleanup, it calls
`CloseIdleConnections` on `http.DefaultTransport`, breaking in-flight
connections from other goroutines or parallel tests sharing that
transport.
## Fix
Clone the default transport for each MCP connection via
`http.DefaultTransport.(*http.Transport).Clone()`, passed through
`WithHTTPBasicClient` (StreamableHTTP) and `WithHTTPClient` (SSE). This
scopes idle connection cleanup to a single MCP server so it cannot
disrupt unrelated connections.
Fixescoder/internal#1420
## Problem
During OAuth2 auto-discovery for MCP servers, the callback URL
registered with the remote authorization server via Dynamic Client
Registration (RFC 7591) contained the literal string `{id}` instead of
the actual config UUID:
```
https://coder.example.com/api/experimental/mcp/servers/{id}/oauth2/callback
```
This happened because the discovery and registration occurred **before**
the database insert that generates the ID. When the user later initiated
the OAuth2 connect flow, the redirect URL used the real UUID, causing
the authorization server to reject it with:
> The provided redirect URIs are not approved for use by this
authorization server
## Fix
Restructure the auto-discovery flow in `createMCPServerConfig` to:
1. **Insert** the MCP server config first (with empty OAuth2 fields) to
get the database-generated UUID
2. **Build** the callback URL with the actual UUID
3. **Perform** OAuth2 discovery and dynamic client registration with the
correct URL
4. **Update** the record with the discovered OAuth2 credentials
5. **Clean up** the record if discovery fails
## Testing
Added regression test
`TestMCPServerConfigsOAuth2AutoDiscovery/RedirectURIContainsRealConfigID`
that:
- Stands up mock auth + MCP servers
- Captures the `redirect_uris` sent during dynamic client registration
- Asserts the URI contains the real config UUID, not `{id}`
- Verifies the full callback path structure
All existing MCP server config tests continue to pass.
Fallback to the configured model name in PR Insights when a model config
has a blank display name.
This updates both the by-model breakdown and recent PR rows, and adds a
regression test for blank display names.
_Disclaimer:_ _produced_ _by_ _Claude_ _Opus_ _4\.6,_ _reviewed_ _by_ _me._
**This is a breaking change.** Users who are not have `owner` or sitewide `auditor` roles will no longer be able to view interceptions.
Regular users should not need to view this information; in fact, it could be used by a malicious insider to see what information we track and don't track to exfiltrate data or perform actions unobserved.
---
Changed authorization for AI Bridge interception-related operations from system-level permissions to resource-specific permissions. The following functions now authorize against `rbac.ResourceAibridgeInterception` instead of `rbac.ResourceSystem`:
- `ListAIBridgeTokenUsagesByInterceptionIDs`
- `ListAIBridgeToolUsagesByInterceptionIDs`
- `ListAIBridgeUserPromptsByInterceptionIDs`
Updated RBAC roles to grant AI Bridge interception permissions:
- **User/Member roles**: Can create and update AI Bridge interceptions but cannot read them back
- **Service accounts**: Same create/update permissions without read access
- **Owners/Auditors**: Retain full read access to all interceptions
Removed system-level authorization bypass in `populatedAndConvertAIBridgeInterceptions` function, allowing proper resource-level authorization checks.
Updated tests to reflect the new permission model where members cannot view AI Bridge interceptions, even their own, while owners and auditors maintain full visibility.
<!--
If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting.
-->
_Disclaimer:_ _initially_ _produced_ _by_ _Claude_ _Opus_ _4\.6,_ _heavily_ _modified_ _and_ _reviewed_ _by_ _me._
Closes https://github.com/coder/internal/issues/1360
Adds a new `/api/v2/aibridge/sessions` API which returns "sessions".
Sessions, as defined in the [RFC](https://www.notion.so/coderhq/AI-Bridge-Sessions-Threads-2ccd579be59280f28021d3baf7472fbe?source=copy_link), are a set of interceptions logically grouped by a session key issued by the client.
The API design for this endpoint was done in [this doc](https://github.com/coder/internal/issues/1360).
If the client has not provided a session ID, we will revert to the thread root ID, and if that's not present we use the interception's own ID (i.e. a session of a single interception - which is effectively what we show currently in our `/api/v2/aibridge/interceptions` API).
The SQL query looks gnarly but it's relatively simple, and seems to perform well (~200ms) even when I import dogfood's `aibridge_*` tables into my workspace. If we need to improve performance on this later we can investigate materialized views, perhaps, but for now I don't think it's warranted.
---
_The PR looks large but it's got a lot of generated code; the actual changes aren't huge._
## Issue context
On `dev.coder.com`, users could successfully log in, briefly see the web
UI, and then get redirected back to `/login`.
We traced the most reliable repro to viewing Tracy's workspaces on the
`/workspaces` page. That page eagerly issues authenticated per-row
requests such as:
- `POST /api/v2/authcheck`
- `GET /api/v2/workspacebuilds/:workspacebuild/parameters`
One confirmed failing request was for Tracy's workspace
`nav-scroll-fix-1f6b`:
- route: `GET
/api/v2/workspacebuilds/f2104ae6-7d53-457c-a8df-de831bee76db/parameters`
- build owner/workspace: `tracy/nav-scroll-fix-1f6b`
The failing response body was:
- message: `An internal error occurred. Please try again or contact the
system administrator.`
- detail: `Internal error fetching API key by id. fetch object: pq:
password authentication failed for user "coder"`
That showed the request was not actually unauthorized. The server hit an
internal database/authentication problem while resolving the session API
key. The underlying issue was that DB password rotation had been
enabled, it has since been disabled.
However, the logout cascade happened because:
1. `APIKeyFromRequest()` returned `ok=false` for both genuine auth
failures and internal backend failures.
2. `ValidateAPIKey()` wrapped every `!ok` result as `401 Unauthorized`.
3. `RequireAuth.tsx` signs the user out on any `401` response.
So a transient backend/database failure was being misreported as an auth
failure, which made the client forcibly log the user out.
A useful extra clue was that the installed PWA did not repro. The PWA
starts on `/agents`, which avoids the `/workspaces` request fan-out.
That helped narrow the problem to the eager authenticated requests on
the workspace list rather than to cookies or the login flow itself.
## What changed
This PR now fixes the bug without changing the exported
`APIKeyFromRequest()` surface:
- `ValidateAPIKey()` now uses a new internal helper that returns a typed
`ValidateAPIKeyError`
- the exported `APIKeyFromRequest()` helper remains compatible for
existing callers like `userauth.go`
- internal API-key lookup failures are classified as `500 Internal
Server Error` plus `Hard: true`
- internal `UserRBACSubject()` failures now return `500 Internal Server
Error` instead of `401 Unauthorized`
- a focused regression test verifies that an internal `GetAPIKeyByID`
failure surfaces as `500`
This removes the brittle message-based classification and makes the
internal-auth-failure path robust for all API-key lookup failures
handled by auth middleware.
## What
Adds per-user per-model auto-compaction threshold overrides. Users can
now customize the percentage of context window usage that triggers chat
compaction, independently for each enabled model.
## Why
The compaction threshold was previously only configurable at the
deployment level (`chat_model_configs.compression_threshold`). Different
users have different preferences — some want aggressive compaction to
keep costs low, others prefer higher thresholds to retain more context.
This gives users control without requiring admin intervention.
## Architecture
**Storage:** Reuses the existing `user_configs` table (no migration
needed). Overrides are stored as key/value pairs with keys shaped
`chat_compaction_threshold:<modelConfigID>` and integer percent values.
**API:** Three new experimental endpoints under
`/api/experimental/chats/config/`:
- `GET /user-compaction-thresholds` — list all overrides for the current
user
- `PUT /user-compaction-thresholds/{modelConfig}` — upsert an override
(validates model exists and is enabled, validates 0–100 range)
- `DELETE /user-compaction-thresholds/{modelConfig}` — clear an override
(idempotent)
**Runtime resolution:** In `coderd/chatd/chatd.go`, a new
`resolveUserCompactionThreshold()` helper runs at the start of each chat
turn (inside `runChat()`), after the model config is resolved but before
`CompactionOptions` is built. If a valid override exists, it replaces
`modelConfig.CompressionThreshold`. The threshold source
(`user_override` vs `model_default`) is logged with each compaction
event.
**Precedence:** `effectiveThreshold = userOverride ??
modelConfig.CompressionThreshold`
**UI:** New "Context Compaction" subsection in the Agents → Settings →
Behavior tab, placed after Personal Instructions. Shows one row per
enabled model with the system default, a number input for the override,
and Save/Reset controls.
## Testing
- 9 API subtests covering CRUD, validation (boundary values 0/100,
out-of-range rejection), upsert behavior, idempotent delete, user
isolation, and non-existent model config
- 4 dbauthz tests (16 scenarios) verifying `ActionReadPersonal` /
`ActionUpdatePersonal` on all query methods
- 4 Storybook stories with play functions (Default, WithOverrides,
Loading, Error)
<details>
<summary>Implementation plan</summary>
### Phase 1 — Tests
- Backend API tests in `coderd/chats_test.go` (9 subtests)
- Database auth wrapper tests in
`coderd/database/dbauthz/dbauthz_test.go` (4 methods)
- Frontend stories in `UserCompactionThresholdSettings.stories.tsx` (4
stories)
### Phase 2 — Backend preference surface
- 4 SQL queries in `coderd/database/queries/users.sql` (list, get,
upsert, delete)
- `make gen` to propagate into generated artifacts
- Auth/metrics wrappers in dbauthz and dbmetrics
- SDK types and client methods in `codersdk/chats.go`
- HTTP handlers and routes in `coderd/chats.go` and `coderd/coderd.go`
- Key prefix constant shared between handlers and runtime
### Phase 3 — Runtime override
- `resolveUserCompactionThreshold()` helper in `coderd/chatd/chatd.go`
- Override injection in `runChat()` before building `CompactionOptions`
- `threshold_source` field added to compaction log
### Phase 4 — Settings UI
- API client methods and React Query hooks in `site/src/api/`
- `UserCompactionThresholdSettings` component extracted from
`SettingsPageContent`
- Per-model mutation tracking (only the active row disables during save)
- 100% warning, "System default" label, helpful empty state copy
### Phase 5 — Refactor and review fixes
- Consolidated key prefix constant in `codersdk`
- Explicit PUT range validation (not just struct tags)
- GET handler gracefully skips malformed rows instead of 500
- Boundary value, upsert, and non-existent model config tests
- UX improvements: per-model mutation state, aria-live on errors
</details>
## Problem
When adding an external MCP server with `auth_type=oauth2`, admins
currently must manually provide:
- `oauth2_client_id`
- `oauth2_client_secret`
- `oauth2_auth_url`
- `oauth2_token_url`
This requires the admin to manually register an OAuth2 client with the
external MCP server's authorization server first — a friction-heavy
process that contradicts the MCP spec's vision of plug-and-play
discovery.
## Solution
When an admin creates an MCP server config with `auth_type=oauth2` and
omits the OAuth2 fields, Coder now automatically discovers and registers
credentials following the MCP authorization spec:
1. **Protected Resource Metadata (RFC 9728)** — Fetches
`/.well-known/oauth-protected-resource` from the MCP server to discover
its authorization server. Falls back to probing the server URL for a
`WWW-Authenticate` header with a `resource_metadata` parameter.
2. **Authorization Server Metadata (RFC 8414)** — Fetches
`/.well-known/oauth-authorization-server` from the discovered auth
server to find all endpoints.
3. **Dynamic Client Registration (RFC 7591)** — Registers Coder as an
OAuth2 client at the auth server's registration endpoint, obtaining a
`client_id` and `client_secret` automatically.
The discovered/generated credentials are stored in the MCP server
config, and the existing per-user OAuth2 connect flow works unchanged.
### Backward compatibility
- **Manual config still works**: If all three fields
(`oauth2_client_id`, `oauth2_auth_url`, `oauth2_token_url`) are
provided, the existing behavior is unchanged.
- **Partial config is rejected**: Providing some but not all fields
returns a clear error explaining the two options.
- **Discovery failure is clear**: If auto-discovery fails, the error
message explains what went wrong and suggests manual configuration.
## Changes
- **New package `coderd/mcpauth`** — Self-contained discovery and DCR
logic with no `codersdk` dependency
- **Modified `coderd/mcp.go`** — `createMCPServerConfig` handler now
attempts auto-discovery when OAuth2 fields are omitted
- **Tests** — Unit tests for discovery (happy path, WWW-Authenticate
fallback, no registration endpoint, registration failure) and
`parseResourceMetadataParam` helper
The slices package provides type-safe generic replacements for the
old typed sort convenience functions. The codebase already uses
slices.Sort in 43 call sites; this finishes the migration for the
remaining 29.
- sort.Strings(x) -> slices.Sort(x)
- sort.Float64s(x) -> slices.Sort(x)
- sort.StringsAreSorted(x) -> slices.IsSorted(x)
Consolidates invocations of `coderdtest.New` to a single shared instance per
parent for the following tests:
- `TestOAuth2ClientMetadataValidation`
- `TestOAuth2ClientNameValidation`
- `TestOAuth2ClientScopeValidation`
- `TestOAuth2ClientMetadataEdgeCases`
> 🤖 This PR was created with the help of Coder Agents, and was
reviewed by my human. 🧑💻
Consolidates 6 tests that spun up separate coderdtest instances per sub-test into a single shared instance per parent.
> 🤖 This PR was created with the help of Coder Agents, and has been
reviewed by my human. 🧑💻
TestInterruptAutoPromotionIgnoresLaterUsageLimitIncrease still relied on
wall-clock polling after the acquire loop moved to a mock clock, so it
could assert before chatd finished its asynchronous cleanup and
auto-promotion work.
Wait on explicit request-start signals and on the server's in-flight
chat work before asserting the intermediate and final database state.
This keeps the test synchronized with the actual processor lifecycle
instead of scheduler timing.
Closes https://github.com/coder/internal/issues/1406