Commit Graph

13335 Commits

Author SHA1 Message Date
Kyle Carberry 0f86c4237e feat: add workspace MCP tool discovery and proxying for chat (#23680)
Coder's chat (chatd) can now discover and use MCP servers configured in
a workspace's `.mcp.json` file. This brings project-specific tooling
(GitHub, databases, docs servers, etc.) into the chat without any manual
configuration.

## How it works

The workspace agent reads `.mcp.json` from the workspace directory (same
format Claude Code uses), connects to the declared MCP servers —
spawning child processes for stdio servers and connecting over the
network for HTTP/SSE — and caches their tool lists. Two new agent HTTP
endpoints expose this:

- `GET /api/v0/mcp/tools` returns the cached tool list (supports
`?refresh=true`)
- `POST /api/v0/mcp/call-tool` proxies calls to the correct server

On each chat turn, chatd calls `ListMCPTools` through the existing
`AgentConn` tailnet connection, wraps each tool as a
`fantasy.AgentTool`, and adds them to the LLM's tool set alongside
built-in and admin-configured MCP tools. Tool names are prefixed with
the server name (`github__create_issue`) to avoid collisions.

Failed server connections are logged and skipped — they never block the
agent or break the chat. Child stdio processes are terminated on agent
shutdown.
2026-03-26 19:57:02 +00:00
Jeremy Ruppel 02b58534a0 fix: use TokenBadges for session list row (#23619)
The sessions list row uses a bespoke token badge instead of the
`<TokenBadge />` component, so this fixes that.
2026-03-26 15:50:32 -04:00
Atif Ali e35fa8b9ee fix(site): use restartWorkspace instead of startWorkspace in schedule dialog (#23658) 2026-03-27 00:45:16 +05:00
Jeremy Ruppel 1358233c83 fix(site): decrease chevron size in AI Bridge session row (#23688)
The Sessions table row chevron icon is too big. Make it smaller 🤏
2026-03-26 15:10:41 -04:00
Asher ea4070c0ce feat: add multi-user dialog select for adding group members (#23396)
Instead of the single-user dropdown we had before.
2026-03-26 10:42:04 -08:00
Hugo Dutka 1b2fab8306 feat(site): enable copy and paste in agents desktop (#23686) 2026-03-26 19:32:48 +01:00
Mathias Fredriksson 94e5de22f7 perf(site): fix compiler memoization gap in AgentDetailInput (#23683)
The React Compiler failed to memoize the messages derivation
chain because a useDashboard() hook call sat between the
messages computation and its consumer (getLatestContextUsage).
An IIFE around the context usage logic also fragmented the
dependency chain.

Replacing the IIFE with a ternary and reordering the non-hook
computation before the hook call lets the compiler group
messages + getLatestContextUsage into a single cache guard
keyed on messagesByID and orderedMessageIDs.
2026-03-26 18:30:06 +00:00
Mathias Fredriksson 6cbb7c6da7 fix(provisioner/terraform): regenerate fixtures with current provider (#23685)
Two test fixtures (devcontainer-resources, multiple-agents-multiple-envs)
were generated before terraform-provider-coder v2.15.0 added the
merge_strategy attribute to coder_env. Running generate.sh with the
current provider adds merge_strategy: "replace" (the default) to all
coder_env resources, causing unstable diffs on every regeneration.
2026-03-26 18:22:45 +00:00
Jeremy Ruppel fc60a6bf9b feat(site): add AIBridge Sessions to deployment menu (#23679)
Adds AI Bridge Sessions link to the deployment menu.
2026-03-26 13:56:51 -04:00
Atif Ali a52153968d fix(site): use Anthropic icon instead of Claude icon for provider (#23661) 2026-03-26 22:25:55 +05:00
Danielle Maywood d18e700699 fix(site): collapse chat toolbar badges fluidly on overflow (#23663) 2026-03-26 17:21:01 +00:00
Mathias Fredriksson 0234e8fffd perf(site): narrow buildStreamTools compiler cache guard dependencies (#23677)
The React Compiler guarded buildStreamTools on the whole
streamState ref, which changes on every text chunk. Refactoring
the function to accept toolCalls and toolResults directly lets
the compiler guard on those sub-fields, which are stable during
text-only streaming.

Before: $[0] !== streamState (misses every text chunk)
After:  $[0] !== toolCalls || $[1] !== toolResults (passes when
        only blocks change)

Verified: 181 functions compile, 0 diagnostics. Reference
stability tests confirm toolCalls/toolResults retain identity
across text-part updates and change when tool data updates.
2026-03-26 17:18:35 +00:00
david-fraley fea4560a64 fix(site): use docs() helper for hardcoded documentation URLs (#23606) 2026-03-26 17:10:47 +00:00
Danielle Maywood 6dee7cf11d perf(site/src/pages/AgentsPage): convert renderBlockList to BlockList component (#23673) 2026-03-26 17:07:36 +00:00
Danielle Maywood d4fc4e0837 fix(site): fix StreamingCodeFence storybook flake (#23681) 2026-03-26 17:00:47 +00:00
david-fraley 8da45c14bc fix(site): fix grammar in batch update description (#23605) 2026-03-26 11:59:46 -05:00
Cian Johnston bfee7e6245 fix: populate all chat fields in pubsub events (#23664)
*Problem:* `publishChatPubsubEvent` was constructing a partial
`codersdk.Chat` that omitted `LastModelConfigID` and other fields. Go's
zero-value UUID caused the sidebar to show "Default model" for chats
received via SSE.

*Solution:*
- Extracted `convertChat`/`convertChats` from `exp_chats.go` into
`db2sdk.Chat`/`db2sdk.Chats`, alongside existing `ChatMessage`,
`ChatQueuedMessage`, and `ChatDiffStatus` converters.
`publishChatPubsubEvent` now calls `db2sdk.Chat(chat, nil)` instead of
maintaining its own copy of the conversion logic
- Added backend integration test
`TestWatchChats/CreatedEventIncludesAllChatFields`
- Added frontend regression tests for nil-UUID and valid model config ID
cases

> 🤖 Created by Coder Agents, reviewed by this human.
2026-03-26 16:49:26 +00:00
Danielle Maywood 52b5d5fdc6 fix(site): match date range picker button height on template insights page (#23667) 2026-03-26 16:36:51 +00:00
Mathias Fredriksson cc4cca90fd perf(site): memo-wrap SmoothedResponse and ReasoningDisclosure to skip completed blocks during streaming (#23674)
During streaming, StreamingOutput's compiler cache guard misses
every chunk because streamState.blocks and streamTools are new
references. This causes renderBlockList to recreate all child JSX
elements, and React calls every child function even for blocks
that finished streaming.

Wrapping SmoothedResponse and ReasoningDisclosure in React.memo
lets React skip the function call entirely when props are stable.
For N completed response blocks and M completed thinking blocks,
this reduces per-chunk function calls from N+M+1 to 1. The
compiler still compiles both inner functions cleanly (6 and 12
cache slots respectively, zero diagnostics).
2026-03-26 16:35:12 +00:00
Hugo Dutka 081d91982a fix(site): fix desktop visibility glitch (#23678)
If the desktop viewer component was hidden, for example after collapsing
the sidebar, the next time it was shown the viewer would be blank. This
PR fixes that.
2026-03-26 17:20:16 +01:00
Danielle Maywood 00cd7b7346 fix: match workspace picker font size to plus menu dropdown (#23670) 2026-03-26 16:12:24 +00:00
Danny Kopping 801e57d430 feat: session detail API (#23203) 2026-03-26 18:09:53 +02:00
Michael Suchacz e937f89081 feat: add enabled toggle to chat model admin panel (#23665)
Adds an `enabled` toggle to the chat model admin create/edit form so
admins
can disable a model without soft-deleting it. Disabled models stay
visible
in admin settings but stop appearing in user-facing model selectors.

The backend already supported this (`chat_model_configs.enabled` column,
filtered queries, and SDK fields). This change wires it into the admin
UI
and adds coverage on both sides.

**Backend:** three new subtests in `coderd/exp_chats_test.go` verifying
the visibility contract (admin sees disabled models, non-admin doesn't,
update-to-disabled preserves the record).

**Frontend:** `enabled` field added to form logic and seeded from the
existing model (defaults to `true` for new models). A Switch+Tooltip
control renders in the form header, matching the MCP Server panel
pattern.
Two interaction stories cover the create-disabled and toggle-existing
flows.
2026-03-26 17:07:20 +01:00
Danielle Maywood 5c7057a67f fix(site): enable streaming-mode Streamdown for live chat output (#23676) 2026-03-26 15:22:54 +00:00
Ehab Younes 249ef7c567 feat(site): harden Agents embed frame communication and add theme sync (#23574)
Add theme synchronization, navigation blocking, scroll-to-bottom
handling, and chat-ready signaling to the agent embed page. The parent
frame can now set light/dark theme via postMessage or query param, and
ThemeProvider skips its own class manipulation when the embed marker is
present. Navigation attempts that leave the embed route are intercepted
and forwarded to the parent frame. The scroll container ref is lifted
to the layout so the parent can request scroll-to-bottom.
2026-03-26 18:03:30 +03:00
Cian Johnston 81fe7543b4 chore: set tls.VersionTLS12 MinVersion in cli/server.go to address gosec warning (#23646)
I was investigating `//nolint` comments and this one popped up.
It raised my eyebrows enough to warrant its own PR.
2026-03-26 14:53:47 +00:00
Kyle Carberry 61d2a4a9b8 fix(site): preserve streaming output when queued message is sent (#23595)
## Problem

When the user sends a message while the agent is actively streaming a
response, `handleSend` called `store.clearStreamState()`
**unconditionally before** the POST request. If the server queues the
message (`response.queued = true` because the agent is busy), the
in-progress stream output is immediately wiped from the UI. The full
text only reappears once the agent finishes and the durable message
arrives via WebSocket — causing a visible cutoff mid-stream.

## Fix

Move `clearStreamState()` from before the POST to **after** the
response, gated behind `!response.queued`:

- **Queued sends** (`response.queued === true`): `clearStreamState()` is
never called. The stream continues uninterrupted. The WebSocket `status`
handler already clears stream state when the chat transitions to
`"pending"` / `"waiting"` after the queued message is dequeued.
- **Non-queued sends** (`response.queued === false`):
`clearStreamState()` + `upsertDurableMessage()` fire immediately after
the POST, same net behavior as before.
- **Edit and promote paths**: Unchanged — those are intentional
interruptions where eager clearing is correct.

### Additional behavior changes (both improvements)

1. **Failed sends no longer wipe stream state.** Previously
`clearStreamState()` ran before the `try` block, so a network error
still wiped the agent's in-progress output. Now the `catch` re-throws
before reaching `clearStreamState()`, preserving the stream on failure.

2. **`clearStreamState()` fires for all non-queued responses**, not just
those with a `message` body. The original guard was `!response.queued &&
response.message`; now `clearStreamState()` is under `!response.queued`
while `upsertDurableMessage` retains the `response.message` check. The
server always sets `message` for non-queued responses, so this is a
no-op in practice but is semantically correct.

## Testing

**AgentDetail.stories.tsx**: New `StreamingSurvivesQueuedSend` story
exercises the full flow — mocks `createChatMessage` to return `{ queued:
true }`, delivers streaming text via WebSocket, sends a message through
the UI, and asserts the streaming text remains visible.
2026-03-26 10:35:31 -04:00
Mathias Fredriksson b23c07cf23 perf(site): use lazy iteration in sliceAtGraphemeBoundary (#23671)
Array.from(graphemeSegmenter.segment(text)) materializes the
entire text into an array before iterating, even though the loop
breaks early at the visible prefix length. During streaming at
60fps, this makes each frame O(full text) instead of O(prefix).

Benchmark on 5000-char text with 200-char prefix: 22.6x faster
(1.44ms to 0.06ms per call, saving 8.3% of the frame budget).
The fallback codepoint path had the same issue with Array.from.
2026-03-26 16:33:48 +02:00
Ethan 87aafd4ae2 fix(site): stabilize date-dependent storybook snapshots (#23657)
_Generated by mux but reviewed by a human_

Several stories computed dates relative to `dayjs()` / `new Date()` at
render time, causing snapshot text to shift daily. I ran into this on my
PRs.

This adds an optional `now` prop to `DateRangePicker`,
`TemplateInsightsControls`, and `CreateTokenForm` so stories can inject
a deterministic clock without global mocking. License stories replace
the misleadingly-named `FIXED_NOW = dayjs().startOf("day")` with
absolute timestamps. All fixed timestamps use noon UTC to avoid timezone
boundary issues.

Affected stories:
- `AgentSettingsPageView`: Usage Date Filter, Usage Date Filter Refetch
Overlay
- `LicenseCard`: Expired/future AI Governance variants, Not Yet Valid
- `LicensesSettingsPage`: Shows Addon Ui For Future License Before Nbf
- `TemplateInsightsControls`: Day
- `CreateTokenPage`: Default
2026-03-27 01:21:52 +11:00
Ethan 4d74603045 fix(coderd/x/chatd): respect provider Retry-After headers in chat retry loop (#23351)
> **PR Stack**
> 1. **#23351** ← `#23282` *(you are here)*
> 2. #23282 ← `#23275`
> 3. #23275 ← `#23349`
> 4. #23349 ← `main`

---

## Summary

`chatretry.Retry()` used pure exponential backoff (1 s, 2 s, 4 s, …) and
never consulted provider `Retry-After` headers. Fantasy's
`ProviderError` carries `ResponseHeaders` including `Retry-After`, but
`chaterror.Classify()` only parsed error text and silently dropped the
structured transport metadata.

This makes `Retry-After` a first-class signal in the classification →
retry pipeline.

<img width="853" height="346" alt="image"
src="https://github.com/user-attachments/assets/65f012b6-8173-43d2-957e-ab9faddea525"
/>


## Changes

### `coderd/chatd/chaterror/classify.go`

- Added `RetryAfter time.Duration` field to `ClassifiedError` — a
normalized minimum retry delay derived from provider response metadata.
- `Classify()` now calls `extractProviderErrorDetails()` before falling
back to text heuristics. Structured `ProviderError.StatusCode` takes
priority over regex extraction.
- `normalizeClassification()` preserves and clamps `RetryAfter`.

### `coderd/chatd/chaterror/provider_error.go` (new)

Provider-specific extraction, isolated from the text-based
classification logic:

- `extractProviderErrorDetails()` unwraps `*fantasy.ProviderError` from
the error chain via `errors.As`.
- `retryAfterFromHeaders()` parses headers in priority order:
  1. `retry-after-ms` (OpenAI-specific, millisecond precision)
  2. `retry-after` (standard HTTP — integer seconds or HTTP-date)
- Case-insensitive header key lookup.

### `coderd/chatd/chatretry/chatretry.go`

- `effectiveDelay(attempt, classified)` computes `max(Delay(attempt),
classified.RetryAfter)` — the provider hint acts as a floor without
weakening the local exponential backoff.
- `Retry()` now uses `effectiveDelay` and passes the effective delay to
both `onRetry(...)` and the sleep timer, so downstream payloads, logs,
and the frontend countdown stay aligned automatically.

### Tests

- `classify_test.go`: Structured provider status + `Retry-After`
extraction, `retry-after-ms` priority, HTTP-date parsing, invalid header
fallback, `WithProvider` preservation.
- `chatretry_test.go`: Retry-after-as-floor semantics — longer hint
wins, shorter hint keeps base delay.

## Design notes

- **No SDK/API/frontend changes needed.** `codersdk.ChatStreamRetry`
already carries `DelayMs` and `RetryingAt`, and the frontend already
consumes them. The fix is purely in the server-side delay computation.
- **Existing retryability rules unchanged.** This fixes *when* we sleep,
not *whether* an error is retryable.
- **Provider hint is a floor:** `max(baseDelay, RetryAfter)` ensures we
never retry earlier than the provider asks, and never weaken our own
backoff curve.
2026-03-27 01:20:46 +11:00
Cian Johnston 847a88c6ca chore: clean up stale and dangerous //nolint comments (#23643)
## Changes

- **Commit 1**: Remove 17 unnecessary `//nolint` directives:
  - `//nolint:varnamelen` — linter not active
  - `//nolint:unused` on exported `SlimUnsupported`
  - `//nolint:govet` in `coderd/httpmw/csrf` — no longer fires
  - `//nolint:revive` on functions refactored since the nolint was added
- `//nolint:paralleltest` citing Go 1.22 loop variable capture
(obsolete)
- Bare `//nolint` narrowed to specific `//nolint:gocritic` with
justification

- **Commit 2**: Fix root causes behind 5 dangerous nolint suppressions:
- Add `MinVersion: tls.VersionTLS12` to TLS client config (removes
`gosec` G402)
- Delete trivial unexported wrappers `apiKey()`/`normalizeProvider()` in
chatprovider (removes `revive` confusing-naming)
- Add doc comments to `StartWithAssert` and `Router` (removes `revive`
exported)
  - Rename unused parameters to `_` in integration test helpers

> 🤖 This PR was created using Coder Agents and reviewed by me.
2026-03-26 14:13:53 +00:00
Jeremy Ruppel a0283ff775 fix(site): use toLocaleString for pagination offsets (#23669)
The Pagination widget localizes the number format of the total results
but not the page offsets.

Before

<img width="620" height="78" alt="Screenshot 2026-03-26 at 09 18 01"
src="https://github.com/user-attachments/assets/7ac0ad9a-7baa-4b30-b3d0-0e0325f8433b"
/>

After

<img width="297" height="42" alt="Screenshot 2026-03-26 at 9 41 22 AM"
src="https://github.com/user-attachments/assets/79c68366-95fa-4012-8419-5cd6f6e10ae3"
/>
2026-03-26 09:50:49 -04:00
Cian Johnston f164463c6a fix(scripts/metricsdocgen): shush the prometheus scanner in CI (#23642)
- Suppress informational `log.Printf` messages from the metrics scanner
when stdout is not a TTY (i.e. piped via `atomic_write` in `make gen` or
CI)
- Genuine warnings (`warnf`) still print unconditionally so real
problems remain visible
- `log.Fatalf` for fatal errors is unchanged

> 🤖 Created by Coder Agents and reviewed by a human
2026-03-26 12:58:02 +00:00
Michael Suchacz 4f063cdc47 feat: separate default and additional Coder Agents system prompts (#23616)
Admins can now control whether the built-in Coder Agents default system
prompt is prepended to their custom instructions, rather than having the
custom prompt silently replace the default.

**Changes:**
- New `include_default_system_prompt` boolean toggle (defaults to `true`
for existing deployments) stored as a site config key — no migration
needed.
- GET `/api/experimental/chats/config/system-prompt` returns the toggle
state, the custom prompt, and a preview of the built-in default.
- PUT persists both the toggle and custom prompt atomically in a single
transaction.
- `resolvedChatSystemPrompt()` composes `[default?, custom?]` joined by
`\n\n`, falling back to the built-in default on DB errors.
- Settings UI adds a Switch toggle with conditional helper text and a
"Preview" button that shows the built-in default prompt via the existing
`TextPreviewDialog`.
- Comprehensive test coverage: 15 subtests covering toggle behavior,
prompt composition matrix, auth boundaries, and integration with chat
creation.
2026-03-26 13:32:41 +01:00
Cian Johnston d175e799da feat: show agent badge on workspace list (#23453)
- Adds `GET /api/experimental/chats/by-workspace` endpoint that returns
workspace_id → latest chat_id mapping
- Modifies FE to fetch this alongside the workspace list, gated on
`agents` experiment and render an "Agent" badge similar to the existing
"Task" badge in `WorkspacesTable`
- Badge links to the "latest chat" linked to the given workspace.

Notes:
- Intentionally uses `fetchWithPostFilter` for RBAC to decouple from
workspaces API — will migrate to `workspaces_expanded` view later.
- If users have multiple chats linked to the same workspace, the badge
will link to the most recently updated one.

> 🤖 This PR was created with the help of Coder Agents, and has been
reviewed by my human. 🧑‍💻
2026-03-26 11:30:12 +00:00
Jaayden Halko 3fb7c6264f feat: display the AI add-on column in the UI on the Users and Organization Members tables (#23291)
## Summary

Adds an entitlement-gated **AI add-on** column to both the **Users**
table and the **Organization Members** table. When
`ai_governance_user_limit` is entitled, each row shows whether the user
is consuming an AI seat.

## Background

The AI governance add-on tracks which users are consuming AI seats.
Admins need visibility into per-user seat consumption directly from the
user management tables. This change surfaces that information through
both the site-wide Users table and the per-organization Members table,
gated behind the `ai_governance_user_limit` entitlement so the column
only appears when the feature is licensed.

## Implementation

### Backend
- **New SQL query** `GetUserAISeatStates`
(`coderd/database/queries/aiseatstate.sql`) — returns user IDs consuming
an AI seat, derived from:
  - Users with entries in `aibridge_interceptions` (AI Bridge usage)
- Users who own workspaces with `has_ai_task = true` builds (AI Tasks
usage)
- **SDK types** — added `has_ai_seat: boolean` to `codersdk.User` and
`codersdk.OrganizationMemberWithUserData`
- **Handler wiring** — both the Users list endpoint (`coderd/users.go`)
and all Members endpoints (`coderd/members.go`) query AI seat state per
page of user IDs and populate the response field
- **dbauthz** — per-user `ActionRead` checks on `ResourceUserObject`

### Frontend
- **Shared `AISeatCell` component**
(`site/src/modules/users/AISeatCell.tsx`) — green `CircleCheck` for
consuming, gray `X` for non-consuming
- **`TableColumnHelpTooltip`** — extended with `ai_addon` variant with
tooltip: *"Users with access to AI features like AI Bridge, Boundary, or
Tasks who are actively consuming a seat."*
- **Column visibility** gated behind
`useFeatureVisibility().ai_governance_user_limit`

## Validation

- Backend: dbauthz full method suite (`TestMethodTestSuite`) passes
including new `GetUserAISeatStates` test
- Backend: `TestGetUsers`, `TestUsersFilter`, CLI golden file tests pass
- Frontend: 7/7 tests pass across `UsersPage.test.tsx` and
`OrganizationMembersPage.test.tsx` (column visibility gating both
directions)
- `go build ./coderd/...` compiles clean
- `pnpm --dir site run lint:types` passes
- `make gen` clean

## Risks

- **Pagination performance**: The AI seat query is scoped to the current
page's user IDs (not a full table scan), keeping it efficient for
paginated views.
- **Semantic scope**: The workspace-side AI seat derivation uses "any
build with `has_ai_task = true`" rather than "latest build only". If the
product intent is latest-build-only, this can be tightened in a
follow-up.

---

_Generated with `mux` • Model: `anthropic:claude-opus-4-6` • Thinking:
`xhigh` • Cost: `$27.25`_

<!-- mux-attribution: model=anthropic:claude-opus-4-6 thinking=xhigh
costs=27.25 -->
2026-03-26 10:36:40 +00:00
Danny Kopping 09d2588e2a docs: AI session auditing (#23660)
_Disclaimer: produced with the help of Claude Opus 4.6, heavily modified
by me._

Closes https://github.com/coder/internal/issues/1341

---------

Signed-off-by: Danny Kopping <danny@coder.com>
2026-03-26 09:49:53 +00:00
Danny Kopping 8eade29e68 chore: update AI Bridge warning to require AI Governance Add-On (#23662)
*Disclaimer: implemented by a Coder Agent using Claude Opus 4.6,
reviewed by me.*

Replace the transitional soft warning message:

> AI Bridge is now Generally Available in v2.30. In a future Coder
version, your deployment will require the AI Governance Add-On to
continue using this feature. Please reach out to your account team or
sales@coder.com to learn more.

with the definitive requirement message:

> The AI Governance Add-On is required to use AI Bridge. Please reach
out to your account team or sales@coder.com to learn more.

Updated in:
- `enterprise/coderd/license/license.go`
- `enterprise/coderd/license/license_test.go` (2 occurrences)
2026-03-26 11:10:53 +02:00
Ethan 15f2fa55c6 perf(coderd/x/chatd): add process-wide config cache for hot DB queries (#23272)
## Summary

Adds a process-wide cache for three hot database queries in `chatd` that
were hitting Postgres on **every chat turn** despite returning
rarely-changing configuration data:

| Query | Before (50k turns) | After | Reduction |
|---|---|---|---|
| `GetEnabledChatProviders` | ~98.6k calls | ~500-1000 | ~99% |
| `GetChatModelConfigByID` | ~49.2k calls | ~500-1000 | ~98% |
| `GetUserChatCustomPrompt` | ~46.7k calls | ~1000-2000 | ~97% |

These were identified via `coder exp scaletest chat` (5000 concurrent
chats × 10 turns) as the dominant source of Postgres load during chat
processing.

## Design

Follows the established **webpush subscription cache pattern**
(`coderd/webpush/webpush.go`):
- `sync.RWMutex` + `tailscale.com/util/singleflight` (generic) +
generation-based stale prevention + TTL
- 10s TTL for provider/model config, 5s TTL for user prompts
- Negative caching for `sql.ErrNoRows` on user prompts (the common case
— most users don't set custom prompts)
- Deep-clones `ChatModelConfig.Options` (`json.RawMessage` = `[]byte`)
on both store and read paths

### Invalidation

Single pubsub channel (`chat:config_change`) with kind discriminator for
cross-replica cache invalidation. Seven publish points in
`coderd/chats.go` cover all admin mutation endpoints
(create/update/delete for providers and model configs, put for user
prompts).

_This PR was generated with mux and was reviewed by a human_
2026-03-26 18:04:53 +11:00
Danny Kopping 2ff329b68a feat(site): add banner on request-logs page directing users to sessions (#23629)
*Disclaimer: implemented by a Coder Agent using Claude Opus 4.6*

Adds an info banner on the `/aibridge/request-logs` page encouraging
users to visit `/aibridge/sessions` for an improved audit experience.

This allows us to validate whether customers still find the raw request
logs view useful before removing it in a future release.

Fixes #23563
2026-03-26 11:57:50 +05:00
Ethan ad3d934290 fix(site/src/pages/AgentsPage): clear retry banner on stream forward progress (#23653)
When a provider request fails and retries, the "Retrying request" banner
lingered in the UI after the retry succeeded. This happened because
`retryState` was only cleared on explicit `status` events (`running`,
`pending`, `waiting`), not when the stream resumed with `message_part`
or `message` events. Since the backend does not publish a
dedicated"retry resolved" event, the banner stayed visible for the
entire duration of the successful response.

Add `store.clearRetryState()` calls to the `message_part`, `message`,
and `status` event handlers so the banner disappears as soon as content
flows again.

Closes https://github.com/coder/coder/issues/23624
2026-03-26 17:41:13 +11:00
Ethan 21c2acbad5 fix: refine chat retry status UX (#23651)
Follow-up to #23282. The retry and terminal error callouts had a few UX
oddities:

- Auto-retrying states reused backend error text that said "Please try
again" even while the UI was already retrying on behalf of the user.
- Terminal error states also said "Please try again" with no action the
user could take.
- `startup_timeout` had no specific title or retry copy — it fell
through to the generic "Retrying request" heading.
- The kind pill showed raw enum values like `startup_timeout` and
`rate_limit`.
- Terminal error metadata showed a "Retryable" / "Not retryable" label
that does not help users.
- A separate "Provider anthropic" metadata row duplicated information
already present in the message body.
- The `usage-limit` error kind used a hyphen while every backend kind
uses underscores.

Changes:

**Backend (`chaterror/message.go`)**

- Split message generation into `terminalMessage()` and
`retryMessage()`, replacing the old `userFacingMessage()`.
- Terminal messages include HTTP status codes and actionable guidance
(e.g. "Check the API key, permissions, and billing settings.").
- Retry messages are clean factual statements without status codes or
remediation, suitable for the retry countdown UI (e.g. "Anthropic is
temporarily overloaded.").
- Removed "Please try again" / "Please try again later" from all paths.
- `StreamRetryPayload` calls `retryMessage()` instead of forwarding
`classified.Message`.

**Frontend**

- Removed the parallel frontend message-generation system:
`getRetryMessage()`, `getProviderDisplayName()`,
`getRetryProviderSubject()`, and the `PROVIDER_DISPLAY_NAMES` map are
all deleted from `chatStatusHelpers.ts`.
- `liveStatusModel.ts` passes `retryState.error` through directly — the
backend owns the copy.
- Added specific title and retry copy for `startup_timeout`, and
extended the title mapping to cover `auth` and `config`.
- Kind pills now show humanized labels ("Startup timeout", "Rate limit",
etc.) instead of raw enum strings.
- Removed the redundant "Provider anthropic" metadata row.
- Removed the terminal "Retryable" / "Not retryable" badge.
- Normalized `"usage-limit"` → `"usage_limit"` and added it to
`ChatProviderFailureKind` so all error kinds follow the same underscore
convention and live in one enum.

Refs #23282.
2026-03-26 17:37:27 +11:00
Ethan 411714cd73 fix(dogfood/coder): tolerate stale gh auth state (#23588)
## Problem

The dogfood startup script uses `gh auth status` to decide whether to
re-authenticate the GitHub CLI. That command exits non-zero when **any**
stored credential is invalid—even if Coder external auth already injects
a working `GITHUB_TOKEN` into the environment and `gh` commands work
fine.

On workspaces with a persistent home volume, `~/.config/gh/hosts.yml`
retains OAuth tokens written by previous `gh auth login --with-token`
calls. These tokens are issued by Coder's external auth integration and
can be rotated or revoked between workspace starts, but the copy in
`hosts.yml` persists on the volume. When the stored token goes stale,
`gh auth status` reports two accounts:

```
✓ Logged in to github.com account user (GITHUB_TOKEN)           ← works fine
✗ Failed to log in to github.com account user (hosts.yml)       ← stale token
```

It exits 1 because of the stale entry, even though `gh` API calls
succeed via `GITHUB_TOKEN`. This makes the auth state **indeterminate**
from `gh auth status` alone—you can't tell whether `gh` actually works
or not.

When the script enters the login branch:

1. `gh auth login --with-token` **refuses** to accept piped input when
`GITHUB_TOKEN` is already set in the environment, and exits 1.
2. `set -e` kills the script before it reaches `sudo service docker
start`.

The result: Docker never starts, devcontainer health checks fail, and
the workspace reports a startup error—all because of a stale GitHub CLI
credential that has no bearing on workspace functionality.

## Fix

- Switch the auth guard from `gh auth status` to `gh api user --jq
.login`, which tests whether GitHub API access actually works regardless
of which credential provides it.
- Wrap the fallback `gh auth login` so a failure logs the indeterminate
state but does not abort the script.
2026-03-26 17:25:42 +11:00
Ethan 61e31ec5cc perf(coderd/x/chatd): persist workspace agent binding across chat turns (#23274)
## Summary

This change removes the steady-state "resolve the latest workspace
agent" query from chat execution.

Instead of asking the database for the latest build's agent on every
turn, a chat now persists the workspace/build/agent binding it actually
uses and reuses that binding across subsequent turns. The common path
becomes "load the bound agent by ID and dial it", with fallback paths to
repair the binding when it is missing, stale, or intentionally changed.

## What changes

- add `workspace_id`, `build_id`, and `agent_id` binding fields to
`chats`
- expose those fields through the chat API / SDK so the execution
context is explicit
- load the persisted binding first in chatd, instead of always resolving
the latest build's agent
- persist a refreshed binding when chatd has to re-resolve the workspace
agent
- keep child / subagent chats on the same bound workspace context by
inheriting the parent binding
- leave `build_id` / `agent_id` unset for flows like `create_workspace`,
then bind them lazily on the next agent-backed turn

## Runtime behavior

The binding is treated as an optimistic cache of the agent a chat should
use:

- if the bound agent still exists and dials successfully, we use it
without a latest-build lookup
- if the bound agent is missing or no longer reachable, chatd
re-resolves against the latest build and persists the new binding
- if a workspace mutation changes the chat's target workspace, the
binding is updated as part of that mutation

To avoid reintroducing a hot-path query, dialing uses lazy validation:

- start dialing the cached agent immediately
- only validate against the latest build if the dial is still pending
after a short delay
- if validation finds a different agent, cancel the stale dial, switch
to the current agent, and persist the repaired binding

## Result

The hot path stops issuing
`GetWorkspaceAgentsInLatestBuildByWorkspaceID` for every user message,
which is the source of the DB pressure this PR is addressing. At the
same time, chats still converge to the correct workspace agent when the
binding becomes stale due to rebuilds or explicit workspace changes.
2026-03-26 17:22:38 +11:00
Ethan 17aea0b19c feat(site): make long execute tool commands expandable (#23562)
Previously, long bash commands in the execute tool were truncated with
an ellipsis and could not be viewed in full. The only way to see the
full command was to copy it via the copy button.

Adds overflow detection and an inline expand/collapse chevron next to
the copy button. Clicking the command text or the chevron toggles
between truncated and wrapped views. Short commands that fit on one line
are visually unchanged.



https://github.com/user-attachments/assets/88ec6cd4-5212-4608-9a90-9ce217d5dce7

EDIT: couldn't be bothered re-recording the video but the chevron is
hidden until hovered now, like the copy button.
2026-03-26 15:49:23 +11:00
Ethan 5112ab7da9 fix(site/e2e): fix flaky updateTemplate test expecting transient URL (#23655)
_PR generated by Mux but reviewed by a human_

## Problem

The e2e test `template update with new name redirects on successful
submit` is flaky.

After saving template settings, the app navigates to
`/templates/<name>`, which immediately redirects to
`/templates/<name>/docs` via the router's index route (`<Navigate
to="docs" replace />`). The assertion used `expect.poll()` with
`toHavePathNameEndingWith(`/${name}`)`, which matches only the
**transient intermediate URL** — it only exists while `TemplateLayout`'s
async data fetch is pending. Once the fetch resolves and the `<Outlet
/>` renders, the index route fires the `/docs` redirect and the URL no
longer matches.

## Why it's flaky (not deterministic)

The flakiness depends on whether the template query cache is warm:

- **Cache miss → PASSES**: The mutation's `onSuccess` handler
invalidates the query cache. If `TemplateLayout` needs to re-fetch, it
shows a `<Loader />`, which delays rendering the `<Outlet />` that
contains the `<Navigate to="docs">`. This gives `expect.poll()` time to
see the transient `/new-name` URL → **pass**.
- **Cache hit → FAILS**: If the template data is still in the query
client, `TemplateLayout` renders immediately and the `<Navigate
to="docs" replace />` fires nearly instantly. By the time the first poll
runs, the URL is already `/new-name/docs` → **fail**.

## Fix

Assert the **final stable URL** (`/${name}/docs`) instead of the
transient one.

This is safe because `expect.poll()` is retry-based: it keeps sampling
until a match is found (or timeout). Seeing the transient `/new-name`
URL just causes harmless retries — once the redirect completes and the
URL settles on `/new-name/docs`, the poll matches and the test passes.

| Poll | URL | Ends with `/new-name/docs`? | Action |
|---|---|---|---|
| 1st | `/templates/new-name` | No | Retry |
| 2nd | `/templates/new-name` | No | Retry |
| 3rd | `/templates/new-name/docs` | Yes | **Pass**  |

Closes https://github.com/coder/internal/issues/1403
2026-03-26 04:32:44 +00:00
Cian Johnston 7a9d57cd87 fix(coderd): actually wire the chat template allowlist into tools (#23626)
Problem: previously, the deployment-wide chat template allowlist was never actually wired in from `chatd.go`

- Extracts `parseChatTemplateAllowlist` into shared `coderd/util/xjson.ParseUUIDList`
- Adds `Server.chatTemplateAllowlist()` method that reads the allowlist from DB
- Passes `AllowedTemplateIDs` callback to `ListTemplates`, `ReadTemplate`, and `CreateWorkspace` tool constructors

> 🤖 Created by Coder Agents and reviewed by a human.
2026-03-25 22:15:27 +00:00
david-fraley dab4e6f0a4 fix(site): use standard dismiss label for cancel confirmation dialogs (#23599) 2026-03-25 21:24:53 +00:00
Kayla はな 0e69e0eaca chore: modernize typescript api client/types imports (#23637) 2026-03-25 15:21:19 -06:00
Kyle Carberry 09bcd0b260 fix: revert "refactor(site/src/pages/AgentsPage): normalize transcript scrolling" (#23638)
Reverts coder/coder#23576
2026-03-25 20:24:42 +00:00