The `/chats/{chat}/diff-status` endpoint was redundant because:
- The `Chat` type already has a `DiffStatus` field
- Listing chats already resolves and returns `diff_status`
- The `getChat` endpoint was the only one not resolving it (passing
`nil`)
## Changes
**Backend:**
- `getChat` now calls `resolveChatDiffStatus` and includes the result in
the response
- Removed `getChatDiffStatus` handler, route (`GET /diff-status`), and
SDK method
- Tests updated to use `GetChat` instead of `GetChatDiffStatus`
**Frontend:**
- `AgentDetail.tsx`: uses `chatQuery.data?.diff_status` instead of
separate query
- `RemoteDiffPanel.tsx`: accepts `diffStatus` as a prop instead of
fetching internally
- `AgentsPage.tsx`: `diff_status_change` events now invalidate the chat
query
- Removed `chatDiffStatus` query, `chatDiffStatusKey`, and
`getChatDiffStatus` API method
## Summary
- add an `IS DISTINCT FROM` guard to `InsertChatMessage`'s
`updated_chat` CTE so `chats.last_model_config_id` is only rewritten
when the incoming `model_config_id` actually changes
- regenerate the query layer
- add focused regression coverage for the two meaningful behaviors:
same-model inserts and real model switches
- trim redundant message-field assertions so the new test stays focused
on the guard behavior
## Proof this is an improvement
This PR reduces work in the hottest chat write query without changing
the insert behavior.
### Why the old query did unnecessary work
Before this change, `InsertChatMessage` always ran this update whenever
`model_config_id` was non-null:
```sql
UPDATE chats
SET last_model_config_id = sqlc.narg('model_config_id')::uuid
WHERE id = @chat_id::uuid
AND sqlc.narg('model_config_id')::uuid IS NOT NULL
```
That means the query rewrote the `chats` row even when
`chats.last_model_config_id` was already equal to the incoming value.
### What changes in this PR
This PR adds:
```sql
AND chats.last_model_config_id IS DISTINCT FROM sqlc.narg('model_config_id')::uuid
```
So same-model inserts still insert the message, but they no longer
perform a redundant `UPDATE chats`.
### Why this matters on the hot path
From the chat scaletest investigation that motivated this change:
- `InsertChatMessage` (+ `updated_chat` CTE) was the hottest write query
- about **104k calls**
- about **0.69 ms average latency**
- about **71.8 s total DB execution time**
We also verified common callsites where the update is provably
redundant:
- `CreateChat` inserts the chat with `LastModelConfigID =
opts.ModelConfigID`, then immediately inserts initial system/user
messages with that same model config
- follow-up user messages commonly pass `lockedChat.LastModelConfigID`
straight into `InsertChatMessage`
- assistant/tool/summary persistence keeps the current model in the
common case; only real switches or fallback cases need the chat row
update
That means a meaningful fraction of executions of the hottest DB write
query move from:
- **before:** insert message **+** rewrite chat row
- **after:** insert message only
This should reduce row churn and write contention on `chats`, especially
against other chat-row writers like `UpdateChatStatus` and
`GetChatByIDForUpdate`.
Fixes flaky `TestChatCostSummary_UnpricedMessages` (and siblings) by
replacing implicit handler-default date windows with explicit time
windows derived from database-assigned message timestamps.
**Root cause:** Tests called `GetChatCostSummary` with empty options,
triggering the handler to use `[time.Now()-30d, time.Now())` as the
query window. The SQL filter's exclusive upper bound (`created_at <
@end_date`) can exclude freshly-inserted messages when the handler's
clock drifts even slightly past the message's `created_at`.
**Fix (test-only, `coderd/chats_test.go`):**
- `seedChatCostFixture` now captures `InsertChatMessage` return values
and exposes `EarliestCreatedAt`/`LatestCreatedAt`.
- Added `safeOptions()` helper that builds a padded ±1 min window around
DB timestamps.
- Updated 4 tests to use explicit date windows;
`TestChatCostSummary_DateRange` unchanged.
Validated with `go test -count=20` (100/100 passes).
### Motivation
- The chat creation flow associated a workspace agent for a chat if the requester could read the workspace, enabling privilege escalation where users without SSH/app-connect permissions could cause the daemon to open privileged agent connections and execute commands.
- The intent is to ensure that attaching a workspace agent to a chat only happens when the requester has the workspace SSH permission so the chat daemon cannot be abused to bypass RBAC.
### Description
- Require request-scoped authorization for workspace agent usage by changing `validateCreateChatWorkspaceSelection` to accept the `*http.Request` and calling `api.Authorize(r, policy.ActionSSH, workspace)` before selecting the workspace for a chat.
- Pass the HTTP request into the validator from `postChats` so authorization is evaluated in the request context (`postChats` now calls `validateCreateChatWorkspaceSelection(ctx, r, req)`).
- Add a regression test `WorkspaceAccessibleButNoSSH` in `coderd/chats_test.go` which creates an org-admin-scoped user (read access but no `ActionSSH`) and asserts that creating a chat with `WorkspaceID` is denied.
### Testing
- Ran `gofmt -w coderd/chats.go coderd/chats_test.go` which succeeded.
- Attempted to run repository pre-commit checks (`make pre-commit`) and targeted `go test` invocations; these checks could not be completed in this environment due to missing local tooling and environment constraints (protobuf include resolution, containerized DB access via Docker socket, and long-running golden generation tasks), so full CI/pre-commit verification and end-to-end test runs did not complete here.
- Added a focused regression unit test (`WorkspaceAccessibleButNoSSH`) to prevent reintroduction of the authorization bypass; this test is included in the change and should be executed in CI where the full toolchain and test environment are available.
------
[Codex Task](https://chatgpt.com/codex/tasks/task_b_69b432502670832e91d14e937745de46)
The script source claimed Dev Containers are early access and told
users to set CODER_AGENT_DEVCONTAINERS_ENABLE=true, which already
defaults to true. Clear the script source and set RunOnStart to
false since there is nothing to run.
Adds the `head_branch` field (the source/feature branch name of a PR) to
the diff status pipeline. Previously only `base_branch` (target branch)
and the head commit SHA were captured from the GitHub API, but not the
head branch name itself.
## Changes
- **Migration 438**: Add `head_branch` nullable TEXT column to
`chat_diff_statuses`
- **gitprovider**: Parse `head.ref` from the GitHub API response
(alongside `head.sha`) and add `HeadBranch` to `PRStatus`
- **gitsync**: Wire `HeadBranch` through `refreshOne()` into the DB
upsert params
- **worker**: Map `HeadBranch` in `chatDiffStatusFromRow()`
- **coderd**: Convert `HeadBranch` in `convertChatDiffStatus()`
- **codersdk**: Expose as `head_branch` (`*string`, omitempty) in
`ChatDiffStatus` API response
- **Tests**: Updated `github_test.go` pull JSON fixtures and assertions
### Motivation
- The desktop watch handler opened a VNC stream using the chat's
workspace ID while only relying on workspace read permissions, allowing
read-only users to escalate to interactive desktop access.
- Enforce connect-level authorization so only actors with
`ActionApplicationConnect` or `ActionSSH` can open the desktop stream.
### Description
- Added an explicit workspace lookup in `watchChatDesktop` using
`GetWorkspaceByID` to obtain a workspace object for authorization.
- Require the requester to be authorized for either
`policy.ActionApplicationConnect` or `policy.ActionSSH` on the workspace
before proceeding to locate agents or connect to the VNC stream, and
return `403 Forbidden` when neither permission is present.
- The change is minimal and localized to `coderd/chats.go` and does not
alter other code paths or behavior when the requester has the necessary
connect permissions.
### Testing
- Ran `gofmt -w coderd/chats.go` to format the modified file, which
succeeded.
- Attempted to run the unit test `TestWatchChatDesktop/NoWorkspace` via
`go test` in this environment but the test run did not complete within
the environment constraints and did not produce a full pass result.
- Attempted to run the repository pre-commit/gen steps but they could
not complete due to missing developer tooling and services in this
environment (e.g. `sqlc`, `mockgen`, `protoc` plugins and test services
like Docker/Postgres), so full pre-commit validation did not finish
here.
- Code review and static validation confirm the added authorization
check properly prevents read-only access from opening the desktop VNC
stream.
------
[Codex
Task](https://chatgpt.com/codex/tasks/task_b_69b46a4ac5c4832ea9d330aeba43c32d)
Surfaces cache token data in the analytics views and fixes table
spacing.
### Changes
- **Cache token columns**: Added cache read and cache write token counts
to all analytics views (user and admin), from SQL queries through Go SDK
types to the frontend tables and summary cards.
- **Table spacing fix**: Replaced the bare React fragment in
`ChatCostSummaryView` with a `space-y-6` container so the model and chat
breakdown tables no longer overlap.
### Data flow
`chat_messages` table already stores `cache_read_tokens` and
`cache_creation_tokens` (and uses them for cost calculation). This PR
aggregates and displays them alongside input/output tokens in:
- Summary cards (6 cards: Total Cost, Input, Output, Cache Read, Cache
Write, Messages)
- Per-model breakdown table
- Per-chat breakdown table
- Admin per-user table
This PR adds a `WatchAllWorkspaces` function with `watch-all-workspaces`
endpoint, which can be used to listen on a single global pubsub channel
for _all_ workspace build updates, and makes use of it in the autostart
scaletest.
This negates the need to use a workspace watch pubsub channel _per_
workspace, which has auth overhead associated with each call. This is
especially relevant in situations such as the autostart scaletest, where
we need to start/stop a set of workspaces before we can configure their
autostart config. The overhead associated with all the watch requests
skews the scaletest results and makes it harder to reason about the
performance of the autostart feature itself.
The autostart scaletest also no longer generates its own metrics nor
does it wait for all the workspaces to actually start via autostart. We
should update the scaletest dashboard after both PRs are merged to
measure autostart performance via the new metrics.
The new function/endpoint and its usage in the autostart scaletest are
gated behind an experiment feature flag, this is something we should
discuss whether we want to enable the endpoint in prod by default or
not. If so, we can remove the experiment.
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Callum Styan <callum@coder.com>
- Use t.Errorf in chattest non-streaming helpers so encoding
failures fail the test
- Thread testing.TB into writeResponsesAPIStreaming and log
SSE write errors instead of silently dropping them
- Bump createworkspace DB error log from Warn to Error
- Use errors.Join for timeout + output error in execute.go
Implement the backend for the desktop feature for agents.
- Adds a new `/api/experimental/chats/$id/desktop` endpoint to coderd
which exposes a VNC stream from a
[portabledesktop](https://github.com/coder/portabledesktop) process
running inside the workspace
- Adds a new `spawn_computer_use_agent` tool to chatd, which spawns a
subagent that has access to the `computer` tool which lets it interact
with the `portabledesktop` process running inside the workspace
- Adds the plumbing to make the above possible
There's a follow up frontend PR here:
https://github.com/coder/coder/pull/23006
Handle previously ignored error return values in coderd:
- coderd/chats.go: check sendEvent errors, log on failure
- coderd/chatd/chattest: thread testing.TB through server structs,
replace log.Printf with t.Logf, check writeSSEEvent errors
- coderd/chatd/chattool/createworkspace.go: log UpdateChatWorkspace
failure instead of discarding both return values
- coderd/chatd/chattool/execute.go: surface ProcessOutput error in
the timeout message returned to the caller
- coderd/provisionerdserver: log stream.Send failure in the
DownloadFile error helper
- Adds `extractJSON()` to strip markdown code fences before JSON parsing and wire into the `json.Unmarshal` call in `generateFromAnthropic`.
- Accepts variadic `RequestOption` in `generateFromAnthropic` so tests can inject a mock Anthropic server via `WithBaseURL`.
- Adds table-driven cases covering bare JSON, fenced with/without language tag, surrounding whitespace, and multiline JSON.
- Adds end-to-end cases using `httptest.NewServer` to serve fake Anthropic SSE streams with bare and fenced responses.
Add cost tracking for LLM chat interactions with microdollar precision.
## Changes
- Add `chatcost` package for per-message cost calculation using
`shopspring/decimal` for intermediate arithmetic
- **Ceil rounding policy**: fractional micros round UP to next whole
micro (applied once after summing all components)
- Database migration: `total_cost_micros` BIGINT column with historical
backfill and `created_at` index
- API endpoints: per-user cost summary and admin rollup under
`/api/experimental/chats/cost/`
- SDK types: `ChatCostSummary`, `ChatCostModelBreakdown`,
`ChatCostUserRollup`
- Fix `modeloptionsgen` to handle `decimal.Decimal` as opaque numeric
type
- Update frontend pricing test fixtures for string decimal types
## Design decisions
- `NULL` = unpriced (no matching model config), `0` = free
- Reasoning tokens included in output tokens (no double-counting)
- Integer microdollars (BIGINT) for storage and API responses
- Price config uses `decimal.Decimal` for exact parsing; totals use
`int64`
Frontend: #23037
Migration 000434 converts chat_messages.role from text to a Postgres
enum, rebuilds the partial index, and adds content_version smallint.
The column is backfilled with DEFAULT 0, then the default is dropped
so future inserts must set it explicitly.
Version 0 uses the role-aware heuristic from #22958. Version 1 (all
new inserts) stores []ChatMessagePart JSON for all roles, including
system messages. ParseContent takes database.ChatMessage directly
and dispatches on version internally. Unknown versions error.
All string(codersdk.ChatMessageRole*) casts at DB write sites are
replaced with database.ChatMessageRole* constants from sqlc.
Refs #22958
File-reference parts in user messages were flattened to `TextContent` at
write time because fantasy has no file-reference content type. The
frontend never saw them as structured parts.
This moves all write paths (user, assistant, tool) from fantasy envelope
format to `codersdk.ChatMessagePart`. The streaming layer (`chatloop`)
is untouched, the conversion happens at the serialization boundary in
`persistStep`.
Old rows are still readable. `ParseContent` uses a structural heuristic
(`isFantasyEnvelopeFormat`) to distinguish legacy envelopes from SDK
parts. We chose this over try/fallback because fantasy envelopes
partially unmarshal into `ChatMessagePart` (the `type` field matches)
while silently losing content. A guard test enforces that no SDK part
can produce the envelope shape.
This is forward-only: new rows are unreadable by old code. Chat is
behind a feature flag so rollback risk is contained.
Also adds a typed `ChatMessageRole` to replace raw strings and
`fantasy.MessageRole*` casts at the persistence boundary. The type
covers `ChatMessage.Role`, `ChatStreamMessagePart.Role`, the
`PublishMessagePart` callback chain, and all DB write sites.
`fantasy.MessageRole*` remains only where we build `fantasy.Message`
structs for LLM dispatch.
Separately, `ProviderMetadata` was leaking to SSE clients via
`publishMessagePart`. `StripInternal` now runs on both the SSE and REST
paths, covering this.
Other cleanup:
- Old `db2sdk.contentBlockToPart` silently dropped metadata on
text/reasoning/tool-call content. New code preserves it.
- `providerMetadataToOptions` now logs warnings instead of silently
returning nil.
- `db2sdk` shrinks from ~250 lines of parallel conversion to ~15 lines
delegating to `chatprompt.ParseContent()`, removing the `fantasy` import
entirely.
Refs #22821
_Disclaimer: implemented by a Coder Agent using Claude Opus 4.6._
Marks the injected MCP approach in AI Bridge as deprecated across the
codebase.
## Changes
- **`codersdk/deployment.go`**: Deprecated `ExternalAuthConfig.MCPURL`,
`.MCPToolAllowRegex`, `.MCPToolDenyRegex` fields; deprecated and hid the
`--aibridge-inject-coder-mcp-tools` server flag; deprecated
`AIBridgeConfig.InjectCoderMCPTools`.
- **`coderd/externalauth/externalauth.go`**: Deprecated `Config.MCPURL`,
`.MCPToolAllowRegex`, `.MCPToolDenyRegex`.
- **`enterprise/aibridgedserver/aibridgedserver.go`**: Added runtime
deprecation warning when `CODER_AIBRIDGE_INJECT_CODER_MCP_TOOLS` is
enabled; deprecated `getCoderMCPServerConfig`.
- **`enterprise/aibridged/mcp.go`**: Deprecated `MCPProxyBuilder`
interface and `MCPProxyFactory` struct.
- **`docs/ai-coder/ai-bridge/mcp.md`**: Added deprecation warning
banner.
## Summary
Adds a new `GET /api/v2/debug/profile` endpoint that collects multiple
pprof profiles in a single request and returns them as a tar.gz archive.
This allows collecting profiles (including block and mutex) without
requiring `CODER_PPROF_ENABLE` to be set, and without restarting
`coderd`.
Closes#21679
## What it does
The endpoint:
- Temporarily enables block and mutex profiling (normally disabled at
runtime)
- Runs CPU profile and/or trace for a configurable duration (default
10s, max 60s)
- Collects snapshot profiles (heap, allocs, block, mutex, goroutine,
threadcreate)
- Returns a tar.gz archive containing all requested `.prof` files
- Uses an atomic bool to prevent concurrent collections (returns 409
Conflict)
- Is protected by the existing debug endpoint RBAC (owner-only)
**Supported profile types:** cpu, heap, allocs, block, mutex, goroutine,
threadcreate, trace
**Query parameters:**
- `duration`: How long to run timed profiles (default: `10s`, max:
`60s`)
- `profiles`: Comma-separated list of profile types (default:
`cpu,heap,allocs,block,mutex,goroutine`)
## Additional changes
- **SDK client method** (`codersdk.Client.DebugCollectProfile`) for easy
programmatic access
- **`coder support bundle --pprof` integration**: tries the consolidated
endpoint first, falls back to individual `/debug/pprof/*` endpoints for
older servers
- **8 new tests** covering defaults, custom profiles, trace+CPU,
validation errors, authorization, and conflict detection
## Summary
Moves the messages response out of `GET /chats/{id}` and into a
dedicated `GET /chats/{id}/messages` endpoint.
### Backend
- `GET /chats/{id}` now returns just the `Chat` object (no messages)
- `GET /chats/{id}/messages` is a new endpoint returning
`ChatMessagesResponse` with `messages` and `queued_messages`
- Added `ChatMessagesResponse` SDK type and `GetChatMessages` client
method
### Frontend
- `getChat()` API method returns `Chat` instead of `ChatWithMessages`
- Added `getChatMessages()` API method for the new endpoint
- Split `chatQuery` into two: `chatQuery` (metadata) and
`chatMessagesQuery` (messages)
- Updated all cache mutations, optimistic updates, and websocket
handlers
- Updated tests and stories
### Files changed
| File | Change |
|---|---|
| `coderd/coderd.go` | Register `GET /messages` route |
| `coderd/chats.go` | Simplify `getChat`, add `getChatMessages` handler
|
| `codersdk/chats.go` | New type + method, update `GetChat` return |
| `site/src/api/api.ts` | New method, update `getChat` |
| `site/src/api/queries/chats.ts` | New query, update cache mutations |
| `site/src/pages/AgentsPage/AgentDetail.tsx` | Use separate queries |
| `site/src/pages/AgentsPage/AgentDetail/ChatContext.ts` | Update types
and cache writes |
| `site/src/pages/AgentsPage/AgentsPage.tsx` | Update websocket cache
handler |
The timeout was started before the unbounded Stepper loop, so
under CI load the deadline could expire before reaching the
operations that actually use it.
Also bumps TestMigration000387 from WaitLong to WaitSuperLong.
Fixescoder/internal#1398
## Summary
Extract a `healthyChecker()` test helper that returns an all-healthy
baseline `testChecker` in `coderd/healthcheck`. Each `TestHealthcheck`
table-driven test case now only overrides the single report field being
tested, instead of repeating all 6 healthy report structs.
- Reduces `healthcheck_test.go` from 603 to 341 lines (~260 lines, 43%
reduction)
- Test coverage unchanged at 77.2%
- All test cases and assertions preserved exactly
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- `coderd/httpapi/websocket.go`: add `net.ErrClosed` +
`websocket.CloseStatus` checks; extract `heartbeatCloseWith` with
`quartz.Clock` parameter for testability
- `coderd/httpapi/websocket_internal_test.go`: new test file
## Problem
When a chat is interrupted while tools are executing, the step content
(text, reasoning, tool calls, and partial tool results) was being lost.
Two gaps existed:
1. **During tool execution**: `executeTools` returns with error results
for interrupted tools, but the subsequent `PersistStep(ctx, ...)` fails
on the canceled context and returns `ErrInterrupted` without persisting
anything.
2. **PersistStep race**: If the context is canceled between the
post-tool interrupt check and the `PersistStep` call, the same loss
occurs.
This is inconsistent with how we handle stream interruptions (which
properly flush and persist partial content via `persistInterruptedStep`)
and how [coder/blink](https://github.com/coder/blink) handles
interruptions (always inserting the response message regardless of
execution phase).
## Fix
Two changes in `chatloop.go`:
- **Post-tool-execution interrupt check**: After `executeTools` returns,
check if the context was interrupted and route through
`persistInterruptedStep` (which uses `context.WithoutCancel` internally)
to save the accumulated content.
- **PersistStep fallback**: If `PersistStep` returns `ErrInterrupted`,
retry via `persistInterruptedStep` so partial content is not lost.
## Tests
- `TestRun_InterruptedDuringToolExecutionPersistsStep`: Verifies that
when a tool is blocked and the chat is interrupted, the step (text +
reasoning + tool call + tool error result) is persisted via the
interrupt-safe path.
- `TestRun_PersistStepInterruptedFallback`: Verifies that when
`PersistStep` itself returns `ErrInterrupted`, the step is retried via
the fallback path and content is saved.
## Problem
When a step contains both provider-executed tool calls (e.g. Anthropic
web search) and local tool calls in parallel, the next loop iteration
fails with the Anthropic API claiming the regular tool call has no
result. However, sending a new user message (which reloads messages from
the DB) works fine.
## Root cause
`toResponseMessages` was placing **all** tool results into the tool-role
message, regardless of `ProviderExecuted`. When Fantasy's Anthropic
provider later converted these messages for the API, it moved the
provider tool result from the tool message to the **end** of the
previous assistant message (`prevMsg.Content = append(...)`). This
placed `web_search_tool_result` **after** the regular `tool_use` block:
```
assistant: [server_tool_use(A), tool_use(B), web_search_tool_result(A)] ← wrong order
user: [tool_result(B)]
```
The persistence layer in `chatd.go` already handles this correctly —
provider-executed tool results stay in the assistant message, producing
the expected ordering:
```
assistant: [server_tool_use(A), web_search_tool_result(A), tool_use(B)] ← correct order
user: [tool_result(B)]
```
This is why reloading from the DB fixed it.
## Fix
In the `ContentTypeToolResult` case of `toResponseMessages`, route
provider-executed results to `assistantParts` instead of `toolParts`,
matching the persistence layer's behavior.
## Testing
Added
`TestToResponseMessages_ProviderExecutedToolResultInAssistantMessage`
which verifies that mixed provider+local tool results are split
correctly between the assistant and tool messages.
## Problem
The gitsync worker polls every 10s and refreshes up to 50 stale
`chat_diff_status` rows **sequentially**, sharing a single 10-second
context timeout. With 50 rows × 1–3 HTTP calls each, the timeout is
exhausted quickly, causing cascading `context deadline exceeded` errors.
Rows with no linked OAuth token (`ErrNoTokenAvailable`) fail fast but
recur every 120s, wasting batch capacity.
## Solution
Three targeted fixes:
### 1. Concurrent refresh processing
`Refresher.Refresh()` now launches goroutines bounded by a semaphore
(`defaultConcurrency = 10`). Provider/token resolution remains
sequential (fast DB lookups); only the HTTP calls run in parallel.
Per-group rate-limit detection uses `atomic.Pointer[RateLimitError]`
with best-effort skip of remaining rows — a rate-limit hit on one
provider doesn't stall requests to other providers.
### 2. Decoupled tick timeout
New `defaultTickTimeout = 30s`, separate from `defaultInterval = 10s`.
The `tick()` method uses `tickTimeout` for its context deadline, giving
concurrent HTTP calls enough headroom to complete without stalling the
next polling cycle.
### 3. Longer backoff for no-token errors
New `NoTokenBackoff = 10 * time.Minute` (exported). When `errors.Is(err,
ErrNoTokenAvailable)`, the worker applies a 10-minute backoff instead of
`DiffStatusTTL` (2 minutes). Retrying every 2 minutes is pointless until
the user manually links their external auth account.
## Design decisions
- Both `NewRefresher` and `NewWorker` accept variadic option functions
(`RefresherOption`, `WorkerOption`) for backward compatibility —
existing callers in `coderd/coderd.go` need no changes.
- `WithConcurrency(n)` and `WithTickTimeout(d)` are available for tests
and future tuning.
- Added `resolvedGroup` struct to cleanly separate the pre-resolution
phase from the concurrent execution phase.
## Testing
- **`TestRefresher_RateLimitSkipsRemainingInGroup`** — rewritten to be
goroutine-order-independent (verifies aggregate counts instead of
per-index results).
- **`TestRefresher_ConcurrentProcessing`** — new test using a gate
channel to prove N goroutines enter `FetchPullRequestStatus`
simultaneously.
- **`TestWorker_RefresherError_BacksOffRow`** — rewritten to use
branch-name-based failure determination instead of non-deterministic
`callCount`.
- **`TestWorker_NoTokenBackoff`** — new test verifying
`ErrNoTokenAvailable` triggers 10-minute backoff.
- All tests pass under `-race -count=3`.
## Problem
Both `start_workspace` and `create_workspace` chattool tools failed to
handle soft-deleted workspaces correctly.
Coder uses soft-delete for workspaces (`deleted = true` on the row).
Both tools called `GetWorkspaceByID`, which queries
`workspaces_expanded` with **no** `deleted = false` filter — so it
returns the workspace row even when soft-deleted. The only deletion
check was for `sql.ErrNoRows`, which never fires because the row still
exists.
### `start_workspace` behavior (before fix)
1. Loads the soft-deleted workspace successfully
2. Finds the latest build (a delete transition)
3. Falls through to attempt to **start** the deleted workspace
4. Produces a confusing downstream error
### `create_workspace` behavior (before fix)
1. `checkExistingWorkspace` loads the soft-deleted workspace
2. If a delete build is **in-progress**: waits for it, then falsely
reports `already_exists` — blocks new workspace creation
3. If the delete build **succeeded**: accidentally allows creation
(because no agents are found), but via fragile logic rather than an
explicit check
## Fix
Add `ws.Deleted` checks immediately after `GetWorkspaceByID` succeeds in
both tools:
- **`startworkspace.go`**: Returns `"workspace was deleted; use
create_workspace to make a new one"`
- **`createworkspace.go`** (`checkExistingWorkspace`): Returns `(nil,
false, nil)` to allow new workspace creation
## Tests
- `TestStartWorkspace/DeletedWorkspace` — verifies `start_workspace`
returns deleted error and never calls `StartFn`
- `TestCheckExistingWorkspace_DeletedWorkspace` — verifies
`checkExistingWorkspace` allows creation for soft-deleted workspaces
WaitBuffer is a thread-safe io.Writer that supports blocking until
accumulated output matches a substring or custom predicate. It
replaces ad-hoc safeBuffer/syncWriter types and time.Sleep-based
poll loops in tests with signal-driven waits.
- WaitFor/WaitForNth/WaitForCond for blocking on output
- Replace custom buffer types in cli/sync_test.go and
provisionersdk/agent_test.go
- Convert time.Sleep poll loops to require.Eventually/require.Never
in cli/ssh_test.go, coderd/activitybump_test.go,
coderd/workspaceagentsrpc_test.go, workspaceproxy_test.go, and
scaletest tests
Fixes Anthropic 400 error on multi-turn conversations with web search:
> web_search tool use with id srvtoolu_... was found without a
corresponding web_search_tool_result block
Provider-executed tool results (e.g. `web_search`) had a nil `Result`
field, which serialized as `"result":null`. Fantasy's
`UnmarshalToolResultOutputContent` couldn't deserialize `null` back, so
the entire assistant message became unreadable after persistence. On the
next LLM call, Anthropic rejected the conversation because
`server_tool_use` had no matching `web_search_tool_result`.
**Fix:** Bump the fantasy fork to e4bbc7bb3054 which returns `nil, nil`
for null `Result` JSON instead of erroring.
**Testing:** Added `integration_test.go` with
`TestAnthropicWebSearchRoundTrip` (requires `ANTHROPIC_API_KEY`) that:
- Sends a query triggering web search
- Verifies the persisted assistant message contains all parts the UI
needs: `tool-call(PE)`, `source`, `tool-result(PE)`, and `text`
- Sends a follow-up to confirm the round-trip works with Anthropic
## Problem
Anthropic's API returns a 400 error when `web_search` tool results are
missing:
```
web_search tool use with id srvtoolu_... was found without a corresponding web_search_tool_result block
```
**Root cause:** `persistStep` in `chatd.go` splits ALL
`ToolResultContent` blocks into separate tool-role DB rows.
Provider-executed (PE) tool results like `web_search` must stay in the
assistant message — Anthropic expects `server_tool_use` and
`web_search_tool_result` in the same turn.
The previous fix (#22976) added repair passes to drop PE results during
reconstruction, which fixed cross-step orphans but broke the normal case
(PE result correctly in the same step).
## Fix
Three changes that address the root cause:
1. **`persistStep` (chatd.go):** Check `ProviderExecuted` before
splitting `ToolResultContent` into tool rows. PE results stay in
`assistantBlocks` and are stored in the assistant content column.
2. **`ToMessageParts` (chatprompt.go):** Propagate the
`ProviderExecuted` field to `ToolResultPart` so the fantasy Anthropic
provider can identify PE results and reconstruct the
`web_search_tool_result` block.
3. **Keep existing repair passes** for backward compatibility with
legacy DB data where PE results were incorrectly persisted as separate
tool messages.
## Tests
- `TestProviderExecutedResultInAssistantContent` — PE result stored
inline in assistant content round-trips correctly with
`ProviderExecuted` preserved.
- `TestProviderExecutedResult_LegacyToolRow` — legacy PE results in
tool-role rows are still dropped correctly.
- All existing tests pass (including the 3 PE tests from #22976).
## Summary
- avoid duplicating preset headers when cachecompress serves compressed
`/bin/*` responses
- add a cachecompress regression test for preset
`X-Original-Content-Length` and `ETag` headers
- strengthen site binary tests to assert those headers stay
single-valued
## Problem
`site/bin.go` sets `X-Original-Content-Length` and `ETag` on the real
response writer before delegating.
`cachecompress` then snapshotted those headers and replayed them with
`Header().Add(...)`, which duplicated them on compressed responses.
For `coder-desktop-macos`, duplicate `X-Original-Content-Length` values
can collapse into a comma-separated string and fail `Int64` parsing,
causing the file size to show as `Unknown`.
## Testing
- `/usr/local/go/bin/go test ./coderd/cachecompress -run
'TestCompressorPresetHeaders|TestCompressorHeadings' -count=1`
- `/usr/local/go/bin/go test ./site -run TestServingBin -count=1`
- `PATH=/usr/local/go/bin:$PATH make lint/go`
## Notes
- Skipped full `make pre-commit` with explicit approval because local
environment/tooling blocked it (Node version/path interaction in
generated site targets, plus missing local tools before setup).
## Problem
The summarization prompt explicitly tells the model to **"Omit
pleasantries and next-step suggestions"** and the summary prefix frames
the compacted context as passive history: `Summary of earlier chat
context:`. After compaction mid-task, the model reads a factual recap
with no forward momentum, loses its direction, and either stops or asks
the user what to do.
## Research
I compared our compaction prompt against several other agents:
| Agent | Key Pattern |
|---|---|
| **Codex** | Prompt says *"Include what remains to be done (clear next
steps)"*. Prefix: *"Another language model started to solve this
problem..."* |
| **Mux** | Includes *"Current state of the work (what's done, what's in
progress)"* + appends the user's follow-up intent |
| **Continue** | *"Make sure it is clear what the current stream of work
was at the very end prior to compaction so that you can continue exactly
where you left off"* |
| **Copilot Chat** | Dedicated sections for *Active Work State*, *Recent
Operations*, *Pre-Summary State*, and a *Continuation Plan* with
explicit next actions |
**Every other major agent explicitly preserves forward intent and
in-progress state.** Coder was the only one telling the model to omit
next steps.
## Changes
**Summary prompt:**
- Removes `Omit next-step suggestions`
- Adds structured `Include:` list with explicit items for in-progress
work, remaining work, and the specific action being performed when
compaction fired
- Frames the operation as `context compaction` (matching Codex's
framing)
**Summary prefix:**
- Old: `Summary of earlier chat context:`
- New: `The following is a summary of the earlier conversation. The
assistant was actively working when the context was compacted. Continue
the work described below:`
The prefix is the first thing the model reads post-compaction — framing
it as an active handoff with an explicit "Continue" directive primes the
model to resume work rather than wait.
## Summary
- add chat model pricing metadata to the agents admin form and SDK
metadata
- split pricing into its own section and show default pricing as
placeholders
- apply default pricing when admins leave pricing fields blank
## Problem
1. **Personal behavior prompt not applied**: The chatd background worker
was missing `ActionReadPersonal` on `ResourceUser` in its RBAC subject.
When `resolveUserPrompt` calls `GetUserChatCustomPrompt`, the dbauthz
layer checks `ActionReadPersonal` on the user — which the chatd role
didn't have. The error was silently swallowed (returns `""`), so the
user's custom prompt was never injected into the system messages.
2. **Sequential DB calls on chat startup**: Several independent database
queries in `runChat` and `resolveChatModel` were running sequentially,
adding unnecessary latency before the LLM stream begins.
## Changes
### RBAC fix (`dbauthz.go`)
- Add `rbac.ResourceUser.Type: {policy.ActionReadPersonal}` to
`subjectChatd` site permissions
- This is the minimal permission needed — `ActionRead` on User remains
denied
### Parallelization (`chatd.go`)
Three parallelization points using `errgroup.Group`:
1. **`resolveChatModel`**: `resolveModelConfig` and
`GetEnabledChatProviders` run concurrently (both needed for
`ModelFromConfig`, which stays sequential after the wait)
2. **`runChat` startup**: `resolveChatModel` and
`GetChatMessagesForPromptByChatID` run concurrently (completely
independent)
3. **`runChat` prompt assembly**: `resolveInstructions` and
`resolveUserPrompt` run concurrently (both produce strings;
`InsertSystem` calls maintain correct order after the wait)
Same pattern applied to the `ReloadMessages` callback.
### Test (`dbauthz_test.go`)
- Add assertion in `TestAsChatd/AllowedActions` that
`ActionReadPersonal` on `ResourceUser` is permitted
## What
Adds provider-native web search tools to the chat system. Anthropic,
OpenAI, and Google all offer server-side web search — this wires them up
as opt-in per-model config options using the existing
`ChatModelProviderOptions` JSONB column (no migration).
Web search is **off by default**.
## Config
Set `web_search_enabled: true` in the model config provider options:
```json
{
"provider_options": {
"anthropic": {
"web_search_enabled": true,
"allowed_domains": ["docs.coder.com", "github.com"]
}
}
}
```
Available options per provider:
- **Anthropic**: `web_search_enabled`, `allowed_domains`,
`blocked_domains`
- **OpenAI**: `web_search_enabled`, `search_context_size`
(`low`/`medium`/`high`), `allowed_domains`
- **Google**: `web_search_enabled`
## Backend
- `codersdk/chats.go` — new fields on the per-provider option structs
- `coderd/chatd/chatd.go` — `buildProviderTools()` reads config, creates
`ProviderDefinedTool` entries (uses `anthropic.WebSearchTool()` helper
from fantasy)
- `coderd/chatd/chatloop/chatloop.go` — `ProviderTools` on `RunOptions`,
merged into `Call.Tools`. Provider-executed tool calls skip local
execution. `StreamPartTypeToolResult` with `ProviderExecuted: true` is
accumulated inline (matching fantasy's own agent.go pattern) instead of
post-stream synthesis.
- `coderd/chatd/chatprompt/` — `MarshalToolResult` carries
`ProviderMetadata` through DB persistence so multi-turn round-trips work
(Anthropic needs `encrypted_content` back)
## Frontend
- Source citations render **inline** at the tool-call position (not
bottom-of-message), using `ToolCollapsible` so they look like other tool
cards — collapsed "Searched N results" with globe icon, expand to see
source pills
- Provider-executed tool calls/results are hidden from the normal tool
card UI
- Tool-role messages with only provider-executed results return `null`
(no empty bubble)
- Both persisted (messageParsing.ts) and streaming (streamState.ts)
paths group consecutive `source` parts into a single `{ type: "sources"
}` render block
## Fantasy changes
The fantasy fork (`kylecarbs/fantasy` branch `cj/go1.25`) has the
Anthropic tool code merged in, but will hopefully go upstream from:
https://github.com/charmbracelet/fantasy/pull/163
## Summary
Scale-tested the `chatd` package with mock-based benchmarks to identify
performance bottlenecks. This PR fixes 6 of the 8 identified issues,
ranked by severity.
## Changes
### 1. Parallel tool execution (HIGH) — `chatloop.go`
`executeTools` ran tool calls sequentially. Now dispatches all calls
concurrently via goroutines with `sync.WaitGroup`. Results are
pre-allocated by index (no mutex needed). `onResult` callbacks fire as
each tool completes.
### 2. Pubsub-backed subagent await (HIGH) — `subagent.go`
`awaitSubagentCompletion` polled the DB every 200ms. Now subscribes to
the child chat's `ChatStreamNotifyChannel` via pubsub for near-instant
notifications. Fallback poll reduced to 5s. Falls back to 200ms only
when `pubsub == nil` (single-instance / in-memory).
### 3. Per-chat stream locking (MEDIUM) — `chatd.go`
Replaced single global `streamMu` + `map[uuid.UUID]*chatStreamState`
with `sync.Map` where each `chatStreamState` has its own `sync.Mutex`.
Zero cross-chat contention.
### 4. Batch chat acquisition (MEDIUM) — `chatd.go`
`processOnce` acquired 1 chat per tick. Now loops up to
`maxChatsPerAcquire = 10` per tick, avoiding idle time when many chats
are pending.
### 5. Reduced heartbeat frequency (LOW-MEDIUM) — `chatd.go`
`chatHeartbeatInterval` changed from 30s to 60s. Safe given the 5-minute
`DefaultInFlightChatStaleAfter`.
### 6. O(depth) descendant check (LOW) — `subagent.go`
Replaced top-down BFS (`O(total_descendants)` queries) with bottom-up
parent-chain walk (`O(depth)` queries). Includes cycle protection.
## Not addressed (intentionally)
- Message serialization overhead
- Buffer eviction (`buffer[1:]` pattern)
## Problem
Two separate code paths refreshed chat diff statuses:
1. **HTTP handler** (`refreshChatDiffStatus`): resolved
provider/token/status inline, ran under the user's context. Worked fine
because the user owns their external auth links.
2. **Background worker** (`Refresher.Refresh`): ran under `AsChatd`
context, which lacked `ActionReadPersonal` on `ResourceUser`.
`GetExternalAuthLink` failed silently (`if err != nil { continue }`),
returning `ErrNoTokenAvailable` every time. Chat diff statuses got
`git_branch`/`git_remote_origin` from `MarkStale` but `refreshed_at`,
`url`, `pull_request_state` stayed nil.
Having two paths also meant bug fixes had to be applied twice.
## Fix
- **`Worker.RefreshChat`**: New method for synchronous, on-demand
refresh of a single chat. Uses the same `Refresher.Refresh` pipeline as
the background `tick()`. Called by the HTTP handler for instant
response.
- **`resolveChatGitAccessToken`**: Uses
`dbauthz.AsSystemRestricted(ctx)` specifically for `GetExternalAuthLink`
and `RefreshToken` calls. This is scoped to just those DB operations
rather than broadening the chatd RBAC role.
- **Removed**: `refreshChatDiffStatus`, `shouldRefreshChatDiffStatus`,
`resolveChatDiffStatusWithOptions` (all replaced by the single
`RefreshChat` path).
## Tests
Added 4 tests for `Worker.RefreshChat`:
- `TestRefreshChat_Success`: full refresh + upsert + publish
- `TestRefreshChat_NoPR`: no PR exists yet, nil result
- `TestRefreshChat_RefreshError`: provider resolution fails
- `TestRefreshChat_UpsertError`: refresh succeeds but DB write fails
## Why tests didn't catch the original bug
- Worker tests used mock stores (no dbauthz) and fake token resolvers
(hardcoded lambdas)
- No integration test exercised `AsChatd` -> `resolveChatGitAccessToken`
-> `GetExternalAuthLink` through dbauthz
## Problem
`insertAgentApp` mutated its input by writing to `app.Healthcheck` when
it was nil (line 3525):
```go
if app.Healthcheck == nil {
app.Healthcheck = &sdkproto.Healthcheck{} // mutation!
}
```
The Devcontainers subtests share the same `tt.resource` pointer across
two parallel goroutines (`WithProtoIDs` and `WithoutProtoIDs`), causing
a data race on the `Healthcheck` field (and its sub-fields `Url`,
`Interval`, `Threshold`).
## Fix
Replace the in-place mutation with a local variable:
```go
healthcheck := app.GetHealthcheck()
if healthcheck == nil {
healthcheck = &sdkproto.Healthcheck{}
}
```
This avoids writing back to the shared proto message. All downstream
reads now use the local `healthcheck` variable.
Adds `pull_request_title` and `pull_request_draft` to the chat diff
status pipeline (DB → provider → SDK → frontend). The GitHub provider
now fetches the PR title alongside existing status fields.
The agents sidebar now displays PR-state-aware icons for chats that have
a linked pull request (when the chat is in waiting/completed state):
- **Open PR**: `GitPullRequestArrow` (green)
- **Draft PR**: `GitPullRequestDraft` (gray)
- **Merged PR**: `GitMerge` (purple)
- **Closed PR**: `GitPullRequestClosed` (red)
Running/pending/paused/error chats keep their existing activity icons
(spinner, pause, error triangle).
### Changes
**Database migration** (`000432`): Adds `pull_request_title TEXT` and
`pull_request_draft BOOLEAN` columns to `chat_diff_statuses`.
**Backend pipeline**:
- `gitprovider.PRStatus` gains a `Title` field
- GitHub provider decodes the `title` from the API response
- `gitsync` and `coderd/chats.go` pass title + draft through to the DB
upsert
- `codersdk.ChatDiffStatus` exposes both new fields in the API response
**Frontend** (`AgentsSidebar.tsx`): New `getPRIconConfig()` function
resolves the appropriate Lucide git icon based on `pull_request_state`
and `pull_request_draft`. Only applies when the chat is in a terminal
state (waiting/completed).
**Real-time sync**: No changes needed — the existing
`diff_status_change` pubsub event already propagates the full
`ChatDiffStatus` including the new fields.
Replace the standalone `?archived=` query parameter on the chats listing
endpoint with a `?q=` search parameter, consistent with how workspaces,
tasks, templates, and other list endpoints work.
The `q` parameter uses the standard `key:value` search syntax parsed by
the `searchquery` package. Currently supports:
- `archived:true/false` (default: `false`, hides archived chats)
When `q` is empty or omits the archived filter, archived chats are
excluded by default. This is a behavioral change — the previous API
returned all chats (including archived) when no filter was specified.
### Changes
**Backend:**
- Add `searchquery.Chats()` parser following the same pattern as
`Tasks()`, `Workspaces()`, etc.
- Update `listChats` handler to read `q` instead of `archived`
- Update `codersdk.ListChatsOptions` to use `Q string` instead of
`Archived *bool`
**Frontend:**
- Update `getChats` API method to accept `q` parameter
- Update `infiniteChats` query to pass `q` instead of `archived`
**Tests:**
- Add `TestSearchChats` unit tests for the parser
- Update existing archive/unarchive integration tests to use `Q:
"archived:true"` syntax
Adds a `created_by` column (nullable UUID) to the `chat_messages` table
to track which user created each message. Only user-sent messages
populate this field; assistant, tool, system, and summary messages leave
it null.
The column is threaded through the full stack: SQL migration, query
updates, generated Go/TypeScript types, db2sdk conversion, chatd
(including subagent paths), and API handlers. All API handlers that
insert user messages now pass the authenticated user's ID as
`created_by`.
No foreign key constraint was added, matching the existing pattern used
by `chat_model_configs.created_by`.