Commit Graph

85 Commits

Author SHA1 Message Date
Cian Johnston 0c27224fc2 fix(coderd): pass title API key context (#25723)
Fixes CODAGT-503

- Add failing-first coverage for manual title generation with missing
message `api_key_id`, with both context fallback and fail-closed cases.
- Set `aibridge.WithDelegatedAPIKeyID(ctx, apiKey.ID)` in
`regenerateChatTitle` and `proposeChatTitle`.
- In `generateManualTitleCandidate`, fall back to
`aibridge.DelegatedAPIKeyIDFromContext(ctx)` only when
`modelBuildOptionsFromMessages` yields an empty `ActiveAPIKeyID`.
- Keep `modelBuildOptionsFromMessages` pure and leave automatic title
generation unchanged.
2026-05-27 13:20:36 +01:00
Michael Suchacz de6d62815e fix(coderd): avoid redundant workspace setup (#25615)
GPT-class chat turns could eagerly create workspaces or repeat setup
such as cloning an existing repo because the system prompt framed setup
work as the default path.

This updates chatd prompt guidance and the `create_workspace` tool
description so agents reuse existing chat and workspace context, treat
injected workspace context as already read, avoid recloning present
repositories, and create or start workspaces only when workspace-backed
work is required. Delegated chats now report workspace needs to the
parent instead of trying to create one.

> Mux opened this PR on behalf of Mike.
2026-05-22 14:08:07 +00:00
Cian Johnston e5293c81f9 fix(coderd): fix flaky TestSendMessageWithModelOverrideUpdatesLastModelConfigID (#25603)
Fixes: ENG-2719

Fixes the flake in
`TestSendMessageWithModelOverrideUpdatesLastModelConfigID` (and the same
pattern in `TestSubsequentSendWithoutOverrideUsesPersistedModel`).


> Generated with [Coder Agents](https://coder.com/agents)
2026-05-22 12:40:45 +01:00
Michael Suchacz ca1f6b19a2 feat: remove legacy chat provider tables (#25416) 2026-05-22 09:50:01 +02:00
Michael Suchacz 06526a5822 feat: use AI provider chat APIs (#25415) 2026-05-22 07:53:23 +02:00
Cian Johnston b7525a9b40 feat: add search and filter support to chats endpoint (#25391)
Fixes https://linear.app/codercom/issue/CODAGT-432

Adds structured search/filter capabilities to the `GET
/api/experimental/chats/` endpoint via the `q` query parameter. All
filters use explicit `key:value` syntax; bare terms are rejected to
reserve them for potential future full-text search.

> Generated by Coder Agents

Co-authored-by: Danielle Maywood <danielle@themaywoods.com>
Co-authored-by: Jaayden Halko <jaayden.halko@gmail.com>
2026-05-21 10:18:55 +01:00
Cian Johnston ce7f41f56d fix: bump MaxChatFileIDs from 20 to 50 (#25492)
Fixes CODAGT-456
2026-05-19 16:53:30 +01:00
Danielle Maywood 170a6e1fe9 feat: add chat sharing foundation (#25041) 2026-05-18 22:32:05 +01:00
Kyle Carberry 9f99a7bc0b fix(coderd): stabilize TestPatchChatMessage/ChangesModel flaky test (#25306)
Fixes coder/internal#1535

## Problem

`TestPatchChatMessage/ChangesModel` is flaky because it races with the
chat daemon's background processing.

`CreateChat` sets the chat to `pending` and the daemon picks it up
asynchronously. The test immediately calls `EditChatMessage` (which
changes the model to an override) while the first processing round is
still running. The `InsertChatMessages` SQL CTE unconditionally updates
`chats.last_model_config_id` to the model of the last inserted message.
When the daemon's in-flight message insertions commit after the edit
transaction, they overwrite `last_model_config_id` back to the default
model.

Similarly, after the edit sets the chat back to `pending`, the daemon
re-processes it. The test's `GetChat` call could race with this second
round.

## Fix

Poll for the chat to reach `waiting` (or `error`) status:
1. **Before editing**: wait for the initial processing round to complete
2. **After editing**: wait for the second processing round (triggered by
the edit) to complete

Then assert `last_model_config_id`, which is now stable.

> Generated with [Coder Agents](https://coder.com/agents) by @kylecarbs
2026-05-15 09:33:54 -04:00
Ethan a59b951565 test: skip stale notification chatd flakes (#25376)
These chatd tests are flaking for the same stale control-notification
race tracked by CODAGT-353, so this change skips the newly reflaking
advisor-chain and `TestPatchChatMessage/ChangesModel` tests and rewrites
the older `TODO(hugodutka)` skips to point at the same root cause. This
keeps the known flakes documented consistently until the chatd
notification-flow refactor lands.

Closes CODAGT-427
Closes https://github.com/coder/internal/issues/1510
2026-05-15 17:36:48 +10:00
Michael Suchacz cb37047dce feat: dedicated /prompts endpoint for chat history cycle (#25083)
Follow-up to #25004. The merged change cycles only through messages
already loaded in the in-memory chat store (page size 50). Long chats
and chats whose oldest turns have rolled out of the page lose access to
their earlier prompts in the composer's up/down arrow cycle. This PR
adds a dedicated server endpoint that returns the full prompt history,
newest first, and rewires the composer to use it.

## What changed

### Endpoint

`GET /api/experimental/chats/{chat}/prompts?limit=N`

```go
type ChatPrompt struct { ID int64; Text string }
type ChatPromptsResponse struct { Prompts []ChatPrompt }
```

- `limit`: `0..2000`. `0` (the default) is treated as the server-side
default of 500; out-of-range values return `400`. Negative values are
rejected by the SDK's `PositiveInt32` parser before reaching the
handler.
- Auth: parent-chat read in `dbauthz`, mirroring
`GetChatMessagesByChatID`.
- The SQL filters `role='user'`, `deleted=false`, `visibility IN
('user','both')`, guards the lateral with `jsonb_typeof(content) =
'array'` so legacy V0 scalar-string rows are silently skipped, then
unrolls `content` JSONB with `WITH ORDINALITY` and concatenates only
`type='text'` parts in original order via `string_agg(... ORDER BY
ordinality)`. Messages whose joined text is whitespace-only are dropped
via `HAVING ... ~ '\S'` so cycling never lands on a blank entry.

### Partial index (migration `000494`)

```sql
CREATE INDEX idx_chat_messages_user_prompts
ON chat_messages (chat_id, id DESC)
WHERE deleted = false
  AND role = 'user'
  AND visibility IN ('user', 'both');
```

The partial WHERE matches the query's filter exactly and the key order
matches `ORDER BY id DESC`, so the planner gets both the filter and the
ordering from the index without a sort step.

`EXPLAIN ANALYZE` on a synthetic 51-chat × 5,000-message dataset (≈260k
rows, 10k user prompts in the target chat, `random_page_cost=1.1`):

| | Plan | Buffers hit | Time |
|---|---|---|---|
| Without index | `Index Scan Backward using chat_messages_pkey`,
**250,848 rows removed by filter** | 6,683 | 32.4 ms |
| With index | `Index Scan using idx_chat_messages_user_prompts`, no
filter | 38 | 1.3 ms |

≈25× faster, 175× fewer buffer hits.

### Frontend

- `chatPromptsKey` / `chatPromptsQuery` factories in
`site/src/api/queries/chats.ts` (`staleTime: 30s`, `enabled: chatId !==
""`, asks the server for 500 prompts).
- `ChatPageContent.tsx` replaces the in-memory derivation with
`useQuery(chatPromptsQuery(chatId ?? ""))`. The composer's existing
`cycleHistorySnapshotRef` anchors the in-flight cycle so a refetch
arriving mid-cycle cannot shift the indexed prompt out from under the
user.
- `getEditableUserMessagePayload` now concatenates user-message text
parts verbatim, mirroring the server's `string_agg(part->>'text', ''
ORDER BY ordinality)`, instead of routing through the streaming-oriented
`parseMessageContent` / `appendText` pipeline (which drops
whitespace-only chunks — correct for assistant streams, wrong for a
user's persisted message). This keeps the cycle and the edit path in
agreement on the same message. File blocks are still pulled separately
via
`parseMessageContent(...).blocks.filter(isEditableUserMessageFileBlock)`.
- Cache invalidation in `createChatMessage.onSuccess`,
`editChatMessage.onSettled`, and `useChatStore.upsertCacheMessages`
(only when an upserted message has `role === "user"`).
- Page-level stories pre-seed `chatPromptsKey(CHAT_ID)` from the same
`messagesData` to keep them offline.

## Tests

- New `TestGetChatUserPrompts` in `coderd/exp_chats_test.go` with five
subtests:
- `NewestFirstFiltering` — multi-part concatenation, non-text parts
skipped, whitespace-only filtered, soft-deleted excluded, `model`-only
visibility excluded, assistant-role excluded by `cm.role = 'user'`,
legacy V0 scalar row silently excluded by the `jsonb_typeof` guard,
ordering newest first.
- `LimitClampsResults` — explicit `limit=2` returns the two newest
prompts.
  - `InvalidLimitRejected` — `limit=5000` is `400 Bad Request`.
- `NotFoundForOtherUsers` — a separate user in the same org gets `404`,
not the prompts.
- `EmptyResultIsJSONArray` — zero-message chat and assistant-only chat
both return `Prompts: []` (non-nil, empty).
- New unit test in `messageParsing.test.ts` asserting that
`getEditableUserMessagePayload(["hello", " ", "world"])` returns `"hello
world"`, locking in the agreement with the SQL `string_agg`.
- `dbauthz_test.go` adds the
`MethodTestSuite.TestChats/GetChatUserPromptsByChatID` entry, asserting
parent-chat `policy.ActionRead`.
- `pnpm test src/pages/AgentsPage` — 1159 passed, 2 skipped.
- `make gen` produces no diff.

## Manual verification

Seeded a dev chat with Claude Sonnet 4.6 via the aibridge Anthropic
provider and posted 20 user prompts end-to-end. Verified that the
`/prompts` endpoint returns 20 rows newest-first, that `limit=10` clamps
correctly, that `limit=0` uses the server default of 500, and that the
up/down keyboard cycle in the composer walks the same sequence (and
reverses correctly back to the empty draft).

## Out of scope

- Cross-chat history.
- Per-user opt-out for the cycle.
- File-reference / attachment cycling — the cycle continues to reproduce
plain text only, by design.

<details>
<summary>Implementation plan</summary>

# CODAGT-319 Follow-up — Dedicated `/prompts` endpoint

## Context

The merged feature ([#25004](https://github.com/coder/coder/pull/25004)
/ [d32842f](https://github.com/coder/coder/commit/d32842f)) cycles only
through messages already loaded in the in-memory chat store, which is
capped at the first 50 messages of the current page. Long chats and
chats whose oldest turns have rolled out of the page can no longer
recall their full prompt history. This follow-up exposes a dedicated
server endpoint that returns the user-authored prompts in a chat, newest
first, and rewires the composer to use it.

## Design

### Endpoint

`GET /api/experimental/chats/{chat}/prompts?limit=N`

Returns:

```go
type ChatPrompt struct {
    ID   int64
    Text string
}
type ChatPromptsResponse struct {
    Prompts []ChatPrompt
}
```

- `limit`: `0..2000`. `0` (the default) → server-side default of 500.
The wire-level default is encoded in SQL as `COALESCE(NULLIF($limit, 0),
500)`. Negatives are rejected upstream by `PositiveInt32`; the handler
only caps the upper bound.
- Auth: parent-chat read in `dbauthz`, mirroring
`GetChatMessagesByChatID`.
- Listed under the experimental router so we can iterate without API
guarantees.

### SQL

The query lives in `coderd/database/queries/chats.sql` as
`GetChatUserPromptsByChatID`:

- Filters `role='user'`, `deleted=false`, `visibility IN
('user','both')` to mirror the composer's "what the user actually typed
and can re-send" contract.
- Guards the lateral with `jsonb_typeof(content) = 'array'` so legacy V0
rows whose content is a scalar JSON string (predates migration `000434`)
are silently excluded instead of raising `"cannot extract elements from
a scalar"`.
- Unrolls `content` JSONB with `jsonb_array_elements WITH ORDINALITY`
and concatenates only `type='text'` parts, preserving original order via
`string_agg(... ORDER BY ordinality)`.
- Casts the result to `text` so sqlc emits a `string` field instead of
`[]byte`.
- Drops whitespace-only prompts via `HAVING string_agg(...) ~ '\S'` so
cycling never lands on a blank entry.
- Orders by `cm.id DESC` (`id` is a sequence, so this is "newest first"
without relying on `created_at`).

### Index

New partial index added in migration `000494`:

```sql
CREATE INDEX idx_chat_messages_user_prompts
ON chat_messages (chat_id, id DESC)
WHERE deleted = false
  AND role = 'user'
  AND visibility IN ('user', 'both');
```

The partial WHERE clause matches the query's filter exactly, so the
planner can use the index for both filtering and ordering without a sort
step.

### Frontend

- `chatPromptsKey(chatId)` and `chatPromptsQuery(chatId)` factories in
`site/src/api/queries/chats.ts`. `staleTime: 30s`, `enabled: chatId !==
""`. Asks the server for 500 prompts (well below the 2000 max, plenty
for the cycle).
- `ChatPageContent.tsx` replaces the in-memory derivation with
`useQuery(chatPromptsQuery(chatId ?? ""))`. The composer's
`cycleHistorySnapshotRef` already takes a stable snapshot at cycle
entry, so a refetch arriving mid-cycle cannot shift the indexed prompt
out from under the user.
- `getEditableUserMessagePayload` extracts the edit-path text from raw
user-message parts (filter `type === "text"`, join verbatim) instead of
going through `parseMessageContent` / `appendText`, which is built for
assistant streams and intentionally drops whitespace-only chunks.
Without this, cycling and clicking Edit on the same message could
produce different draft text for messages with whitespace-only
interleaved text parts.
- Cache invalidation: `createChatMessage.onSuccess`,
`editChatMessage.onSettled`, and `useChatStore.upsertCacheMessages`
(when at least one upserted message has `role === "user"`) all
invalidate `chatPromptsKey(chatId)`.

### Tests

- `TestGetChatUserPrompts` (`coderd/exp_chats_test.go`) covers:
- `NewestFirstFiltering` — multi-part concatenation, non-text parts
skipped, whitespace-only filtered, soft-deleted excluded, `model`-only
visibility excluded, assistant-role excluded by `cm.role = 'user'`,
legacy V0 scalar row silently excluded by the `jsonb_typeof` guard,
ordering newest first.
- `LimitClampsResults` — explicit `limit=2` returns the two newest
prompts.
  - `InvalidLimitRejected` — `limit=5000` is `400 Bad Request`.
- `NotFoundForOtherUsers` — a separate user in the same org gets `404`,
not the prompts.
- `EmptyResultIsJSONArray` — zero-message chat and assistant-only chat
both return `Prompts: []` (non-nil, empty).
- `messageParsing.test.ts` adds a unit test asserting that
`getEditableUserMessagePayload(["hello", " ", "world"])` returns `"hello
world"`, locking in the agreement with the SQL `string_agg`.
- `dbauthz_test.go` adds the
`MethodTestSuite.TestChats/GetChatUserPromptsByChatID` entry, asserting
the parent-chat `policy.ActionRead`.

## Out of scope

- Cross-chat history.
- Per-user opt-out for the cycle.
- File-reference / attachment cycling — the cycle still reproduces plain
text only, by design.

</details>

<details>
<summary>coder-agents-review history</summary>

Four review rounds, eight unique findings, all addressed in this PR
(approved twice). Rebased onto `main` twice after R4: first to pick up
new migrations `000491` / `000492`, then again for
`000493_idx_chat_diff_statuses_url_lower`. The prompts-index migration
was renumbered `000491 → 000493 → 000494` via
`coderd/database/migrations/fix_migration_numbers.sh`; no other diff
changes.

| Round | Head | Outcome |
|---|---|---|
| R1 | `725422ab` | `COMMENTED` — 7 findings (DEREM-1..7) |
| R2 | `ab2a8936` | `COMMENTED` — 1 new (DEREM-10) + 1 reraised
(DEREM-5) |
| R3 | `648c5d1f` | **`APPROVED`** — 7 fixed, DEREM-5 deferred via
#25125 |
| R4 | `93b6f450` | **`APPROVED`** — DEREM-5 also fixed in-PR, #25125
closed |

| ID | Where | Resolution |
|---|---|---|
| DEREM-1 | `chats.sql` | Added `jsonb_typeof(content) = 'array'` guard
against V0 scalar rows |
| DEREM-2 | `exp_chats.go` | Removed dead `limit < 0` branch (SDK
rejects upstream) |
| DEREM-3 | `useChatStore.ts` | Rewrote misleading invalidation comment
|
| DEREM-4 | `exp_chats_test.go` | `NewestFirstFiltering` now inserts an
assistant-role message so the `role='user'` filter is exercised
end-to-end |
| DEREM-5 | `messageParsing.ts` | Rewrote
`getEditableUserMessagePayload` to concatenate text parts verbatim,
mirroring the SQL `string_agg` |
| DEREM-6 | `exp_chats.go` | Tightened swagger doc + error message to
spell out the 0–2000 range |
| DEREM-7 | `exp_chats_test.go` | Added `EmptyResultIsJSONArray` subtest
|
| DEREM-10 | `exp_chats_test.go` | `NewestFirstFiltering` now inserts a
raw V0 scalar-content row; verified locally that removing the guard
makes the test fail |

</details>

---

This PR was created on behalf of @ibetitsmike by Coder Agents.
2026-05-14 12:43:12 +02:00
Kyle Carberry 5040ab6fca feat: filter chats by diff URL via the q search parameter (#24970)
Adds a `diff_url:` term to the `q` search parameter on `GET
/api/experimental/chats` so callers can look up the chat associated with
a particular pull request, merge request, or any other URL persisted on
the chat's diff status.

```
q=diff_url:"https://github.com/coder/coder/pull/123"
```

Match is case-insensitive. When the URL lives on a delegated sub-agent's
diff status, the parent chat is returned so the relationship surfaces
from a single lookup.

<details>
<summary>Design notes</summary>

- **Forge-agnostic.** Reuses the existing `chat_diff_statuses.url`
column rather than introducing a `pr:` vocabulary, since the SDK already
documents the URL as "may point to a pull request or a branch page
depending on whether a PR has been opened." Works for GitHub PRs, GitLab
MRs, branch pages, etc.
- **Composes with `archived:`.** The two terms can be combined:
`q=archived:true diff_url:"..."`.
- **Case handling.** The parser used to lowercase the entire `q` string
up front, which would mangle URL path segments. Switched to lowercasing
only the field key inside `searchTerms` (already happens there) and
keeping the value as the caller typed it. The SQL comparison lowercases
on both sides.
- **Validation.** `diff_url` must be a syntactically valid HTTP(S) URL
with a non-empty host. No forge-specific validation.
- **Index.** Adds `idx_chat_diff_statuses_url_lower` on `LOWER(url)` so
the lookup is cheap even on large datasets.
- **Sub-agent fan-in.** `EXISTS` clause matches when the URL lives on
the chat itself or any chat with `root_chat_id` equal to the chat's id,
so a delegated sub-agent's PR pulls in its parent.
- **Deferred.** Sentinels like `pr:any` / `pr:none` and a forge-agnostic
state filter (`diff_state:open|merged|closed`) were intentionally left
out of this change. They couple cleanly to a second forge or a clearer
product call, and shipping them now would lock in vocabulary we may want
to revisit.

</details>

## Tests

- `coderd/searchquery`: parser tests for valid URLs, case handling (key
insensitive, value preserved), composition with `archived:`, and
validation errors (non-HTTP scheme, missing host, malformed URL).
- `coderd/exp_chats_test.go`: end-to-end coverage hitting `ListChats`.
Verifies a root chat matches its own URL, a parent chat surfaces when
only a sub-agent has the URL, lookups are case-insensitive, non-matching
URLs return empty, and invalid URLs return `400`.

---

_This PR was authored by a Coder Agent on behalf of @kylecarbs._
2026-05-13 11:06:42 -04:00
Ethan fabf7d31fc test: use default provider in TestPatchChatMessage/ChangesModel (#25189)
`TestPatchChatMessage/ChangesModel` hardcoded `"openai"` as the provider
for the override model config. After #25171, the shared chat test
harness registers a single `"openai-compat"` provider by default, so
calling `createAdditionalChatModelConfig(..., "openai", ...)` fails with
HTTP 400 `Chat provider is not configured` before the test can exercise
the model-change path. The subtest was added in #25084 after #25171 was
reviewed, so the harness change and the new hardcoded provider only met
on `main`.

Use `defaultModel.Provider` so the override always matches whatever
provider the harness registered. This mirrors every other call site of
`createAdditionalChatModelConfig` in the file.

Closes https://github.com/coder/internal/issues/1530
2026-05-12 14:05:08 +00:00
Michael Suchacz f1d160c7f4 fix: allow changing model when editing earlier chat message (#25084)
Editing a previous user message and selecting a different model in the
picker silently kept using the original model: the selection was dropped
on the frontend, in the SDK, and in the backend, so both the replacement
user message and the assistant turn that followed ran against the old
model.

Plumb the selected model through all three layers (`AgentChatPage`,
`codersdk.EditChatMessageRequest`, `chatd.EditMessageOptions` /
`Server.EditMessage`), defaulting to the original message's model when
the client does not specify one. The existing `InsertChatMessages` CTE
already advances `chats.last_model_config_id` when the inserted
message's model differs, so the assistant turn picks up the new
selection without further changes. The new model is validated inside the
transaction, so an unknown ID rolls the edit back and returns a 400
`Invalid model config ID.`, mirroring the `SendMessage` path.

Refs: CODAGT-345

This change was generated by a Coder agent.

<details>
<summary>Implementation plan</summary>

# CODAGT-345: Editing an earlier message cannot change model

## Problem

When editing a previous user message in a chat, the user can change the
model in the model picker, but the backend keeps using the original
message's model. The model selection is dropped at three layers:

1. **Frontend:** `AgentChatPage.tsx`'s edit branch builds an
`EditChatMessageRequest` that omits `model_config_id`. The new-message
branch (a few lines below) does include it.
2. **SDK:** `codersdk.EditChatMessageRequest` has no `ModelConfigID`
field at all.
3. **Backend:** `chatd.EditMessageOptions` has no model field, and
`Server.EditMessage` always copies the original message's
`ModelConfigID` into the replacement message.

Once the replacement user message is inserted with the original model,
the `InsertChatMessages` CTE leaves `chats.last_model_config_id`
unchanged, so the assistant turn that follows runs against the old
model.

## Fix

Plumb the selected model through all three layers, defaulting to the
original message's model when the client doesn't override it. This
mirrors the `SendMessage` path, which already accepts a
`model_config_id` and validates it via
`resolveSendMessageModelConfigID`.

### Backend

- `codersdk/chats.go`: add `ModelConfigID *uuid.UUID` to
`EditChatMessageRequest`.
- `coderd/x/chatd/chatd.go`:
  - Add `ModelConfigID uuid.UUID` to `EditMessageOptions`.
- In `EditMessage`, after fetching the edited message, resolve the
model: if `opts.ModelConfigID != uuid.Nil`, validate it exists with
`tx.GetChatModelConfigByID` (using `chatdModelConfigLookupContext`),
otherwise keep `editedMsg.ModelConfigID.UUID`. Pass the resolved ID into
`newChatMessage(...)`.
  - Reuse the existing `ErrInvalidModelConfigID` sentinel.
- `coderd/exp_chats.go` (`patchChatMessage`):
- Read `req.ModelConfigID` (nil-safe), pass into
`chatd.EditMessageOptions`.
- Add a `case xerrors.Is(editErr, chatd.ErrInvalidModelConfigID)` arm
returning 400 `Invalid model config ID.`, matching the
`postChatMessages` handler.

### Frontend

- `site/src/pages/AgentsPage/AgentChatPage.tsx`:
- In the edit branch, set `model_config_id: effectiveSelectedModel ||
undefined` on the `EditChatMessageRequest`.
- On success, persist the chosen model to `lastModelConfigIDStorageKey`
so the next chat from this browser keeps the same default. Mirrors the
new-message branch.

### Generated

- `make site/src/api/typesGenerated.ts` and `make
coderd/apidoc/swagger.json` produce the updated `EditChatMessageRequest`
schema in `typesGenerated.ts`, `coderd/apidoc/{docs.go,swagger.json}`,
and `docs/reference/api/{chats.md,schemas.md}`.

## Tests

- `coderd/x/chatd/chatd_test.go`:
- `TestEditMessageWithModelConfigOverride`: edit with a different model
-> replacement message and `chats.LastModelConfigID` use the new model.
- `TestEditMessagePreservesModelConfigByDefault`: edit without
`ModelConfigID` -> original model preserved.
- `TestEditMessageRejectsUnknownModelConfig`: passes a random UUID ->
`ErrInvalidModelConfigID`, original message still present,
`LastModelConfigID` unchanged (rollback).
- `coderd/exp_chats_test.go` (under `TestPatchChatMessage`):
- `ChangesModel`: end-to-end via SDK; `edited.Message.ModelConfigID` and
`chat.LastModelConfigID` both match the new model.
- `InvalidModelConfigID`: random UUID -> 400 `Invalid model config ID.`.

</details>
2026-05-12 14:51:55 +02:00
Ethan 4e08543ace test(coderd): centralize chat test harness and stabilize flakes (#25171)
Chat tests previously constructed a real `openai` provider with a fake
API key and no `BaseURL`, so background title generation hit
`api.openai.com` and timed out under `-race`. The same root cause
produced several distinct flakes: title regeneration races with
synchronous `UpdateChat`/`ProposeChatTitle`, and pagination races
against `updated_at` bumps from real-network processing.

This moves the fake OpenAI-compatible provider and the chat-settle wait
into first-class `coderdtest` capabilities.
`coderd.Options.ChatProviderAPIKeys` is the new seam tests use to
redirect chat traffic to a local `httptest.Server`.
`coderdtest.WaitForChatSettled` replaces per-test waiters and drains
tracked chat-daemon work after the chat row leaves `pending`/`running`.
The `newChatClient*` constructors funnel through one options builder
that installs the fake provider before the coderd test server so cleanup
ordering is deterministic.

Closes https://github.com/coder/internal/issues/1528 & Closes ENG-2659
Closes https://github.com/coder/internal/issues/1480 & Closes CODAGT-359
Closes https://github.com/coder/internal/issues/1507 & Closes CODAGT-368
Relates to https://github.com/coder/internal/issues/1397 & Relates to
CODAGT-374
2026-05-12 22:13:55 +10:00
Kyle Carberry 07ff3b3f90 fix(coderd/exp_chats_test.go): stabilize TestListChats/Pagination by inserting chats directly (#25137) 2026-05-12 00:26:22 -04:00
Ethan bd6cc1aaf2 feat(coderd): add stop_workspace chatd tool and recovery classification (#24997)
## Summary

Adds a `stop_workspace` tool to chatd so the model can recover from the
"workspace running but agent dead" failure mode (e.g. an OOM that leaves
the workspace running but the agent unreachable) by stopping and then
starting the workspace.

<img width="924" height="742" alt="image"
src="https://github.com/user-attachments/assets/279dedb6-6e29-4fe1-8754-3a1f01e538bf"
/>



## What changed

**New `stop_workspace` chatd tool**
(`coderd/x/chatd/chattool/stopworkspace.go`). Mirrors `start_workspace`:
shares `WorkspaceMu` to serialize with create/start, waits for any
in-progress build before issuing a stop, and is idempotent only after a
successful Stop transition. Failed stop builds re-attempt rather than
reporting success.

**New `chatStopWorkspace` coderd hook** (`coderd/exp_chats.go`). Mirrors
`chatStartWorkspace` minus the `RequireActiveVersion` gate. Stop should
not be blocked by template version policy.

**Differentiated recovery sentinels** (`coderd/x/chatd/chatd.go`).
`errChatAgentDisconnected` instructs the model to call `stop_workspace`
then `start_workspace`. `errChatDialTimeout` instructs a single retry,
then user escalation if it repeats. The previous single message
conflated transient and persistent failures.

**Two-signal recovery gate.** Recovery is only surfaced when a tool call
times out *and* a fresh DB read of the latest workspace agent says
`Disconnected`. The previous draft escalated on the DB read alone, which
would fire on a 30-second heartbeat blip (e.g. agent respawn) and prompt
a destructive stop/start unnecessarily.

**Cache-hit disconnected handling** now clears the cache and retries a
fresh dial before escalating, rather than returning the recovery
sentinel immediately. Latest-agent classification uses
`GetWorkspaceAgentsInLatestBuildByWorkspaceID` instead of the chat's
bound `AgentID`, so stale bindings after a rebuild don't misclassify.

**Shared chattool helpers** in `coderd/x/chatd/chattool/chattool.go`:
`latestWorkspaceBuildAndJob`, `publishBuildBinding`,
`provisionerJobTerminal`. Applied to both `start_workspace` and
`stop_workspace`.

## Notes

- Reverts an earlier draft that widened `ask_user_question` to root
standard turns. Plan-mode-only behavior is restored.
- The `stop_workspace` tool currently renders via the generic chat
tool-call UI. A follow-up frontend PR will prettify the `stop_workspace`
tool and style it like the `start_workspace` tool.
- Never-connected (`Timeout` status) agents are intentionally excluded
from recovery. They indicate template or startup failure, not the
running-but-dead case this PR targets.

Closes CODAGT-315
2026-05-11 16:23:07 +10:00
Ethan de9cdca77e fix(coderd): handle external-agent workspaces honestly in chat (#24969)
## Summary

Make Coder's chat agent honest about workspaces that use
`coder_external_agent`. Three behaviors change so the chat stops
pretending it can drive an external workspace through to a usable state
on its own.

<img width="859" height="537" alt="image"
src="https://github.com/user-attachments/assets/0561442b-95f1-4a2d-853c-7e3776114680"
/>


## Problem

External agents are not started by Coder. The user has to run `coder
agent` on their own host with a token Coder generates. Before this
change, the chat agent treated those workspaces like any other:

- `create_workspace` would enqueue a build for an external-agent
template and then wait minutes (~22 worst case) for an agent that was
never going to come up.
- When mid-turn tool calls dialed an external agent that was not
connected, the chat burned the full 30-second dial timeout and returned
generic "the workspace may need to be restarted from the Coder
dashboard" guidance, which is not the action the user can take.
- Nothing told the chat (or the user, through the chat) that the next
action lives outside Coder.

## Fix

Three changes scoped to `coderd/x/chatd/`:

1. **`create_workspace` blocks templates with external agents.** The
tool reads `template_versions.has_external_agent` for the template's
active version and refuses external-agent templates with a message
instructing the chat to pick a different template, or to have the user
create and start the workspace themselves and then attach it.

2. **Attaching an existing external workspace stays open.** No
selection-time gate on attachment; users can still bind a working
external workspace to a chat.

3. **External-agent-aware error handling on connection.** Two
complementary changes both predicated on proven connectivity failures
rather than every dial error:

- **`getWorkspaceConn` preflight and timeout handling.** Before opening
a connection, the cache-miss path reads the agent's status from the
already-loaded row. If the selected agent is external and clearly
offline according to the existing `isAgentUnreachable` helper
(`Disconnected` or `Timeout`, never `Connecting`), it returns an
external-agent-specific error immediately instead of waiting out the
30-second dial timeout. `Connecting` external agents fall through to the
dial so a user who just started the agent on their host can still
succeed in the same turn. The preflight only fires when the agent is
still the latest selected agent for the workspace, so stale-binding
recovery via `dialWithLazyValidation` is unaffected. The post-dial
rewrite is limited to the dial timeout sentinel; stale/no-agent bindings
and non-timeout dial failures preserve their original errors.

- **`waitForAgentReady` timeout-branch rewrite.** The 2-minute retry
loop used by `create_workspace` and `start_workspace` runs unchanged for
all agents. When the loop's outer deadline elapses, the timeout branch
substitutes the external-agent message in place of the raw dial error if
the agent belongs to an external resource.

This applies the same pattern that the cache-hit path of
`getWorkspaceConn` already used (`isAgentUnreachable` returning
`errChatAgentDisconnected`), extended to the cache-miss path and to the
readiness helper, with the external-agent-aware error rewrite layered
only on confirmed offline or timeout paths.

Closes CODAGT-314
2026-05-08 13:51:13 +10:00
Thomas Kosiewski 273e828442 fix: remove advisor reasoning configuration (#25030) 2026-05-07 15:19:19 +02:00
Mathias Fredriksson 6b0518d051 fix: state-aware queued message promotion (#24819)
PromoteQueued now branches on chat status: synth tool results before
the user message on requires_action, deferred reorder + Waiting on
running so the worker's persist+auto-promote keeps partial output.
Stale heartbeat falls through to the synchronous path; GetStaleChats
picks up Waiting+queue to recover post-cleanup-crash. Endpoint
returns 202.

Closes CODAGT-119
2026-05-06 19:11:56 +03:00
Jakub Domeracki 2949028dcb fix(coderd): enforce chat owner check on processing handlers (#24921) 2026-05-06 10:25:12 +02:00
Michael Suchacz 2874d4b4cd feat: add chat debug retention purge (#24943)
> Mux is acting on Mike's behalf.

Adds configurable retention for chat debug data, including the purge
query, updated_at index, site config, experimental API, SDK types,
frontend lifecycle setting, and docs.

The purge deletes debug runs older than the configured retention window
and relies on existing cascades to delete steps. The default retention
is 30 days, and setting the value to 0 disables the purge.
2026-05-05 22:37:13 +02:00
Ethan 4751416b29 fix!: persist structured chat errors (#24919)
**Breaking change for changelog:**

> `codersdk.Chat.last_error` now returns a structured `ChatError` object
(`{message, kind, provider, retryable, status_code, detail}`) instead of
a plain string. The chats API is experimental
(`/api/experimental/chats`), so this ships without a deprecation cycle;
consumers reading `chat.last_error` as a string must update to read
`chat.last_error.message`. SDK/generated TypeScript terminal error
payloads now use the single `ChatError` type; the live stream error
payload type is renamed from `ChatStreamError` to `ChatError`.

Persisted chat errors now carry the same provider-specific detail (kind,
provider, retryable, HTTP status, optional detail) as the live stream,
so refreshing a failed chat rehydrates with the full structured error
instead of a one-line headline.

Existing rows are migrated in place: legacy text errors are wrapped into
`{message, kind: "generic"}` so already-errored chats still render, and
rows with `last_error IS NULL` stay NULL. Internally, persisted fallback
decoding now reuses the existing `chaterror.KindGeneric` constant, with
no JSON value change.

Closes CODAGT-239
2026-05-05 12:56:06 +10:00
Ethan 7e01edeb8e fix: align chat attachment picker with allowed file types (#24917)
The agent chat composer only advertised image uploads to the OS file
picker and filtered drag-and-drop and paste events to `image/*`, even
though the backend accepts text, CSV, JSON, PDF, and a narrower set of
image types.

Move the allowed chat attachment media types into `codersdk` so the
frontend picker and backend enforcement share one source of truth. Use
the generated TypeScript list to drive the file input `accept` attribute
and the drag-and-drop and paste filters, while adding common text
extensions so platforms without MIME registrations still surface those
files in the picker.
2026-05-05 12:25:13 +10:00
Michael Suchacz 632dcdb63a feat: add personal chat model overrides (#24715) 2026-05-05 00:57:51 +02:00
Michael Suchacz 0bb09935bc feat: add computer-use provider selection for AI agents (#24772)
Adds a deployment-wide setting to select the computer-use provider
(Anthropic or OpenAI) for AI agents, plus the OpenAI computer-use runner
needed to honor that selection.

The setting is stored in `site_configs` under
`agents_computer_use_provider`, defaults to Anthropic when unset, and is
exposed via experimental GET/PUT endpoints under
`/api/experimental/chats/config/computer-use-provider`. The chatd
computer-use tool now dispatches to either `runAnthropicComputerUse` or
`runOpenAIComputerUse` based on the resolved provider, with
provider-specific result metadata for OpenAI screenshots.

Frontend adds a provider dropdown to the Agents Experiments settings
page nested under the virtual desktop toggle, with disabled state
handling while virtual desktop is off and skeleton loaders while config
queries are in flight.

Hugo and Codex review follow-up:
- Uses shared provider validation and clearer computer-use constant
names.
- Removes stale OpenAI pending-safety-checks commentary.
- Documents why provider result metadata is needed for OpenAI
screenshots.
- Keeps the computer-use subagent visible when provider credentials are
missing, then returns a clear spawn-time configuration error.
- Uses OpenAI's recommended 1600x900 screenshot geometry to preserve the
native 16:9 aspect ratio.
- Moves OpenAI-specific computer-use helpers into
`coderd/x/chatd/chatopenai/computeruse` after rebasing onto the provider
package refactor in `main`.
- Converts OpenAI pixel scroll deltas to Coder desktop wheel-click
amounts.
- Preserves OpenAI pointer modifiers with key down/up desktop actions
and rejects unsupported non-left double-click buttons explicitly.
- Maps OpenAI back/forward side-button clicks to browser navigation key
actions.
- Defaults omitted OpenAI click buttons to left-click.
- Retries mouse release cleanup if the final OpenAI drag release fails.
- Keeps computer-use subagent availability messages stable when provider
config cannot be loaded, while logging the backend error.
- Releases remaining OpenAI modifier keys if a synthetic key-up cleanup
action fails.
- Updates Storybook interaction stories so provider snapshots show the
selected final provider.

> Mux updated this PR description on behalf of Mike.
2026-05-04 20:30:50 +02:00
Michael Suchacz 033ed0bb82 feat: add admin-configurable chat title generation model (#24838)
Adds an admin-configurable deployment-wide setting that controls which
model is used for chat title generation. Admins can pick any enabled
chat model config from the Agents settings page, or leave the setting
unset to keep the existing fast-models-then-chat-model fallback
algorithm.

When a model is selected, both automatic and manual title generation use
only that model, with no silent fallback. When the configured model is
disabled, missing credentials, or otherwise unusable, automatic title
generation skips entirely (best-effort) and manual title regeneration
returns a clear error, so admins notice the misconfiguration instead of
silently routing title traffic through another provider.

## Surface

- New deployment-wide setting stored as a `site_configs` row
(`agents_chat_title_generation_model_override`).
- New experimental endpoint `GET/PUT
/api/experimental/chats/config/model-override/{context}`.
- Frontend: title generation now appears as a third dropdown on the
Agents admin settings page alongside the existing general and explore
context overrides.

## DRY refactors folded in

Title generation is integrated as a third value of the existing
`ChatModelOverrideContext` type alongside `general` and `explore`,
sharing the parameterized HTTP route, SDK methods, generated types, and
frontend API plumbing rather than introducing a parallel surface. The
`Agent` prefix was dropped from the type and route since title
generation is not a delegated agent.

The chatd model-override resolver is also shared.
`resolveConfiguredModelOverride` now takes a `failureMode` parameter:

- Subagent overrides use soft failure: misconfigured overrides are
logged and the parent model is used.
- Title generation uses hard failure: misconfigured overrides return an
explicit error so manual title regeneration surfaces the
misconfiguration and automatic title generation skips instead of
silently falling back.

> Mux is acting on Mike's behalf.
2026-05-04 13:13:00 +02:00
Cian Johnston 2f855904be refactor: add dbgen chat generators and migrate test boilerplate (#24497)
- Adds chat-related dbgen generators covering defaults, overrides, and message field mapping.
- Replaces raw single-row chat, message, provider, and model-config setup in tests with dbgen helpers.
- Simplifies chat seed helpers after moving fixture setup into dbgen.

> Generated with [Coder Agents](https://coder.com/agents).
2026-05-01 13:29:33 +01:00
Cian Johnston 04cc983833 fix: add preset support to MCP tools (#24694)
The chat tools (`read_template`, `create_workspace`) did not surface or
respect template version presets. Presets were invisible to the LLM and
preset parameter defaults were never applied at workspace creation. The
`toolsdk` MCP surface had the same gap (ref #24695, now subsumed here).

## What this changes

- **`read_template`** returns presets with `id`, `name`, `default`,
`description`, `icon`, `parameters`, and `desired_prebuild_instances`
(when set), so the LLM can pick the right preset and prefer
prebuilt-backed ones.
- **`create_workspace`** accepts a `preset_id`. The wsbuilder applies
preset parameter defaults and may claim a prebuilt workspace.
- **`start_workspace`** does *not* accept a preset. Presets are a
creation-time choice; subsequent starts use the workspace's existing
version and parameters. Users who need a specific preset or version on
an existing chat can create the workspace out-of-band (CLI / UI / API)
with the desired configuration and attach the chat to it.
- **`toolsdk`** gains `GetTemplate` (with presets including
`desired_prebuild_instances`), preset support on `CreateWorkspace`, and
preset + `rich_parameters` support on `CreateWorkspaceBuild`. The
`template_version_preset_id` description warns about preset/version
affinity.


> 🤖 Generated with [Coder Agents](https://coder.com/agents) and reviewed by a human.

Co-authored-by: Max schwenk <maschwenk@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:57:52 +01:00
Dean Sheather e57525002c chore: remove agents experiment flag and mark feature as beta (#24432)
Remove the `ExperimentAgents` feature flag so the Agents feature is
always available without requiring `--experiments=agents`. The feature
is now in beta.

Existing deployments that still pass `--experiments=agents` will get a
harmless "ignoring unknown experiment" warning on startup.

### Changes

**Backend:**
- Remove `RequireExperimentWithDevBypass` middleware from chat and MCP
server routes
- Always include `AgentsAccessRole` in assignable site roles (later
refactored to org-scoped on main; rebase keeps that)
- Always set `AgentsTabVisible = true`, then drop the entire dead
`AgentsTabVisible` metadata pipeline (Go htmlState field,
populateHTMLState goroutine, HTML meta tag, useEmbeddedMetadata
registration, mock); no production consumer reads it. `AgentsNavItem`
already gates on `permissions.createChat`.
- Make `blob:` CSP `img-src` addition unconditional
- Remove `ExperimentAgents` constant, `DisplayName` case, and
`ExperimentsKnown` entry

**CLI:**
- Graduate the agents TUI from `coder exp agents` to `coder agents`
(moved from `AGPLExperimental()` to `CoreSubcommands()`)
- Drop the `agent` alias so it does not collide with the hidden
workspace-agent command
- Rename implementation files `cli/exp_agents_*.go` -> `cli/agents_*.go`
and internal identifiers (`expChatsTUIModel` -> `chatsTUIModel`,
`newExpChatsTUIModel` -> `newChatsTUIModel`, `setupExpAgentsBackend` ->
`setupAgentsBackend`, `startExpAgentsSession` -> `startAgentsSession`,
`expAgentsPtr` -> `agentsPtr`, `expAgentsSession` -> `agentsSession`,
`TestExpAgents*` -> `TestAgents*`). `expClient` (the
`*codersdk.ExperimentalClient` local) is kept; `coderd/exp_chats*.go`
and other still-experimental `cli/exp_*.go` commands are intentionally
untouched.

**Frontend:**
- Remove experiment check from `AgentsNavItem` - render when
`canCreateChat` is true
- Remove `agentsEnabled` experiment check from `WorkspacesPage`, then
gate `chatsByWorkspace` on `permissions.createChat` so users without
chat access don't trigger the per-page DB query (Copilot review
feedback)
- Add `FeatureStageBadge` (beta) next to the Coder logo in the Agents
sidebar (desktop + mobile)

**Docs:**
- Remove experiment flag setup instructions from `early-access.md` and
`getting-started.md` (and rename `early-access.md`'s "Enable Coder
Agents" heading to "Set up Coder Agents", since there is no enablement
step left)
- Update `chats-api.md` and `getting-started.md`'s Chats API note to say
"beta" instead of "experimental"
- `docs/manifest.json`: drop "experimental" from the Chats API sidebar
description
- `make gen` regenerated `docs/reference/cli/agents.md` and the CLI
index
- `scripts/check_emdash.sh`: exclude `cli/testdata/*.golden` and
`enterprise/cli/testdata/*.golden` from the new repo-wide emdash lint,
since serpent emits emdash borders in every generated `--help` golden
file

**Tests:**
- Remove `ExperimentAgents` setup from all test files (14 occurrences
across 7 files)
- Update stale "with the agents experiment" comments in
`coderd/x/chatd/integration_test.go` and `coderd/mcp_test.go`


<img width="1185" height="900" alt="image"
src="https://github.com/user-attachments/assets/b420bc8f-41d6-42c6-abd8-ad572533d651"
/>


> 🤖 Generated by Coder Agents
2026-05-01 01:49:00 +10:00
Thomas Kosiewski 06bad73df4 feat: add admin-configurable advisor API, SDK, and queries (#24621)
## Summary

Add the **admin-configurable advisor configuration**: database-backed storage, SDK types, and the experimental HTTP handlers that back the admin settings UI (later PRs). Follows the same "site-configs" pattern as Virtual Desktop.

## Motivation

The advisor needs runtime-tunable knobs (enable/disable, per-run cap, max output tokens, reasoning effort, optional model override) without a service restart or redeploy. Using the existing `site_configs` K/V table keeps this pattern consistent with other admin features and avoids a bespoke schema.

## Changes

### Database (`coderd/database/queries/siteconfig.sql`)
- `GetChatAdvisorConfig` returns the stored JSON blob (default `'{}'`) under key `agents_advisor_config`.
- `UpsertChatAdvisorConfig` uses the standard `INSERT ... ON CONFLICT` pattern.
- Regenerated via `make gen` (queries.sql.go + mocks).

### SDK (`codersdk/chats.go`)
- `AdvisorConfig` type with `Enabled`, `MaxUsesPerRun`, `MaxOutputTokens`, `ReasoningEffort` (`""` / `low` / `medium` / `high`), `ModelConfigID uuid.UUID`.
- Client methods: `ChatAdvisorConfig(ctx)` / `UpdateChatAdvisorConfig(ctx, cfg)`.

### API (`coderd/exp_chats.go`)
- `GET /api/experimental/chats/config/advisor`: reads current config; relies on `ActorFromContext` validation.
- `PUT /api/experimental/chats/config/advisor`: requires `policy.ActionUpdate` on `rbac.ResourceDeploymentConfig`.
- Handlers unmarshal `{}` to a typed zero value and re-marshal on upsert for schema stability.
- Tests in `exp_chats_test.go` cover empty defaults, round-trip update, unauthorized update, and invalid body.

## Stack context

This is **PR 3 of 6** in the advisor feature stack. Consumed by:
- PR 4 (`feat/advisor-04-chatd-runtime`), which reads this config on every `runChat`.
- PR 6 (`feat/advisor-06-admin-settings-ui`), which renders the admin form.

## Scope / non-goals

- No `chatd` read path (lands in PR 4).
- No UI (lands in PR 6).
- `agents_advisor_config` remains a single-row JSON blob; we intentionally do not shard per-org/per-template yet.

## Validation

- `make gen`
- `go test ./coderd/database/... -run TestChatAdvisor`
- `go test ./coderd/... -run TestChatAdvisorConfig`
- `make lint`

---

<details>
<summary>📋 Implementation Plan (shared across the advisor stack)</summary>

# Plan: Add a Mux-style advisor tool to coder agents/chatd

## Outcome

Add a first-class `advisor` tool to agent chats in `coderd/x/chatd` that feels native to Coder:

- it is a built-in server-side tool, not an MCP/dynamic-tool workaround;
- it performs a nested **tool-less** model call for strategic advice;
- it is exposed only when eligible, and the prompt mentions it only when it is actually available;
- it is treated as a **planning-only** tool so it does not run alongside action tools in the same batch;
- it tracks usage/cost separately enough for operators to reason about it;
- it has a minimally polished UI in the Agents page;
- and it ships with explicit dogfooding evidence, including screenshots and repro videos.

## Design decisions to lock before coding

1. **Primary architecture:** native built-in tool in `chattool/`, backed by a small `chatadvisor` package.
2. **Nested model execution:** reuse chatd's existing model/provider stack for a one-step, tool-less advisor call rather than inventing a new provider pathway.
3. **Execution policy:** treat `advisor` as an exclusive/planning-only tool; mixed batches must return structured policy errors and force the model to retry cleanly.
4. **Availability:** initial rollout is for root agent chats only; disable for child/sub-agent chats until recursion/cost policy is proven.
5. **Prompt sync:** use one eligibility boolean to drive both tool registration and advisor guidance injection.
6. **Persistence/cost split:** MVP should keep advisor usage visible in result metadata and server metrics; only add DB schema if product/billing explicitly needs queryable advisor-specific cost.
7. **UI scope:** generic tool rendering is an acceptable temporary milestone during backend bring-up, but the release candidate should include a dedicated lightweight advisor renderer.

## Delivery model

The work should be executed as coordinated workstreams with one integration owner and parallel contributors for low-conflict areas. The integration owner should own `coderd/x/chatd/chatd.go` because prompt assembly, tool registration, and model resolution all converge there.

## Detailed workstreams

### Repo evidence used for this plan

<details>
<summary>Mux reference and current chatd seams</summary>

**Mux reference implementation**

- `src/node/services/tools/advisor.ts` — native advisor tool implementation.
- `src/common/constants/advisor.ts` — advisor prompt/constants and truncation policy.
- `src/common/utils/tools/tools.ts` — conditional tool registration.
- `src/node/services/streamContextBuilder.ts` — injects advisor guidance only when the tool is available.

**Current chatd seams**

- `coderd/x/chatd/chatd.go`
  - `processChat()` — tool assembly, prompt assembly, and chatloop invocation.
  - `resolveChatModel()` — current model/provider/key resolution seam.
  - `type Config struct` — server-level chatd configuration surface.
- `coderd/x/chatd/chatloop/chatloop.go`
  - `Run()` — main streaming/model loop.
  - `executeTools()` — built-in tool execution/batching seam.
- `coderd/x/chatd/chattool/` — built-in tool implementations.
- `site/src/pages/AgentsPage/components/ChatElements/tools/Tool.tsx` — tool renderer dispatch.
- `site/src/pages/AgentsPage/components/ChatConversation/messageParsing.ts` and `ConversationTimeline.tsx` — tool/result merge and rendering flow.

</details>

### Workstream map and ownership

| Workstream | Primary owner | Main files | Can run in parallel? | Done when |
|---|---|---|---|---|
| 0. Integration + gating | Integration lead | `coderd/x/chatd/chatd.go` | No; central merge lane | Tool registration, prompt sync, and model selection are wired together |
| 1. Advisor runtime + tool | Backend agent | new `coderd/x/chatd/chatadvisor/`, new `coderd/x/chatd/chattool/advisor.go` | Yes | Tool can perform a tool-less advisor call in memory and return structured results |
| 2. Planning-only execution policy | Chatloop agent | `coderd/x/chatd/chatloop/chatloop.go`, related tests | Yes | Mixed `advisor` + action-tool batches are rejected cleanly and deterministically |
| 3. Metrics/usage/config | Backend/telemetry agent | `chatd.go`, `chatloop/metrics.go`, optional config plumbing | Partially; coordinate with integration lead | Advisor usage is separately visible in metadata/metrics and limits are enforced |
| 4. Frontend rendering | Frontend agent | `site/.../tools/Tool.tsx`, new `AdvisorTool.tsx`, stories | Yes after result schema stabilizes | Advisor renders as a readable card and story tests pass |
| 5. Dogfood + QA evidence | QA agent | dev server, Storybook, dogfood output | After backend + UI are usable | Repro videos, screenshots, and a concise QA report exist |

### Parallelization rules

- **Do not split `coderd/x/chatd/chatd.go` across multiple execution agents without an integration lead.** That file owns prompt building, tool registration, model resolution, and cost persistence.
- Workstreams 1 and 2 can be developed in parallel and then stacked onto the integration branch.
- Workstream 4 should begin once the backend result schema is agreed on, even if the backend is still behind a feature flag.
- Any agent that needs to re-check Mux behavior should clone `coder/mux` into a temporary directory (for example, `$(mktemp -d)/mux`) and inspect it read-only; do not vendor or copy code from Mux directly.

## Phase 0 — Preflight and guardrails

### Goals

- Align the team on the smallest shippable architecture.
- Prevent scope creep into MCP/dynamic-tool/sub-agent variants.
- Decide upfront what is MVP vs. follow-up.

### Tasks

1. **Confirm the MVP boundary.**
   - Ship a built-in advisor tool first.
   - Do **not** make MCP, dynamic tools, or sub-agents the primary implementation.
   - Do **not** add transient streaming phases in the first backend PR unless they fall out almost for free.

2. **Confirm local workflow hygiene before coding.**
   - Ensure the repo is using the project git hooks from `scripts/githooks`.
   - Do not bypass hooks with `--no-verify`.
   - Use `./scripts/develop.sh` for the full dev server rather than manual build/run commands.

3. **Lock the model-selection policy.**
   - **Recommended MVP:** advisor uses the same resolved provider/model/cost config as the current chat, with advisor-specific max-output and usage caps.
   - **Follow-up only if required:** add a separate `AdvisorModelConfigID`-style override that resolves through the existing `configCache`/model-config path. Do not invent a new free-form `provider:model` parser if chatd already stores provider/model separately.

4. **Lock the persistence policy.**
   - **Recommended MVP:** no DB migration. Persist advisor-visible metadata in the tool result and record separate metrics in memory/Prometheus.
   - **Only if product/billing explicitly asks for queryable advisor cost:** add a later DB migration or usage table, following the normal `queries/*.sql` + `make gen` workflow.

5. **Create an execution ADR note in the work item or tracking doc.**
   - Capture: built-in tool, tool-less nested call, root-chat-only rollout, exclusive execution policy, MVP no-DB-migration default.

### Quality gate

- Everyone on the team can state the same answers to these questions:
  - Is advisor a built-in tool? **Yes.**
  - Can advisor run with action tools in the same batch? **No.**
  - Does advisor get tools of its own? **No.**
  - Is a DB migration required for MVP? **No, unless billing insists.**

## Phase 1 — Build the advisor runtime and tool wrapper

### Goals

Create the core advisor implementation in a way that is easy to test and keeps `chattool/` thin.

### Files to add

- `coderd/x/chatd/chatadvisor/types.go`
- `coderd/x/chatd/chatadvisor/guidance.go`
- `coderd/x/chatd/chatadvisor/handoff.go`
- `coderd/x/chatd/chatadvisor/runtime.go`
- `coderd/x/chatd/chatadvisor/runner.go`
- `coderd/x/chatd/chattool/advisor.go`

### Responsibilities by file

1. **`types.go`**
   - Define the input/result schema used by the tool and UI.
   - Keep the result shape close to Mux so the UI and model both have predictable cases.
   - Recommended result variants:
     - `advice`
     - `limit_reached`
     - `error`

   Recommended shape:

   ```go
   type AdvisorArgs struct {
       Question string `json:"question"`
   }

   type AdvisorResult struct {
       Type          string              `json:"type"`
       Advice        string              `json:"advice,omitempty"`
       Error         string              `json:"error,omitempty"`
       AdvisorModel  string              `json:"advisor_model,omitempty"`
       RemainingUses int                 `json:"remaining_uses,omitempty"`
       Usage         *AdvisorUsageResult `json:"usage,omitempty"`
   }
   ```

2. **`guidance.go`**
   - Hold two strings:
     - the nested advisor system prompt;
     - the parent-agent guidance block to inject into the outer system prompt.
   - The nested advisor prompt must say, in plain language:
     - you are advising the parent agent;
     - you do not address the end user directly;
     - you do not claim actions happened;
     - you return concise strategic guidance and tradeoffs.

3. **`runtime.go`**
   - Define the per-run runtime state.
   - Recommended fields:
     - resolved model + model config;
     - provider keys/options reused from the outer chat;
     - `MaxUsesPerRun`;
     - `MaxOutputTokens`;
     - atomic/current call counter;
     - callback(s) to obtain the current prompt snapshot and current-step snapshot;
     - optional metrics/usage hook.
   - Add fail-fast validation for impossible config: nil model, non-positive limits, empty prompt builders, etc.

4. **`handoff.go`**
   - Build the advisor handoff message from:
     - the explicit question;
     - the exact prompt/messages the parent model just used;
     - the current step's text/reasoning snapshot, if available;
     - the most recent relevant tool outputs, if they are already in the prompt snapshot.
   - **Important:** use the already-prepared outer prompt tail, not a fresh DB reload. That keeps the advisor aligned with compaction and the exact context the outer model saw.
   - Apply hard truncation budgets with recent-context bias.

5. **`runner.go`**
   - Execute the nested advisor call.
   - **Recommended implementation:** call `chatloop.Run()` in an in-memory, one-step mode:
     - `Tools: nil`
     - `ProviderTools: nil`
     - `MaxSteps: 1`
     - `PersistStep`: capture the assistant output in memory instead of writing DB rows
   - Reuse the existing provider/model/cost path instead of building a second provider runner.
   - Assert that no tool definitions are passed to the nested call.

6. **`chattool/advisor.go`**
   - Keep this file thin and consistent with other built-ins.
   - Responsibilities:
     - decode `AdvisorArgs`;
     - validate `Question` is non-empty and bounded;
     - call the `chatadvisor` runner;
     - return a structured tool response.

### Defensive programming requirements

- Assert `Question` is non-empty after trimming.
- Assert runtime limits are positive.
- Assert the nested advisor call runs with zero tools/provider tools.
- Assert `AdvisorResult.Type` is one of the known variants before returning.
- Assert remaining uses never goes negative.

### Acceptance criteria

- A unit test can call the advisor tool with a fake model and receive a stable `advice` result.
- The nested advisor call is impossible to run with tools accidentally attached.
- The core logic lives in `chatadvisor/`, not embedded inside `chatd.go`.

## Phase 2 — Wire advisor into chatd and keep prompt/tool availability in sync

### Goals

Register the tool in the right place, expose it only when eligible, and inject system guidance only when the tool is present.

### Files to modify

- `coderd/x/chatd/chatd.go`
- optionally a small helper file if `chatd.go` becomes too crowded

### Tasks

1. **Compute one eligibility boolean in `processChat()`.**
   Recommended inputs:
   - server-level advisor enabled flag;
   - root chat only (`chat.ParentChatID == uuid.Nil` or equivalent existing root/child check);
   - a usable resolved model/provider exists;
   - optional experiment/workspace/org gate if product wants staged rollout.

2. **Create the runtime once per outer chat run.**
   - Use the model/config/keys resolved by `resolveChatModel()`.
   - Reuse provider options from the current chat's `ChatModelCallConfig`.
   - Set `MaxUsesPerRun` and `MaxOutputTokens` from advisor config defaults.

3. **Register the tool in the built-in tool block.**
   - Insert after the skill tools and before MCP tools in `processChat()`.
   - Record `builtinToolNames["advisor"] = true` so metrics stay bounded.

4. **Inject advisor guidance into the outer system prompt using the same boolean.**
   - Use `chatprompt.InsertSystem()` in the same prompt assembly path that already injects user/system instructions.
   - Place the block near the existing instruction insertion, before plan-path/skill context blocks.
   - Wrap the guidance in an explicit tag like `<advisor-guidance>` so it is easy to spot in tests and future refactors.

5. **Keep advisor out of child chats for the first release.**
   - That avoids recursion/cost blowups with `spawn_agent` / `wait_agent` flows.
   - Document this explicitly in the rollout notes and tests.

### Acceptance criteria

- If advisor is disabled, neither the tool nor the prompt guidance appears.
- If advisor is enabled, both the tool and the prompt guidance appear.
- Root chats can use advisor; child chats cannot.
- Built-in tool names include `advisor` so metrics do not collapse it into the generic `mcp` label.

## Phase 3 — Enforce planning-only execution policy in `chatloop`

### Goals

Prevent the model from calling `advisor` and action tools in the same execution batch.

### Files to modify

- `coderd/x/chatd/chatloop/chatloop.go`
- related chatloop tests

### Recommended implementation

Keep the MVP small; do **not** build a general policy engine yet.

1. Add a minimal field to `chatloop.RunOptions`, for example:

   ```go
   ExclusiveToolName *string
   ```

2. In `Run()` / `executeTools()`, detect the case where the exclusive tool appears in the same local-tool batch as any other locally executed tool.

3. When that happens, synthesize structured tool-result errors for the affected calls instead of executing anything in the batch.
   - `advisor` should receive a clear error like: _advisor must be called by itself before action tools_.
   - The sibling action tools should receive a paired policy error like: _this tool was skipped because advisor must run alone_.

4. Let the outer model see those tool errors and retry cleanly.
   - This is simpler and safer than partial execution or hidden deferral.
   - It preserves deterministic transcript history for debugging.

5. Pass the just-finished step snapshot into the tool execution context.
   - The advisor runtime should be able to see the current step's text/reasoning content, because that is often the best hint about what the outer model is trying to decide.

### Why this is the right fit

- It matches the intended semantics: advisor is consulted **before** taking action.
- It avoids subtle race conditions caused by concurrent built-in tool execution.
- It keeps the behavior easy to test with fake models.

### Acceptance criteria

- A model-emitted batch containing only `advisor` succeeds.
- A model-emitted batch containing `advisor` plus any other locally executed tool returns deterministic policy errors and executes nothing.
- Non-advisor tool execution stays unchanged for normal chats.

## Phase 4 — Usage limits, metrics, and configuration

### Goals

Make advisor safe to operate without over-designing billing/storage in the first release.

### Files to modify

- `coderd/x/chatd/chatd.go`
- `coderd/x/chatd/chatloop/metrics.go` as needed
- `coderd/x/chatd/chatd.go` `Config` struct and constructor path
- optional follow-up config/db files only if a separate advisor model or persistent billing is required

### Tasks

1. **Add explicit server config knobs for MVP.**
   Recommended fields on `chatd.Config` or a nested advisor config struct:
   - `AdvisorEnabled bool`
   - `AdvisorMaxUsesPerRun int`
   - `AdvisorMaxOutputTokens int64`

2. **Track usage per outer run.**
   - Reset the counter for each `processChat()` invocation.
   - Return `remaining_uses` in the tool result.
   - Return `limit_reached` when the cap is exhausted.

3. **Expose advisor usage metadata in the tool result.**
   - Include model name and token/cost summary if available.
   - Use the same `callConfig.Cost` calculation path as the outer chat for MVP if advisor reuses the same model.

4. **Record server-side metrics.**
   - Count advisor invocations, failures, and latency.
   - Ensure they show up under the built-in tool label `advisor`.

5. **Optional decision gate: separate advisor model.**
   - If product insists on a stronger/different advisor model, add a follow-up config hook that resolves another existing chat model config through the same `configCache` path.
   - Keep that out of the first landing PR unless it is required for acceptance.

6. **Optional decision gate: queryable advisor cost.**
   - If this becomes required, spin a follow-up DB task:
     - update `coderd/database/queries/*.sql`;
     - add migration files;
     - run `make gen`;
     - update audit mappings if a new auditable type/field is introduced.

### Acceptance criteria

- Advisor calls are capped per outer run.
- Limit exhaustion is user-visible in the tool result.
- Metrics distinguish advisor calls from other built-in tools.
- MVP does not require a schema migration unless explicitly approved.

## Phase 5 — Frontend rendering and Storybook coverage

### Goals

Make advisor feel intentional in the Agents UI without blocking the backend on fancy streaming UI.

### Files to modify

- `site/src/pages/AgentsPage/components/ChatElements/tools/Tool.tsx`
- new `site/src/pages/AgentsPage/components/ChatElements/tools/AdvisorTool.tsx`
- Storybook story file(s) in the same tools directory

### Delivery strategy

1. **Intermediate milestone during backend bring-up:** rely on the existing generic tool renderer if needed.
   - This is acceptable only as a short-lived integration checkpoint.

2. **Release milestone:** add a dedicated lightweight `AdvisorTool` renderer.
   - Reuse existing primitives:
     - `ToolCollapsible`
     - `ToolIcon`
     - `Response` for markdown/prose rendering
     - `ScrollArea` if the advice can be long
   - Keep styling light and consistent with the Agents page.
   - Do not add unnecessary React memoization in `site/src/pages/AgentsPage/`; that area is already React-Compiler aware.

3. **Render the structured result states cleanly.**
   - `advice` — readable prose/markdown with optional metadata footer.
   - `limit_reached` — warning-style message.
   - `error` — error state with visible fallback text.
   - `running` — existing tool loading state/spinner is enough for MVP.

4. **Add Storybook coverage instead of ad-hoc component tests.**
   Recommended stories:
   - successful advice;
   - running/loading;
   - limit reached;
   - error.

5. **Keep the UI contract narrow.**
   - Prefer one text field like `advice` plus small metadata rather than a deeply nested schema.
   - That keeps the UI resilient to prompt iteration.

### Acceptance criteria

- The advisor tool card renders readable content rather than raw quoted JSON in the final release branch.
- Running, limit, and error states are visibly distinct.
- Storybook stories and play assertions cover the new states.
- Existing tool rendering flows remain unchanged.

## Phase 6 — Automated tests and validation gates

### Backend tests to add

1. **Advisor runtime/tool tests**
   - question validation;
   - tool-less nested execution assertion;
   - success result shaping;
   - limit-reached result shaping;
   - error result shaping.

2. **Prompt/gating tests in chatd**
   - advisor disabled ⇒ no tool, no guidance;
   - advisor enabled/root chat ⇒ tool + guidance;
   - child chat ⇒ advisor absent.

3. **Chatloop policy tests**
   - advisor alone runs;
   - advisor + action tool mixed batch returns deterministic policy errors;
   - non-advisor tools still execute normally.

4. **Usage/metrics tests**
   - per-run cap resets correctly;
   - builtin tool labeling includes `advisor`;
   - returned metadata includes model/usage summary when available.

### Frontend tests to add

- Storybook `play()` assertions for the advisor renderer states.
- Verify expand/collapse behavior and visible fallback text.
- Verify the message timeline still renders adjacent tools correctly.

### Recommended command sequence

Run these as the implementation matures, not only at the end:

1. Backend-focused gate after phases 1–4:
   - `make test RUN=TestAdvisor`
   - `make test RUN=TestChatloopAdvisor`
   - `make lint`

2. Frontend-focused gate after phase 5:
   - `pnpm test:storybook src/pages/AgentsPage/components/ChatElements/tools/AdvisorTool.stories.tsx`
   - `pnpm lint`
   - `pnpm format`

3. Final repo gate before handoff:
   - `make pre-commit`
   - run any additional targeted `make test RUN=...` selections covering touched chatd paths

> Use the exact new test names the implementing agents create; the names above are recommended anchors, not existing tests.

## Dogfooding plan

### Principle

Dogfood the change as a real agent feature, not just a unit-tested backend. Per the dogfood and `agent-browser` skills, the reviewer should get **watchable repro videos** plus screenshots that make the behavior obvious without reading logs.

### Required setup

1. Start the full dev environment with:
   - `./scripts/develop.sh`
2. If the frontend renderer changes, also start Storybook from `site/` with:
   - `pnpm storybook --no-open`
3. Use `agent-browser` directly — **never `npx agent-browser`**.
4. Use named browser sessions and an output folder such as:
   - `./dogfood-output/advisor/`
   - with subfolders `screenshots/` and `videos/`

### Evidence protocol

For every interactive scenario below:

1. Start video recording **before** the action.
2. Capture step-by-step screenshots at human pace.
3. Capture one annotated screenshot of the final state.
4. Stop the recording.
5. Note the exact pass/fail observation in the QA report.

For static UI states (for example Storybook error/limit cards), an annotated screenshot is sufficient; video is optional but still encouraged by this project’s review preference.

### Dogfood scenarios

#### Scenario A — Happy path in the real Agents UI

**Goal:** prove that a root agent chat can invoke advisor and produce a readable recommendation before taking further action.

Steps:

1. Open the Agents page with an advisor-enabled root chat.
2. Start a repro video.
3. Send a prompt that should reasonably trigger strategic planning, such as an architecture or multi-tradeoff question.
4. Capture screenshots of:
   - the prompt before send;
   - the running advisor state;
   - the completed advisor card and the assistant’s follow-up response.
5. Stop recording.

Pass criteria:

- advisor appears in the timeline;
- the rendered result is readable;
- the assistant can continue after consuming the advisor output.

#### Scenario B — Advisor unavailable path

**Goal:** prove the feature is truly gated.

Suggested variants (at least one is required, both are better):

- feature flag/config off;
- child/sub-agent chat.

Evidence:

- annotated screenshot of the chat/tool state showing advisor is absent;
- short video if toggling the gate live is part of the repro.

Pass criteria:

- no advisor tool is available;
- no advisor-specific prompt behavior leaks through.

#### Scenario C — UI states in Storybook

**Goal:** prove the renderer handles non-happy states cleanly.

Required story states:

- success/advice;
- running;
- limit reached;
- error.

Evidence:

- one screenshot per state;
- at least one short video showing collapse/expand behavior.

Pass criteria:

- success renders readable advice;
- limit/error have visible fallback text;
- the component behaves like the other tool cards.

#### Scenario D — Regression sweep of nearby tools

**Goal:** ensure advisor does not break the surrounding chat timeline.

Check at minimum:

- another existing built-in tool still renders correctly near advisor;
- sub-agent/tool cards still expand/collapse normally;
- no obvious console errors appear in the Agents page during the advisor flow.

Evidence:

- screenshots of adjacent tool cards;
- console/error capture if anything suspicious appears.

### `agent-browser` usage notes for the QA agent

- Prefer `agent-browser batch` for 2+ sequential commands when no intermediate parsing is needed.
- Use `snapshot -i` to discover interactive refs.
- Re-snapshot after navigation or major DOM changes.
- Avoid `wait --load networkidle` unless the page is known to go idle; prefer explicit element/text waits or short fixed waits.
- Record videos at human pace and include pauses that a reviewer can follow.

## Rollout plan

### Initial rollout

- Gate behind a server-side advisor-enabled flag.
- Enable only for selected internal/root agent chats first.
- Watch metrics for:
  - invocation count;
  - failure rate;
  - latency;
  - obvious retry loops.

### Expansion conditions

Expand beyond the initial rollout only after the following are true:

- mixed-batch policy behavior is stable;
- cost impact is understood;
- frontend UX is readable in production-like dogfood;
- no recursion surprises have appeared with sub-agent flows.

### Explicit non-goals for the first release

- advisor inside child/sub-agent chats;
- provider-agnostic streaming phase UI;
- MCP-based external advisor implementation;
- mandatory DB-backed advisor cost reporting.

## Final acceptance checklist

- [ ] `advisor` is a built-in chatd tool, not an MCP/dynamic-tool substitute.
- [ ] The nested advisor call is tool-less and bounded to one in-memory step.
- [ ] One eligibility boolean controls both tool registration and prompt guidance injection.
- [ ] Root chats can use advisor; child chats cannot in the initial rollout.
- [ ] Mixed advisor/action batches produce deterministic policy errors instead of partial execution.
- [ ] Per-run usage caps and limit-reached behavior work.
- [ ] Advisor usage is visible in metadata/metrics without forcing a DB migration for MVP.
- [ ] The Agents UI has a readable advisor card and Storybook coverage.
- [ ] Dogfooding produced screenshots and repro videos for the required scenarios.
- [ ] Validation commands (`make lint`, targeted `make test`, Storybook tests, `make pre-commit`) passed before handoff.

## Suggested PR split

1. **PR 1 — Backend foundation**
   - `chatadvisor/` package
   - `chattool/advisor.go`
   - `chatloop` exclusive policy
   - chatd gating/prompt sync
   - backend tests

2. **PR 2 — Frontend + QA**
   - advisor renderer
   - stories/play assertions
   - dogfood artifacts and QA notes

3. **PR 3 — Optional follow-ups only if demanded by stakeholders**
   - separate advisor model override
   - persistent advisor billing/queryability
   - transient phase-stream UX


</details>

---
_Generated with [`mux`](https://github.com/coder/mux) • Model: `anthropic:claude-opus-4-7` • Thinking: `max`_
2026-04-30 14:53:08 +02:00
david-fraley 5222db86c7 feat: add after_id pagination for chat messages (#24531) 2026-04-28 08:31:33 -05:00
Michael Suchacz 0211448d09 fix(coderd): sanitize Anthropic provider tool history (#24706)
Anthropic can reject replayed chat histories when a provider-executed
tool call, such as `web_search`, is present without its matching
provider result block.

This sanitizes unpaired Anthropic provider-executed tool calls during
prompt reconstruction, before Anthropic requests, and before persistence
so existing poisoned histories can continue and new malformed turns are
not stored.

Resolves: CODAGT-259

> Mux is acting on Mike's behalf.
2026-04-24 23:57:30 +02:00
Michael Suchacz c7cac9debe fix: persist per-turn model on chats and queued messages (#24688)
Previously, `chats.last_model_config_id` was not updated when a user
sent a mid-chat message with a different model, and queued messages did
not store their own per-turn model, so promotion ran against whatever
the chat row said at promote time. Chat watch events also did not merge
`last_model_config_id` into the site's root, child, and per-chat
caches, so sidebar labels stayed stale after direct sends and queued
promotions.

- Add nullable `chat_queued_messages.model_config_id`, backfilled from
  `chats.last_model_config_id`. Queued inserts round-trip the effective
  model id at enqueue time.
- In `coderd/x/chatd`, direct sends update `chats.last_model_config_id`
  inside the same transaction that inserts the admitted user message.
  Manual promotion and auto-promotion use the queued row's stored
  `model_config_id`, with a fallback to `chats.last_model_config_id`
for legacy NULL rows during rollout.
`PromoteQueuedOptions.ModelConfigID`
  is now ignored.
- On the site, extract `mergeWatchedChatSummary` and
  `mergeWatchedChatIntoCaches` in `site/src/api/queries/chats.ts` so
  status-change watch events merge `last_model_config_id` into the
  root infinite chat list, the parent-embedded child entry, and the
  per-chat `chatKey(chatId)` cache. `updated_at` guards against stale
  watch payloads clobbering newer cached state, while diff status
  events still merge their PR metadata because they are timestamped
  outside the chat row. Watch timestamps are compared as instants so
  variable fractional precision does not make fresh events look stale.
- Queued promotion validates stored model config IDs before admission.
  Invalid legacy queued IDs fall back to the chat's current model config
  instead of dropping the queued message during auto-promotion.
- Backend and frontend regression coverage added for admission, queue
  promotion (including FIFO across mixed models, legacy NULL fallback,
  and invalid queued model IDs), and chat watch cache merging.

> Mux is acting on Mike's behalf.
2026-04-24 15:36:08 +02:00
Cian Johnston a876287d36 feat: auto-archive inactive chats with audit trail (#24642)
Adds a background job in `dbpurge` that periodically archives chats
inactive beyond a configurable threshold. Each archived root chat gets a
background audit entry tagged `chat_auto_archive`. Disabled by default.

* New `AutoArchiveInactiveChats` SQL query with LATERAL last-activity
subquery and partial index on archive candidates
* `site_configs`-backed `auto_archive_days` setting with admin-only PUT,
any-authenticated-user GET
* Cascade archive via `root_chat_id`; pinned chats and active threads
exempt
* Root-only audit dispatch on detached context, matching manual archive
(`patchChat`) behavior
* 11 subtests covering disabled no-op, boundary, deleted messages, child
activity, pinned exemption, multi-owner, idempotency, and batch
pagination

PR #24643 adds per-owner digest notifications.
PR #24704 adds the requisite UI controls.

> 🤖
2026-04-24 14:18:28 +01:00
Michael Suchacz 3d90546aae feat: add general subagent model override (#24610)
Adds a deployment-wide admin override for general delegated subagents.

## What changed
- store the general override in `site_configs` and expose it through the
shared `agent-model-override/{context}` API
- apply the general override when spawning delegated general subagents,
while preserving the existing Explore override behavior
- reuse a shared Agents settings form for the general and Explore
override sections

## Validation
- `make gen`
- `go test ./coderd -run 'TestChatModelOverrides'`
- `go test ./coderd/x/chatd -run
'TestSpawnAgent_(GeneralUsesConfiguredModelOverride|GeneralOverrideLogsAndFallsBackWhenCredentialsUnavailable|GeneralOverrideLogsAndFallsBackWhenProviderDisabled)'`
- `pnpm -C site lint:types`
- `pnpm -C site test:storybook --
AgentSettingsAgentsPageView.stories.tsx`
- `make lint`
- `make pre-commit`

> Mux is acting on Mike's behalf.
2026-04-24 12:37:20 +02:00
Cian Johnston c602a31856 fix(coderd): reject pinning child chats in patchChat handler (#24669)
The UI already prevents child (delegated/subagent) chats from being
pinned, but the `PATCH /api/experimental/chats/{chat}` endpoint did not
enforce this. A direct API call could pin a child chat.

- Add a `400 Bad Request` guard in `patchChat` when `pinOrder > 0` and
the chat has a `ParentChatID`
- Add `TestChatPinOrder/RejectsChildChat` test

> 🤖
2026-04-23 18:36:20 +01:00
Cian Johnston b5a625549e feat: migrate agents-access to org-scoped system role for proper chat RBAC (#24438)
The agents-access role previously granted chat permissions at user
scope, but chats are org-scoped objects. Rego skips user-level perms
when org_owner is set, making the grants invisible. Handler-level
band-aids used synthetic non-org-scoped objects as a workaround.

  - Migrates agents-access from users.rbac_roles (site-level) to
    organization_members.roles (org-scoped) via DB migration
  - Redefines agents-access as a predefined org-scoped builtin role
    alongside organization-admin, organization-auditor, etc., with
    Member permissions granting chat create/read/update
  - Excludes ResourceChat from OrgMemberPermissions so org membership
    alone no longer grants chat access
  - Fixes handler Authorize checks to use org-scoped objects with
semantically correct actions (ActionUpdate for message/tool operations)
  - Grants org admins the ability to assign agents-access

Closes #24250
Fixes CODAGT-174

Note: this does not update the "Usage" endpoints. Tracked by CODAGT-161.
> 🤖
2026-04-23 17:59:42 +01:00
Mathias Fredriksson f8fe5d680b fix(coderd): reject API operations on archived chats (#24633)
Archived chats accept mutations (messages, edits, queued-message
promotions, tool-result submissions) via the API, causing them to
re-enter the processing pipeline. This violates the hard-stop
design intent from PR #23758.

Add archived checks at three layers:

- HTTP handlers (postChatMessages, patchChatMessage,
  promoteChatQueuedMessage, postChatToolResults): return 400
  after auth so callers get a clear error.
- Daemon functions (SendMessage, EditMessage, PromoteQueued,
  SubmitToolResults): return ErrChatArchived after row lock,
  guarding against future callers that bypass the handler.
- AcquireChats SQL: filter out archived chats so they are never
  acquired for processing.

Fixes CODAGT-245
2026-04-23 19:03:33 +03:00
Cian Johnston be1256c418 fix(coderd): fix TestListChats/PinnedOnFirstPage race timeout (#24641)
- Insert filler chats directly into the database with `completed` status
instead of creating them via the API
- Removes the `testutil.Eventually` polling loop that waited for all 52
chats to reach terminal status
- Avoids spawning 52 background chat processors that each time out on
title generation under `-race`, exceeding the 25s `WaitLong` timeout
- Test now completes in ~1s instead of timing out at 30s+

Flake:
https://github.com/coder/coder/actions/runs/24789695935/job/72543519963?pr=24438

> 🤖
2026-04-22 20:37:06 +01:00
Michael Suchacz 9634739aed fix: support Bedrock ambient AWS credentials for Agents providers (#24397)
> This PR was authored by Mux on behalf of Mike.

Adds AWS Bedrock ambient credential support to the Agents provider path.
Bedrock providers can now be saved without a stored API key and
authenticated via the standard AWS SDK credential chain on the Coder
server (IAM roles, `AWS_ACCESS_KEY_ID`, etc.). Also fixes missing `Base
URL` forwarding for Bedrock.

## Changes

**Backend runtime** (`coderd/x/chatd/chatprovider/chatprovider.go`):
- New `ProviderAllowsAmbientCredentials(provider)` helper. Currently
returns true only for Bedrock.
- `ModelFromConfig` no longer errors on an empty API key when the
provider is in the ambient-allowed set AND was explicitly resolved via
`ByProvider`. This preserves the policy gate: unresolvable providers
(disabled central key, user-key-required without a user key) still
error.
- `setResolvedProviderAPIKey` internalizes the ambient-credentials
contract via `ProviderAllowsAmbientCredentials`, so a
resolved-but-keyless Bedrock provider is represented as an empty
`ByProvider` entry rather than a post-hoc sentinel patch in the caller.
- `WithAPIKey` is only appended when a token is present.
- `WithBaseURL(baseURL)` is now forwarded for Bedrock (was previously
missing).

**Backend admin API** (`coderd/exp_chats.go`):
- `validateChatProviderCentralAPIKey` exempts Bedrock from requiring a
stored API key when central credentials are enabled.
- AI Gateway separation (`ChatProviderAPIKeysFromDeploymentValues`) is
unchanged. No silent reuse of `CODER_AIBRIDGE_BEDROCK_*` flags.

**Frontend**
(`site/src/pages/AgentsPage/components/ChatModelAdminPanel/*`):
- API Key field is optional for Bedrock when central credentials are
enabled.
- Bedrock-specific descriptions on API Key and Base URL fields
(bearer-token vs ambient modes, `AWS_REGION` guidance).
- Right-aligned "Clear stored token" action switches an existing Bedrock
provider back to ambient mode.
- `hasEffectiveAPIKey` treats Bedrock with central credentials enabled
as configured, so the provider list shows the correct status icon.
- Three new stories: `ProviderFormBedrockAmbientCredentials`,
`ProviderFormBedrockBearerToken`, `ProviderFormBedrockClearBearerToken`.

**Docs** (`docs/ai-coder/agents/models.md`,
`docs/ai-coder/ai-gateway/setup.md`):
- New "Configuring AWS Bedrock" section covering both credential modes,
region resolution, and the Base URL override.
- Explicit note that the `us-east-1` region fallback only applies to
bearer-token mode; ambient credentials require a region from the
standard AWS SDK chain.
- Cross-reference in AI Gateway docs clarifying that
`CODER_AIBRIDGE_BEDROCK_*` flags are a separate configuration path from
Agents.

## Not in scope

- Reusing AI Gateway Bedrock flags as an implicit Agents fallback.
- Per-provider AWS access key, secret, or region fields (would need a
migration and audit-table review).
- IMDS or network-backed credential probes in admin/listing request
paths.

## Related

Dogfood deployment integration:
https://github.com/coder/dogfood/pull/324
2026-04-22 14:20:23 +02:00
Ethan ad1906589d fix(coderd): allow deleting chat providers used in historical chats (#24568)
Drop the `chat_model_configs.provider -> chat_providers.provider`
foreign key and soft-delete model configs when their provider is
removed. The provider row is now hard-deleted inside a transaction that
also tombstones its model configs and promotes a replacement default
when needed.

Historical chats and messages keep pointing at the soft-deleted model
config rows, which are hidden from live/admin queries but still resolve
for read. The runtime chat path already falls back to the default model
config when a soft-deleted config is looked up.

Replaces the lost FK validation in the create/update model-config
handlers with an explicit provider lookup that returns the existing
`Chat provider is not configured.` 400.

## UX

**Admin deleting a chat provider that has historical usage**

- Before: blocked with 400 `Provider models are still referenced by
existing chats.` Admins had no in-product way to remove a provider that
had ever been used.
- After: delete succeeds (204). Any model configs under that provider
are soft-deleted. If the removed provider owned the default model
config, one of the remaining live configs is auto-promoted to the new
default. The promotion is deterministic (`ensureDefaultChatModelConfig`
picks the first live config by `provider ASC, model ASC, updated_at
DESC, id DESC`); there is no picker, and no toast or response detail
names which config became the new default.

**End users with chats that used a deleted provider's model**

- Old chats still open and their history still renders unchanged.
- Sending a new turn in such a chat silently falls back to the current
default model. No banner or warning tells the user the original model is
gone.
- The model picker no longer lists the deleted model.
- If no default model config exists at all after the delete, sending a
new turn fails with `no default chat model config is available`.

**Admin creating or updating a model config against a provider that is
not configured**

- Same as before: 400 `Chat provider is not configured.` Only the
detection mechanism changed (explicit `FOR UPDATE` lookup inside the
transaction, which also serializes against a concurrent provider
delete).

**Admin updating a model config whose row disappears mid-transaction**

- Now returns the standard 404 `Resource not found or you do not have
access to this resource` instead of the previous 500 that leaked `sql:
no rows in result set` in the detail. Unrelated internal races (for
example a race on the promoted default candidate) are still reported as
500 so they are not misclassified as "your target is gone".

Closes CODAGT-23
2026-04-22 19:34:34 +10:00
Cian Johnston 360e119b43 fix(coderd): use waitChatSettled in remaining title tests (#24585)
- Replace inline `require.Eventually` blocks in `PreservesUpdatedAt` and
`NoOpWhenTitleUnchanged` with the shared `waitChatSettled` helper
- These were the last two title subtests still using direct DB polling
instead of the API-based helper

> 🤖
2026-04-22 09:14:25 +01:00
Jaayden Halko 148e56b5d9 fix(coderd): fix TestPatchChat/Title flake by waiting for chat to settle (#24572)
## Problem

`TestPatchChat/Title/Rename` and `TestPatchChat/Title/TrimsWhitespace`
fail intermittently on `test-go-pg` with:

```
PATCH .../api/experimental/chats/<id>: unexpected status code 409:
Title regeneration already in progress for this chat.
```

`createChat` persists a chat with `ChatStatusPending` and signals the
daemon wake loop. If the `UpdateChat` PATCH arrives before the daemon
transitions the chat past `Pending`/`Running`, the handler's
`acquireManualTitleLock` returns a 409. Whether the PATCH wins the race
is timing-dependent under PG + `-parallel` load.

Sibling subtests `PreservesUpdatedAt` and `NoOpWhenTitleUnchanged`
already wait for the chat to leave `Pending`/`Running` before renaming,
which is why they do not flake.

## Fix

Add a `waitChatSettled` helper closure in `TestPatchChat` that polls
`client.GetChat` until the chat status leaves `Pending`/`Running`.
Call it in the 4 subtests that issue a valid rename immediately after
`createChat`:

- `Title/Rename` (originally reported flake)
- `Title/TrimsWhitespace` (originally reported flake)
- `Title/LengthBoundaries` (latent flake in valid-rename cases)
- `Title/PublishesWatchEvent` (latent flake, goroutine silently 409s)

No handler, daemon, or SDK changes. The 409 is intentional production
behavior; this is a pure test-side timing fix.

Refs coder/internal#1480
2026-04-21 17:10:00 +01:00
Cian Johnston 4d45b69b03 fix: stop tracking chat title in audit logs (#24564)
Chat titles can contain sensitive information (secrets, internal project
names, etc.) and should not be visible in audit logs.

- Use truncated chat UUID (first 8 chars) as `resource_target` instead
of the title
- Mark the `title` field as `ActionSecret` so diffs render as `••••••••`

<details><summary>Implementation notes</summary>

Two changes:
1. `coderd/audit/request.go`: `ResourceTarget` for Chat returns
`typed.ID.String()[:8]` instead of `typed.Title`
2. `enterprise/audit/table.go`: Chat `title` field tracking changed from
`ActionTrack` to `ActionSecret`

No frontend changes needed. The frontend already handles `secret: true`
fields.

</details>

> 🤖
2026-04-21 14:26:22 +01:00
Cian Johnston c968a1f3a3 feat: make database.Chat auditable (#24485)
Wire database.Chat into the audit system so chat lifecycle events
(creation, patches, etc.) produce audit log entries.

Part of CODAGT-200.

> 🤖
2026-04-21 11:11:56 +01:00
Jaayden Halko 410f9a5e19 feat: allow renaming of agent chat title (#24489)
Co-authored-by: Coder Agents <noreply@coder.com>
2026-04-20 14:00:46 +01:00
Thomas Kosiewski 18a30a7a10 feat: add chat debug HTTP handlers and API docs (#23918) 2026-04-20 13:34:41 +02:00
Dean Sheather ea00d2d396 fix(coderd): enforce workspace authz on watchChatGit (#24477)
`watchChatGit` proxies a live websocket to the workspace agent's git
watcher (`/api/v0/git/watch`), streaming repository diffs back through
the chat stream. Before this change it only enforced `chat:read` (via
`ExtractChatParam`) plus an implicit `workspace:read` from the dbauthz
wrapper on `GetWorkspaceAgentsInLatestBuildByWorkspaceID`. The sibling
`watchChatDesktop` handler already fetches the workspace and requires
`policy.ActionApplicationConnect` or `policy.ActionSSH` before dialing.

Built-in roles like **Template Admin** and **Org Admin** grant
`workspace:read` without SSH/ApplicationConnect, and **Owner** also
loses both under `DisableOwnerWorkspaceExec`. A chat owner whose
exec-level workspace access was revoked *after* the chat was bound could
therefore keep streaming repository content from the workspace agent
through the chat's git-watch endpoint.

Mirror `watchChatDesktop`: fetch the workspace and require
`ApplicationConnect || SSH` before any agent-tunnel activity. Adds one
real-coderdtest regression test (`TestWatchChatGitAuthz`) that demotes
the chat's owner to template-admin after binding and asserts the
git-watch endpoint returns 403; the mock-based `TestWatchChatGit` in
`coderd/workspaceagents_internal_test.go` continues to cover the
no-workspace / disconnected-agent / websocket-proxy paths.

Fixes CODAGT-184.
2026-04-20 21:33:35 +10:00
Mathias Fredriksson 467430d8fa fix: sort child chats newest-first and prepend on creation (#24524)
GetChildChatsByParentIDs sorted created_at ASC, but the cache
helper appended new children to the end. On refetch the API and
cache agreed on oldest-first, putting the just-created child at
the bottom. Users expect newest first, matching the root-chat
sidebar convention.

- SQL: change child sort to created_at DESC, id DESC.
- Cache: prepend instead of append in addChildToParentInCache
  (renamed from appendChildToParentInCache to avoid leaking
  position semantics).
- Test: update ordering assertion to expect newest-first.

Refs #24404
2026-04-20 10:43:31 +00:00