Commit Graph

773 Commits

Author SHA1 Message Date
Kyle Carberry 483adc59fe feat: replace InsertChatMessage with batch InsertChatMessages (#23220)
Replaces the singular `InsertChatMessage` query with
`InsertChatMessages` that uses PostgreSQL's `unnest()` for batch
inserts. This reduces the number of database round-trips when inserting
multiple messages in a single transaction.

## Changes

- **SQL**: New `InsertChatMessages :many` query using `unnest()` arrays
following the existing codebase pattern (e.g.,
`InsertWorkspaceAgentStats`). Preserves the CTE that updates
`chats.last_model_config_id` using the last non-null model config from
the batch. Uses `NULLIF` for UUID columns to handle NULL foreign keys.
- **Go layers**: Updated `querier.go`, `dbauthz.go`,
`dbmetrics/querymetrics.go`, `dbmock/dbmock.go`, and `queries.sql.go` to
use the new batch signature (`[]ChatMessage` return type, array params).
- **chatd.go**: All call sites converted to batch inserts:
  - **CreateChat**: System prompt + user message batched into one call
- **persistStep**: Assistant message + tool messages batched into one
call
- **persistSummary**: Hidden summary + assistant + tool messages batched
into one call
  - Single-message sites use the same API with single-element arrays
- **Helper**: New `appendChatMessage` function simplifies building batch
params at each call site.
- **Tests**: All test files updated to use the new API.

Builds on top of #23213.
2026-03-18 16:27:07 +00:00
Kyle Carberry d6fef96d72 feat: add PR insights analytics dashboard (#23215)
## What

Adds a new admin-only **PR Insights** page for the `/agents` analytics
view — a dashboard for engineering leaders to understand code shipped by
AI agents.

### Backend
- `GET /api/v2/chats/insights/pull-requests` — admin-only endpoint
- 4 SQL queries in `chatinsights.sql` aggregating `chat_diff_statuses`
joined with chat cost data (via root chat tree rollup)
- Runs 5 parallel DB queries: current summary, previous summary (for
trends), time series, per-model breakdown, recent PRs
- SDK types auto-generate to TypeScript

### Frontend (`PRInsightsView`)
- **Stat cards**: PRs created, Merged, Merge rate, Lines shipped,
Cost/merged PR — with trend badges comparing to previous period
- **Activity chart**: Stacked area chart (created/merged/closed) using
git color tokens (`git-added-bright`, `git-merged-bright`,
`git-deleted-bright`)
- **Model performance table**: Per-model PR counts, inline merge rate
bars, diff stats, cost breakdown
- **Recent PRs table**: Status badges, review state icons, author info,
external links
- **Time range filter**: 7d/14d/30d/90d button group
- **4 Storybook stories**: Default, HighPerformance, LowVolume, NoPRs

### Data source
All PR data comes from the existing `chat_diff_statuses` table
(populated by the `gitsync.Worker` background job that polls GitHub
every 120s). No new data collection required.

### Screenshot
View in Storybook: `pages/AgentsPage/PRInsightsView`
2026-03-18 15:29:29 +00:00
Kyle Carberry 4dd8531f37 feat: track step runtime_ms on chat messages (#23219)
## Summary

Adds a `runtime_ms` column to `chat_messages` that records the
wall-clock duration (in milliseconds) of each LLM step. This covers LLM
streaming, tool execution, and retries — the full time the agent is
"alive" for a step.

This is the foundation for billing by agent alive time. The column
follows the same pattern as `total_cost_micros`: stored per assistant
message, aggregatable with `SUM()` over time periods by user.

## Changes

- **Migration**: adds nullable `runtime_ms bigint` to `chat_messages`.
- **chatloop**: adds `Runtime time.Duration` field to `PersistedStep`,
measures `time.Since(stepStart)` at the beginning of each step (covering
stream + tool execution + retries).
- **chatd**: passes `step.Runtime.Milliseconds()` to the assistant
message `InsertChatMessage` call; all other message types (system, user,
tool) get `NULL`.
- **Tests**: adds `runtime > 0` assertion in chatloop tests.

## Billing query pattern

Once ready, aggregation mirrors the existing cost queries:

```sql
SELECT COALESCE(SUM(cm.runtime_ms), 0)::bigint AS total_runtime_ms
FROM chat_messages cm
JOIN chats c ON c.id = cm.chat_id
WHERE c.owner_id = @user_id
  AND cm.created_at >= @start_time
  AND cm.created_at < @end_time
  AND cm.runtime_ms IS NOT NULL;
```
2026-03-18 10:57:35 -04:00
Steven Masley 84de391f26 chore: add tallyman events for ai seat tracking (#22689)
AI seat tracking inserted as heartbeat into usage table.
2026-03-18 09:30:22 -05:00
Hugo Dutka 2cf47ec384 feat: virtual desktop settings toggle backend (#23171)
Adds a new `site_config` entry that controls whether the virtual desktop
feature for Coder Agents is enabled. It can be set via a new
`/api/experimental/chats/config/desktop-enabled` endpoint, which will be
used by the frontend.
2026-03-18 09:35:13 +01:00
George K 91ec0f1484 feat: add service_accounts workspace sharing mode (#23093)
Introduce a three-way workspace sharing setting (none, everyone,
service_accounts) replacing the boolean workspace_sharing_disabled.
In service_accounts mode, only service account-owned workspaces can be
shared while regular members' share permissions are removed. Adds a
new organization-service-account system role with per-org permissions
reconciled alongside the existing organization-member system role.

Related to:
https://linear.app/codercom/issue/PLAT-28/feat-service-accounts-sharing-mode-and-rbac-role

---------

Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com>
Co-authored-by: Kayla はな <mckayla@hey.com>
2026-03-17 12:16:43 -07:00
Kyle Carberry b779c9ee33 fix: use SQL-level auth filtering for chat listing (#23159)
## Problem

The chat listing endpoint (`GetChatsByOwnerID`) was using
`fetchWithPostFilter`, which fetches N rows from the database and then
filters them in Go memory using RBAC checks. This causes a pagination
bug: if the user requests `limit=25` but some rows fail the auth check,
fewer than 25 rows are returned even though more authorized rows exist
in the database. The client may incorrectly assume it has reached the
end of the list.

## Solution

Switch to the same pattern used by `GetWorkspaces`, `GetTemplates`, and
`GetUsers`: `prepareSQLFilter` + `GetAuthorized*` variant. The RBAC
filter is compiled to a SQL WHERE clause and injected into the query
before `ORDER BY`/`LIMIT`, so the database returns exactly the requested
number of authorized rows.

Additionally, `GetChatsByOwnerID` is renamed to `GetChats` with
`OwnerID` as an optional (nullable) filter parameter, matching the
`GetWorkspaces` naming convention.

## Changes

| File | Change |
|------|--------|
| `queries/chats.sql` | Renamed to `GetChats`, `owner_id` now optional
via CASE/NULL, added `-- @authorize_filter` |
| `queries.sql.go` | Renamed constant, params struct (`GetChatsParams`),
and method |
| `querier.go` | Interface method renamed |
| `modelqueries.go` | Added `chatQuerier` interface +
`GetAuthorizedChats` impl |
| `dbauthz/dbauthz.go` | `GetChats` now uses `prepareSQLFilter` instead
of `fetchWithPostFilter` |
| `dbauthz/dbauthz_test.go` | Updated tests for SQL filter pattern |
| `dbmock/dbmock.go` | Renamed + added mock for `GetAuthorizedChats` |
| `dbmetrics/querymetrics.go` | Renamed + added metrics wrapper |
| `rbac/regosql/configs.go` | Added `ChatConverter` (maps `org_owner` to
empty string literal since `chats` has no `organization_id` column) |
| `rbac/authz.go` | Added `ConfigChats()` |
| `chats.go` | Handler uses renamed method with `uuid.NullUUID` |
| `searchquery/search.go` | Updated return type |
| `gitsync/worker.go` | Updated interface and call site |
| Various test files | Updated for renamed types |
2026-03-17 12:46:24 -04:00
Danny Kopping 365de3e367 feat: record model thoughts (#22676)
Depends on https://github.com/coder/aibridge/pull/203
Closes https://github.com/coder/internal/issues/1337

---------

Signed-off-by: Danny Kopping <danny@coder.com>
2026-03-17 11:41:10 +00:00
Michael Suchacz 1031da9738 feat: add agent chat spend limiting (backend) (#23071)
Introduces deployment-scoped spend limiting for Coder Agents, enabling
administrators to control LLM costs at global, group, and individual
user levels.

## Changes

- **Database migration (000437)**: `chat_usage_limit_config`
(singleton), `chat_usage_limit_overrides` (per-user),
`chat_usage_limit_group_overrides` (per-group)
- **Single-query limit resolution**: individual override > min(group) >
global default via `ResolveUserChatSpendLimit`
- **Fail-open enforcement** in chatd with documented TOCTOU trade-off
- **Experimental API** under `/api/experimental/chats/usage-limits` for
CRUD on limits
- **`AsChatd` RBAC subject** for narrowly-scoped daemon access (replaces
`AsSystemRestricted`)
- **Generated TypeScript types** for the frontend SDK

## Hierarchy

1. Individual user override (highest)
2. Minimum of group limits
3. Global default
4. Disabled / unlimited

Currency stored as micro-dollars (`1,000,000` = $1.00).

Frontend PR: #23072
2026-03-17 01:24:03 +01:00
Steven Masley 93b9d70a9b chore: add audit log entry when ai seat is consumed (#22683)
When an ai seat is consumed, an audit log entry is made. This only happens the first time a seat is used.
2026-03-16 15:30:25 -05:00
Steven Masley cabb611fd9 chore: implement database crud for AI seat usage (#22681)
Creates a new table `ai_seat_state` to keep track of when users consume an ai_seat. Once a user consumes an AI seat, they will forever in this table (as it stands today).
2026-03-16 11:53:20 -05:00
Kyle Carberry 741af057dc feat: paginate chat messages endpoint with cursor-based infinite scroll (#23083)
Adds cursor-based pagination to the chat messages endpoint.

## Backend

- New `GetChatMessagesByChatIDPaginated` SQL query: returns messages in
`id DESC` order with a `before_id` keyset cursor and configurable
`limit`
- Handler parses `?before_id=N&limit=N` query params, uses the `LIMIT
N+1` trick to set `has_more` without a separate COUNT query
- Queued messages only returned on the first page (no cursor) since
they're always the most recent
- SDK client updated with `ChatMessagesPaginationOptions`
- Fully backward compatible: omitting params returns the 50 newest
messages

## Frontend

- Switches `getChatMessages` from `useQuery` to `useInfiniteQuery` with
cursor chaining via `getNextPageParam`
- Pages flattened and sorted by `id` ascending for chronological display
- `MessagesPaginationSentinel` component uses `IntersectionObserver`
(200px rootMargin prefetch) inside the existing `flex-col-reverse`
scroll container
- `flex-col-reverse` handles scroll anchoring natively when older
messages are prepended — no manual `scrollTop` adjustment needed (same
pattern as coder/blink)

## Why cursor-based instead of offset/limit

Offset-based pagination breaks when new messages arrive while paginating
backward (offsets shift, causing duplicates or missed messages). The
`before_id` cursor is stable regardless of inserts — each page is
deterministic.
2026-03-16 16:40:59 +00:00
Ethan c4db03f11a perf(coderd/database): skip redundant chat row update in InsertChatMessage (#23111)
## Summary

- add an `IS DISTINCT FROM` guard to `InsertChatMessage`'s
`updated_chat` CTE so `chats.last_model_config_id` is only rewritten
when the incoming `model_config_id` actually changes
- regenerate the query layer
- add focused regression coverage for the two meaningful behaviors:
same-model inserts and real model switches
- trim redundant message-field assertions so the new test stays focused
on the guard behavior

## Proof this is an improvement

This PR reduces work in the hottest chat write query without changing
the insert behavior.

### Why the old query did unnecessary work

Before this change, `InsertChatMessage` always ran this update whenever
`model_config_id` was non-null:

```sql
UPDATE chats
SET last_model_config_id = sqlc.narg('model_config_id')::uuid
WHERE id = @chat_id::uuid
  AND sqlc.narg('model_config_id')::uuid IS NOT NULL
```

That means the query rewrote the `chats` row even when
`chats.last_model_config_id` was already equal to the incoming value.

### What changes in this PR

This PR adds:

```sql
AND chats.last_model_config_id IS DISTINCT FROM sqlc.narg('model_config_id')::uuid
```

So same-model inserts still insert the message, but they no longer
perform a redundant `UPDATE chats`.

### Why this matters on the hot path

From the chat scaletest investigation that motivated this change:

- `InsertChatMessage` (+ `updated_chat` CTE) was the hottest write query
- about **104k calls**
- about **0.69 ms average latency**
- about **71.8 s total DB execution time**

We also verified common callsites where the update is provably
redundant:

- `CreateChat` inserts the chat with `LastModelConfigID =
opts.ModelConfigID`, then immediately inserts initial system/user
messages with that same model config
- follow-up user messages commonly pass `lockedChat.LastModelConfigID`
straight into `InsertChatMessage`
- assistant/tool/summary persistence keeps the current model in the
common case; only real switches or fallback cases need the chat row
update

That means a meaningful fraction of executions of the hottest DB write
query move from:

- **before:** insert message **+** rewrite chat row
- **after:** insert message only

This should reduce row churn and write contention on `chats`, especially
against other chat-row writers like `UpdateChatStatus` and
`GetChatByIDForUpdate`.
2026-03-17 00:44:10 +11:00
Kyle Carberry 0d3e39a24e feat: add head_branch to pull request diff status (#23076)
Adds the `head_branch` field (the source/feature branch name of a PR) to
the diff status pipeline. Previously only `base_branch` (target branch)
and the head commit SHA were captured from the GitHub API, but not the
head branch name itself.

## Changes

- **Migration 438**: Add `head_branch` nullable TEXT column to
`chat_diff_statuses`
- **gitprovider**: Parse `head.ref` from the GitHub API response
(alongside `head.sha`) and add `HeadBranch` to `PRStatus`
- **gitsync**: Wire `HeadBranch` through `refreshOne()` into the DB
upsert params
- **worker**: Map `HeadBranch` in `chatDiffStatusFromRow()`
- **coderd**: Convert `HeadBranch` in `convertChatDiffStatus()`
- **codersdk**: Expose as `head_branch` (`*string`, omitempty) in
`ChatDiffStatus` API response
- **Tests**: Updated `github_test.go` pull JSON fixtures and assertions
2026-03-14 17:24:19 +00:00
Michael Suchacz 969066b55e feat(site): improve cost analytics view (#23069)
Surfaces cache token data in the analytics views and fixes table
spacing.

### Changes

- **Cache token columns**: Added cache read and cache write token counts
to all analytics views (user and admin), from SQL queries through Go SDK
types to the frontend tables and summary cards.
- **Table spacing fix**: Replaced the bare React fragment in
`ChatCostSummaryView` with a `space-y-6` container so the model and chat
breakdown tables no longer overlap.

### Data flow

`chat_messages` table already stores `cache_read_tokens` and
`cache_creation_tokens` (and uses them for cost calculation). This PR
aggregates and displays them alongside input/output tokens in:

- Summary cards (6 cards: Total Cost, Input, Output, Cache Read, Cache
Write, Messages)
- Per-model breakdown table
- Per-chat breakdown table
- Admin per-user table
2026-03-14 01:22:00 -05:00
Kyle Carberry c5b8611c5a feat(gitsync): enrich PR status with author, base branch, review info (#23038)
## Summary

Adds 7 new fields to the PR status stored by gitsync, all sourced from
the existing GitHub API calls (**zero additional HTTP requests**):

| Field | Source | Purpose |
|---|---|---|
| `author_login` | `pull.user.login` | PR author username |
| `author_avatar_url` | `pull.user.avatar_url` | PR author avatar for UI
|
| `base_branch` | `pull.base.ref` | Target branch (e.g. `main`) |
| `pr_number` | `pull.number` | Explicit PR number |
| `commits` | `pull.commits` | Number of commits in PR |
| `approved` | Derived from reviews | True when ≥1 approved, no
outstanding changes requested |
| `reviewer_count` | Derived from reviews | Distinct reviewers with a
decisive state |

## Changes

- **`gitprovider/gitprovider.go`**: Added 7 fields to `PRStatus` struct.
- **`gitprovider/github.go`**: Expanded the anonymous struct in
`FetchPullRequestStatus` to decode new JSON fields. Replaced
`hasOutstandingChangesRequested()` with `summarizeReviews()` returning a
`reviewStats` struct with `changesRequested`, `approved`, and
`reviewerCount`.
- **Migration 000434**: Adds 7 columns to `chat_diff_statuses`.
- **`queries/chats.sql`**: Updated `UpsertChatDiffStatus`
INSERT/VALUES/ON CONFLICT.
- **`gitsync/gitsync.go`**: Maps new `PRStatus` fields into upsert
params.
- **`gitsync/worker.go`**: Maps new columns in row-to-model converter.
- **`codersdk/chats.go`**: Added fields to SDK `ChatDiffStatus` type.
- **`coderd/chats.go`**: Maps new DB fields in
`convertChatDiffStatus()`.
- Auto-generated: `models.go`, `queries.sql.go`, `dump.sql`,
`typesGenerated.ts`.
2026-03-13 18:54:07 -04:00
Hugo Dutka 84527390c6 feat: chat desktop backend (#23005)
Implement the backend for the desktop feature for agents.

- Adds a new `/api/experimental/chats/$id/desktop` endpoint to coderd
which exposes a VNC stream from a
[portabledesktop](https://github.com/coder/portabledesktop) process
running inside the workspace
- Adds a new `spawn_computer_use_agent` tool to chatd, which spawns a
subagent that has access to the `computer` tool which lets it interact
with the `portabledesktop` process running inside the workspace
- Adds the plumbing to make the above possible

There's a follow up frontend PR here:
https://github.com/coder/coder/pull/23006
2026-03-13 19:49:34 +01:00
Michael Suchacz c3b6284955 feat: add chat cost analytics backend (#23036)
Add cost tracking for LLM chat interactions with microdollar precision.

## Changes
- Add `chatcost` package for per-message cost calculation using
`shopspring/decimal` for intermediate arithmetic
- **Ceil rounding policy**: fractional micros round UP to next whole
micro (applied once after summing all components)
- Database migration: `total_cost_micros` BIGINT column with historical
backfill and `created_at` index
- API endpoints: per-user cost summary and admin rollup under
`/api/experimental/chats/cost/`
- SDK types: `ChatCostSummary`, `ChatCostModelBreakdown`,
`ChatCostUserRollup`
- Fix `modeloptionsgen` to handle `decimal.Decimal` as opaque numeric
type
- Update frontend pricing test fixtures for string decimal types

## Design decisions
- `NULL` = unpriced (no matching model config), `0` = free
- Reasoning tokens included in output tokens (no double-counting)
- Integer microdollars (BIGINT) for storage and API responses
- Price config uses `decimal.Decimal` for exact parsing; totals use
`int64`

Frontend: #23037
2026-03-13 18:30:49 +01:00
Mathias Fredriksson 4a79af1a0d refactor: add chat_message_role enum and content_version column (#23042)
Migration 000434 converts chat_messages.role from text to a Postgres
enum, rebuilds the partial index, and adds content_version smallint.
The column is backfilled with DEFAULT 0, then the default is dropped
so future inserts must set it explicitly.

Version 0 uses the role-aware heuristic from #22958. Version 1 (all
new inserts) stores []ChatMessagePart JSON for all roles, including
system messages. ParseContent takes database.ChatMessage directly
and dispatches on version internally. Unknown versions error.

All string(codersdk.ChatMessageRole*) casts at DB write sites are
replaced with database.ChatMessageRole* constants from sqlc.

Refs #22958
2026-03-13 16:47:36 +00:00
Cian Johnston e9025f91e8 chore(db): remove 23 unused database methods (#22999)
Removes 22 database query methods with no callers outside generated code
and the dbauthz wrapper layer (~1,600 lines).

**Security keys (6)** — superseded by `cryptokeys` package:
`GetAppSecurityKey`, `UpsertAppSecurityKey`, `GetOAuthSigningKey`,
`UpsertOAuthSigningKey`, `GetCoordinatorResumeTokenSigningKey`,
`UpsertCoordinatorResumeTokenSigningKey`

**Superseded queries (4):**
- `GetProvisionerJobsByIDs` → `GetProvisionerJobsByIDsWithQueuePosition`
- `GetDeploymentDAUs` / `GetTemplateDAUs` →
`GetTemplateInsightsByInterval`
- `GetWorkspaceBuildParametersByBuildIDs` + its `GetAuthorized...`
variant → unused

**OAuth2 (2):**
`GetOAuth2ProviderAppByRegistrationToken`,
`UpdateOAuth2ProviderAppSecretByID`

**Chat (4)** — pre-wired with no callers:
`GetChatModelConfigByProviderAndModel`, `DeleteChatMessagesByChatID`,
`ListChatsByRootID`, `ListChildChatsByParentID`

**Other (6):**
`DeleteGitSSHKey`, `UpdateUserLinkedID`, `GetFileIDByTemplateVersionID`,
`GetTemplateVersionHasAITask`, `InsertUserGroupsByName`,
`RemoveUserFromAllGroups`
2026-03-12 21:32:57 +00:00
Kyle Carberry 1f37df4db3 perf(chatd): fix six scale bottlenecks identified by benchmarking (#22957)
## Summary

Scale-tested the `chatd` package with mock-based benchmarks to identify
performance bottlenecks. This PR fixes 6 of the 8 identified issues,
ranked by severity.

## Changes

### 1. Parallel tool execution (HIGH) — `chatloop.go`
`executeTools` ran tool calls sequentially. Now dispatches all calls
concurrently via goroutines with `sync.WaitGroup`. Results are
pre-allocated by index (no mutex needed). `onResult` callbacks fire as
each tool completes.

### 2. Pubsub-backed subagent await (HIGH) — `subagent.go`
`awaitSubagentCompletion` polled the DB every 200ms. Now subscribes to
the child chat's `ChatStreamNotifyChannel` via pubsub for near-instant
notifications. Fallback poll reduced to 5s. Falls back to 200ms only
when `pubsub == nil` (single-instance / in-memory).

### 3. Per-chat stream locking (MEDIUM) — `chatd.go`
Replaced single global `streamMu` + `map[uuid.UUID]*chatStreamState`
with `sync.Map` where each `chatStreamState` has its own `sync.Mutex`.
Zero cross-chat contention.

### 4. Batch chat acquisition (MEDIUM) — `chatd.go`
`processOnce` acquired 1 chat per tick. Now loops up to
`maxChatsPerAcquire = 10` per tick, avoiding idle time when many chats
are pending.

### 5. Reduced heartbeat frequency (LOW-MEDIUM) — `chatd.go`
`chatHeartbeatInterval` changed from 30s to 60s. Safe given the 5-minute
`DefaultInFlightChatStaleAfter`.

### 6. O(depth) descendant check (LOW) — `subagent.go`
Replaced top-down BFS (`O(total_descendants)` queries) with bottom-up
parent-chain walk (`O(depth)` queries). Includes cycle protection.

## Not addressed (intentionally)
- Message serialization overhead
- Buffer eviction (`buffer[1:]` pattern)
2026-03-11 14:00:08 -04:00
George K e5c19d0af4 feat: backend support for creating and storing service accounts (#22698)
Add is_service_account column to users table with CHECK constraints
enforcing login_type='none' and empty email for service accounts.
Update user creation API to validate service account constraints.

Related to:
https://linear.app/codercom/issue/PLAT-27/feat-backend-support-for-creating-and-storing-service-accounts
2026-03-11 10:19:08 -07:00
Kyle Carberry 7a83d825cf feat(agents): add PR title, draft, and status icons to sidebar (#22952)
Adds `pull_request_title` and `pull_request_draft` to the chat diff
status pipeline (DB → provider → SDK → frontend). The GitHub provider
now fetches the PR title alongside existing status fields.

The agents sidebar now displays PR-state-aware icons for chats that have
a linked pull request (when the chat is in waiting/completed state):
- **Open PR**: `GitPullRequestArrow` (green)
- **Draft PR**: `GitPullRequestDraft` (gray)
- **Merged PR**: `GitMerge` (purple)
- **Closed PR**: `GitPullRequestClosed` (red)

Running/pending/paused/error chats keep their existing activity icons
(spinner, pause, error triangle).

### Changes

**Database migration** (`000432`): Adds `pull_request_title TEXT` and
`pull_request_draft BOOLEAN` columns to `chat_diff_statuses`.

**Backend pipeline**:
- `gitprovider.PRStatus` gains a `Title` field
- GitHub provider decodes the `title` from the API response
- `gitsync` and `coderd/chats.go` pass title + draft through to the DB
upsert
- `codersdk.ChatDiffStatus` exposes both new fields in the API response

**Frontend** (`AgentsSidebar.tsx`): New `getPRIconConfig()` function
resolves the appropriate Lucide git icon based on `pull_request_state`
and `pull_request_draft`. Only applies when the chat is in a terminal
state (waiting/completed).

**Real-time sync**: No changes needed — the existing
`diff_status_change` pubsub event already propagates the full
`ChatDiffStatus` including the new fields.
2026-03-11 11:50:45 -04:00
Kyle Carberry bb59477648 feat(db): add created_by column to chat_messages table (#22940)
Adds a `created_by` column (nullable UUID) to the `chat_messages` table
to track which user created each message. Only user-sent messages
populate this field; assistant, tool, system, and summary messages leave
it null.

The column is threaded through the full stack: SQL migration, query
updates, generated Go/TypeScript types, db2sdk conversion, chatd
(including subagent paths), and API handlers. All API handlers that
insert user messages now pass the authenticated user's ID as
`created_by`.

No foreign key constraint was added, matching the existing pattern used
by `chat_model_configs.created_by`.
2026-03-11 10:00:38 -04:00
Cian Johnston bc27274aba feat(coderd): refactors github pr sync functionality (#22715)
- Adds `_API_BASE_URL` to `CODER_EXTERNAL_AUTH_CONFIG_`
- Extracts and refactors existing GitHub PR sync logic to new packages
`coderd/gitsync` and `coderd/externalauth/gitprovider`
- Associated wiring and tests

Created using Opus 4.6
2026-03-10 18:46:01 +00:00
Kyle Carberry 53e52aef78 fix(externalauth): prevent race condition in token refresh with optimistic locking (#22904)
## Problem

When multiple concurrent callers (e.g., parallel workspace builds) read
the same single-use OAuth2 refresh token from the database and race to
exchange it with the provider, the first caller succeeds but subsequent
callers get `bad_refresh_token`. The losing caller then **clears the
valid new token** from the database, permanently breaking the auth link
until the user manually re-authenticates.

This is reliably reproducible when launching multiple workspaces
simultaneously with GitHub App external auth and user-to-server token
expiration enabled.

## Solution

Two layers of protection:

### 1. Singleflight deduplication (`Config.RefreshToken` +
`ObtainOIDCAccessToken`)

Concurrent callers for the same user/provider share a single refresh
call via `golang.org/x/sync/singleflight`, keyed by `userID`. The
singleflight callback re-reads the link from the database to pick up any
token already refreshed by a prior in-flight call, avoiding redundant
IDP round-trips entirely.

### 2. Optimistic locking on `UpdateExternalAuthLinkRefreshToken`

The SQL `WHERE` clause now includes `AND oauth_refresh_token =
@old_oauth_refresh_token`, so if two replicas (HA) race past
singleflight, the loser's destructive UPDATE is a harmless no-op rather
than overwriting the winner's valid token.

## Changes

| File | Change |
|------|--------|
| `coderd/externalauth/externalauth.go` | Added `singleflight.Group` to
`Config`; split `RefreshToken` into public wrapper +
`refreshTokenInner`; pass `OldOauthRefreshToken` to DB update |
| `coderd/provisionerdserver/provisionerdserver.go` | Wrapped OIDC
refresh in `ObtainOIDCAccessToken` with package-level singleflight |
| `coderd/database/queries/externalauth.sql` | Added optimistic lock
(`WHERE ... AND oauth_refresh_token = @old_oauth_refresh_token`) |
| `coderd/database/queries.sql.go` | Regenerated |
| `coderd/database/querier.go` | Regenerated |
| `coderd/database/dbauthz/dbauthz_test.go` | Updated test params for
new field |
| `coderd/externalauth/externalauth_test.go` | Added
`ConcurrentRefreshDedup` test; updated existing tests for singleflight
DB re-read |

## Testing

- **New test `ConcurrentRefreshDedup`**: 5 goroutines call
`RefreshToken` concurrently, asserts IDP refresh called exactly once,
all callers get same token.
- All existing `TestRefreshToken/*` subtests updated and passing.
- `TestObtainOIDCAccessToken` passing.
- `dbauthz` tests passing.
2026-03-10 13:52:55 -04:00
Jon Ayers 22a87f6cf6 fix: filter sub-agents from build duration metric (#22732) 2026-03-10 12:17:32 -05:00
Kyle Carberry b6d1a11c58 feat(chatd): add user-level custom prompt for agent chats (#22896)
Adds a user-level custom prompt to the database.

I'll be doing a follow-up for the UI, as we currently do not have
user-level settings (it's just admin). I'll also make it very obvious
for chats where there is a user-level prompt, but I don't know how yet.
2026-03-10 11:17:52 -04:00
Danielle Maywood 6489d6f714 feat(chatd): use last assistant message as push notification summary (#22671)
Instead of the static 'Agent has finished running.' text, extract a
summary from the last assistant message to give users meaningful context
about what the agent accomplished. Falls back to the static text if no
suitable message is found.

Co-authored-by: Kyle Carberry <kyle@carberry.com>
2026-03-10 15:14:15 +00:00
Kyle Carberry e18ce505ec feat(coderd): add pagination to chat list endpoint (#22887)
Adds offset and cursor-based pagination to the `GET
/api/experimental/chats` endpoint, following the exact same patterns
used by `GetUsers` and `GetTemplateVersionsByTemplateID`.

## Changes

### Database
- Add `after_id`, `offset_opt`, `limit_opt` params to
`GetChatsByOwnerID` SQL query
- Use composite `(updated_at, id) DESC` cursor for stable, deterministic
pagination
- Add migration with composite index on `chats (owner_id, updated_at
DESC, id DESC)`

### Backend
- Use `ParsePagination()` in `listChats` handler (matches `users.go`
pattern)
- Add `Pagination` field to `ListChatsOptions` SDK struct

### Frontend
- Add `infiniteChats()` query factory using `useInfiniteQuery` with
offset-based page params (same pattern as `infiniteWorkspaceBuilds`)
- Update `AgentsPage` to use `useInfiniteQuery`
- Add "Show more" button at the bottom of the agents sidebar (matches
`HistorySidebar` pattern)
- Keep existing `chats()` query for non-paginated uses (e.g., parent
chat lookup in `AgentDetail`)

### Tests
- Add `TestListChats/Pagination` covering `limit`, `after_id` cursor,
`offset`, and no-limit behavior
2026-03-10 13:55:33 +00:00
Cian Johnston c933ddcffd fix(agents): persist system prompt server-side instead of localStorage (#22857)
## Problem

The Admin → Agents → System Prompt textarea saved only to the browser's
`localStorage`. The value was never sent to the backend, never stored in
the database, and never injected into chats. Entering text, clicking
Save, and refreshing the page showed no changes — the prompt was
effectively a no-op.

## Root Cause

Three disconnected layers:
1. **Frontend** wrote to `localStorage`, never called an API.
2. **`handleCreateChat`** never read `savedSystemPrompt`.
3. **Backend** hardcoded `chatd.DefaultSystemPrompt` on every chat
creation — no field in `CreateChatRequest` accepted a custom prompt.

## Changes

### Database
- Added `GetChatSystemPrompt` / `UpsertChatSystemPrompt` queries on the
existing `site_configs` table (no migration needed).

### API
- `GET /api/experimental/chats/system-prompt` — returns the configured
prompt (any authenticated user).
- `PUT /api/experimental/chats/system-prompt` — sets the prompt
(admin-only, `rbac: deployment_config update`).
- Input validation: max 32 KiB prompt length.

### Backend
- `resolvedChatSystemPrompt(ctx)` checks for a custom prompt in the DB,
falls back to `chatd.DefaultSystemPrompt` when empty/unset.
- Logs a warning on DB errors instead of silently swallowing them.
- Replaced the hardcoded `defaultChatSystemPrompt()` call in chat
creation.

### Frontend
- Replaced `localStorage` read/write with React Query
`useQuery`/`useMutation` backed by the new endpoints.
- Fixed `useEffect` draft sync to avoid clobbering in-progress user
edits on refetch.
- Added `try/catch` error handling on save (draft stays dirty for
retry).
- Save button disabled during mutation (`isSavingSystemPrompt`).
- Query key follows kebab-case convention (`chat-system-prompt`).

### UX
- Added hint: "When empty, the built-in default prompt is used."

### Tests
- `TestChatSystemPrompt`: GET returns empty when unset, admin can set,
non-admin gets 403.
- dbauthz `TestMethodTestSuite` coverage for both new querier methods.
2026-03-10 11:46:53 +00:00
Jon Ayers e7ea649dc2 fix: optimize GetProvisionerJobsByIDsWithQueuePosition query (#22724) 2026-03-09 16:47:02 -05:00
Kyle Carberry aba3832b15 fix: update the compaction message to be the "user" role (#22819)
## Bug

After compaction in the chat loop, the loop re-enters and calls the LLM
with a prompt that has **no non-system messages**. Anthropic (and most
providers) require at least one user/assistant/tool message, so the API
errors with empty messages.

## Root Cause

The compaction summary was stored as `role=system`. After compaction,
`GetChatMessagesForPromptByChatID` returns only:
- The compressed system summary (matched by the CTE)
- Original non-compressed system messages (system prompts)

All original user/assistant/tool messages are excluded (they predate the
summary). The compaction assistant/tool messages are `compressed=TRUE`
and don't match the main query's `compressed=FALSE` clauses.

So `ReloadMessages` returned only system messages. The Anthropic
provider moves system messages into a separate `system` field, leaving
the `messages` API field as `[]`.

## Fix

1. **Changed compaction summary from `role=system` to `role=user`** —
the summary now appears as a user message in the reloaded prompt, giving
the model valid conversational context to respond to.

2. **Simplified the CTE** — removed the `role = 'system'` check and
narrowed `visibility IN ('model', 'both')` to just `visibility =
'model'`. The summary is the only compressed message with
`visibility=model` (the assistant has `visibility=user`, the tool has
`visibility=both`), so the role check was redundant.

## Test

`PostRunCompactionReEntryIncludesUserSummary`: verifies the re-entry
prompt contains a user message (the compaction summary) after compaction
+ reload.
2026-03-08 22:25:27 -04:00
Mathias Fredriksson a104d608a3 feat: add file/image attachment support to chat input (#22604)
This change adds support for image attachments to chat via add button
and clipboard paste. Files are stored in a new `chat_files` table and
referenced by ID in message content. File data is resolved from storage
at LLM dispatch time, keeping the message content column small.

Upload validates MIME types via content type or content sniffing against
an allowlist (png, jpeg, gif, webp). The retrieval endpoint serves files
with immutable caching headers. On the frontend, uploads start eagerly
on attach with a background fetch to pre-warm the browser HTTP cache so
the timeline renders instantly after send.
2026-03-06 21:05:26 +02:00
Danny Kopping 13e3df67d6 feat: track client sessions (#22470)
This change adds support for tracking client session IDs in AI Bridge interceptions to enable better session-based auditing.

Depends on https://github.com/coder/aibridge/pull/198  
Fixes https://github.com/coder/internal/issues/1337

The session ID field is optional and not universally supported by all clients.
2026-03-06 14:43:53 +02:00
Kayla はな 56bdea73b8 feat: add workspace acls to task rbac objects (#22311)
To allow tasks to be shareable, we need to share both the `task`
resource and the `workspace` resource, and their sharing state needs to
be kept in sync. We've already implemented all of the necessary ACL
functionality for workspaces, so we can just sort of proxy those ACLs
back to the task as well.
2026-03-05 13:40:53 -07:00
Sas Swart cfcb81fb0f fix: user status change chart accommodates DST (#22191)
closes https://github.com/coder/internal/issues/464

# Summary

This PR resolves a flaky test that was sensitive to DST transitions in
various time zones. The root of the flake was:
* a bug; the query and its tests assume 24 hours per day
* the tests used local system time, which resulted in failures for dates
proximal to DST transitions

# Changes

Query:

The original query assumed 24 hour intervals between each day, which is
not a valid assumption. It now increments `1 day` at a time.

Database tests:

Database level tests for the query all assumed 24 hour days. They now
increment in DST-aware days instead. Instead of using time.Now() as a
base for testing, the test uses a series of dates over the course of an
entire year, to ensure that DST transition dates are present in every
test run.

# API Endpoint

The endpoint that delivers the user status chart now accepts an IANA
timezone name as a parameter and passes it, keeping the existing offset
as a fallback, to the database query.

API level tests were added to ensure the correct response form and error
behaviour. Correctness of content is tested at the database level.
2026-03-04 12:54:39 +02:00
Danielle Maywood d2d956edb1 fix: add archived query parameter to chat list endpoint (#22562)
Despite the SDK type having an `Archived` field for chats, this data was
never fetched from the database — the `GetChatsByOwnerID` query
hardcoded `AND archived = false`, and the `convertChat` function never
mapped the field.

This PR adds an optional `archived` query parameter to `GET
/api/experimental/chats`:

| Value | Behavior |
|-------|----------|
| *(not provided)* | Returns all chats (active and archived) |
| `archived=false` | Returns only non-archived chats |
| `archived=true` | Returns only archived chats |

This follows the same pattern used by template versions
(`sqlc.narg('archived')` nullable boolean).

Also fixes `convertChat` to populate the `Archived` field in API
responses, which was never being set despite existing on the SDK type.
2026-03-03 20:39:19 +00:00
Danny Kopping 1b08bc76a6 feat: store tool call IDs to determine interception lineage (#22246)
Adds database columns and server-side logic to track interception lineage via tool call IDs. When an interception ends, the server resolves the correlating tool call ID to find the parent interception and links them via `parent_id`.

New `provider_tool_call_id` column on `aibridge_tool_usages` and `parent_id` column on `aibridge_interceptions`, with indexes for lookup. `findParentInterceptionID` queries by tool call ID and filters out the current interception to find the parent.

Adapted from the [coder/coder `dk/prompt_provenance_poc`](https://github.com/coder/coder/compare/main...dk/prompt_provenance_poc) branch.
Depends on [coder/aibridge#188](https://github.com/coder/aibridge/pull/188).  
  
Closes https://github.com/coder/internal/issues/1334
2026-03-03 21:04:41 +02:00
Kyle Carberry 5eebd3829f fix: use cursor-based query for chat stream notifications (#22510)
## Problem

The pubsub notification handler in `chatd` re-fetched **all** messages
from the DB on every new message notification, then filtered in Go with
`msg.ID > lastMessageID`. This grows linearly with conversation length —
every new message triggers a full table scan of that chat's history.

The `AfterMessageID` field in the pubsub notification payload was
clearly designed for cursor-based fetching, but no matching query
existed.

## Fix

- Add `GetChatMessagesByChatIDAfter` SQL query with `WHERE id >
@after_id`, so the database does the filtering instead of Go.
- Use it in the pubsub notification handler in `chatd.go`, passing
`lastMessageID` as the cursor.
- Implement the dbauthz wrapper (was a `panic("not implemented")` stub
from codegen) with the same read-check-on-parent-chat pattern as
adjacent methods.
- Add dbauthz test coverage for the new method.

**Not changed:** The initial snapshot in `Subscribe()` still loads all
messages — that's correct, since a newly-connecting client needs the
full conversation state. The waste was only in the ongoing notification
path.
2026-03-02 16:31:04 -05:00
Kyle Carberry 0908505348 fix(chats): archive chat tree with single query instead of loop (#22496)
## Problem

When archiving an agent with subagents, the children briefly flash in
the sidebar as root-level items before disappearing. Two issues:

1. **Backend:** Archive used N+1 queries — a recursive DFS
(`archiveChatTree`, no transaction) or BFS loop (`chatd.ArchiveChat`,
N+1 queries in a tx) to walk the tree and archive each chat
individually.
2. **Frontend:** The SSE `deleted` event handler only filtered out the
parent chat from the cache. Children remained briefly, got promoted to
root-level by `buildChatTree`, then disappeared on the next re-fetch.

## Fix

**Backend:** Replace both tree-walk implementations with a single SQL
query:
```sql
UPDATE chats SET archived = true, updated_at = NOW()
WHERE id = @id OR root_chat_id = @id;
```
This leverages the existing `root_chat_id` column (already indexed) to
archive the entire tree atomically.

**Frontend:** When a `deleted` event arrives, also filter out any chats
whose `root_chat_id` matches the deleted chat, so children vanish from
the sidebar immediately with the parent.

## Changes

- `coderd/database/queries/chats.sql` — Added `ArchiveChatTreeByID`
query
- `coderd/chats.go` — Use single query, delete `archiveChatTree`
function
- `coderd/chatd/chatd.go` — Simplify `ArchiveChat` to use single query
- `coderd/database/dbauthz/dbauthz.go` — Auth wrapper for new query
- `coderd/chats_test.go` — Added `TestArchiveChat/ArchivesChildren`
subtest
- `site/src/pages/AgentsPage/AgentsPage.tsx` — Filter children in SSE
handler
- Generated files updated via `make gen`
2026-03-02 12:00:00 -05:00
Kyle Carberry 34d9392e37 chore(db): remove workspace_agent_id from chats table (#22442)
## Summary

Remove the `workspace_agent_id` column from the `chats` table and
dynamically look up the first workspace agent instead.

## Problem

When a workspace is stopped and restarted, the workspace agent gets a
new ID. The `workspace_agent_id` stored on the chat at creation time
becomes stale, making the agent unreachable. This caused chats to break
after workspace restarts.

## Solution

Instead of persisting the agent ID, dynamically look up the first agent
from the workspace's latest build via
`GetWorkspaceAgentsInLatestBuildByWorkspaceID` whenever an agent
connection is needed. The `workspace_id` on the chat remains stable
across restarts.

This behavior may be refined later (e.g., agent selection heuristics),
but picking the first agent resolves the immediate breakage.

## Changes

- **Migration 000425**: Drop `workspace_agent_id` column from `chats`
- **SQL queries**: Remove `workspace_agent_id` from `InsertChat` and
`UpdateChatWorkspace`
- **chatd.go**: `getWorkspaceConn` and `resolveInstructions` now look up
agents dynamically from workspace ID
- **chatd.go**: Remove `refreshChatWorkspaceSnapshot` (no longer needed)
- **createworkspace.go**: Stop persisting agent ID when associating
workspace with chat
- **subagent.go**: Stop passing agent ID to child chats
- **SDK/frontend**: Remove `WorkspaceAgentID` / `workspace_agent_id`
from Chat type

---------

Co-authored-by: Kyle Carberry <kylecarbs@gmail.com>
2026-02-28 16:46:51 -05:00
Kyle Carberry 0ad2f9ecd7 feat(chatd): persist last_error on chats table (#22436)
Adds a nullable `last_error` column to the `chats` table so error
reasons survive page reloads.

**Backend:**
- Migration adds `last_error TEXT` (nullable) to chats
- `UpdateChatStatus` writes the error reason when status transitions to
`error`, clears it (NULL) on recovery
- `convertChat` maps `sql.NullString` to `*string` in the SDK

**Frontend:**
- Sidebar falls back to `chat.last_error` when no stream error reason is
cached
- Chat detail page does the same for `persistedErrorReason`
- Fixtures updated for new required field
2026-02-28 12:27:26 -05:00
Kyle Carberry 12083441e0 feat(chats): archive chats instead of hard-deleting them (#22406)
## Summary

The UI has always labeled the action as "Archive agent" but the backend
was performing a hard `DELETE`, permanently destroying chats and all
their messages.

This change replaces the hard delete with a soft archive, consistent
with the pattern used by template versions.

## Changes

### Database
- **Migration 000423**: Add `archived boolean DEFAULT false NOT NULL`
column to `chats` table
- Replace `DeleteChatByID` query with `ArchiveChatByID` (`UPDATE SET
archived = true`)
- Add `UnarchiveChatByID` query (`UPDATE SET archived = false`)
- Filter archived chats from `GetChatsByOwnerID` (`WHERE archived =
false`)

### API
- Remove `DELETE /api/experimental/chats/{chat}`
- Add `POST /api/experimental/chats/{chat}/archive` — archives a chat
and all its descendants
- Add `POST /api/experimental/chats/{chat}/unarchive` — unarchives a
single chat (API only, no UI yet)

### Backend
- `archiveChatTree()` recursively archives child chats (replaces
`deleteChatTree()` which hard-deleted)
- Chat daemon's `ArchiveChat()` archives the full chat tree in a
transaction
- Authorization uses `ActionUpdate` instead of `ActionDelete`

### SDK
- Replace `DeleteChat()` with `ArchiveChat()` and `UnarchiveChat()`
- Add `Archived` field to `Chat` struct

### Frontend
- `archiveChat` API call uses `POST .../archive` instead of `DELETE`
- No UI changes — the "Archive agent" button now actually archives
instead of deleting

## Design Decision

This follows the **template version archive pattern** (Pattern B in the
codebase):
- `archived boolean` column (not `deleted boolean`)
- Dedicated `POST .../archive` and `POST .../unarchive` routes (not
repurposing `DELETE`)
- Reversible — users can unarchive via the API (UI for this will come
later)
2026-02-27 16:46:19 -05:00
Kyle Carberry edee917d88 feat: add experimental agents support (#22290)
feat: add AI chat system with agent tools and chat UI

Introduce the chatd subsystem and Agents UI for AI-powered chat
within Coder workspaces.

- Add chatd package with chat loop, message compaction, prompt
  management, and LLM provider integration (OpenAI, Anthropic)
- Add agent tools: create workspace, list/read templates, read/write/
  edit files, execute commands
- Add chat API endpoints with streaming, message editing, and
  durable reconnection
- Add database schema and migrations for chats, chat messages, chat
  providers, and chat model configs
- Add RBAC policies and dbauthz enforcement for chat resources
- Add Agents UI pages with conversation timeline, queued messages
  list, diff viewer, and model configuration panel
- Add comprehensive test coverage including coderd integration tests,
  chatd unit tests, and Storybook stories
- Gate feature behind experiments flag

---------

Co-authored-by: Cian Johnston <cian@coder.com>
Co-authored-by: Danielle Maywood <danielle@themaywoods.com>
Co-authored-by: Jeremy Ruppel <jeremy@coder.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 16:50:56 +00:00
Jake Howell d2787df442 feat: add AI Bridge request logs model filter (#22230)
This pull-request implements a simple filtering logic so that we're able
to pick which model the user actually used when logs were sent to AI
Bridge.

- Add `GET /aibridge/models` API endpoint that returns distinct model
names from AI Bridge interceptions, with pagination and search support
- New `ListAIBridgeModels` SQL query using case-sensitive prefix
matching (`LIKE model || '%'`) to allow B-tree index usage
- Hand-written `ListAuthorizedAIBridgeModels` in `modelqueries.go` for
RBAC authorization filter injection
- `AIBridgeModels` search query parser in searchquery/search.go
(defaults bare terms to the `model` field)
- dbauthz wrappers, dbmetrics, and dbmock implementations for the new
query

<img width="292" height="185" alt="image"
src="https://github.com/user-attachments/assets/134771df-2d26-4c54-acc4-27f58128b351"
/>
2026-02-26 02:40:45 +11:00
Cian Johnston 6336fee3a7 feat: add telemetry for task lifecycle events (#21922)
Relates to https://github.com/coder/internal/issues/1259

Adds new database queries and telemetry collection functions to gather
task lifecycle events (pause/resume cycles, idle time) for analytics.
    
Task events track pause/resume activity, idle duration before pausing,
paused duration, and time from resume to first app status, filtered to
recent activity based on the telemetry snapshot interval.

🤖 Created with Mux (Opus 4.6).
2026-02-24 17:04:42 +00:00
Kacper Sawicki 1e274063d4 feat(coderd): filter expired API tokens server-side (#22263)
## Summary

Moves expired token filtering from client-side to server-side by adding
an `include_expired` parameter to the `GetAPIKeysByLoginType` and
`GetAPIKeysByUserID` database queries. This is more efficient for large
deployments with many expired/short-lived tokens.

## Changes

- Add `include_expired` parameter to SQL queries using `OR`
short-circuit
- Add `include_expired` query parameter to `GET
/users/{user}/keys/tokens`
- Add `IncludeExpired` field to `codersdk.TokensFilter`
- Remove client-side filtering from CLI `tokens list` command
- Add `TestTokensFilterExpired` test

Fixes coder/internal#1357
2026-02-24 15:27:03 +00:00
Jon Ayers 0a7a3da178 fix: exclude provisioner_state from workspace_build_with_user view (#22159)
The provisioner state for a workspace build was being loaded for every
long-lived agent rpc connection. Since this state can be anywhere from
kilobytes to megabytes this can gradually cause the `coderd` memory
footprint to grow over time. It's also a lot of unnecessary allocations
for every query that fetches a workspace build since only a few callers
ever actually reference the provisioner state.

This PR removes it from the returned workspace build and adds a query to
fetch the provisioner state explicitly.
2026-02-23 22:46:17 -06:00
Thomas Kosiewski b776a14b46 fix(coderd): harden OAuth2 provider security (#22194)
## Summary

Harden the OAuth2 provider with multiple security fixes addressing
`coder/security#121` (CSRF session takeover) and converge on OAuth 2.1
compliance.

### Security Fixes

| Fix | Description | Commits |
|-----|-------------|---------|
| **CSRF on `/oauth2/authorize`** | Enforce CSRF protection on the
authorize endpoint POST (consent form submission) | `ba7d646`, `b94a64e`
|
| **Clickjacking: `frame-ancestors` CSP** | Prevent consent page from
being iframed (`Content-Security-Policy: frame-ancestors 'none'` +
`X-Frame-Options: DENY`) | `597aeb2` |
| **Exact redirect URI matching** | Changed from prefix matching to full
string exact matching per OAuth 2.1 §4.1.2.1 | `73d64b1`, `93897f1` |
| **Store & verify `redirect_uri`** | Store redirect_uri with auth code
in DB, verify at token exchange matches exactly (RFC 6749 §4.1.3) |
`50569b9`, `d7ca315` |
| **Mandatory PKCE** | Require `code_challenge` at authorization (for
`response_type=code`) + unconditional `code_verifier` verification at
token exchange | `d7ca315`, `1cda1a9` |
| **Reject implicit grant** | `response_type=token` now returns
`unsupported_response_type` error page (OAuth 2.1 removes implicit flow)
| `d7ca315`, `91b8863` |

### Changes by File

**`coderd/httpmw/csrf.go`** — Extended the CSRF `ExemptFunc` to enforce
CSRF on `/oauth2/authorize` in addition to `/api` routes. The consent
form POST is now CSRF-protected to prevent cross-site authorization code
theft.

**`site/site.go`** — Added `Content-Security-Policy: frame-ancestors
'none'` and `X-Frame-Options: DENY` headers to `RenderOAuthAllowPage`
(consent page only — does not affect the SPA/global CSP used by AI
tasks).

**`coderd/httpapi/queryparams.go`** — Changed `RedirectURL` from prefix
matching (`strings.HasPrefix(v.Path, base.Path)`) to full URI exact
matching (`v.String() != base.String()`), comparing scheme, host, path,
and query.

**`coderd/oauth2provider/authorize.go`** — Added PKCE enforcement:
`code_challenge` is required when `response_type=code` (via a
conditional check, not `RequiredNotEmpty`, so `response_type=token` can
reach the explicit rejection path). `ShowAuthorizePage` (GET) validates
`response_type` before rendering and returns a 400 error page for
unsupported types. `ProcessAuthorize` (POST) stores the `redirect_uri`
with the auth code when explicitly provided.

**`coderd/oauth2provider/tokens.go`** — PKCE verification is now
unconditional (not gated on `code_challenge` being present in DB). If
the stored code has a `redirect_uri`, the token endpoint verifies it
matches exactly — mismatch returns `errBadCode` → `invalid_grant`.
Missing `code_verifier` returns `invalid_grant`.

**`codersdk/oauth2.go`** — `OAuth2ProviderResponseTypeToken` constant
and `Valid()` acceptance are **kept** so the authorize handler can parse
`response_type=token` and return the proper `unsupported_response_type`
error rather than failing at parameter validation.

**`coderd/database/migrations/000421_*`** — Added `redirect_uri text`
column to `oauth2_provider_app_codes`.

### Design Decisions

**`state` parameter remains optional** — The plan initially required
`state` via `RequiredNotEmpty`, but this was reverted in `376a753` to
avoid breaking existing clients. The `state` is still hashed and stored
when provided (via `state_hash` column), securing clients that opt in.

**`response_type=token` kept in `Valid()`** — Removing it from `Valid()`
would cause the parameter parser to reject the request before the
authorize handler can return the proper `unsupported_response_type`
error. The constant is kept for correct error handling flow.

**CSP scoped to consent page only** — `frame-ancestors 'none'` is set
only on the OAuth consent page renderer, not globally. The SPA/global
CSP was previously changed to allow framing for AI tasks
([#18102](https://github.com/coder/coder/pull/18102)); this change does
not regress that.

### Out of Scope (follow-up PRs)

- Bearer tokens in query strings (needs internal caller audit)
- Scope enforcement on OAuth2 tokens
- Rate limiting on dynamic client registration


---

<details>
<summary>📋 Implementation Plan</summary>

# Plan: Harden OAuth2 Provider — Security Fixes + OAuth 2.1 Compliance

## Context & Why

Security issue `coder/security#121` reports a critical session takeover
via CSRF on the OAuth2 provider. This plan covers all remaining security
fixes from that issue **plus** convergence on OAuth 2.1 requirements.
The goal is a single PR that closes all actionable gaps.

## Current State (already committed on branch `csrf-sjx1`)

| Fix | Status | Commits |
|-----|--------|---------|
| Fix 1: CSRF on `/oauth2/authorize` |  Done | `ba7d646`, `b94a64e` |
| CSRF token in consent form HTML |  Done | `b94a64e` |
| `state_hash` column + storage |  Done (hash stored, but state still
optional) | `9167d83`, `b94a64e` |
| Tests for CSRF + state hash |  Done | `e4119b5` |

## Remaining Work

### ~~Fix 2 — Require `state` parameter~~ (DROPPED)

> **Decision:** Do not enforce `state` as required. The `state`
parameter is still hashed and stored when provided (via
`hashOAuth2State` / `state_hash` column from prior commits), but clients
are not forced to supply it. This avoids breaking existing integrations
that omit state.

**Rollback:** Remove `"state"` from the `RequiredNotEmpty` call in
`coderd/oauth2provider/authorize.go:42`:

```go
// BEFORE (current on branch)
p.RequiredNotEmpty("response_type", "client_id", "state", "code_challenge")

// AFTER
p.RequiredNotEmpty("response_type", "client_id", "code_challenge")
```

No test changes needed — tests already pass `state` voluntarily.

### Fix 4 — Exact redirect URI matching

Currently `coderd/httpapi/queryparams.go:233` uses prefix matching:

```go
// CURRENT — prefix match
if v.Host != base.Host || !strings.HasPrefix(v.Path, base.Path) {
```

OAuth 2.1 requires **exact string matching**. Change to:

```go
// AFTER — exact match (OAuth 2.1 §4.1.2.1)
if v.Host != base.Host || v.Path != base.Path {
```

**File: `coderd/httpapi/queryparams.go` — `RedirectURL` method**

Also update the error message from "must be a subset of" to "must
exactly match".

**Additionally**, store `redirect_uri` with the auth code and verify at
the token endpoint (RFC 6749 §4.1.3):

1. **New migration** (same migration file or a new `000421`): Add
`redirect_uri text` column to `oauth2_provider_app_codes`
2. **Update INSERT query** in `coderd/database/queries/oauth2.sql` to
include `redirect_uri`
3. **`coderd/oauth2provider/authorize.go`**: Store
`params.redirectURL.String()` when inserting the code
4. **`coderd/oauth2provider/tokens.go`**: After retrieving the code from
DB, verify that `redirect_uri` from the token request matches the stored
value exactly. Currently `tokens.go:103` calls `p.RedirectURL(vals,
callbackURL, "redirect_uri")` for prefix validation only — it must
compare against the stored redirect_uri from the code, not just the
app's callback URL.

<details>
<summary>Why both exact match AND store+verify?</summary>

Exact matching at the authorize endpoint prevents open redirectors
(attacker can't use a sub-path).
Storing and verifying at the token endpoint prevents code injection — an
attacker who steals a code can't exchange it with a different
redirect_uri than was originally authorized. This is required by RFC
6749 §4.1.3 and OAuth 2.1.
</details>

### Fix 7 — `frame-ancestors` CSP on consent page

The consent page can be iframed by a workspace app (same-site), which is
the attack vector. Add a `Content-Security-Policy` header to prevent
framing.

**File: `site/site.go` — `RenderOAuthAllowPage` function (~line 731)**

Before writing the response, add:

```go
func RenderOAuthAllowPage(rw http.ResponseWriter, r *http.Request, data RenderOAuthAllowData) {
    rw.Header().Set("Content-Type", "text/html; charset=utf-8")
    // Prevent the consent page from being framed to mitigate
    // clickjacking attacks (coder/security#121).
    rw.Header().Set("Content-Security-Policy", "frame-ancestors 'none'")
    rw.Header().Set("X-Frame-Options", "DENY")
    ...
```

Both headers for defense-in-depth (CSP for modern browsers,
X-Frame-Options for legacy).

### OAuth 2.1 — Mandatory PKCE

Currently PKCE is checked only when `code_challenge` was provided during
authorization (`tokens.go:258`):

```go
// CURRENT — conditional check
if dbCode.CodeChallenge.Valid && dbCode.CodeChallenge.String != "" {
    // verify PKCE
}
```

OAuth 2.1 requires PKCE for ALL authorization code flows. Change to:

**File: `coderd/oauth2provider/authorize.go`** — Add `"code_challenge"`
to required params:

```go
p.RequiredNotEmpty("response_type", "client_id", "code_challenge")
```

**File: `coderd/oauth2provider/tokens.go:257-265`** — Make PKCE
verification unconditional:

```go
// AFTER — PKCE always required (OAuth 2.1)
if req.CodeVerifier == "" {
    return codersdk.OAuth2TokenResponse{}, errInvalidPKCE
}
if !dbCode.CodeChallenge.Valid || dbCode.CodeChallenge.String == "" {
    // Code was issued without a challenge — should not happen
    // with the authorize endpoint enforcement, but defend in
    // depth.
    return codersdk.OAuth2TokenResponse{}, errInvalidPKCE
}
if !VerifyPKCE(dbCode.CodeChallenge.String, req.CodeVerifier) {
    return codersdk.OAuth2TokenResponse{}, errInvalidPKCE
}
```

**File: `codersdk/oauth2.go`** — Remove
`OAuth2ProviderResponseTypeToken` from the enum or reject it explicitly
in the authorize handler. Currently it's defined at line 216 but the
handler ignores `response_type` and always issues a code. We should
either:
- (a) Remove the `"token"` variant from the enum and reject it with
`unsupported_response_type`, OR
- (b) Add an explicit check in `ProcessAuthorize` that rejects
`response_type=token`

Option (b) is simpler and more backwards-compatible:

```go
// In ProcessAuthorize, after extracting params:
if params.responseType != codersdk.OAuth2ProviderResponseTypeCode {
    httpapi.WriteOAuth2Error(ctx, rw, http.StatusBadRequest,
        codersdk.OAuth2ErrorCodeUnsupportedResponseType,
        "Only response_type=code is supported")
    return
}
```

### OAuth 2.1 — Bearer tokens in query strings

`coderd/httpmw/apikey.go:743` accepts `access_token` from URL query
parameters. OAuth 2.1 prohibits this. However, this may be used
internally (e.g., workspace apps, DERP). Need to audit callers before
removing.

**Approach:** This is a larger change with potential breakage. Mark as a
**separate follow-up issue** rather than including in this PR. Document
the finding.

### OAuth 2.1 — Removed flows

 **Already compliant.** `tokens.go` only supports `authorization_code`
and `refresh_token` grant types. The implicit grant
(`response_type=token`) will be explicitly rejected per the PKCE section
above.

### OAuth 2.1 — Refresh token rotation

 **Already compliant.** `tokens.go:442` deletes the old API key when a
refresh token is used.

## Migration Plan

All DB changes can go in a single new migration (or extend 000420 if the
branch is rebased before merge). Columns to add:
- `redirect_uri text` on `oauth2_provider_app_codes`

The `state_hash` column is already added by migration 000420.

## Implementation Order

1. **Fix 7** — CSP headers on consent page (isolated, no deps)
2. ~~**Fix 2** — Require `state` parameter~~ (DROPPED — state stays
optional)
3. **Fix 4** — Exact redirect URI matching + store/verify redirect_uri
4. **PKCE mandatory** — Require `code_challenge` + reject
`response_type=token`
5. **Rollback** — Remove `"state"` from `RequiredNotEmpty` in
`authorize.go`
6. **Tests** — Update/add tests for all changes
7. **`make gen`** after DB changes

## Out of Scope (separate PRs)

- Bearer tokens in query strings (needs internal caller audit)
- Scope enforcement on OAuth2 tokens
- Rate limiting / quota on dynamic client registration

</details>

---
_Generated with [`mux`](https://github.com/coder/mux) • Model:
`anthropic:claude-opus-4-6` • Thinking: `xhigh`_
2026-02-23 12:18:44 +01:00