- Adds `_API_BASE_URL` to `CODER_EXTERNAL_AUTH_CONFIG_`
- Extracts and refactors existing GitHub PR sync logic to new packages
`coderd/gitsync` and `coderd/externalauth/gitprovider`
- Associated wiring and tests
Created using Opus 4.6
## Problem
When multiple concurrent callers (e.g., parallel workspace builds) read
the same single-use OAuth2 refresh token from the database and race to
exchange it with the provider, the first caller succeeds but subsequent
callers get `bad_refresh_token`. The losing caller then **clears the
valid new token** from the database, permanently breaking the auth link
until the user manually re-authenticates.
This is reliably reproducible when launching multiple workspaces
simultaneously with GitHub App external auth and user-to-server token
expiration enabled.
## Solution
Two layers of protection:
### 1. Singleflight deduplication (`Config.RefreshToken` +
`ObtainOIDCAccessToken`)
Concurrent callers for the same user/provider share a single refresh
call via `golang.org/x/sync/singleflight`, keyed by `userID`. The
singleflight callback re-reads the link from the database to pick up any
token already refreshed by a prior in-flight call, avoiding redundant
IDP round-trips entirely.
### 2. Optimistic locking on `UpdateExternalAuthLinkRefreshToken`
The SQL `WHERE` clause now includes `AND oauth_refresh_token =
@old_oauth_refresh_token`, so if two replicas (HA) race past
singleflight, the loser's destructive UPDATE is a harmless no-op rather
than overwriting the winner's valid token.
## Changes
| File | Change |
|------|--------|
| `coderd/externalauth/externalauth.go` | Added `singleflight.Group` to
`Config`; split `RefreshToken` into public wrapper +
`refreshTokenInner`; pass `OldOauthRefreshToken` to DB update |
| `coderd/provisionerdserver/provisionerdserver.go` | Wrapped OIDC
refresh in `ObtainOIDCAccessToken` with package-level singleflight |
| `coderd/database/queries/externalauth.sql` | Added optimistic lock
(`WHERE ... AND oauth_refresh_token = @old_oauth_refresh_token`) |
| `coderd/database/queries.sql.go` | Regenerated |
| `coderd/database/querier.go` | Regenerated |
| `coderd/database/dbauthz/dbauthz_test.go` | Updated test params for
new field |
| `coderd/externalauth/externalauth_test.go` | Added
`ConcurrentRefreshDedup` test; updated existing tests for singleflight
DB re-read |
## Testing
- **New test `ConcurrentRefreshDedup`**: 5 goroutines call
`RefreshToken` concurrently, asserts IDP refresh called exactly once,
all callers get same token.
- All existing `TestRefreshToken/*` subtests updated and passing.
- `TestObtainOIDCAccessToken` passing.
- `dbauthz` tests passing.
Adds a user-level custom prompt to the database.
I'll be doing a follow-up for the UI, as we currently do not have
user-level settings (it's just admin). I'll also make it very obvious
for chats where there is a user-level prompt, but I don't know how yet.
Instead of the static 'Agent has finished running.' text, extract a
summary from the last assistant message to give users meaningful context
about what the agent accomplished. Falls back to the static text if no
suitable message is found.
Co-authored-by: Kyle Carberry <kyle@carberry.com>
Adds offset and cursor-based pagination to the `GET
/api/experimental/chats` endpoint, following the exact same patterns
used by `GetUsers` and `GetTemplateVersionsByTemplateID`.
## Changes
### Database
- Add `after_id`, `offset_opt`, `limit_opt` params to
`GetChatsByOwnerID` SQL query
- Use composite `(updated_at, id) DESC` cursor for stable, deterministic
pagination
- Add migration with composite index on `chats (owner_id, updated_at
DESC, id DESC)`
### Backend
- Use `ParsePagination()` in `listChats` handler (matches `users.go`
pattern)
- Add `Pagination` field to `ListChatsOptions` SDK struct
### Frontend
- Add `infiniteChats()` query factory using `useInfiniteQuery` with
offset-based page params (same pattern as `infiniteWorkspaceBuilds`)
- Update `AgentsPage` to use `useInfiniteQuery`
- Add "Show more" button at the bottom of the agents sidebar (matches
`HistorySidebar` pattern)
- Keep existing `chats()` query for non-paginated uses (e.g., parent
chat lookup in `AgentDetail`)
### Tests
- Add `TestListChats/Pagination` covering `limit`, `after_id` cursor,
`offset`, and no-limit behavior
## Problem
The Admin → Agents → System Prompt textarea saved only to the browser's
`localStorage`. The value was never sent to the backend, never stored in
the database, and never injected into chats. Entering text, clicking
Save, and refreshing the page showed no changes — the prompt was
effectively a no-op.
## Root Cause
Three disconnected layers:
1. **Frontend** wrote to `localStorage`, never called an API.
2. **`handleCreateChat`** never read `savedSystemPrompt`.
3. **Backend** hardcoded `chatd.DefaultSystemPrompt` on every chat
creation — no field in `CreateChatRequest` accepted a custom prompt.
## Changes
### Database
- Added `GetChatSystemPrompt` / `UpsertChatSystemPrompt` queries on the
existing `site_configs` table (no migration needed).
### API
- `GET /api/experimental/chats/system-prompt` — returns the configured
prompt (any authenticated user).
- `PUT /api/experimental/chats/system-prompt` — sets the prompt
(admin-only, `rbac: deployment_config update`).
- Input validation: max 32 KiB prompt length.
### Backend
- `resolvedChatSystemPrompt(ctx)` checks for a custom prompt in the DB,
falls back to `chatd.DefaultSystemPrompt` when empty/unset.
- Logs a warning on DB errors instead of silently swallowing them.
- Replaced the hardcoded `defaultChatSystemPrompt()` call in chat
creation.
### Frontend
- Replaced `localStorage` read/write with React Query
`useQuery`/`useMutation` backed by the new endpoints.
- Fixed `useEffect` draft sync to avoid clobbering in-progress user
edits on refetch.
- Added `try/catch` error handling on save (draft stays dirty for
retry).
- Save button disabled during mutation (`isSavingSystemPrompt`).
- Query key follows kebab-case convention (`chat-system-prompt`).
### UX
- Added hint: "When empty, the built-in default prompt is used."
### Tests
- `TestChatSystemPrompt`: GET returns empty when unset, admin can set,
non-admin gets 403.
- dbauthz `TestMethodTestSuite` coverage for both new querier methods.
## Bug
After compaction in the chat loop, the loop re-enters and calls the LLM
with a prompt that has **no non-system messages**. Anthropic (and most
providers) require at least one user/assistant/tool message, so the API
errors with empty messages.
## Root Cause
The compaction summary was stored as `role=system`. After compaction,
`GetChatMessagesForPromptByChatID` returns only:
- The compressed system summary (matched by the CTE)
- Original non-compressed system messages (system prompts)
All original user/assistant/tool messages are excluded (they predate the
summary). The compaction assistant/tool messages are `compressed=TRUE`
and don't match the main query's `compressed=FALSE` clauses.
So `ReloadMessages` returned only system messages. The Anthropic
provider moves system messages into a separate `system` field, leaving
the `messages` API field as `[]`.
## Fix
1. **Changed compaction summary from `role=system` to `role=user`** —
the summary now appears as a user message in the reloaded prompt, giving
the model valid conversational context to respond to.
2. **Simplified the CTE** — removed the `role = 'system'` check and
narrowed `visibility IN ('model', 'both')` to just `visibility =
'model'`. The summary is the only compressed message with
`visibility=model` (the assistant has `visibility=user`, the tool has
`visibility=both`), so the role check was redundant.
## Test
`PostRunCompactionReEntryIncludesUserSummary`: verifies the re-entry
prompt contains a user message (the compaction summary) after compaction
+ reload.
This change adds support for image attachments to chat via add button
and clipboard paste. Files are stored in a new `chat_files` table and
referenced by ID in message content. File data is resolved from storage
at LLM dispatch time, keeping the message content column small.
Upload validates MIME types via content type or content sniffing against
an allowlist (png, jpeg, gif, webp). The retrieval endpoint serves files
with immutable caching headers. On the frontend, uploads start eagerly
on attach with a background fetch to pre-warm the browser HTTP cache so
the timeline renders instantly after send.
To allow tasks to be shareable, we need to share both the `task`
resource and the `workspace` resource, and their sharing state needs to
be kept in sync. We've already implemented all of the necessary ACL
functionality for workspaces, so we can just sort of proxy those ACLs
back to the task as well.
closes https://github.com/coder/internal/issues/464
# Summary
This PR resolves a flaky test that was sensitive to DST transitions in
various time zones. The root of the flake was:
* a bug; the query and its tests assume 24 hours per day
* the tests used local system time, which resulted in failures for dates
proximal to DST transitions
# Changes
Query:
The original query assumed 24 hour intervals between each day, which is
not a valid assumption. It now increments `1 day` at a time.
Database tests:
Database level tests for the query all assumed 24 hour days. They now
increment in DST-aware days instead. Instead of using time.Now() as a
base for testing, the test uses a series of dates over the course of an
entire year, to ensure that DST transition dates are present in every
test run.
# API Endpoint
The endpoint that delivers the user status chart now accepts an IANA
timezone name as a parameter and passes it, keeping the existing offset
as a fallback, to the database query.
API level tests were added to ensure the correct response form and error
behaviour. Correctness of content is tested at the database level.
Despite the SDK type having an `Archived` field for chats, this data was
never fetched from the database — the `GetChatsByOwnerID` query
hardcoded `AND archived = false`, and the `convertChat` function never
mapped the field.
This PR adds an optional `archived` query parameter to `GET
/api/experimental/chats`:
| Value | Behavior |
|-------|----------|
| *(not provided)* | Returns all chats (active and archived) |
| `archived=false` | Returns only non-archived chats |
| `archived=true` | Returns only archived chats |
This follows the same pattern used by template versions
(`sqlc.narg('archived')` nullable boolean).
Also fixes `convertChat` to populate the `Archived` field in API
responses, which was never being set despite existing on the SDK type.
Adds database columns and server-side logic to track interception lineage via tool call IDs. When an interception ends, the server resolves the correlating tool call ID to find the parent interception and links them via `parent_id`.
New `provider_tool_call_id` column on `aibridge_tool_usages` and `parent_id` column on `aibridge_interceptions`, with indexes for lookup. `findParentInterceptionID` queries by tool call ID and filters out the current interception to find the parent.
Adapted from the [coder/coder `dk/prompt_provenance_poc`](https://github.com/coder/coder/compare/main...dk/prompt_provenance_poc) branch.
Depends on [coder/aibridge#188](https://github.com/coder/aibridge/pull/188).
Closes https://github.com/coder/internal/issues/1334
## Problem
The pubsub notification handler in `chatd` re-fetched **all** messages
from the DB on every new message notification, then filtered in Go with
`msg.ID > lastMessageID`. This grows linearly with conversation length —
every new message triggers a full table scan of that chat's history.
The `AfterMessageID` field in the pubsub notification payload was
clearly designed for cursor-based fetching, but no matching query
existed.
## Fix
- Add `GetChatMessagesByChatIDAfter` SQL query with `WHERE id >
@after_id`, so the database does the filtering instead of Go.
- Use it in the pubsub notification handler in `chatd.go`, passing
`lastMessageID` as the cursor.
- Implement the dbauthz wrapper (was a `panic("not implemented")` stub
from codegen) with the same read-check-on-parent-chat pattern as
adjacent methods.
- Add dbauthz test coverage for the new method.
**Not changed:** The initial snapshot in `Subscribe()` still loads all
messages — that's correct, since a newly-connecting client needs the
full conversation state. The waste was only in the ongoing notification
path.
## Problem
When archiving an agent with subagents, the children briefly flash in
the sidebar as root-level items before disappearing. Two issues:
1. **Backend:** Archive used N+1 queries — a recursive DFS
(`archiveChatTree`, no transaction) or BFS loop (`chatd.ArchiveChat`,
N+1 queries in a tx) to walk the tree and archive each chat
individually.
2. **Frontend:** The SSE `deleted` event handler only filtered out the
parent chat from the cache. Children remained briefly, got promoted to
root-level by `buildChatTree`, then disappeared on the next re-fetch.
## Fix
**Backend:** Replace both tree-walk implementations with a single SQL
query:
```sql
UPDATE chats SET archived = true, updated_at = NOW()
WHERE id = @id OR root_chat_id = @id;
```
This leverages the existing `root_chat_id` column (already indexed) to
archive the entire tree atomically.
**Frontend:** When a `deleted` event arrives, also filter out any chats
whose `root_chat_id` matches the deleted chat, so children vanish from
the sidebar immediately with the parent.
## Changes
- `coderd/database/queries/chats.sql` — Added `ArchiveChatTreeByID`
query
- `coderd/chats.go` — Use single query, delete `archiveChatTree`
function
- `coderd/chatd/chatd.go` — Simplify `ArchiveChat` to use single query
- `coderd/database/dbauthz/dbauthz.go` — Auth wrapper for new query
- `coderd/chats_test.go` — Added `TestArchiveChat/ArchivesChildren`
subtest
- `site/src/pages/AgentsPage/AgentsPage.tsx` — Filter children in SSE
handler
- Generated files updated via `make gen`
## Summary
Remove the `workspace_agent_id` column from the `chats` table and
dynamically look up the first workspace agent instead.
## Problem
When a workspace is stopped and restarted, the workspace agent gets a
new ID. The `workspace_agent_id` stored on the chat at creation time
becomes stale, making the agent unreachable. This caused chats to break
after workspace restarts.
## Solution
Instead of persisting the agent ID, dynamically look up the first agent
from the workspace's latest build via
`GetWorkspaceAgentsInLatestBuildByWorkspaceID` whenever an agent
connection is needed. The `workspace_id` on the chat remains stable
across restarts.
This behavior may be refined later (e.g., agent selection heuristics),
but picking the first agent resolves the immediate breakage.
## Changes
- **Migration 000425**: Drop `workspace_agent_id` column from `chats`
- **SQL queries**: Remove `workspace_agent_id` from `InsertChat` and
`UpdateChatWorkspace`
- **chatd.go**: `getWorkspaceConn` and `resolveInstructions` now look up
agents dynamically from workspace ID
- **chatd.go**: Remove `refreshChatWorkspaceSnapshot` (no longer needed)
- **createworkspace.go**: Stop persisting agent ID when associating
workspace with chat
- **subagent.go**: Stop passing agent ID to child chats
- **SDK/frontend**: Remove `WorkspaceAgentID` / `workspace_agent_id`
from Chat type
---------
Co-authored-by: Kyle Carberry <kylecarbs@gmail.com>
Adds a nullable `last_error` column to the `chats` table so error
reasons survive page reloads.
**Backend:**
- Migration adds `last_error TEXT` (nullable) to chats
- `UpdateChatStatus` writes the error reason when status transitions to
`error`, clears it (NULL) on recovery
- `convertChat` maps `sql.NullString` to `*string` in the SDK
**Frontend:**
- Sidebar falls back to `chat.last_error` when no stream error reason is
cached
- Chat detail page does the same for `persistedErrorReason`
- Fixtures updated for new required field
## Summary
The UI has always labeled the action as "Archive agent" but the backend
was performing a hard `DELETE`, permanently destroying chats and all
their messages.
This change replaces the hard delete with a soft archive, consistent
with the pattern used by template versions.
## Changes
### Database
- **Migration 000423**: Add `archived boolean DEFAULT false NOT NULL`
column to `chats` table
- Replace `DeleteChatByID` query with `ArchiveChatByID` (`UPDATE SET
archived = true`)
- Add `UnarchiveChatByID` query (`UPDATE SET archived = false`)
- Filter archived chats from `GetChatsByOwnerID` (`WHERE archived =
false`)
### API
- Remove `DELETE /api/experimental/chats/{chat}`
- Add `POST /api/experimental/chats/{chat}/archive` — archives a chat
and all its descendants
- Add `POST /api/experimental/chats/{chat}/unarchive` — unarchives a
single chat (API only, no UI yet)
### Backend
- `archiveChatTree()` recursively archives child chats (replaces
`deleteChatTree()` which hard-deleted)
- Chat daemon's `ArchiveChat()` archives the full chat tree in a
transaction
- Authorization uses `ActionUpdate` instead of `ActionDelete`
### SDK
- Replace `DeleteChat()` with `ArchiveChat()` and `UnarchiveChat()`
- Add `Archived` field to `Chat` struct
### Frontend
- `archiveChat` API call uses `POST .../archive` instead of `DELETE`
- No UI changes — the "Archive agent" button now actually archives
instead of deleting
## Design Decision
This follows the **template version archive pattern** (Pattern B in the
codebase):
- `archived boolean` column (not `deleted boolean`)
- Dedicated `POST .../archive` and `POST .../unarchive` routes (not
repurposing `DELETE`)
- Reversible — users can unarchive via the API (UI for this will come
later)
This pull-request implements a simple filtering logic so that we're able
to pick which model the user actually used when logs were sent to AI
Bridge.
- Add `GET /aibridge/models` API endpoint that returns distinct model
names from AI Bridge interceptions, with pagination and search support
- New `ListAIBridgeModels` SQL query using case-sensitive prefix
matching (`LIKE model || '%'`) to allow B-tree index usage
- Hand-written `ListAuthorizedAIBridgeModels` in `modelqueries.go` for
RBAC authorization filter injection
- `AIBridgeModels` search query parser in searchquery/search.go
(defaults bare terms to the `model` field)
- dbauthz wrappers, dbmetrics, and dbmock implementations for the new
query
<img width="292" height="185" alt="image"
src="https://github.com/user-attachments/assets/134771df-2d26-4c54-acc4-27f58128b351"
/>
Relates to https://github.com/coder/internal/issues/1259
Adds new database queries and telemetry collection functions to gather
task lifecycle events (pause/resume cycles, idle time) for analytics.
Task events track pause/resume activity, idle duration before pausing,
paused duration, and time from resume to first app status, filtered to
recent activity based on the telemetry snapshot interval.
🤖 Created with Mux (Opus 4.6).
## Summary
Moves expired token filtering from client-side to server-side by adding
an `include_expired` parameter to the `GetAPIKeysByLoginType` and
`GetAPIKeysByUserID` database queries. This is more efficient for large
deployments with many expired/short-lived tokens.
## Changes
- Add `include_expired` parameter to SQL queries using `OR`
short-circuit
- Add `include_expired` query parameter to `GET
/users/{user}/keys/tokens`
- Add `IncludeExpired` field to `codersdk.TokensFilter`
- Remove client-side filtering from CLI `tokens list` command
- Add `TestTokensFilterExpired` test
Fixescoder/internal#1357
The provisioner state for a workspace build was being loaded for every
long-lived agent rpc connection. Since this state can be anywhere from
kilobytes to megabytes this can gradually cause the `coderd` memory
footprint to grow over time. It's also a lot of unnecessary allocations
for every query that fetches a workspace build since only a few callers
ever actually reference the provisioner state.
This PR removes it from the returned workspace build and adds a query to
fetch the provisioner state explicitly.
## Summary
Harden the OAuth2 provider with multiple security fixes addressing
`coder/security#121` (CSRF session takeover) and converge on OAuth 2.1
compliance.
### Security Fixes
| Fix | Description | Commits |
|-----|-------------|---------|
| **CSRF on `/oauth2/authorize`** | Enforce CSRF protection on the
authorize endpoint POST (consent form submission) | `ba7d646`, `b94a64e`
|
| **Clickjacking: `frame-ancestors` CSP** | Prevent consent page from
being iframed (`Content-Security-Policy: frame-ancestors 'none'` +
`X-Frame-Options: DENY`) | `597aeb2` |
| **Exact redirect URI matching** | Changed from prefix matching to full
string exact matching per OAuth 2.1 §4.1.2.1 | `73d64b1`, `93897f1` |
| **Store & verify `redirect_uri`** | Store redirect_uri with auth code
in DB, verify at token exchange matches exactly (RFC 6749 §4.1.3) |
`50569b9`, `d7ca315` |
| **Mandatory PKCE** | Require `code_challenge` at authorization (for
`response_type=code`) + unconditional `code_verifier` verification at
token exchange | `d7ca315`, `1cda1a9` |
| **Reject implicit grant** | `response_type=token` now returns
`unsupported_response_type` error page (OAuth 2.1 removes implicit flow)
| `d7ca315`, `91b8863` |
### Changes by File
**`coderd/httpmw/csrf.go`** — Extended the CSRF `ExemptFunc` to enforce
CSRF on `/oauth2/authorize` in addition to `/api` routes. The consent
form POST is now CSRF-protected to prevent cross-site authorization code
theft.
**`site/site.go`** — Added `Content-Security-Policy: frame-ancestors
'none'` and `X-Frame-Options: DENY` headers to `RenderOAuthAllowPage`
(consent page only — does not affect the SPA/global CSP used by AI
tasks).
**`coderd/httpapi/queryparams.go`** — Changed `RedirectURL` from prefix
matching (`strings.HasPrefix(v.Path, base.Path)`) to full URI exact
matching (`v.String() != base.String()`), comparing scheme, host, path,
and query.
**`coderd/oauth2provider/authorize.go`** — Added PKCE enforcement:
`code_challenge` is required when `response_type=code` (via a
conditional check, not `RequiredNotEmpty`, so `response_type=token` can
reach the explicit rejection path). `ShowAuthorizePage` (GET) validates
`response_type` before rendering and returns a 400 error page for
unsupported types. `ProcessAuthorize` (POST) stores the `redirect_uri`
with the auth code when explicitly provided.
**`coderd/oauth2provider/tokens.go`** — PKCE verification is now
unconditional (not gated on `code_challenge` being present in DB). If
the stored code has a `redirect_uri`, the token endpoint verifies it
matches exactly — mismatch returns `errBadCode` → `invalid_grant`.
Missing `code_verifier` returns `invalid_grant`.
**`codersdk/oauth2.go`** — `OAuth2ProviderResponseTypeToken` constant
and `Valid()` acceptance are **kept** so the authorize handler can parse
`response_type=token` and return the proper `unsupported_response_type`
error rather than failing at parameter validation.
**`coderd/database/migrations/000421_*`** — Added `redirect_uri text`
column to `oauth2_provider_app_codes`.
### Design Decisions
**`state` parameter remains optional** — The plan initially required
`state` via `RequiredNotEmpty`, but this was reverted in `376a753` to
avoid breaking existing clients. The `state` is still hashed and stored
when provided (via `state_hash` column), securing clients that opt in.
**`response_type=token` kept in `Valid()`** — Removing it from `Valid()`
would cause the parameter parser to reject the request before the
authorize handler can return the proper `unsupported_response_type`
error. The constant is kept for correct error handling flow.
**CSP scoped to consent page only** — `frame-ancestors 'none'` is set
only on the OAuth consent page renderer, not globally. The SPA/global
CSP was previously changed to allow framing for AI tasks
([#18102](https://github.com/coder/coder/pull/18102)); this change does
not regress that.
### Out of Scope (follow-up PRs)
- Bearer tokens in query strings (needs internal caller audit)
- Scope enforcement on OAuth2 tokens
- Rate limiting on dynamic client registration
---
<details>
<summary>📋 Implementation Plan</summary>
# Plan: Harden OAuth2 Provider — Security Fixes + OAuth 2.1 Compliance
## Context & Why
Security issue `coder/security#121` reports a critical session takeover
via CSRF on the OAuth2 provider. This plan covers all remaining security
fixes from that issue **plus** convergence on OAuth 2.1 requirements.
The goal is a single PR that closes all actionable gaps.
## Current State (already committed on branch `csrf-sjx1`)
| Fix | Status | Commits |
|-----|--------|---------|
| Fix 1: CSRF on `/oauth2/authorize` | ✅ Done | `ba7d646`, `b94a64e` |
| CSRF token in consent form HTML | ✅ Done | `b94a64e` |
| `state_hash` column + storage | ✅ Done (hash stored, but state still
optional) | `9167d83`, `b94a64e` |
| Tests for CSRF + state hash | ✅ Done | `e4119b5` |
## Remaining Work
### ~~Fix 2 — Require `state` parameter~~ (DROPPED)
> **Decision:** Do not enforce `state` as required. The `state`
parameter is still hashed and stored when provided (via
`hashOAuth2State` / `state_hash` column from prior commits), but clients
are not forced to supply it. This avoids breaking existing integrations
that omit state.
**Rollback:** Remove `"state"` from the `RequiredNotEmpty` call in
`coderd/oauth2provider/authorize.go:42`:
```go
// BEFORE (current on branch)
p.RequiredNotEmpty("response_type", "client_id", "state", "code_challenge")
// AFTER
p.RequiredNotEmpty("response_type", "client_id", "code_challenge")
```
No test changes needed — tests already pass `state` voluntarily.
### Fix 4 — Exact redirect URI matching
Currently `coderd/httpapi/queryparams.go:233` uses prefix matching:
```go
// CURRENT — prefix match
if v.Host != base.Host || !strings.HasPrefix(v.Path, base.Path) {
```
OAuth 2.1 requires **exact string matching**. Change to:
```go
// AFTER — exact match (OAuth 2.1 §4.1.2.1)
if v.Host != base.Host || v.Path != base.Path {
```
**File: `coderd/httpapi/queryparams.go` — `RedirectURL` method**
Also update the error message from "must be a subset of" to "must
exactly match".
**Additionally**, store `redirect_uri` with the auth code and verify at
the token endpoint (RFC 6749 §4.1.3):
1. **New migration** (same migration file or a new `000421`): Add
`redirect_uri text` column to `oauth2_provider_app_codes`
2. **Update INSERT query** in `coderd/database/queries/oauth2.sql` to
include `redirect_uri`
3. **`coderd/oauth2provider/authorize.go`**: Store
`params.redirectURL.String()` when inserting the code
4. **`coderd/oauth2provider/tokens.go`**: After retrieving the code from
DB, verify that `redirect_uri` from the token request matches the stored
value exactly. Currently `tokens.go:103` calls `p.RedirectURL(vals,
callbackURL, "redirect_uri")` for prefix validation only — it must
compare against the stored redirect_uri from the code, not just the
app's callback URL.
<details>
<summary>Why both exact match AND store+verify?</summary>
Exact matching at the authorize endpoint prevents open redirectors
(attacker can't use a sub-path).
Storing and verifying at the token endpoint prevents code injection — an
attacker who steals a code can't exchange it with a different
redirect_uri than was originally authorized. This is required by RFC
6749 §4.1.3 and OAuth 2.1.
</details>
### Fix 7 — `frame-ancestors` CSP on consent page
The consent page can be iframed by a workspace app (same-site), which is
the attack vector. Add a `Content-Security-Policy` header to prevent
framing.
**File: `site/site.go` — `RenderOAuthAllowPage` function (~line 731)**
Before writing the response, add:
```go
func RenderOAuthAllowPage(rw http.ResponseWriter, r *http.Request, data RenderOAuthAllowData) {
rw.Header().Set("Content-Type", "text/html; charset=utf-8")
// Prevent the consent page from being framed to mitigate
// clickjacking attacks (coder/security#121).
rw.Header().Set("Content-Security-Policy", "frame-ancestors 'none'")
rw.Header().Set("X-Frame-Options", "DENY")
...
```
Both headers for defense-in-depth (CSP for modern browsers,
X-Frame-Options for legacy).
### OAuth 2.1 — Mandatory PKCE
Currently PKCE is checked only when `code_challenge` was provided during
authorization (`tokens.go:258`):
```go
// CURRENT — conditional check
if dbCode.CodeChallenge.Valid && dbCode.CodeChallenge.String != "" {
// verify PKCE
}
```
OAuth 2.1 requires PKCE for ALL authorization code flows. Change to:
**File: `coderd/oauth2provider/authorize.go`** — Add `"code_challenge"`
to required params:
```go
p.RequiredNotEmpty("response_type", "client_id", "code_challenge")
```
**File: `coderd/oauth2provider/tokens.go:257-265`** — Make PKCE
verification unconditional:
```go
// AFTER — PKCE always required (OAuth 2.1)
if req.CodeVerifier == "" {
return codersdk.OAuth2TokenResponse{}, errInvalidPKCE
}
if !dbCode.CodeChallenge.Valid || dbCode.CodeChallenge.String == "" {
// Code was issued without a challenge — should not happen
// with the authorize endpoint enforcement, but defend in
// depth.
return codersdk.OAuth2TokenResponse{}, errInvalidPKCE
}
if !VerifyPKCE(dbCode.CodeChallenge.String, req.CodeVerifier) {
return codersdk.OAuth2TokenResponse{}, errInvalidPKCE
}
```
**File: `codersdk/oauth2.go`** — Remove
`OAuth2ProviderResponseTypeToken` from the enum or reject it explicitly
in the authorize handler. Currently it's defined at line 216 but the
handler ignores `response_type` and always issues a code. We should
either:
- (a) Remove the `"token"` variant from the enum and reject it with
`unsupported_response_type`, OR
- (b) Add an explicit check in `ProcessAuthorize` that rejects
`response_type=token`
Option (b) is simpler and more backwards-compatible:
```go
// In ProcessAuthorize, after extracting params:
if params.responseType != codersdk.OAuth2ProviderResponseTypeCode {
httpapi.WriteOAuth2Error(ctx, rw, http.StatusBadRequest,
codersdk.OAuth2ErrorCodeUnsupportedResponseType,
"Only response_type=code is supported")
return
}
```
### OAuth 2.1 — Bearer tokens in query strings
`coderd/httpmw/apikey.go:743` accepts `access_token` from URL query
parameters. OAuth 2.1 prohibits this. However, this may be used
internally (e.g., workspace apps, DERP). Need to audit callers before
removing.
**Approach:** This is a larger change with potential breakage. Mark as a
**separate follow-up issue** rather than including in this PR. Document
the finding.
### OAuth 2.1 — Removed flows
✅ **Already compliant.** `tokens.go` only supports `authorization_code`
and `refresh_token` grant types. The implicit grant
(`response_type=token`) will be explicitly rejected per the PKCE section
above.
### OAuth 2.1 — Refresh token rotation
✅ **Already compliant.** `tokens.go:442` deletes the old API key when a
refresh token is used.
## Migration Plan
All DB changes can go in a single new migration (or extend 000420 if the
branch is rebased before merge). Columns to add:
- `redirect_uri text` on `oauth2_provider_app_codes`
The `state_hash` column is already added by migration 000420.
## Implementation Order
1. **Fix 7** — CSP headers on consent page (isolated, no deps)
2. ~~**Fix 2** — Require `state` parameter~~ (DROPPED — state stays
optional)
3. **Fix 4** — Exact redirect URI matching + store/verify redirect_uri
4. **PKCE mandatory** — Require `code_challenge` + reject
`response_type=token`
5. **Rollback** — Remove `"state"` from `RequiredNotEmpty` in
`authorize.go`
6. **Tests** — Update/add tests for all changes
7. **`make gen`** after DB changes
## Out of Scope (separate PRs)
- Bearer tokens in query strings (needs internal caller audit)
- Scope enforcement on OAuth2 tokens
- Rate limiting / quota on dynamic client registration
</details>
---
_Generated with [`mux`](https://github.com/coder/mux) • Model:
`anthropic:claude-opus-4-6` • Thinking: `xhigh`_
Parent agents were re-using AuthInstanceID when spawning child agents.
This caused GetWorkspaceAgentByInstanceID to return the most recently
created sub agent instead of the parent when the parent tried to refetch
its own manifest.
Fix by not reusing AuthInstanceID for sub agents, and updating
GetWorkspaceAgentByInstanceID to filter them out entirely.
Adds coderd_template_workspace_build_duration_seconds histogram that
tracks the full duration from workspace build creation to agent ready.
This captures the complete user-perceived build time including
provisioning and agent startup.
The metric is emitted when the agent reports ready/error/timeout via the
lifecycle API, ensuring each build is counted exactly once per replica.
Previously, UpsertBoundaryUsageStats (INSERT...ON CONFLICT DO UPDATE) and
GetAndResetBoundaryUsageSummary (DELETE...RETURNING) could race during
telemetry period cutover. Without serialization, an upsert concurrent with the
delete could lose data (deleted right after being written) or commit after the
delete (miscounted in the next period). Both operations now acquire
LockIDBoundaryUsageStats within a transaction to ensure a clean cutover.
Task snapshots were orphaned when tasks were soft-deleted. The
`task_snapshots` table has an `ON DELETE CASCADE` foreign key, but
that only fires on hard deletes.
Modified DeleteTask to use a CTE that atomically soft-deletes the
task and removes its snapshot in a single transaction. The query now
returns just the task UUID instead of the full row.
Closescoder/internal#1283
Adds support for filtering workspaces by health status using
healthy:true or healthy:false in the search query.
This is done by changing `has-agent` to accept a list of statuses and
aliasing `health:true` to `has-agent:connected` and `healthy:false` to
`has-agent:timeout,disconnected`.
Fixes#21623
The AcquireProvisionerJob query only checked started_at IS NULL, allowing
it to acquire jobs that were canceled while pending (which have
completed_at set but started_at still NULL).
Added completed_at IS NULL check to the query to prevent this.
Also fixed JobCompleteBuilder.Do() in dbfake to set started_at when
completing jobs to match production behavior.
Fixescoder/internal#1323
Previously there were two issues that could cause incorrect boundary
usage telemetry data.
1. Bad handling across snapshot intervals: After telemetry snapshot deleted
the DB row, the next flush would INSERT the stale cumulative data (which
included already-reported usage). This would then be overwritten by
subsequent UPDATE flushes, causing the delta between the last snapshot
and the reset to be lost (under-reporting usage). Additionally, if there
was no new usage after the reset, the tracker would carry over all usage
from the previous period into the next period (over-reporting usage).
2. Missed usage from a race condition: Track() calls between the first
mutex unlock and second mutex lock in FlushToDB() were lost. The data
wasn't included in the current flush (already snapshotted) and was wiped
by the subsequent reset. This is likely low impact to overall usage
numbers in the real world.
Fix by tracking unique workspace/user deltas separately from cumulative
values and always tracking delta allowed/denied requests. Deltas are used
for INSERT (fresh start after reset), cumulative for UPDATE (accurate unique
counts within a period). All counters reset atomically before the DB operation
so Track() calls during the operation are preserved for the next flush.
Closes#21044
This pull-request addresses an issue we were seeing where we would
attempt to filter the `<UserCombobox />` by the users username or email
not their username (which the rendered options would show).
To highlight this I created three different users. Each with a username
that did not contain their `email` or `name` and attempted to filter.
Attempting to search for `John` wouldn't actually show the user as his
username was `x`, and infact whereas a subset of users might be returned
from the backend for having `john` in the `email` it would've been
filtered by the frontend for not being in the `name` field.
| Name | Username |
| --- | --- |
| `Jake` | `z` |
| `Jeff` | `y` |
| `John` | `x` |
| Previously | Now |
| --- | --- |
| <img width="560" height="547" alt="OLD_USER_COMBOBOX"
src="https://github.com/user-attachments/assets/a0567264-0034-42ac-aba0-95b05c4f92dd"
/> | <img width="580" height="548" alt="NEW_USER_COMBOBOX"
src="https://github.com/user-attachments/assets/1aa0c942-d340-4b1c-8dde-b97879525bfb"
/> |
Update provisionerdserver to handle the changes introduced to
provisionerd in https://github.com/coder/coder/pull/21602
We now create a relationship between `workspace_agent_devcontainers` and
`workspace_agents` with the newly created `subagent_id`.
feat: add boundary usage telemetry database schema and RBAC
Adds the foundation for tracking boundary usage telemetry across Coder
replicas. This includes:
- Database schema: `boundary_usage_stats` table with per-replica stats
(unique workspaces, unique users, allowed/denied request counts)
- Database queries: upsert stats, get aggregated summary, reset stats,
delete by replica ID
- RBAC: `boundary_usage` resource type with read/update/delete actions,
accessible only via system `BoundaryUsageTracker` subject (not regular
user roles)
- Tracker skeleton + docs: stub implementation in `coderd/boundaryusage/`
The tracker accumulates stats in memory and periodically flushes to the
database. Stats are aggregated across replicas for telemetry reporting,
then reset when a new reporting period begins. The tracker implementation
and plumbing will be done in a subsequent commit/PR.
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This change adds a POST /workspaceagents/me/tasks/{task}/log-snapshot
endpoint for agents to upload task conversation history during
workspace shutdown. This allows users to view task logs even when the
workspace is stopped.
The endpoint accepts agentapi format payloads (typically last 10
messages, max 64KB), wraps them in a format envelope, and upserts to the
task_snapshots table. Uses agent token auth and validates the task
belongs to the agent's workspace.
Closescoder/internal#1253
Removes the legacy tailnet v1 API tables (`tailnet_clients`, `tailnet_agents`, `tailnet_client_subscriptions`) and their associated queries, triggers, and functions. These were superseded by the v2 tables (`tailnet_peers`, `tailnet_tunnels`) in migration 000168, and the v1 API code was removed in commit d6154c4310, but the database artifacts were never cleaned up.
**Changes:**
- New migration `000410_remove_tailnet_v1_tables` to drop the unused tables
- Removed 11 unused queries from `tailnet.sql`
- Removed associated manual wrapper methods in `dbauthz` and `dbmetrics`
- ~930 lines deleted across 11 files
Agents were losing authentication during workspace shutdown, causing
shutdown scripts to fail. The auth query required agents to belong to
the latest build, but during shutdown a `stop` build becomes latest while
the `start` build's agents are still running.
Modified the auth query to allow `start` build agents to authenticate
temporarily during `stop` execution. The query allows auth when:
- Agent's `start` build job succeeded
- Latest build is `stop` with `pending`/`running` job status
- Builds are adjacent (`stop` is `build_number + 1`)
- Template versions match
Auth closes once `stop` completes.
Renamed `GetWorkspaceAgentAndLatestBuildByAuthToken` to
`GetAuthenticatedWorkspaceAgentAndBuildByAuthToken` since it returns the
agent's build (not always latest) during shutdown.
Closes coder/internal#1249
Fixes#19467
Relates to https://github.com/coder/internal/issues/1214
The `ExtractWorkspaceAgentParam` middleware ends up making 4 database
queries to follow the chain of `WorkspaceAgent` -> `WorkspaceResource`
-> `ProvisionerJob` -> `WorkspaceBuild` -- but then dropping all that
hard work on the floor. The `api.workspaceAgent` handler that references
this middleware then has to do all of that work again, plus one more
query to get the related `User` so we can get the username. This pattern
is also mirrored in `getDatabaseTerminal` but without the middleware.
This PR:
* Adds a new query `GetWorkspaceAgentAndWorkspaceByID` to fetch all
this information at once to avoid the multiple round-trips,
* Updates the existing usage of `GetWorkspaceAgentByID` to this new
query instead,
* Updates `ExtractWorkspaceAgentParam` to also store the workspace in
the request context
Dalibo: [0.63ms](https://explain.dalibo.com/plan/40bb597f3539gc6c)
Adds a per-organization setting to disable workspace sharing. When enabled,
all existing workspace ACLs in the organization are cleared and the workspace
ACL mutation API endpoints return `403 Forbidden`.
This complements the existing site-wide `--disable-workspace-sharing` flag by
providing more granular control at the organization level.
Closes https://github.com/coder/internal/issues/1073 (part 2)
---------
Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com>
fixes#21303
Update user last_seen_at when we mark them active on login. This prevents a narrow race where they can be re-marked dormant and fail to log in.
closes: https://github.com/coder/internal/issues/858
Similar to https://github.com/coder/coder/pull/19375, this one uses
system permissions for fetching actual user and group data.
Modifies the `workspaces_expanded` view to fetch the required data; this way it's made available to all code paths that make use of it.
Also fixes a bug in a test helper function that can result in `null` being saved to the DB for `user_acl` or `group_acl` and break tests; a defensive check constraint that prevents this is worth a PR, e.g:
`ALTER TABLE workspaces
ADD CONSTRAINT group_acl_is_object CHECK (jsonb_typeof(group_acl) = 'object');`
Also adds missing `OwnerName` in `ConvertWorkspaceRows`.
Previously the GetTemplateVersionVariables query did not sort output,
relying on PostgreSQL on-disk ordering which is undeterministic.
Variables are now sorted by name because there is no alternative for
ordering.
Tests were adjusted to accommodate the new ordering, previously they
relied on data being written to disk in insert order.