Use typed atomics (atomic.Int64, atomic.Int32, etc.) in test files to prevent
mixing atomic and non-atomic access on the same value, guarantee 64-bit
alignment on 32-bit platforms, and provide a cleaner API.
- Adds chat-related dbgen generators covering defaults, overrides, and message field mapping.
- Replaces raw single-row chat, message, provider, and model-config setup in tests with dbgen helpers.
- Simplifies chat seed helpers after moving fixture setup into dbgen.
> Generated with [Coder Agents](https://coder.com/agents).
Add a `chat_client_type` enum (`ui` | `api`) and `client_type` column to
the `chats` table. The column defaults to `api` for new rows so API
callers don't need to set it explicitly. Existing rows are backfilled to
`ui`.
The field flows through `CreateChatRequest`, `chatd.CreateOptions`,
`InsertChat`, and is returned in the `Chat` response via `db2sdk`.
<details>
<summary>Implementation notes (Coder Agents generated)</summary>
### Changes
**Database migration (000469)**
- New enum `chat_client_type` with values `ui`, `api`.
- New `client_type` column, `NOT NULL DEFAULT 'api'`.
- Backfill: `UPDATE chats SET client_type = 'ui'`.
**SQL query** — `InsertChat` now includes `client_type`.
**SDK** — `ChatClientType` type added; `ClientType` field added to both
`CreateChatRequest` (optional, defaults server-side to `api`) and `Chat`
response.
**Handler** — `postChats` maps the request field (defaulting to `api`)
and passes it through `chatd.CreateOptions`.
**Sub-agent** — Child chats inherit their parent's `client_type`.
**db2sdk** — Maps the database value to the SDK type.
### Decision log
- Default is `api` (not `ui`) so existing API integrations get the
correct value without code changes.
- Backfill sets existing rows to `ui` per requirement.
- Child chats inherit `client_type` from parent rather than defaulting.
</details>
Fixes https://github.com/coder/internal/issues/1436
* Adds organization_id to chats with backfill (workspace org → user org membership → default org)
* No support yet for ACLs (follow-up issue)
- Cross-org workspace binding rejected (both in `CreateChatRequest` and in `create_workspace` tool
- Adds `OrganizationAutocomplete` to `AgentCreateForm`
- Docs updated with `organization_id` in chats-api.md
> 🤖 Written by a Coder Agent. Reviewed by many humans and many agents.
---------
Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
Previously, `CreateChat` inserted the `chats` row with the DB default
status (`waiting`), then updated it to `pending` in the same transaction
via `setChatPendingWithStore`. This wasted two extra queries per chat
creation (`GetChatByID` + `UpdateChatStatus`) and rewrote the same row
immediately after inserting it.
Now `CreateChat` passes the status directly to `InsertChat`, so the row
is written once in its final create-time state. The
`setChatPendingWithStore` helper is removed entirely. `InsertChat` now
requires an explicit `status` parameter at all callsites instead of
relying on a DB column default.
## Motivation
On an experimental branch we're trialing firing all chatd notifications
from plpgsql triggers. The old two-step insert made that awkward: in an
`AFTER INSERT` trigger, `NEW` only contained the insert-time row
(`waiting`), not the final committed state (`pending`). To emit the
correct event payload the trigger had to be deferred and re-read the row
from `chats` at commit time.
With this change, `NEW` already contains the correct row to publish — no
deferred trigger, no extra `SELECT`, simpler and cheaper trigger logic.
That said, this seems like a worthwhile change regardless of the trigger
experiment: writing the final row state once removes unnecessary DB work
on every chat creation and makes the create path easier to reason about.
Replace overly-broad `AsSystemRestricted` with purpose-built actors:
- **OAuth2 provider paths** → `AsSystemOAuth2` (13 call sites across
`tokens.go`, `registration.go`, `apikey.go`)
- **Provisioner daemon health read** → `AsSystemReadProvisionerDaemons`
(1 site in `healthcheck/provisioner.go`)
- **Provisionerd file cache paths** → `AsProvisionerd` (2 sites in
`provisionerdserver.go`, matching existing usage nearby)
<details>
<summary>Implementation notes</summary>
Each replacement actor is a strict subset of `AsSystemRestricted`. Every
DB method
at each call site is already covered by the narrower actor's
permissions:
- `subjectSystemOAuth2`: OAuth2App/Secret/CodeToken (all), ApiKey (Read,
Delete), User (Read), Organization (Read)
- `subjectSystemReadProvisionerDaemons`: ProvisionerDaemon (Read)
- `subjectProvisionerd`: File (Create, Read) plus provisionerd-scoped
resources
No new permissions added. `nolint:gocritic` comments updated to reflect
the new actors.
</details>
> 🤖 Created by a Coder Agent, reviewed by me.
## Changes
- **Commit 1**: Remove 17 unnecessary `//nolint` directives:
- `//nolint:varnamelen` — linter not active
- `//nolint:unused` on exported `SlimUnsupported`
- `//nolint:govet` in `coderd/httpmw/csrf` — no longer fires
- `//nolint:revive` on functions refactored since the nolint was added
- `//nolint:paralleltest` citing Go 1.22 loop variable capture
(obsolete)
- Bare `//nolint` narrowed to specific `//nolint:gocritic` with
justification
- **Commit 2**: Fix root causes behind 5 dangerous nolint suppressions:
- Add `MinVersion: tls.VersionTLS12` to TLS client config (removes
`gosec` G402)
- Delete trivial unexported wrappers `apiKey()`/`normalizeProvider()` in
chatprovider (removes `revive` confusing-naming)
- Add doc comments to `StartWithAssert` and `Router` (removes `revive`
exported)
- Rename unused parameters to `_` in integration test helpers
> 🤖 This PR was created using Coder Agents and reviewed by me.
## Issue context
On `dev.coder.com`, users could successfully log in, briefly see the web
UI, and then get redirected back to `/login`.
We traced the most reliable repro to viewing Tracy's workspaces on the
`/workspaces` page. That page eagerly issues authenticated per-row
requests such as:
- `POST /api/v2/authcheck`
- `GET /api/v2/workspacebuilds/:workspacebuild/parameters`
One confirmed failing request was for Tracy's workspace
`nav-scroll-fix-1f6b`:
- route: `GET
/api/v2/workspacebuilds/f2104ae6-7d53-457c-a8df-de831bee76db/parameters`
- build owner/workspace: `tracy/nav-scroll-fix-1f6b`
The failing response body was:
- message: `An internal error occurred. Please try again or contact the
system administrator.`
- detail: `Internal error fetching API key by id. fetch object: pq:
password authentication failed for user "coder"`
That showed the request was not actually unauthorized. The server hit an
internal database/authentication problem while resolving the session API
key. The underlying issue was that DB password rotation had been
enabled, it has since been disabled.
However, the logout cascade happened because:
1. `APIKeyFromRequest()` returned `ok=false` for both genuine auth
failures and internal backend failures.
2. `ValidateAPIKey()` wrapped every `!ok` result as `401 Unauthorized`.
3. `RequireAuth.tsx` signs the user out on any `401` response.
So a transient backend/database failure was being misreported as an auth
failure, which made the client forcibly log the user out.
A useful extra clue was that the installed PWA did not repro. The PWA
starts on `/agents`, which avoids the `/workspaces` request fan-out.
That helped narrow the problem to the eager authenticated requests on
the workspace list rather than to cookies or the login flow itself.
## What changed
This PR now fixes the bug without changing the exported
`APIKeyFromRequest()` surface:
- `ValidateAPIKey()` now uses a new internal helper that returns a typed
`ValidateAPIKeyError`
- the exported `APIKeyFromRequest()` helper remains compatible for
existing callers like `userauth.go`
- internal API-key lookup failures are classified as `500 Internal
Server Error` plus `Hard: true`
- internal `UserRBACSubject()` failures now return `500 Internal Server
Error` instead of `401 Unauthorized`
- a focused regression test verifies that an internal `GetAPIKeyByID`
failure surfaces as `500`
This removes the brittle message-based classification and makes the
internal-auth-failure path robust for all API-key lookup failures
handled by auth middleware.
## Problem
Rate limiting by user is broken (#20857). The rate limit middleware runs
before API key extraction, so user ID is never in the request context.
This causes:
- Rate limiting falls back to IP address for all requests
- `X-Coder-Bypass-Ratelimit` header for Owners is ignored (can't verify
role without identity)
## Solution
Adds `PrecheckAPIKey`, a **root-level middleware** that fully validates
the API key on every request (expiry, OIDC refresh, DB updates, role
lookup) and stores the result in context. Added **once** at the root
router — not duplicated per route group.
### Architecture
```
Request → Root middleware stack:
→ ExtractRealIP, Logger, ...
→ PrecheckAPIKey(...) ← validates key, stores result, never rejects
→ HandleSubdomain(apiRateLimiter) ← workspace apps now also benefit
→ CORS, CSRF
→ /api/v2 or /api/experimental:
→ apiRateLimiter ← reads prechecked result from context
→ route handlers:
→ ExtractAPIKeyMW ← reuses prechecked data, adds route-specific logic
→ handler
```
### Key design decisions
| Decision | Rationale |
|---|---|
| **Full validation, not lightweight** | Spike's review: "the whole idea
of a 'lightweight' extraction that skips security checks is
fundamentally flawed." Only fully validated keys are used for rate
limiting — expired/invalid keys fall back to IP. |
| **Structured error results** | `ValidateAPIKeyError` has a `Hard` flag
that maps to `write` vs `optionalWrite`. Hard errors (5xx, OAuth refresh
failures) surface even on optional-auth routes. Soft errors
(missing/expired token) are swallowed on optional routes. |
| **Added once at the root** | Spike's review: "Why can't we add it once
at the root?" Root placement means workspace app rate limiters also
benefit. |
| **Skip prechecked when `SessionTokenFunc != nil`** |
`workspaceapps/db.go` uses a custom `SessionTokenFunc` that extracts
from `issueReq.SessionToken`. The prechecked result may have validated a
different token. Falls back to `ValidateAPIKey` with the custom func. |
| **User status check stays in `ExtractAPIKey`** | Dormant activation is
route-specific — `ValidateAPIKey` stores status but doesn't enforce it.
|
| **Audience validation stays in `ExtractAPIKey`** | Depends on
`cfg.AccessURL` and request path, uses `optionalWrite(403)` which
depends on route config. |
### Changes
- **`coderd/httpmw/apikey.go`**:
- New `ValidateAPIKey` function — extracted core validation logic,
returns structured errors instead of writing HTTP responses
- New `PrecheckAPIKey` middleware — calls `ValidateAPIKey`, stores
result in `apiKeyPrecheckedContextKey`, never rejects
- New types: `ValidateAPIKeyConfig`, `ValidateAPIKeyResult`,
`ValidateAPIKeyError`, `APIKeyPrechecked`
- Refactored `ExtractAPIKey` — consumes prechecked result from context
(skipping redundant validation), falls back to `ValidateAPIKey` when no
precheck available
- Removed `ExtractAPIKeyForRateLimit` and `preExtractedAPIKey`
- **`coderd/httpmw/ratelimit.go`**: Rate limiter checks
`apiKeyPrecheckedContextKey` first, then `apiKeyContextKey` fallback
(for unit tests / workspace apps), then IP
- **`coderd/coderd.go`**: Added `PrecheckAPIKey` once at root
`r.Use(...)` block, removed `ExtractAPIKeyForRateLimit` from `/api/v2`
and `/api/experimental`
- **`coderd/coderd_test.go`**: `TestRateLimitByUser` regression test
with `BypassOwner` subtest
Fixes#20857
## Summary
Remove the `workspace_agent_id` column from the `chats` table and
dynamically look up the first workspace agent instead.
## Problem
When a workspace is stopped and restarted, the workspace agent gets a
new ID. The `workspace_agent_id` stored on the chat at creation time
becomes stale, making the agent unreachable. This caused chats to break
after workspace restarts.
## Solution
Instead of persisting the agent ID, dynamically look up the first agent
from the workspace's latest build via
`GetWorkspaceAgentsInLatestBuildByWorkspaceID` whenever an agent
connection is needed. The `workspace_id` on the chat remains stable
across restarts.
This behavior may be refined later (e.g., agent selection heuristics),
but picking the first agent resolves the immediate breakage.
## Changes
- **Migration 000425**: Drop `workspace_agent_id` column from `chats`
- **SQL queries**: Remove `workspace_agent_id` from `InsertChat` and
`UpdateChatWorkspace`
- **chatd.go**: `getWorkspaceConn` and `resolveInstructions` now look up
agents dynamically from workspace ID
- **chatd.go**: Remove `refreshChatWorkspaceSnapshot` (no longer needed)
- **createworkspace.go**: Stop persisting agent ID when associating
workspace with chat
- **subagent.go**: Stop passing agent ID to child chats
- **SDK/frontend**: Remove `WorkspaceAgentID` / `workspace_agent_id`
from Chat type
---------
Co-authored-by: Kyle Carberry <kylecarbs@gmail.com>
## Summary
Harden the OAuth2 provider with multiple security fixes addressing
`coder/security#121` (CSRF session takeover) and converge on OAuth 2.1
compliance.
### Security Fixes
| Fix | Description | Commits |
|-----|-------------|---------|
| **CSRF on `/oauth2/authorize`** | Enforce CSRF protection on the
authorize endpoint POST (consent form submission) | `ba7d646`, `b94a64e`
|
| **Clickjacking: `frame-ancestors` CSP** | Prevent consent page from
being iframed (`Content-Security-Policy: frame-ancestors 'none'` +
`X-Frame-Options: DENY`) | `597aeb2` |
| **Exact redirect URI matching** | Changed from prefix matching to full
string exact matching per OAuth 2.1 §4.1.2.1 | `73d64b1`, `93897f1` |
| **Store & verify `redirect_uri`** | Store redirect_uri with auth code
in DB, verify at token exchange matches exactly (RFC 6749 §4.1.3) |
`50569b9`, `d7ca315` |
| **Mandatory PKCE** | Require `code_challenge` at authorization (for
`response_type=code`) + unconditional `code_verifier` verification at
token exchange | `d7ca315`, `1cda1a9` |
| **Reject implicit grant** | `response_type=token` now returns
`unsupported_response_type` error page (OAuth 2.1 removes implicit flow)
| `d7ca315`, `91b8863` |
### Changes by File
**`coderd/httpmw/csrf.go`** — Extended the CSRF `ExemptFunc` to enforce
CSRF on `/oauth2/authorize` in addition to `/api` routes. The consent
form POST is now CSRF-protected to prevent cross-site authorization code
theft.
**`site/site.go`** — Added `Content-Security-Policy: frame-ancestors
'none'` and `X-Frame-Options: DENY` headers to `RenderOAuthAllowPage`
(consent page only — does not affect the SPA/global CSP used by AI
tasks).
**`coderd/httpapi/queryparams.go`** — Changed `RedirectURL` from prefix
matching (`strings.HasPrefix(v.Path, base.Path)`) to full URI exact
matching (`v.String() != base.String()`), comparing scheme, host, path,
and query.
**`coderd/oauth2provider/authorize.go`** — Added PKCE enforcement:
`code_challenge` is required when `response_type=code` (via a
conditional check, not `RequiredNotEmpty`, so `response_type=token` can
reach the explicit rejection path). `ShowAuthorizePage` (GET) validates
`response_type` before rendering and returns a 400 error page for
unsupported types. `ProcessAuthorize` (POST) stores the `redirect_uri`
with the auth code when explicitly provided.
**`coderd/oauth2provider/tokens.go`** — PKCE verification is now
unconditional (not gated on `code_challenge` being present in DB). If
the stored code has a `redirect_uri`, the token endpoint verifies it
matches exactly — mismatch returns `errBadCode` → `invalid_grant`.
Missing `code_verifier` returns `invalid_grant`.
**`codersdk/oauth2.go`** — `OAuth2ProviderResponseTypeToken` constant
and `Valid()` acceptance are **kept** so the authorize handler can parse
`response_type=token` and return the proper `unsupported_response_type`
error rather than failing at parameter validation.
**`coderd/database/migrations/000421_*`** — Added `redirect_uri text`
column to `oauth2_provider_app_codes`.
### Design Decisions
**`state` parameter remains optional** — The plan initially required
`state` via `RequiredNotEmpty`, but this was reverted in `376a753` to
avoid breaking existing clients. The `state` is still hashed and stored
when provided (via `state_hash` column), securing clients that opt in.
**`response_type=token` kept in `Valid()`** — Removing it from `Valid()`
would cause the parameter parser to reject the request before the
authorize handler can return the proper `unsupported_response_type`
error. The constant is kept for correct error handling flow.
**CSP scoped to consent page only** — `frame-ancestors 'none'` is set
only on the OAuth consent page renderer, not globally. The SPA/global
CSP was previously changed to allow framing for AI tasks
([#18102](https://github.com/coder/coder/pull/18102)); this change does
not regress that.
### Out of Scope (follow-up PRs)
- Bearer tokens in query strings (needs internal caller audit)
- Scope enforcement on OAuth2 tokens
- Rate limiting on dynamic client registration
---
<details>
<summary>📋 Implementation Plan</summary>
# Plan: Harden OAuth2 Provider — Security Fixes + OAuth 2.1 Compliance
## Context & Why
Security issue `coder/security#121` reports a critical session takeover
via CSRF on the OAuth2 provider. This plan covers all remaining security
fixes from that issue **plus** convergence on OAuth 2.1 requirements.
The goal is a single PR that closes all actionable gaps.
## Current State (already committed on branch `csrf-sjx1`)
| Fix | Status | Commits |
|-----|--------|---------|
| Fix 1: CSRF on `/oauth2/authorize` | ✅ Done | `ba7d646`, `b94a64e` |
| CSRF token in consent form HTML | ✅ Done | `b94a64e` |
| `state_hash` column + storage | ✅ Done (hash stored, but state still
optional) | `9167d83`, `b94a64e` |
| Tests for CSRF + state hash | ✅ Done | `e4119b5` |
## Remaining Work
### ~~Fix 2 — Require `state` parameter~~ (DROPPED)
> **Decision:** Do not enforce `state` as required. The `state`
parameter is still hashed and stored when provided (via
`hashOAuth2State` / `state_hash` column from prior commits), but clients
are not forced to supply it. This avoids breaking existing integrations
that omit state.
**Rollback:** Remove `"state"` from the `RequiredNotEmpty` call in
`coderd/oauth2provider/authorize.go:42`:
```go
// BEFORE (current on branch)
p.RequiredNotEmpty("response_type", "client_id", "state", "code_challenge")
// AFTER
p.RequiredNotEmpty("response_type", "client_id", "code_challenge")
```
No test changes needed — tests already pass `state` voluntarily.
### Fix 4 — Exact redirect URI matching
Currently `coderd/httpapi/queryparams.go:233` uses prefix matching:
```go
// CURRENT — prefix match
if v.Host != base.Host || !strings.HasPrefix(v.Path, base.Path) {
```
OAuth 2.1 requires **exact string matching**. Change to:
```go
// AFTER — exact match (OAuth 2.1 §4.1.2.1)
if v.Host != base.Host || v.Path != base.Path {
```
**File: `coderd/httpapi/queryparams.go` — `RedirectURL` method**
Also update the error message from "must be a subset of" to "must
exactly match".
**Additionally**, store `redirect_uri` with the auth code and verify at
the token endpoint (RFC 6749 §4.1.3):
1. **New migration** (same migration file or a new `000421`): Add
`redirect_uri text` column to `oauth2_provider_app_codes`
2. **Update INSERT query** in `coderd/database/queries/oauth2.sql` to
include `redirect_uri`
3. **`coderd/oauth2provider/authorize.go`**: Store
`params.redirectURL.String()` when inserting the code
4. **`coderd/oauth2provider/tokens.go`**: After retrieving the code from
DB, verify that `redirect_uri` from the token request matches the stored
value exactly. Currently `tokens.go:103` calls `p.RedirectURL(vals,
callbackURL, "redirect_uri")` for prefix validation only — it must
compare against the stored redirect_uri from the code, not just the
app's callback URL.
<details>
<summary>Why both exact match AND store+verify?</summary>
Exact matching at the authorize endpoint prevents open redirectors
(attacker can't use a sub-path).
Storing and verifying at the token endpoint prevents code injection — an
attacker who steals a code can't exchange it with a different
redirect_uri than was originally authorized. This is required by RFC
6749 §4.1.3 and OAuth 2.1.
</details>
### Fix 7 — `frame-ancestors` CSP on consent page
The consent page can be iframed by a workspace app (same-site), which is
the attack vector. Add a `Content-Security-Policy` header to prevent
framing.
**File: `site/site.go` — `RenderOAuthAllowPage` function (~line 731)**
Before writing the response, add:
```go
func RenderOAuthAllowPage(rw http.ResponseWriter, r *http.Request, data RenderOAuthAllowData) {
rw.Header().Set("Content-Type", "text/html; charset=utf-8")
// Prevent the consent page from being framed to mitigate
// clickjacking attacks (coder/security#121).
rw.Header().Set("Content-Security-Policy", "frame-ancestors 'none'")
rw.Header().Set("X-Frame-Options", "DENY")
...
```
Both headers for defense-in-depth (CSP for modern browsers,
X-Frame-Options for legacy).
### OAuth 2.1 — Mandatory PKCE
Currently PKCE is checked only when `code_challenge` was provided during
authorization (`tokens.go:258`):
```go
// CURRENT — conditional check
if dbCode.CodeChallenge.Valid && dbCode.CodeChallenge.String != "" {
// verify PKCE
}
```
OAuth 2.1 requires PKCE for ALL authorization code flows. Change to:
**File: `coderd/oauth2provider/authorize.go`** — Add `"code_challenge"`
to required params:
```go
p.RequiredNotEmpty("response_type", "client_id", "code_challenge")
```
**File: `coderd/oauth2provider/tokens.go:257-265`** — Make PKCE
verification unconditional:
```go
// AFTER — PKCE always required (OAuth 2.1)
if req.CodeVerifier == "" {
return codersdk.OAuth2TokenResponse{}, errInvalidPKCE
}
if !dbCode.CodeChallenge.Valid || dbCode.CodeChallenge.String == "" {
// Code was issued without a challenge — should not happen
// with the authorize endpoint enforcement, but defend in
// depth.
return codersdk.OAuth2TokenResponse{}, errInvalidPKCE
}
if !VerifyPKCE(dbCode.CodeChallenge.String, req.CodeVerifier) {
return codersdk.OAuth2TokenResponse{}, errInvalidPKCE
}
```
**File: `codersdk/oauth2.go`** — Remove
`OAuth2ProviderResponseTypeToken` from the enum or reject it explicitly
in the authorize handler. Currently it's defined at line 216 but the
handler ignores `response_type` and always issues a code. We should
either:
- (a) Remove the `"token"` variant from the enum and reject it with
`unsupported_response_type`, OR
- (b) Add an explicit check in `ProcessAuthorize` that rejects
`response_type=token`
Option (b) is simpler and more backwards-compatible:
```go
// In ProcessAuthorize, after extracting params:
if params.responseType != codersdk.OAuth2ProviderResponseTypeCode {
httpapi.WriteOAuth2Error(ctx, rw, http.StatusBadRequest,
codersdk.OAuth2ErrorCodeUnsupportedResponseType,
"Only response_type=code is supported")
return
}
```
### OAuth 2.1 — Bearer tokens in query strings
`coderd/httpmw/apikey.go:743` accepts `access_token` from URL query
parameters. OAuth 2.1 prohibits this. However, this may be used
internally (e.g., workspace apps, DERP). Need to audit callers before
removing.
**Approach:** This is a larger change with potential breakage. Mark as a
**separate follow-up issue** rather than including in this PR. Document
the finding.
### OAuth 2.1 — Removed flows
✅ **Already compliant.** `tokens.go` only supports `authorization_code`
and `refresh_token` grant types. The implicit grant
(`response_type=token`) will be explicitly rejected per the PKCE section
above.
### OAuth 2.1 — Refresh token rotation
✅ **Already compliant.** `tokens.go:442` deletes the old API key when a
refresh token is used.
## Migration Plan
All DB changes can go in a single new migration (or extend 000420 if the
branch is rebased before merge). Columns to add:
- `redirect_uri text` on `oauth2_provider_app_codes`
The `state_hash` column is already added by migration 000420.
## Implementation Order
1. **Fix 7** — CSP headers on consent page (isolated, no deps)
2. ~~**Fix 2** — Require `state` parameter~~ (DROPPED — state stays
optional)
3. **Fix 4** — Exact redirect URI matching + store/verify redirect_uri
4. **PKCE mandatory** — Require `code_challenge` + reject
`response_type=token`
5. **Rollback** — Remove `"state"` from `RequiredNotEmpty` in
`authorize.go`
6. **Tests** — Update/add tests for all changes
7. **`make gen`** after DB changes
## Out of Scope (separate PRs)
- Bearer tokens in query strings (needs internal caller audit)
- Scope enforcement on OAuth2 tokens
- Rate limiting / quota on dynamic client registration
</details>
---
_Generated with [`mux`](https://github.com/coder/mux) • Model:
`anthropic:claude-opus-4-6` • Thinking: `xhigh`_
Since Go 1.22, the loop variable capture issue is resolved. Variables
declared by for loops are now per-iteration rather than per-loop, making
the 'v := v' pattern unnecessary.
Operators need to know which API key was used in HTTP requests.
For example, if a key is leaking and a DDOS is underway using that key, operators need a way to identify the key in use and take steps to expire the key (see https://github.com/coder/coder/issues/21782).
_Disclaimer: created using Claude Opus 4.5_
Agents were losing authentication during workspace shutdown, causing
shutdown scripts to fail. The auth query required agents to belong to
the latest build, but during shutdown a `stop` build becomes latest while
the `start` build's agents are still running.
Modified the auth query to allow `start` build agents to authenticate
temporarily during `stop` execution. The query allows auth when:
- Agent's `start` build job succeeded
- Latest build is `stop` with `pending`/`running` job status
- Builds are adjacent (`stop` is `build_number + 1`)
- Template versions match
Auth closes once `stop` completes.
Renamed `GetWorkspaceAgentAndLatestBuildByAuthToken` to
`GetAuthenticatedWorkspaceAgentAndBuildByAuthToken` since it returns the
agent's build (not always latest) during shutdown.
Closes coder/internal#1249
Fixes#19467
Relates to https://github.com/coder/internal/issues/1214
The `ExtractWorkspaceAgentParam` middleware ends up making 4 database
queries to follow the chain of `WorkspaceAgent` -> `WorkspaceResource`
-> `ProvisionerJob` -> `WorkspaceBuild` -- but then dropping all that
hard work on the floor. The `api.workspaceAgent` handler that references
this middleware then has to do all of that work again, plus one more
query to get the related `User` so we can get the username. This pattern
is also mirrored in `getDatabaseTerminal` but without the middleware.
This PR:
* Adds a new query `GetWorkspaceAgentAndWorkspaceByID` to fetch all
this information at once to avoid the multiple round-trips,
* Updates the existing usage of `GetWorkspaceAgentByID` to this new
query instead,
* Updates `ExtractWorkspaceAgentParam` to also store the workspace in
the request context
Dalibo: [0.63ms](https://explain.dalibo.com/plan/40bb597f3539gc6c)
Extracts part of the prometheus middleware that stores the route
information in the request context into its own middleware. Also adds
request method information to context.
Relates to https://github.com/coder/internal/issues/1214
Add comprehensive OAuth2 enum types to codersdk following RFC specifications:
- OAuth2ProviderGrantType (RFC 6749)
- OAuth2ProviderResponseType (RFC 6749)
- OAuth2TokenEndpointAuthMethod (RFC 7591)
- OAuth2PKCECodeChallengeMethod (RFC 7636)
- OAuth2TokenType (RFC 6749, RFC 9449)
- OAuth2RevocationTokenTypeHint (RFC 7009)
- OAuth2ErrorCode (RFC 6749, RFC 7009, RFC 8707)
Add OAuth2TokenRequest, OAuth2TokenResponse, OAuth2TokenRevocationRequest,
and OAuth2Error structs to the SDK. Update OAuth2ClientRegistrationRequest,
OAuth2ClientRegistrationResponse, OAuth2ClientConfiguration, and
OAuth2AuthorizationServerMetadata to use typed enums instead of raw strings.
This makes codersdk the single source of truth for OAuth2 types, eliminating
duplication between SDK and server-side structs.
Closes#21476
Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:
```
import (
"context"
"time"
"github.com/prometheus/client_golang/prometheus"
"golang.org/x/xerrors"
"gopkg.in/natefinch/lumberjack.v2"
"cdr.dev/slog/v3"
"github.com/coder/coder/v2/codersdk/agentsdk"
"github.com/coder/serpent"
)
```
3 groups: standard library, 3rd partly libs, Coder libs.
This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).
It also updates dependencies that also use slog and were updated.
I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.
Other dependencies, I pushed new tags.
Related to
[`internal#1139`](https://github.com/coder/internal/issues/1139)
Continuation of #21074
This implements some RBAC role specificity for `dbpurge`, ensuring that
we follow the least-privileged model for removing data from the
database. It is specified as following.
```go
Site: rbac.Permissions(map[string][]policy.Action{
// DeleteOldWorkspaceAgentLogs
// DeleteOldWorkspaceAgentStats
// DeleteOldProvisionerDaemons
// DeleteOldTelemetryLocks
// DeleteOldAuditLogConnectionEvents
// DeleteOldConnectionLogs
rbac.ResourceSystem.Type: {policy.ActionDelete},
// DeleteOldNotificationMessages
rbac.ResourceNotificationMessage.Type: {policy.ActionDelete},
// ExpirePrebuildsAPIKeys
// DeleteExpiredAPIKeys
rbac.ResourceApiKey.Type: {policy.ActionDelete},
// DeleteOldAIBridgeRecords
rbac.ResourceAibridgeInterception.Type: {policy.ActionDelete},
}),
```
| Position | Pull-request |
| -------- | ------------ |
| | [feat: add prometheus observability metrics for
`dbpurge`](https://github.com/coder/coder/pull/21074) |
| ✅ | [feat: add rbac specificity for
`dbpurge`](https://github.com/coder/coder/pull/21088) |
I noticed while looking at scale test metrics that we don't always
report a useful path in the API request metrics.

There are a lot of requests with path `/*`. I chased this problem to the
workspace proxy, where we mount a the proxy router as a child of a
"root" router to support some high level endpoints like `latency-check`.
Because we query the path from the Chi route context in the prometheus
middleware _before_ the request is actually handled, we can have a
partially resolved pattern match only corresponding to the root router.
The fix is to always re-resolve the path, rather than accept a partially
resolved path.
**Breaking Change:** Existing oauth apps might now use PKCE. If an
unknown IdP type was being used, and it does not support PKCE, it will
break.
To fix, set the PKCE methods on the external auth to `none`
```
export CODER_EXTERNAL_AUTH_1_PKCE_METHODS=none
```
## Summary
This adds configurable overload protection to the AI Bridge daemon to
prevent the server from being overwhelmed during periods of high load.
Partially addresses coder/internal#1153 (rate limits and concurrency
control; circuit breakers are deferred to a follow-up).
## New Configuration Options
| Option | Environment Variable | Description | Default |
|--------|---------------------|-------------|---------|
| `--aibridge-max-concurrency` | `CODER_AIBRIDGE_MAX_CONCURRENCY` |
Maximum number of concurrent AI Bridge requests. Set to 0 to disable
(unlimited). | `0` |
| `--aibridge-rate-limit` | `CODER_AIBRIDGE_RATE_LIMIT` | Maximum number
of AI Bridge requests per second. Set to 0 to disable rate limiting. |
`0` |
## Behavior
When limits are exceeded:
- **Concurrency limit**: Returns HTTP `503 Service Unavailable` with
message "AI Bridge is currently at capacity. Please try again later."
- **Rate limit**: Returns HTTP `429 Too Many Requests` with
`Retry-After` header.
Both protections are optional and disabled by default (0 values).
## Implementation
The overload protection is implemented as reusable middleware in
`coderd/httpmw/ratelimit.go`:
1. **`RateLimitByAuthToken`**: Per-user rate limiting that uses
`APITokenFromRequest` to extract the authentication token, with fallback
to `X-Api-Key` header for AI provider compatibility (e.g., Anthropic).
Falls back to IP-based rate limiting if no token is present. Includes
`Retry-After` header for backpressure signaling.
2. **`ConcurrencyLimit`**: Uses an atomic counter to track in-flight
requests and reject when at capacity.
The middleware is applied in `enterprise/coderd/aibridge.go` via
`r.Group` in the following order:
1. Concurrency check (faster rejection for load shedding)
2. Rate limit check
**Note**: Rate limiting currently applies to all AI Bridge requests,
including pass-through requests. Ideally only actual interceptions
should count, but this would require changes in the aibridge library.
## Testing
Added comprehensive tests for:
- Rate limiting by auth token (Bearer token, X-Api-Key, no token
fallback to IP)
- Different tokens not rate limited against each other
- Disabled when limit is zero
- Retry-After header is set on 429 responses
- Concurrency limiting (allows within limit, rejects over limit,
disabled when zero)
This is to debug context timeouts on API requests to the agent.
Because rbac and database cannot be imported in slim, split the logger
middleware into slim and non-slim versions and break out the recovery
middleware.
* Adds a `GetTaskByOwnerIDAndName` query
* Updates `httpmw.TaskParam` to fall back to task name if no task by
UUID found.
* Updates the `TaskByIdentifier` used in `cli/` to use direct lookup instead of searching.
This PR uses the same sha256 hashing technique as we use for APIKeys. So
now all randomly generated secrets will be hashed with sha256 for
consistency.
This is a breaking change for the oauth tokens. Since oauth is only
allowed for dev builds and experimental, this is ok.
The authz recorder is causing a lot of memory to be allocated, and is a
memory leak for websocket connections.
This change makes it opt-in on a per request basis (ontop of `isDev`).
To get the authz headers, use `Copy as cURL` on chrome and append the
header `x-authz-checks=true`.
# Add API key allow_list for resource-scoped tokens
This PR adds support for API key allow lists, enabling tokens to be scoped to specific resources. The implementation:
1. Adds a new `allow_list` field to the `CreateTokenRequest` struct, allowing clients to specify resource-specific scopes when creating API tokens
2. Implements `APIAllowListTarget` type to represent resource targets in the format `<type>:<id>` with support for wildcards
3. Adds validation and normalization logic for allow lists to handle wildcards and deduplication
4. Integrates with RBAC by creating an `APIKeyEffectiveScope` that merges API key scopes with allow list restrictions
5. Updates API documentation and TypeScript types to reflect the new functionality
This feature enables creating tokens that are limited to specific resources (like workspaces or templates) by ID, making it possible to create more granular API tokens with limited access.
# Canonicalize API Key Scopes
This PR introduces canonical API key scopes with a `coder:` namespace prefix to avoid collisions with low-level resource:action names. It:
1. Renames special API key scopes in the database:
- `all` → `coder:all`
- `application_connect` → `coder:application_connect`
2. Adds support for a new `scopes` field in the API key creation request, allowing multiple scopes to be specified while maintaining backward compatibility with the singular `scope` field.
3. Updates the API documentation to reflect these changes, including the new endpoint for listing public API key scopes.
4. Ensures backward compatibility by mapping between legacy and canonical scope names in relevant code paths.
Blink helped here but it's suggestion was to have a set map of sensitive
fields based on predefined constants in various files, such as the api
token string names. For now we'll add additional query param logging for fields we know are safe/that we want to log, such as query pagination/limit fields and ID list counts which may help identify P99 DB query latencies.
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>
PProf labels segment the code into groups for determing the source of
cpu/memory profiles. Since the web server and background jobs share a
lot of the same code (eg wsbuilder), it helps to know if the load is
user induced, or background job based.
# Enhanced OAuth2 and MCP Compliance for API Authentication
This PR improves OAuth2 and MCP (Microsoft Cloud for Sovereignty)
compliance by:
1. Adding RFC 9728 compliant `WWW-Authenticate` headers with resource
metadata URLs
2. Passing the configured `AccessURL` to API key middleware for proper
audience validation
3. Creating specialized CORS handling for OAuth2 and MCP endpoints with
appropriate headers
4. Making the `state` parameter optional in OAuth2 authorization
requests
These changes ensure proper OAuth2 token audience validation against the
configured access URL and improve interoperability with OAuth2 clients
by providing better error responses and metadata discovery.
Signed-off-by: Thomas Kosiewski <tk@coder.com>
## Problem
Users were being automatically logged out when deleting OAuth2
applications.
## Root Cause
1. User deletes OAuth2 app successfully
2. React Query automatically refetches the app data
3. Management API incorrectly returned **401 Unauthorized** for the
missing app
4. Frontend axios interceptor sees 401 and calls `signOut()`
5. User gets logged out unexpectedly
## Solution
- Change management API to return **404 Not Found** for missing OAuth2
apps
- OAuth2 protocol endpoints continue returning 401 per RFC 6749
- Rename `writeInvalidClient` to `writeClientNotFound` for clarity
## Additional Changes
- Add conditional OAuth2 navigation when experiment is enabled or in dev
builds
- Add `isDevBuild()` utility and `buildInfo` to dashboard context
- Minor improvements to format script and warning dialogs
Signed-off-by: Thomas Kosiewski <tk@coder.com>
# Add MCP HTTP Server Experiment
This PR adds a new experiment flag `mcp-server-http` to enable the MCP HTTP server functionality. The changes include:
1. Added a new experiment constant `ExperimentMCPServerHTTP` with the value "mcp-server-http"
2. Added display name and documentation for the new experiment
3. Improved the experiment middleware to:
- Support requiring multiple experiments
- Provide better error messages with experiment display names
- Add a development mode bypass option
4. Applied the new experiment requirement to the MCP HTTP endpoint
5. Replaced the custom OAuth2 middleware with the standard experiment middleware
The PR also improves the `Enabled()` method on the `Experiments` type by using `slices.Contains()` for better readability.
# Add RFC 6750 Bearer Token Authentication Support
This PR implements RFC 6750 Bearer Token authentication as an additional authentication method for Coder's API. This allows clients to authenticate using standard OAuth 2.0 Bearer tokens in two ways:
1. Using the `Authorization: Bearer <token>` header
2. Using the `access_token` query parameter
Key changes:
- Added support for extracting tokens from both Bearer headers and access_token query parameters
- Implemented proper WWW-Authenticate headers for 401/403 responses with appropriate error descriptions
- Added comprehensive test coverage for the new authentication methods
- Updated the OAuth2 protected resource metadata endpoint to advertise Bearer token support
- Enhanced the OAuth2 testing script to verify Bearer token functionality
These authentication methods are added as fallback options, maintaining backward compatibility with Coder's existing authentication mechanisms. The existing authentication methods (cookies, session token header, etc.) still take precedence.
This implementation follows the OAuth 2.0 Bearer Token specification (RFC 6750) and improves interoperability with standard OAuth 2.0 clients.