coder

mirror of https://github.com/coder/coder.git synced 2026-06-03 04:58:23 +00:00

Author	SHA1	Message	Date
Zach	81e2be69e9	test: use typed atomics in test files (#25071 ) Use typed atomics (atomic.Int64, atomic.Int32, etc.) in test files to prevent mixing atomic and non-atomic access on the same value, guarantee 64-bit alignment on 32-bit platforms, and provide a cleaner API.	2026-05-11 08:41:17 -06:00
Cian Johnston	2f855904be	refactor: add dbgen chat generators and migrate test boilerplate (#24497 ) - Adds chat-related dbgen generators covering defaults, overrides, and message field mapping. - Replaces raw single-row chat, message, provider, and model-config setup in tests with dbgen helpers. - Simplifies chat seed helpers after moving fixture setup into dbgen. > Generated with [Coder Agents](https://coder.com/agents).	2026-05-01 13:29:33 +01:00
Jakub Domeracki	411ed21059	fix(coderd): omit frame-ancestors CSP for embed routes (#24529 )	2026-04-20 15:38:52 +02:00
Jakub Domeracki	615be176b8	fix(coderd): add frame-ancestors CSP directive to prevent clickjacking (#24474 )	2026-04-20 13:01:46 +02:00
Dean Sheather	3452ab3166	chore: add client_type field to chats and telemetry (#24342 ) Add a `chat_client_type` enum (`ui` \| `api`) and `client_type` column to the `chats` table. The column defaults to `api` for new rows so API callers don't need to set it explicitly. Existing rows are backfilled to `ui`. The field flows through `CreateChatRequest`, `chatd.CreateOptions`, `InsertChat`, and is returned in the `Chat` response via `db2sdk`. <details> <summary>Implementation notes (Coder Agents generated)</summary> ### Changes Database migration (000469) - New enum `chat_client_type` with values `ui`, `api`. - New `client_type` column, `NOT NULL DEFAULT 'api'`. - Backfill: `UPDATE chats SET client_type = 'ui'`. SQL query — `InsertChat` now includes `client_type`. SDK — `ChatClientType` type added; `ClientType` field added to both `CreateChatRequest` (optional, defaults server-side to `api`) and `Chat` response. Handler — `postChats` maps the request field (defaulting to `api`) and passes it through `chatd.CreateOptions`. Sub-agent — Child chats inherit their parent's `client_type`. db2sdk — Maps the database value to the SDK type. ### Decision log - Default is `api` (not `ui`) so existing API integrations get the correct value without code changes. - Backfill sets existing rows to `ui` per requirement. - Child chats inherit `client_type` from parent rather than defaulting. </details>	2026-04-16 23:57:05 +10:00
Cian Johnston	22062ec52e	feat: add organization scoping to chats (#23827 ) Fixes https://github.com/coder/internal/issues/1436 * Adds organization_id to chats with backfill (workspace org → user org membership → default org) * No support yet for ACLs (follow-up issue) - Cross-org workspace binding rejected (both in `CreateChatRequest` and in `create_workspace` tool - Adds `OrganizationAutocomplete` to `AgentCreateForm` - Docs updated with `organization_id` in chats-api.md > 🤖 Written by a Coder Agent. Reviewed by many humans and many agents. --------- Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>	2026-04-13 12:31:25 +01:00
Michael Suchacz	7d0a0c6495	feat: provider key policies and user provider settings (#23751 )	2026-04-02 19:46:42 +02:00
Ethan	7757cd8e08	refactor(coderd/x/chatd): insert chats directly as pending on creation (#23888 ) Previously, `CreateChat` inserted the `chats` row with the DB default status (`waiting`), then updated it to `pending` in the same transaction via `setChatPendingWithStore`. This wasted two extra queries per chat creation (`GetChatByID` + `UpdateChatStatus`) and rewrote the same row immediately after inserting it. Now `CreateChat` passes the status directly to `InsertChat`, so the row is written once in its final create-time state. The `setChatPendingWithStore` helper is removed entirely. `InsertChat` now requires an explicit `status` parameter at all callsites instead of relying on a DB column default. ## Motivation On an experimental branch we're trialing firing all chatd notifications from plpgsql triggers. The old two-step insert made that awkward: in an `AFTER INSERT` trigger, `NEW` only contained the insert-time row (`waiting`), not the final committed state (`pending`). To emit the correct event payload the trigger had to be deferred and re-read the row from `chats` at commit time. With this change, `NEW` already contains the correct row to publish — no deferred trigger, no extra `SELECT`, simpler and cheaper trigger logic. That said, this seems like a worthwhile change regardless of the trigger experiment: writing the final row state once removes unnecessary DB work on every chat creation and makes the create path easier to reason about.	2026-04-02 14:13:51 +11:00
Cian Johnston	3f55b35f68	refactor: replace AsSystemRestricted with narrower actors (#23712 ) Replace overly-broad `AsSystemRestricted` with purpose-built actors: - OAuth2 provider paths → `AsSystemOAuth2` (13 call sites across `tokens.go`, `registration.go`, `apikey.go`) - Provisioner daemon health read → `AsSystemReadProvisionerDaemons` (1 site in `healthcheck/provisioner.go`) - Provisionerd file cache paths → `AsProvisionerd` (2 sites in `provisionerdserver.go`, matching existing usage nearby) <details> <summary>Implementation notes</summary> Each replacement actor is a strict subset of `AsSystemRestricted`. Every DB method at each call site is already covered by the narrower actor's permissions: - `subjectSystemOAuth2`: OAuth2App/Secret/CodeToken (all), ApiKey (Read, Delete), User (Read), Organization (Read) - `subjectSystemReadProvisionerDaemons`: ProvisionerDaemon (Read) - `subjectProvisionerd`: File (Create, Read) plus provisionerd-scoped resources No new permissions added. `nolint:gocritic` comments updated to reflect the new actors. </details> > 🤖 Created by a Coder Agent, reviewed by me.	2026-03-27 15:08:30 +00:00
Cian Johnston	847a88c6ca	chore: clean up stale and dangerous //nolint comments (#23643 ) ## Changes - Commit 1: Remove 17 unnecessary `//nolint` directives: - `//nolint:varnamelen` — linter not active - `//nolint:unused` on exported `SlimUnsupported` - `//nolint:govet` in `coderd/httpmw/csrf` — no longer fires - `//nolint:revive` on functions refactored since the nolint was added - `//nolint:paralleltest` citing Go 1.22 loop variable capture (obsolete) - Bare `//nolint` narrowed to specific `//nolint:gocritic` with justification - Commit 2: Fix root causes behind 5 dangerous nolint suppressions: - Add `MinVersion: tls.VersionTLS12` to TLS client config (removes `gosec` G402) - Delete trivial unexported wrappers `apiKey()`/`normalizeProvider()` in chatprovider (removes `revive` confusing-naming) - Add doc comments to `StartWithAssert` and `Router` (removes `revive` exported) - Rename unused parameters to `_` in integration test helpers > 🤖 This PR was created using Coder Agents and reviewed by me.	2026-03-26 14:13:53 +00:00
Ethan	c1474c7ee2	fix(coderd/httpmw): return 500 for internal auth errors (#23352 ) ## Issue context On `dev.coder.com`, users could successfully log in, briefly see the web UI, and then get redirected back to `/login`. We traced the most reliable repro to viewing Tracy's workspaces on the `/workspaces` page. That page eagerly issues authenticated per-row requests such as: - `POST /api/v2/authcheck` - `GET /api/v2/workspacebuilds/:workspacebuild/parameters` One confirmed failing request was for Tracy's workspace `nav-scroll-fix-1f6b`: - route: `GET /api/v2/workspacebuilds/f2104ae6-7d53-457c-a8df-de831bee76db/parameters` - build owner/workspace: `tracy/nav-scroll-fix-1f6b` The failing response body was: - message: `An internal error occurred. Please try again or contact the system administrator.` - detail: `Internal error fetching API key by id. fetch object: pq: password authentication failed for user "coder"` That showed the request was not actually unauthorized. The server hit an internal database/authentication problem while resolving the session API key. The underlying issue was that DB password rotation had been enabled, it has since been disabled. However, the logout cascade happened because: 1. `APIKeyFromRequest()` returned `ok=false` for both genuine auth failures and internal backend failures. 2. `ValidateAPIKey()` wrapped every `!ok` result as `401 Unauthorized`. 3. `RequireAuth.tsx` signs the user out on any `401` response. So a transient backend/database failure was being misreported as an auth failure, which made the client forcibly log the user out. A useful extra clue was that the installed PWA did not repro. The PWA starts on `/agents`, which avoids the `/workspaces` request fan-out. That helped narrow the problem to the eager authenticated requests on the workspace list rather than to cookies or the login flow itself. ## What changed This PR now fixes the bug without changing the exported `APIKeyFromRequest()` surface: - `ValidateAPIKey()` now uses a new internal helper that returns a typed `ValidateAPIKeyError` - the exported `APIKeyFromRequest()` helper remains compatible for existing callers like `userauth.go` - internal API-key lookup failures are classified as `500 Internal Server Error` plus `Hard: true` - internal `UserRBACSubject()` failures now return `500 Internal Server Error` instead of `401 Unauthorized` - a focused regression test verifies that an internal `GetAPIKeyByID` failure surfaces as `500` This removes the brittle message-based classification and makes the internal-auth-failure path robust for all API-key lookup failures handled by auth middleware.	2026-03-24 12:37:17 +11:00
Cian Johnston	65b7658568	chore: extract testutil.FakeSink for slog test assertions (#23208 ) Follow-up to [review comment on #23025](https://github.com/coder/coder/pull/23025#discussion_r2930309487) from @mafredri. Extracts the repeated `logSink` / `fakeSink` test pattern into a shared `testutil.FakeSink` and migrates all existing call sites. > 🤖 This PR was created with the help of Coder Agents, and will be reviewed by my human. 🧑‍💻 --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-03-18 17:02:38 +00:00
Kacper Sawicki	49006685b0	fix: rate limit by user instead of IP for authenticated requests (#22049 ) ## Problem Rate limiting by user is broken (#20857). The rate limit middleware runs before API key extraction, so user ID is never in the request context. This causes: - Rate limiting falls back to IP address for all requests - `X-Coder-Bypass-Ratelimit` header for Owners is ignored (can't verify role without identity) ## Solution Adds `PrecheckAPIKey`, a root-level middleware that fully validates the API key on every request (expiry, OIDC refresh, DB updates, role lookup) and stores the result in context. Added once at the root router — not duplicated per route group. ### Architecture ``` Request → Root middleware stack: → ExtractRealIP, Logger, ... → PrecheckAPIKey(...) ← validates key, stores result, never rejects → HandleSubdomain(apiRateLimiter) ← workspace apps now also benefit → CORS, CSRF → /api/v2 or /api/experimental: → apiRateLimiter ← reads prechecked result from context → route handlers: → ExtractAPIKeyMW ← reuses prechecked data, adds route-specific logic → handler ``` ### Key design decisions \| Decision \| Rationale \| \|---\|---\| \| Full validation, not lightweight \| Spike's review: "the whole idea of a 'lightweight' extraction that skips security checks is fundamentally flawed." Only fully validated keys are used for rate limiting — expired/invalid keys fall back to IP. \| \| Structured error results \| `ValidateAPIKeyError` has a `Hard` flag that maps to `write` vs `optionalWrite`. Hard errors (5xx, OAuth refresh failures) surface even on optional-auth routes. Soft errors (missing/expired token) are swallowed on optional routes. \| \| Added once at the root \| Spike's review: "Why can't we add it once at the root?" Root placement means workspace app rate limiters also benefit. \| \| Skip prechecked when `SessionTokenFunc != nil` \| `workspaceapps/db.go` uses a custom `SessionTokenFunc` that extracts from `issueReq.SessionToken`. The prechecked result may have validated a different token. Falls back to `ValidateAPIKey` with the custom func. \| \| User status check stays in `ExtractAPIKey` \| Dormant activation is route-specific — `ValidateAPIKey` stores status but doesn't enforce it. \| \| Audience validation stays in `ExtractAPIKey` \| Depends on `cfg.AccessURL` and request path, uses `optionalWrite(403)` which depends on route config. \| ### Changes - `coderd/httpmw/apikey.go`: - New `ValidateAPIKey` function — extracted core validation logic, returns structured errors instead of writing HTTP responses - New `PrecheckAPIKey` middleware — calls `ValidateAPIKey`, stores result in `apiKeyPrecheckedContextKey`, never rejects - New types: `ValidateAPIKeyConfig`, `ValidateAPIKeyResult`, `ValidateAPIKeyError`, `APIKeyPrechecked` - Refactored `ExtractAPIKey` — consumes prechecked result from context (skipping redundant validation), falls back to `ValidateAPIKey` when no precheck available - Removed `ExtractAPIKeyForRateLimit` and `preExtractedAPIKey` - `coderd/httpmw/ratelimit.go`: Rate limiter checks `apiKeyPrecheckedContextKey` first, then `apiKeyContextKey` fallback (for unit tests / workspace apps), then IP - `coderd/coderd.go`: Added `PrecheckAPIKey` once at root `r.Use(...)` block, removed `ExtractAPIKeyForRateLimit` from `/api/v2` and `/api/experimental` - `coderd/coderd_test.go`: `TestRateLimitByUser` regression test with `BypassOwner` subtest Fixes #20857	2026-03-09 13:54:31 +01:00
Kyle Carberry	34d9392e37	chore(db): remove workspace_agent_id from chats table (#22442 ) ## Summary Remove the `workspace_agent_id` column from the `chats` table and dynamically look up the first workspace agent instead. ## Problem When a workspace is stopped and restarted, the workspace agent gets a new ID. The `workspace_agent_id` stored on the chat at creation time becomes stale, making the agent unreachable. This caused chats to break after workspace restarts. ## Solution Instead of persisting the agent ID, dynamically look up the first agent from the workspace's latest build via `GetWorkspaceAgentsInLatestBuildByWorkspaceID` whenever an agent connection is needed. The `workspace_id` on the chat remains stable across restarts. This behavior may be refined later (e.g., agent selection heuristics), but picking the first agent resolves the immediate breakage. ## Changes - Migration 000425: Drop `workspace_agent_id` column from `chats` - SQL queries: Remove `workspace_agent_id` from `InsertChat` and `UpdateChatWorkspace` - chatd.go: `getWorkspaceConn` and `resolveInstructions` now look up agents dynamically from workspace ID - chatd.go: Remove `refreshChatWorkspaceSnapshot` (no longer needed) - createworkspace.go: Stop persisting agent ID when associating workspace with chat - subagent.go: Stop passing agent ID to child chats - SDK/frontend: Remove `WorkspaceAgentID` / `workspace_agent_id` from Chat type --------- Co-authored-by: Kyle Carberry <kylecarbs@gmail.com>	2026-02-28 16:46:51 -05:00
Kyle Carberry	edee917d88	feat: add experimental agents support (#22290 ) feat: add AI chat system with agent tools and chat UI Introduce the chatd subsystem and Agents UI for AI-powered chat within Coder workspaces. - Add chatd package with chat loop, message compaction, prompt management, and LLM provider integration (OpenAI, Anthropic) - Add agent tools: create workspace, list/read templates, read/write/ edit files, execute commands - Add chat API endpoints with streaming, message editing, and durable reconnection - Add database schema and migrations for chats, chat messages, chat providers, and chat model configs - Add RBAC policies and dbauthz enforcement for chat resources - Add Agents UI pages with conversation timeline, queued messages list, diff viewer, and model configuration panel - Add comprehensive test coverage including coderd integration tests, chatd unit tests, and Storybook stories - Gate feature behind experiments flag --------- Co-authored-by: Cian Johnston <cian@coder.com> Co-authored-by: Danielle Maywood <danielle@themaywoods.com> Co-authored-by: Jeremy Ruppel <jeremy@coder.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 16:50:56 +00:00
Thomas Kosiewski	b776a14b46	fix(coderd): harden OAuth2 provider security (#22194 ) ## Summary Harden the OAuth2 provider with multiple security fixes addressing `coder/security#121` (CSRF session takeover) and converge on OAuth 2.1 compliance. ### Security Fixes \| Fix \| Description \| Commits \| \|-----\|-------------\|---------\| \| CSRF on `/oauth2/authorize` \| Enforce CSRF protection on the authorize endpoint POST (consent form submission) \| `ba7d646`, `b94a64e` \| \| Clickjacking: `frame-ancestors` CSP \| Prevent consent page from being iframed (`Content-Security-Policy: frame-ancestors 'none'` + `X-Frame-Options: DENY`) \| `597aeb2` \| \| Exact redirect URI matching \| Changed from prefix matching to full string exact matching per OAuth 2.1 §4.1.2.1 \| `73d64b1`, `93897f1` \| \| Store & verify `redirect_uri` \| Store redirect_uri with auth code in DB, verify at token exchange matches exactly (RFC 6749 §4.1.3) \| `50569b9`, `d7ca315` \| \| Mandatory PKCE \| Require `code_challenge` at authorization (for `response_type=code`) + unconditional `code_verifier` verification at token exchange \| `d7ca315`, `1cda1a9` \| \| Reject implicit grant \| `response_type=token` now returns `unsupported_response_type` error page (OAuth 2.1 removes implicit flow) \| `d7ca315`, `91b8863` \| ### Changes by File `coderd/httpmw/csrf.go` — Extended the CSRF `ExemptFunc` to enforce CSRF on `/oauth2/authorize` in addition to `/api` routes. The consent form POST is now CSRF-protected to prevent cross-site authorization code theft. `site/site.go` — Added `Content-Security-Policy: frame-ancestors 'none'` and `X-Frame-Options: DENY` headers to `RenderOAuthAllowPage` (consent page only — does not affect the SPA/global CSP used by AI tasks). `coderd/httpapi/queryparams.go` — Changed `RedirectURL` from prefix matching (`strings.HasPrefix(v.Path, base.Path)`) to full URI exact matching (`v.String() != base.String()`), comparing scheme, host, path, and query. `coderd/oauth2provider/authorize.go` — Added PKCE enforcement: `code_challenge` is required when `response_type=code` (via a conditional check, not `RequiredNotEmpty`, so `response_type=token` can reach the explicit rejection path). `ShowAuthorizePage` (GET) validates `response_type` before rendering and returns a 400 error page for unsupported types. `ProcessAuthorize` (POST) stores the `redirect_uri` with the auth code when explicitly provided. `coderd/oauth2provider/tokens.go` — PKCE verification is now unconditional (not gated on `code_challenge` being present in DB). If the stored code has a `redirect_uri`, the token endpoint verifies it matches exactly — mismatch returns `errBadCode` → `invalid_grant`. Missing `code_verifier` returns `invalid_grant`. `codersdk/oauth2.go` — `OAuth2ProviderResponseTypeToken` constant and `Valid()` acceptance are kept so the authorize handler can parse `response_type=token` and return the proper `unsupported_response_type` error rather than failing at parameter validation. *`coderd/database/migrations/000421_` — Added `redirect_uri text` column to `oauth2_provider_app_codes`. ### Design Decisions `state` parameter remains optional — The plan initially required `state` via `RequiredNotEmpty`, but this was reverted in `376a753` to avoid breaking existing clients. The `state` is still hashed and stored when provided (via `state_hash` column), securing clients that opt in. `response_type=token` kept in `Valid()` — Removing it from `Valid()` would cause the parameter parser to reject the request before the authorize handler can return the proper `unsupported_response_type` error. The constant is kept for correct error handling flow. CSP scoped to consent page only — `frame-ancestors 'none'` is set only on the OAuth consent page renderer, not globally. The SPA/global CSP was previously changed to allow framing for AI tasks ([#18102](https://github.com/coder/coder/pull/18102)); this change does not regress that. ### Out of Scope (follow-up PRs) - Bearer tokens in query strings (needs internal caller audit) - Scope enforcement on OAuth2 tokens - Rate limiting on dynamic client registration --- <details> <summary>📋 Implementation Plan</summary> # Plan: Harden OAuth2 Provider — Security Fixes + OAuth 2.1 Compliance ## Context & Why Security issue `coder/security#121` reports a critical session takeover via CSRF on the OAuth2 provider. This plan covers all remaining security fixes from that issue plus convergence on OAuth 2.1 requirements. The goal is a single PR that closes all actionable gaps. ## Current State (already committed on branch `csrf-sjx1`) \| Fix \| Status \| Commits \| \|-----\|--------\|---------\| \| Fix 1: CSRF on `/oauth2/authorize` \| ✅ Done \| `ba7d646`, `b94a64e` \| \| CSRF token in consent form HTML \| ✅ Done \| `b94a64e` \| \| `state_hash` column + storage \| ✅ Done (hash stored, but state still optional) \| `9167d83`, `b94a64e` \| \| Tests for CSRF + state hash \| ✅ Done \| `e4119b5` \| ## Remaining Work ### ~~Fix 2 — Require `state` parameter~~ (DROPPED) > Decision: Do not enforce `state` as required. The `state` parameter is still hashed and stored when provided (via `hashOAuth2State` / `state_hash` column from prior commits), but clients are not forced to supply it. This avoids breaking existing integrations that omit state. Rollback: Remove `"state"` from the `RequiredNotEmpty` call in `coderd/oauth2provider/authorize.go:42`: ```go // BEFORE (current on branch) p.RequiredNotEmpty("response_type", "client_id", "state", "code_challenge") // AFTER p.RequiredNotEmpty("response_type", "client_id", "code_challenge") ``` No test changes needed — tests already pass `state` voluntarily. ### Fix 4 — Exact redirect URI matching Currently `coderd/httpapi/queryparams.go:233` uses prefix matching: ```go // CURRENT — prefix match if v.Host != base.Host \|\| !strings.HasPrefix(v.Path, base.Path) { ``` OAuth 2.1 requires exact string matching. Change to: ```go // AFTER — exact match (OAuth 2.1 §4.1.2.1) if v.Host != base.Host \|\| v.Path != base.Path { ``` File: `coderd/httpapi/queryparams.go` — `RedirectURL` method Also update the error message from "must be a subset of" to "must exactly match". Additionally, store `redirect_uri` with the auth code and verify at the token endpoint (RFC 6749 §4.1.3): 1. New migration (same migration file or a new `000421`): Add `redirect_uri text` column to `oauth2_provider_app_codes` 2. Update INSERT query in `coderd/database/queries/oauth2.sql` to include `redirect_uri` 3. `coderd/oauth2provider/authorize.go`: Store `params.redirectURL.String()` when inserting the code 4. `coderd/oauth2provider/tokens.go`: After retrieving the code from DB, verify that `redirect_uri` from the token request matches the stored value exactly. Currently `tokens.go:103` calls `p.RedirectURL(vals, callbackURL, "redirect_uri")` for prefix validation only — it must compare against the stored redirect_uri from the code, not just the app's callback URL. <details> <summary>Why both exact match AND store+verify?</summary> Exact matching at the authorize endpoint prevents open redirectors (attacker can't use a sub-path). Storing and verifying at the token endpoint prevents code injection — an attacker who steals a code can't exchange it with a different redirect_uri than was originally authorized. This is required by RFC 6749 §4.1.3 and OAuth 2.1. </details> ### Fix 7 — `frame-ancestors` CSP on consent page The consent page can be iframed by a workspace app (same-site), which is the attack vector. Add a `Content-Security-Policy` header to prevent framing. File: `site/site.go` — `RenderOAuthAllowPage` function (~line 731)** Before writing the response, add: ```go func RenderOAuthAllowPage(rw http.ResponseWriter, r http.Request, data RenderOAuthAllowData) { rw.Header().Set("Content-Type", "text/html; charset=utf-8") // Prevent the consent page from being framed to mitigate // clickjacking attacks (coder/security#121). rw.Header().Set("Content-Security-Policy", "frame-ancestors 'none'") rw.Header().Set("X-Frame-Options", "DENY") ... ``` Both headers for defense-in-depth (CSP for modern browsers, X-Frame-Options for legacy). ### OAuth 2.1 — Mandatory PKCE Currently PKCE is checked only when `code_challenge` was provided during authorization (`tokens.go:258`): ```go // CURRENT — conditional check if dbCode.CodeChallenge.Valid && dbCode.CodeChallenge.String != "" { // verify PKCE } ``` OAuth 2.1 requires PKCE for ALL authorization code flows. Change to: File: `coderd/oauth2provider/authorize.go`* — Add `"code_challenge"` to required params: ```go p.RequiredNotEmpty("response_type", "client_id", "code_challenge") ``` File: `coderd/oauth2provider/tokens.go:257-265` — Make PKCE verification unconditional: ```go // AFTER — PKCE always required (OAuth 2.1) if req.CodeVerifier == "" { return codersdk.OAuth2TokenResponse{}, errInvalidPKCE } if !dbCode.CodeChallenge.Valid \|\| dbCode.CodeChallenge.String == "" { // Code was issued without a challenge — should not happen // with the authorize endpoint enforcement, but defend in // depth. return codersdk.OAuth2TokenResponse{}, errInvalidPKCE } if !VerifyPKCE(dbCode.CodeChallenge.String, req.CodeVerifier) { return codersdk.OAuth2TokenResponse{}, errInvalidPKCE } ``` File: `codersdk/oauth2.go` — Remove `OAuth2ProviderResponseTypeToken` from the enum or reject it explicitly in the authorize handler. Currently it's defined at line 216 but the handler ignores `response_type` and always issues a code. We should either: - (a) Remove the `"token"` variant from the enum and reject it with `unsupported_response_type`, OR - (b) Add an explicit check in `ProcessAuthorize` that rejects `response_type=token` Option (b) is simpler and more backwards-compatible: ```go // In ProcessAuthorize, after extracting params: if params.responseType != codersdk.OAuth2ProviderResponseTypeCode { httpapi.WriteOAuth2Error(ctx, rw, http.StatusBadRequest, codersdk.OAuth2ErrorCodeUnsupportedResponseType, "Only response_type=code is supported") return } ``` ### OAuth 2.1 — Bearer tokens in query strings `coderd/httpmw/apikey.go:743` accepts `access_token` from URL query parameters. OAuth 2.1 prohibits this. However, this may be used internally (e.g., workspace apps, DERP). Need to audit callers before removing. Approach: This is a larger change with potential breakage. Mark as a separate follow-up issue rather than including in this PR. Document the finding. ### OAuth 2.1 — Removed flows ✅ Already compliant. `tokens.go` only supports `authorization_code` and `refresh_token` grant types. The implicit grant (`response_type=token`) will be explicitly rejected per the PKCE section above. ### OAuth 2.1 — Refresh token rotation ✅ Already compliant. `tokens.go:442` deletes the old API key when a refresh token is used. ## Migration Plan All DB changes can go in a single new migration (or extend 000420 if the branch is rebased before merge). Columns to add: - `redirect_uri text` on `oauth2_provider_app_codes` The `state_hash` column is already added by migration 000420. ## Implementation Order 1. Fix 7 — CSP headers on consent page (isolated, no deps) 2. ~~Fix 2 — Require `state` parameter~~ (DROPPED — state stays optional) 3. Fix 4 — Exact redirect URI matching + store/verify redirect_uri 4. PKCE mandatory — Require `code_challenge` + reject `response_type=token` 5. Rollback — Remove `"state"` from `RequiredNotEmpty` in `authorize.go` 6. Tests — Update/add tests for all changes 7. `make gen` after DB changes ## Out of Scope (separate PRs) - Bearer tokens in query strings (needs internal caller audit) - Scope enforcement on OAuth2 tokens - Rate limiting / quota on dynamic client registration </details> --- _Generated with [`mux`](https://github.com/coder/mux) • Model: `anthropic:claude-opus-4-6` • Thinking: `xhigh`_	2026-02-23 12:18:44 +01:00
Danielle Maywood	92a6d6c2c0	chore: remove unnecessary loop variable captures (#22180 ) Since Go 1.22, the loop variable capture issue is resolved. Variables declared by for loops are now per-iteration rather than per-loop, making the 'v := v' pattern unnecessary.	2026-02-19 09:02:19 +00:00
Steven Masley	01f06671a1	chore: return 404, not 400 if missing or authz deny (#22069 )	2026-02-13 08:19:07 -06:00
Thomas Kosiewski	dd6aec04d7	fix(coderd/oauth2provider): support client_secret_basic client auth (#21793 )	2026-02-02 16:01:33 +01:00
Danny Kopping	536bca7ea9	chore: log api key on each HTTP API request (#21785 ) Operators need to know which API key was used in HTTP requests. For example, if a key is leaking and a DDOS is underway using that key, operators need a way to identify the key in use and take steps to expire the key (see https://github.com/coder/coder/issues/21782). _Disclaimer: created using Claude Opus 4.5_	2026-01-30 14:48:10 +02:00
Mathias Fredriksson	97e8a5b093	fix(coderd): allow agent auth during workspace shutdown (#21538 ) Agents were losing authentication during workspace shutdown, causing shutdown scripts to fail. The auth query required agents to belong to the latest build, but during shutdown a `stop` build becomes latest while the `start` build's agents are still running. Modified the auth query to allow `start` build agents to authenticate temporarily during `stop` execution. The query allows auth when: - Agent's `start` build job succeeded - Latest build is `stop` with `pending`/`running` job status - Builds are adjacent (`stop` is `build_number + 1`) - Template versions match Auth closes once `stop` completes. Renamed `GetWorkspaceAgentAndLatestBuildByAuthToken` to `GetAuthenticatedWorkspaceAgentAndBuildByAuthToken` since it returns the agent's build (not always latest) during shutdown. Closes coder/internal#1249 Fixes #19467	2026-01-21 13:18:43 +00:00
Cian Johnston	08343a7a9f	perf: reduce number of queries made by /api/v2/workspaceagents/{id} (#21522 ) Relates to https://github.com/coder/internal/issues/1214 The `ExtractWorkspaceAgentParam` middleware ends up making 4 database queries to follow the chain of `WorkspaceAgent` -> `WorkspaceResource` -> `ProvisionerJob` -> `WorkspaceBuild` -- but then dropping all that hard work on the floor. The `api.workspaceAgent` handler that references this middleware then has to do all of that work again, plus one more query to get the related `User` so we can get the username. This pattern is also mirrored in `getDatabaseTerminal` but without the middleware. This PR: * Adds a new query `GetWorkspaceAgentAndWorkspaceByID` to fetch all this information at once to avoid the multiple round-trips, * Updates the existing usage of `GetWorkspaceAgentByID` to this new query instead, * Updates `ExtractWorkspaceAgentParam` to also store the workspace in the request context Dalibo: [0.63ms](https://explain.dalibo.com/plan/40bb597f3539gc6c)	2026-01-19 12:36:33 +00:00
Cian Johnston	ad23ea3561	chore: remove unused ExtractWorkspaceAndAgentParam (#21537 ) While investigating https://github.com/coder/internal/issues/1214 I noticed that `ExtractWorkspaceAndAgentParam` appeared to be unused outside of tests.	2026-01-16 15:11:10 +00:00
Cian Johnston	32354261d3	chore(coderd/httpmw): extract HTTPRoute middleware (#21498 ) Extracts part of the prometheus middleware that stores the route information in the request context into its own middleware. Also adds request method information to context. Relates to https://github.com/coder/internal/issues/1214	2026-01-15 10:26:50 +00:00
Ehab Younes	6683d807ac	refactor: add RFC-compliant enum types and use SDK as source of truth (#21468 ) Add comprehensive OAuth2 enum types to codersdk following RFC specifications: - OAuth2ProviderGrantType (RFC 6749) - OAuth2ProviderResponseType (RFC 6749) - OAuth2TokenEndpointAuthMethod (RFC 7591) - OAuth2PKCECodeChallengeMethod (RFC 7636) - OAuth2TokenType (RFC 6749, RFC 9449) - OAuth2RevocationTokenTypeHint (RFC 7009) - OAuth2ErrorCode (RFC 6749, RFC 7009, RFC 8707) Add OAuth2TokenRequest, OAuth2TokenResponse, OAuth2TokenRevocationRequest, and OAuth2Error structs to the SDK. Update OAuth2ClientRegistrationRequest, OAuth2ClientRegistrationResponse, OAuth2ClientConfiguration, and OAuth2AuthorizationServerMetadata to use typed enums instead of raw strings. This makes codersdk the single source of truth for OAuth2 types, eliminating duplication between SDK and server-side structs. Closes #21476	2026-01-15 12:41:28 +03:00
Cian Johnston	64e7a77983	feat: add user_agent to loggermw (#21485 ) Adds the `user_agent` field to `httpmw/loggermw`.	2026-01-13 10:50:01 +00:00
Spike Curtis	bddb808b25	chore: arrange imports in a standard way (#21452 ) Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example: ``` import ( "context" "time" "github.com/prometheus/client_golang/prometheus" "golang.org/x/xerrors" "gopkg.in/natefinch/lumberjack.v2" "cdr.dev/slog/v3" "github.com/coder/coder/v2/codersdk/agentsdk" "github.com/coder/serpent" ) ``` 3 groups: standard library, 3rd partly libs, Coder libs. This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.	2026-01-08 15:24:11 +04:00
Spike Curtis	49b34a716a	fix: fix slog to always use array of Fields (#21426 ) Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder). It also updates dependencies that also use slog and were updated. I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule. Other dependencies, I pushed new tags.	2026-01-08 10:29:41 +04:00
Jake Howell	ea00e72063	feat: add rbac specificity for `dbpurge` (#21088 ) Related to [`internal#1139`](https://github.com/coder/internal/issues/1139) Continuation of #21074 This implements some RBAC role specificity for `dbpurge`, ensuring that we follow the least-privileged model for removing data from the database. It is specified as following. ```go Site: rbac.Permissions(map[string][]policy.Action{ // DeleteOldWorkspaceAgentLogs // DeleteOldWorkspaceAgentStats // DeleteOldProvisionerDaemons // DeleteOldTelemetryLocks // DeleteOldAuditLogConnectionEvents // DeleteOldConnectionLogs rbac.ResourceSystem.Type: {policy.ActionDelete}, // DeleteOldNotificationMessages rbac.ResourceNotificationMessage.Type: {policy.ActionDelete}, // ExpirePrebuildsAPIKeys // DeleteExpiredAPIKeys rbac.ResourceApiKey.Type: {policy.ActionDelete}, // DeleteOldAIBridgeRecords rbac.ResourceAibridgeInterception.Type: {policy.ActionDelete}, }), ``` \| Position \| Pull-request \| \| -------- \| ------------ \| \| \| [feat: add prometheus observability metrics for `dbpurge`](https://github.com/coder/coder/pull/21074) \| \| ✅ \| [feat: add rbac specificity for `dbpurge`](https://github.com/coder/coder/pull/21088) \|	2025-12-20 01:02:39 +11:00
Spike Curtis	c5fc6defb8	fix: report correct request paths from workspace proxy metrics (#21302 ) I noticed while looking at scale test metrics that we don't always report a useful path in the API request metrics. ![image.png](https://app.graphite.com/user-attachments/assets/a5b0dadf-9c2f-46a8-a6c1-3ad5f6201edb.png) There are a lot of requests with path `/*`. I chased this problem to the workspace proxy, where we mount a the proxy router as a child of a "root" router to support some high level endpoints like `latency-check`. Because we query the path from the Chi route context in the prometheus middleware _before_ the request is actually handled, we can have a partially resolved pattern match only corresponding to the root router. The fix is to always re-resolve the path, rather than accept a partially resolved path.	2025-12-17 21:08:40 +04:00
Steven Masley	8fefd91e4a	feat!: support PKCE in the oauth2 client's auth/exchange flow (#21215 ) Breaking Change: Existing oauth apps might now use PKCE. If an unknown IdP type was being used, and it does not support PKCE, it will break. To fix, set the PKCE methods on the external auth to `none` ``` export CODER_EXTERNAL_AUTH_1_PKCE_METHODS=none ```	2025-12-15 17:41:47 +00:00
Kacper Sawicki	6f86f67754	feat(coderd): add overload protection with rate limiting and concurrency control (#21161 ) ## Summary This adds configurable overload protection to the AI Bridge daemon to prevent the server from being overwhelmed during periods of high load. Partially addresses coder/internal#1153 (rate limits and concurrency control; circuit breakers are deferred to a follow-up). ## New Configuration Options \| Option \| Environment Variable \| Description \| Default \| \|--------\|---------------------\|-------------\|---------\| \| `--aibridge-max-concurrency` \| `CODER_AIBRIDGE_MAX_CONCURRENCY` \| Maximum number of concurrent AI Bridge requests. Set to 0 to disable (unlimited). \| `0` \| \| `--aibridge-rate-limit` \| `CODER_AIBRIDGE_RATE_LIMIT` \| Maximum number of AI Bridge requests per second. Set to 0 to disable rate limiting. \| `0` \| ## Behavior When limits are exceeded: - Concurrency limit: Returns HTTP `503 Service Unavailable` with message "AI Bridge is currently at capacity. Please try again later." - Rate limit: Returns HTTP `429 Too Many Requests` with `Retry-After` header. Both protections are optional and disabled by default (0 values). ## Implementation The overload protection is implemented as reusable middleware in `coderd/httpmw/ratelimit.go`: 1. `RateLimitByAuthToken`: Per-user rate limiting that uses `APITokenFromRequest` to extract the authentication token, with fallback to `X-Api-Key` header for AI provider compatibility (e.g., Anthropic). Falls back to IP-based rate limiting if no token is present. Includes `Retry-After` header for backpressure signaling. 2. `ConcurrencyLimit`: Uses an atomic counter to track in-flight requests and reject when at capacity. The middleware is applied in `enterprise/coderd/aibridge.go` via `r.Group` in the following order: 1. Concurrency check (faster rejection for load shedding) 2. Rate limit check Note: Rate limiting currently applies to all AI Bridge requests, including pass-through requests. Ideally only actual interceptions should count, but this would require changes in the aibridge library. ## Testing Added comprehensive tests for: - Rate limiting by auth token (Bearer token, X-Api-Key, no token fallback to IP) - Different tokens not rate limited against each other - Disabled when limit is zero - Retry-After header is set on 429 responses - Concurrency limiting (allows within limit, rejects over limit, disabled when zero)	2025-12-11 16:38:54 +01:00
Asher	c266bb830c	chore: add debug logging and recovery to agent api requests (#20785 ) This is to debug context timeouts on API requests to the agent. Because rbac and database cannot be imported in slim, split the logger middleware into slim and non-slim versions and break out the recovery middleware.	2025-11-25 14:59:20 -09:00
Cian Johnston	34f6e72879	feat(coderd): add lookup task by name in httpmw.TaskParam (#20647 ) * Adds a `GetTaskByOwnerIDAndName` query * Updates `httpmw.TaskParam` to fall back to task name if no task by UUID found. * Updates the `TaskByIdentifier` used in `cli/` to use direct lookup instead of searching.	2025-11-05 14:28:34 +00:00
Mathias Fredriksson	7ae3fdc749	refactor: use task data model for notifications (#20590 ) Updates coder/internal#973 Updates coder/internal#974	2025-10-31 15:53:27 +02:00
Steven Masley	13ca9ead3a	chore!: ensure consistent secret token generation and hashing (#20388 ) This PR uses the same sha256 hashing technique as we use for APIKeys. So now all randomly generated secrets will be hashed with sha256 for consistency. This is a breaking change for the oauth tokens. Since oauth is only allowed for dev builds and experimental, this is ok.	2025-10-23 15:38:49 -05:00
Mathias Fredriksson	9855460524	feat(coderd): use new data model for task delete (#20334 ) Updates coder/internal#976	2025-10-23 19:45:18 +03:00
Steven Masley	86f0f39863	chore: make authz recorder opt in (#20310 ) The authz recorder is causing a lot of memory to be allocated, and is a memory leak for websocket connections. This change makes it opt-in on a per request basis (ontop of `isDev`). To get the authz headers, use `Copy as cURL` on chrome and append the header `x-authz-checks=true`.	2025-10-21 14:15:37 +00:00
Thomas Kosiewski	ed90ecf00e	feat: add allow_list to resource-scoped API tokens (#19964 ) # Add API key allow_list for resource-scoped tokens This PR adds support for API key allow lists, enabling tokens to be scoped to specific resources. The implementation: 1. Adds a new `allow_list` field to the `CreateTokenRequest` struct, allowing clients to specify resource-specific scopes when creating API tokens 2. Implements `APIAllowListTarget` type to represent resource targets in the format `<type>:<id>` with support for wildcards 3. Adds validation and normalization logic for allow lists to handle wildcards and deduplication 4. Integrates with RBAC by creating an `APIKeyEffectiveScope` that merges API key scopes with allow list restrictions 5. Updates API documentation and TypeScript types to reflect the new functionality This feature enables creating tokens that are limited to specific resources (like workspaces or templates) by ID, making it possible to create more granular API tokens with limited access.	2025-10-09 14:53:08 +02:00
Cian Johnston	ff930ad4f3	feat(coderd): add ability to search org members by user_id, is_system, github_user_id (#20048 ) Adds the ability to search org members by query. Supported fields: `user_id`, `is_system`, `github_user_id`.	2025-09-30 23:54:21 +01:00
Thomas Kosiewski	d0db9ec88f	feat: add multi-scope support to API keys (#19917 ) # Canonicalize API Key Scopes This PR introduces canonical API key scopes with a `coder:` namespace prefix to avoid collisions with low-level resource:action names. It: 1. Renames special API key scopes in the database: - `all` → `coder:all` - `application_connect` → `coder:application_connect` 2. Adds support for a new `scopes` field in the API key creation request, allowing multiple scopes to be specified while maintaining backward compatibility with the singular `scope` field. 3. Updates the API documentation to reflect these changes, including the new endpoint for listing public API key scopes. 4. Ensures backward compatibility by mapping between legacy and canonical scope names in relevant code paths.	2025-09-26 11:56:34 +02:00
Thomas Kosiewski	fb0ce389a6	feat: implement API key scopes database migration (#19861 ) Added database migration for API key scopes. Fixes #19845	2025-09-22 19:26:51 +02:00
Thomas Kosiewski	d238480c7a	fix: trim whitespace from API tokens (#19814 )	2025-09-15 10:02:10 +02:00
Callum Styan	f0cf0adcc8	feat: log additional known non-sensitive query param fields in the httpmw logger (#19532 ) Blink helped here but it's suggestion was to have a set map of sensitive fields based on predefined constants in various files, such as the api token string names. For now we'll add additional query param logging for fields we know are safe/that we want to log, such as query pagination/limit fields and ID list counts which may help identify P99 DB query latencies. --------- Signed-off-by: Callum Styan <callumstyan@gmail.com>	2025-08-26 11:14:53 -07:00
Steven Masley	34c46c0748	chore: rename `service` -> `coder_service`, remove `agent_id` label (#19241 ) Pyroscope uses `service` tag for top level distinction. So move our `service` -> `coder_service`	2025-08-07 13:58:39 -05:00
Steven Masley	8ba8b4f061	chore: add profiling labels for pprof analysis (#19232 ) PProf labels segment the code into groups for determing the source of cpu/memory profiles. Since the web server and background jobs share a lot of the same code (eg wsbuilder), it helps to know if the load is user induced, or background job based.	2025-08-07 11:21:17 -05:00
Thomas Kosiewski	071383bbe8	feat: add RFC 9728 OAuth2 resource metadata support (#18920 ) # Enhanced OAuth2 and MCP Compliance for API Authentication This PR improves OAuth2 and MCP (Microsoft Cloud for Sovereignty) compliance by: 1. Adding RFC 9728 compliant `WWW-Authenticate` headers with resource metadata URLs 2. Passing the configured `AccessURL` to API key middleware for proper audience validation 3. Creating specialized CORS handling for OAuth2 and MCP endpoints with appropriate headers 4. Making the `state` parameter optional in OAuth2 authorization requests These changes ensure proper OAuth2 token audience validation against the configured access URL and improve interoperability with OAuth2 clients by providing better error responses and metadata discovery. Signed-off-by: Thomas Kosiewski <tk@coder.com>	2025-07-19 22:05:15 +02:00
Thomas Kosiewski	3dcd2acf1d	fix: return 404 instead of 401 for missing OAuth2 apps (#18755 ) ## Problem Users were being automatically logged out when deleting OAuth2 applications. ## Root Cause 1. User deletes OAuth2 app successfully 2. React Query automatically refetches the app data 3. Management API incorrectly returned 401 Unauthorized for the missing app 4. Frontend axios interceptor sees 401 and calls `signOut()` 5. User gets logged out unexpectedly ## Solution - Change management API to return 404 Not Found for missing OAuth2 apps - OAuth2 protocol endpoints continue returning 401 per RFC 6749 - Rename `writeInvalidClient` to `writeClientNotFound` for clarity ## Additional Changes - Add conditional OAuth2 navigation when experiment is enabled or in dev builds - Add `isDevBuild()` utility and `buildInfo` to dashboard context - Minor improvements to format script and warning dialogs Signed-off-by: Thomas Kosiewski <tk@coder.com>	2025-07-07 19:57:32 +02:00
Thomas Kosiewski	7fbb3ced5b	feat: add MCP HTTP server experiment and improve experiment middleware (#18712 ) # Add MCP HTTP Server Experiment This PR adds a new experiment flag `mcp-server-http` to enable the MCP HTTP server functionality. The changes include: 1. Added a new experiment constant `ExperimentMCPServerHTTP` with the value "mcp-server-http" 2. Added display name and documentation for the new experiment 3. Improved the experiment middleware to: - Support requiring multiple experiments - Provide better error messages with experiment display names - Add a development mode bypass option 4. Applied the new experiment requirement to the MCP HTTP endpoint 5. Replaced the custom OAuth2 middleware with the standard experiment middleware The PR also improves the `Enabled()` method on the `Experiments` type by using `slices.Contains()` for better readability.	2025-07-03 20:09:18 +02:00
Thomas Kosiewski	09c50559f3	feat: implement RFC 6750 Bearer token authentication (#18644 ) # Add RFC 6750 Bearer Token Authentication Support This PR implements RFC 6750 Bearer Token authentication as an additional authentication method for Coder's API. This allows clients to authenticate using standard OAuth 2.0 Bearer tokens in two ways: 1. Using the `Authorization: Bearer <token>` header 2. Using the `access_token` query parameter Key changes: - Added support for extracting tokens from both Bearer headers and access_token query parameters - Implemented proper WWW-Authenticate headers for 401/403 responses with appropriate error descriptions - Added comprehensive test coverage for the new authentication methods - Updated the OAuth2 protected resource metadata endpoint to advertise Bearer token support - Enhanced the OAuth2 testing script to verify Bearer token functionality These authentication methods are added as fallback options, maintaining backward compatibility with Coder's existing authentication mechanisms. The existing authentication methods (cookies, session token header, etc.) still take precedence. This implementation follows the OAuth 2.0 Bearer Token specification (RFC 6750) and improves interoperability with standard OAuth 2.0 clients.	2025-07-02 19:14:54 +02:00

1 2 3 4 5 ...

268 Commits