coder

mirror of https://github.com/coder/coder.git synced 2026-06-04 13:38:21 +00:00

Author	SHA1	Message	Date
Kyle Carberry	aba3832b15	fix: update the compaction message to be the "user" role (#22819 ) ## Bug After compaction in the chat loop, the loop re-enters and calls the LLM with a prompt that has no non-system messages. Anthropic (and most providers) require at least one user/assistant/tool message, so the API errors with empty messages. ## Root Cause The compaction summary was stored as `role=system`. After compaction, `GetChatMessagesForPromptByChatID` returns only: - The compressed system summary (matched by the CTE) - Original non-compressed system messages (system prompts) All original user/assistant/tool messages are excluded (they predate the summary). The compaction assistant/tool messages are `compressed=TRUE` and don't match the main query's `compressed=FALSE` clauses. So `ReloadMessages` returned only system messages. The Anthropic provider moves system messages into a separate `system` field, leaving the `messages` API field as `[]`. ## Fix 1. Changed compaction summary from `role=system` to `role=user` — the summary now appears as a user message in the reloaded prompt, giving the model valid conversational context to respond to. 2. Simplified the CTE — removed the `role = 'system'` check and narrowed `visibility IN ('model', 'both')` to just `visibility = 'model'`. The summary is the only compressed message with `visibility=model` (the assistant has `visibility=user`, the tool has `visibility=both`), so the role check was redundant. ## Test `PostRunCompactionReEntryIncludesUserSummary`: verifies the re-entry prompt contains a user message (the compaction summary) after compaction + reload.	2026-03-08 22:25:27 -04:00
Mathias Fredriksson	a104d608a3	feat: add file/image attachment support to chat input (#22604 ) This change adds support for image attachments to chat via add button and clipboard paste. Files are stored in a new `chat_files` table and referenced by ID in message content. File data is resolved from storage at LLM dispatch time, keeping the message content column small. Upload validates MIME types via content type or content sniffing against an allowlist (png, jpeg, gif, webp). The retrieval endpoint serves files with immutable caching headers. On the frontend, uploads start eagerly on attach with a background fetch to pre-warm the browser HTTP cache so the timeline renders instantly after send.	2026-03-06 21:05:26 +02:00
Danny Kopping	13e3df67d6	feat: track client sessions (#22470 ) This change adds support for tracking client session IDs in AI Bridge interceptions to enable better session-based auditing. Depends on https://github.com/coder/aibridge/pull/198 Fixes https://github.com/coder/internal/issues/1337 The session ID field is optional and not universally supported by all clients.	2026-03-06 14:43:53 +02:00
Kayla はな	56bdea73b8	feat: add workspace acls to task rbac objects (#22311 ) To allow tasks to be shareable, we need to share both the `task` resource and the `workspace` resource, and their sharing state needs to be kept in sync. We've already implemented all of the necessary ACL functionality for workspaces, so we can just sort of proxy those ACLs back to the task as well.	2026-03-05 13:40:53 -07:00
Mathias Fredriksson	719c24829a	build(Makefile): use atomic writes for remaining gen targets (#22670 ) Follow-up to #22612. Running `git status --short` in a loop during `make -B -j gen` still showed intermediate states for several files. This PR fixes the remaining ones. The main issues: - `generate.sh` ran `gofmt` and `goimports` in-place after moving files into the source tree. Now it formats in a workdir first and only `mv`s the final result. - `protoc` targets wrote directly to the source tree. Wrapped with `scripts/atomic_protoc.sh` which redirects output to a tmpdir. - Several generators used hardcoded `/tmp/` paths. On systems where `/tmp` is tmpfs, `mv` degrades to copy+delete. Switched to a project-local `_gen/` directory (gitignored, same filesystem). - `apidoc/.gen` and `cli/index.md` used `cp` for final output. Replaced with `mv`. - `manifest.json` was written twice (unformatted, then formatted). Now `.gen` writes to a staging file and the manifest target does one formatted atomic write. - `biome_format.sh` silently skipped files in gitignored dirs. Added `--vcs-enabled=false`. Two helpers reduce the Makefile boilerplate: `scripts/atomic_protoc.sh` (wraps protoc) and an `atomic_write` Make define (stdout-to-temp-to-target pattern). `.PRECIOUS` now also covers `.pb.go` and mock files. Verification: `make -B -j gen` x3 with `git status` polling, no changes. Refs #22612	2026-03-05 22:32:18 +02:00
Danielle Maywood	f91475cd51	test: remove unnecessary dbauthz.AsSystemRestricted calls in tests (#22663 )	2026-03-05 20:29:49 +00:00
Mathias Fredriksson	a6a8fd94d7	build(Makefile): enable parallel `make -j gen` with correct dependency graph (#22612 ) `make gen` could not run with `-j` because inter-target dependency edges were missing. Multiple recipes compile `coderd/rbac` (which includes generated files like `object_gen.go`), and without explicit ordering, parallel runs produced syntax errors from mid-write reads. Three main changes: Dependency graph fixes declare the compile-time chain through `coderd/rbac` so that `object_gen.go` is written before anything that imports it is compiled. The DB generation targets use a GNU Make 4.3+ grouped target (`&:`) so Make knows `generate.sh` co-produces `querier.go`, `unique_constraint.go`, `dbmetrics`, and `dbauthz` in a single invocation. `SKIP_DUMP_SQL=1` avoids re-entrant `make` inside `generate.sh` when the Makefile already guarantees `dump.sql` is fresh. `scripts/atomicwrite` package replaces `os.WriteFile` in all gen scripts with a temp-file-in-same-dir + rename pattern, preventing interrupted runs from leaving partial files. `.PRECIOUS` and shell atomic writes protect git-tracked generated files from Make's default delete-on-error behavior. Since these files are committed, deletion is worse than staleness -- `git restore` is the recovery path. CI now runs `make -j --output-sync -B gen` (~32s, down from ~85s serial). \| Scenario \| Before \| After \| \|-----------------------------------\|--------------------\|----------\| \| `make gen` (serial) \| 95s \| 95s \| \| `make -j gen` (parallel) \| race error \| 22s \| \| CI `make -j --output-sync -B gen` \| forced serial ~85s \| ~32s \|	2026-03-05 11:58:10 +00:00
Mathias Fredriksson	c7dd429bbf	fix(coderd/database/dbfake): prevent cross-test job stealing in WorkspaceBuildBuilder (#22598 ) Previously, WorkspaceBuildBuilder.doInTX() inserted provisioner jobs with empty tags and used a loop in AcquireProvisionerJob that could match other tests' pending jobs when parallel tests share a database. Add a unique tag (jobID -> "true") to each provisioner job at insert time, then use that tag in AcquireProvisionerJob to target only the correct job. This follows the same pattern used in dbgen.ProvisionerJob. Closes coder/internal#1367	2026-03-04 17:47:34 +00:00
Sas Swart	cfcb81fb0f	fix: user status change chart accommodates DST (#22191 ) closes https://github.com/coder/internal/issues/464 # Summary This PR resolves a flaky test that was sensitive to DST transitions in various time zones. The root of the flake was: * a bug; the query and its tests assume 24 hours per day * the tests used local system time, which resulted in failures for dates proximal to DST transitions # Changes Query: The original query assumed 24 hour intervals between each day, which is not a valid assumption. It now increments `1 day` at a time. Database tests: Database level tests for the query all assumed 24 hour days. They now increment in DST-aware days instead. Instead of using time.Now() as a base for testing, the test uses a series of dates over the course of an entire year, to ensure that DST transition dates are present in every test run. # API Endpoint The endpoint that delivers the user status chart now accepts an IANA timezone name as a parameter and passes it, keeping the existing offset as a fallback, to the database query. API level tests were added to ensure the correct response form and error behaviour. Correctness of content is tested at the database level.	2026-03-04 12:54:39 +02:00
Danielle Maywood	d2d956edb1	fix: add archived query parameter to chat list endpoint (#22562 ) Despite the SDK type having an `Archived` field for chats, this data was never fetched from the database — the `GetChatsByOwnerID` query hardcoded `AND archived = false`, and the `convertChat` function never mapped the field. This PR adds an optional `archived` query parameter to `GET /api/experimental/chats`: \| Value \| Behavior \| \|-------\|----------\| \| (not provided) \| Returns all chats (active and archived) \| \| `archived=false` \| Returns only non-archived chats \| \| `archived=true` \| Returns only archived chats \| This follows the same pattern used by template versions (`sqlc.narg('archived')` nullable boolean). Also fixes `convertChat` to populate the `Archived` field in API responses, which was never being set despite existing on the SDK type.	2026-03-03 20:39:19 +00:00
Danny Kopping	1b08bc76a6	feat: store tool call IDs to determine interception lineage (#22246 ) Adds database columns and server-side logic to track interception lineage via tool call IDs. When an interception ends, the server resolves the correlating tool call ID to find the parent interception and links them via `parent_id`. New `provider_tool_call_id` column on `aibridge_tool_usages` and `parent_id` column on `aibridge_interceptions`, with indexes for lookup. `findParentInterceptionID` queries by tool call ID and filters out the current interception to find the parent. Adapted from the [coder/coder `dk/prompt_provenance_poc`](https://github.com/coder/coder/compare/main...dk/prompt_provenance_poc) branch. Depends on [coder/aibridge#188](https://github.com/coder/aibridge/pull/188). Closes https://github.com/coder/internal/issues/1334	2026-03-03 21:04:41 +02:00
Kyle Carberry	2d7009e50d	test: reduce unnecessary sleep durations in tests (#22552 ) ## Summary Removes `time.Sleep` calls in two test files by replacing them with deterministic or event-driven alternatives. ### Changes `coderd/provisionerjobs_test.go` (34.5s → 0.25s) Replaced `time.Sleep(1500ms)` with a direct SQL `UPDATE` to bump `created_at` by 2 seconds. The sleep existed purely to ensure different timestamps for sort-order testing. The fix is deterministic and cannot flake. Uses `NewDBWithSQLDB` (the test already required real Postgres via `WithDumpOnFailure`). `coderd/database/pubsub/pubsub_test.go` (2.05s → 1.3s) Replaced `time.Sleep(1s)` with a `testutil.Eventually` retry loop that publishes and checks for subscriber receipt. This is the idiomatic pattern in the codebase. The old sleep waited for pq.Listener to re-issue LISTEN after reconnect; the new code polls until it actually works.	2026-03-03 10:19:00 -05:00
Kyle Carberry	5eebd3829f	fix: use cursor-based query for chat stream notifications (#22510 ) ## Problem The pubsub notification handler in `chatd` re-fetched all messages from the DB on every new message notification, then filtered in Go with `msg.ID > lastMessageID`. This grows linearly with conversation length — every new message triggers a full table scan of that chat's history. The `AfterMessageID` field in the pubsub notification payload was clearly designed for cursor-based fetching, but no matching query existed. ## Fix - Add `GetChatMessagesByChatIDAfter` SQL query with `WHERE id > @after_id`, so the database does the filtering instead of Go. - Use it in the pubsub notification handler in `chatd.go`, passing `lastMessageID` as the cursor. - Implement the dbauthz wrapper (was a `panic("not implemented")` stub from codegen) with the same read-check-on-parent-chat pattern as adjacent methods. - Add dbauthz test coverage for the new method. Not changed: The initial snapshot in `Subscribe()` still loads all messages — that's correct, since a newly-connecting client needs the full conversation state. The waste was only in the ongoing notification path.	2026-03-02 16:31:04 -05:00
Kyle Carberry	0908505348	fix(chats): archive chat tree with single query instead of loop (#22496 ) ## Problem When archiving an agent with subagents, the children briefly flash in the sidebar as root-level items before disappearing. Two issues: 1. Backend: Archive used N+1 queries — a recursive DFS (`archiveChatTree`, no transaction) or BFS loop (`chatd.ArchiveChat`, N+1 queries in a tx) to walk the tree and archive each chat individually. 2. Frontend: The SSE `deleted` event handler only filtered out the parent chat from the cache. Children remained briefly, got promoted to root-level by `buildChatTree`, then disappeared on the next re-fetch. ## Fix Backend: Replace both tree-walk implementations with a single SQL query: ```sql UPDATE chats SET archived = true, updated_at = NOW() WHERE id = @id OR root_chat_id = @id; ``` This leverages the existing `root_chat_id` column (already indexed) to archive the entire tree atomically. Frontend: When a `deleted` event arrives, also filter out any chats whose `root_chat_id` matches the deleted chat, so children vanish from the sidebar immediately with the parent. ## Changes - `coderd/database/queries/chats.sql` — Added `ArchiveChatTreeByID` query - `coderd/chats.go` — Use single query, delete `archiveChatTree` function - `coderd/chatd/chatd.go` — Simplify `ArchiveChat` to use single query - `coderd/database/dbauthz/dbauthz.go` — Auth wrapper for new query - `coderd/chats_test.go` — Added `TestArchiveChat/ArchivesChildren` subtest - `site/src/pages/AgentsPage/AgentsPage.tsx` — Filter children in SSE handler - Generated files updated via `make gen`	2026-03-02 12:00:00 -05:00
Cian Johnston	a62f2fbfc4	feat(rbac): add AsChatd subject to replace AsSystemRestricted in chatd (#22487 ) Add a new SubjectTypeChatd RBAC subject with minimal permissions: - Chat: CRUD - Workspace: Read - DeploymentConfig: Read Replace all 10 AsSystemRestricted calls in coderd/chatd/chatd.go: - Line 890: Use AsChatd instead of AsSystemRestricted for the background processor context. - Subscribe() path (5 calls): Remove system escalation entirely; these run under the authenticated user's context from the HTTP handler. - processChat path (4 calls): Remove redundant per-call wraps; the context already carries AsChatd from the processor start. Add TestAsChatd verifying allowed and denied actions. Created using Mux (Opus 4.6)	2026-03-02 15:57:04 +00:00
Kyle Carberry	34d9392e37	chore(db): remove workspace_agent_id from chats table (#22442 ) ## Summary Remove the `workspace_agent_id` column from the `chats` table and dynamically look up the first workspace agent instead. ## Problem When a workspace is stopped and restarted, the workspace agent gets a new ID. The `workspace_agent_id` stored on the chat at creation time becomes stale, making the agent unreachable. This caused chats to break after workspace restarts. ## Solution Instead of persisting the agent ID, dynamically look up the first agent from the workspace's latest build via `GetWorkspaceAgentsInLatestBuildByWorkspaceID` whenever an agent connection is needed. The `workspace_id` on the chat remains stable across restarts. This behavior may be refined later (e.g., agent selection heuristics), but picking the first agent resolves the immediate breakage. ## Changes - Migration 000425: Drop `workspace_agent_id` column from `chats` - SQL queries: Remove `workspace_agent_id` from `InsertChat` and `UpdateChatWorkspace` - chatd.go: `getWorkspaceConn` and `resolveInstructions` now look up agents dynamically from workspace ID - chatd.go: Remove `refreshChatWorkspaceSnapshot` (no longer needed) - createworkspace.go: Stop persisting agent ID when associating workspace with chat - subagent.go: Stop passing agent ID to child chats - SDK/frontend: Remove `WorkspaceAgentID` / `workspace_agent_id` from Chat type --------- Co-authored-by: Kyle Carberry <kylecarbs@gmail.com>	2026-02-28 16:46:51 -05:00
Kyle Carberry	0ad2f9ecd7	feat(chatd): persist last_error on chats table (#22436 ) Adds a nullable `last_error` column to the `chats` table so error reasons survive page reloads. Backend: - Migration adds `last_error TEXT` (nullable) to chats - `UpdateChatStatus` writes the error reason when status transitions to `error`, clears it (NULL) on recovery - `convertChat` maps `sql.NullString` to `string` in the SDK Frontend:* - Sidebar falls back to `chat.last_error` when no stream error reason is cached - Chat detail page does the same for `persistedErrorReason` - Fixtures updated for new required field	2026-02-28 12:27:26 -05:00
Kyle Carberry	12083441e0	feat(chats): archive chats instead of hard-deleting them (#22406 ) ## Summary The UI has always labeled the action as "Archive agent" but the backend was performing a hard `DELETE`, permanently destroying chats and all their messages. This change replaces the hard delete with a soft archive, consistent with the pattern used by template versions. ## Changes ### Database - Migration 000423: Add `archived boolean DEFAULT false NOT NULL` column to `chats` table - Replace `DeleteChatByID` query with `ArchiveChatByID` (`UPDATE SET archived = true`) - Add `UnarchiveChatByID` query (`UPDATE SET archived = false`) - Filter archived chats from `GetChatsByOwnerID` (`WHERE archived = false`) ### API - Remove `DELETE /api/experimental/chats/{chat}` - Add `POST /api/experimental/chats/{chat}/archive` — archives a chat and all its descendants - Add `POST /api/experimental/chats/{chat}/unarchive` — unarchives a single chat (API only, no UI yet) ### Backend - `archiveChatTree()` recursively archives child chats (replaces `deleteChatTree()` which hard-deleted) - Chat daemon's `ArchiveChat()` archives the full chat tree in a transaction - Authorization uses `ActionUpdate` instead of `ActionDelete` ### SDK - Replace `DeleteChat()` with `ArchiveChat()` and `UnarchiveChat()` - Add `Archived` field to `Chat` struct ### Frontend - `archiveChat` API call uses `POST .../archive` instead of `DELETE` - No UI changes — the "Archive agent" button now actually archives instead of deleting ## Design Decision This follows the template version archive pattern (Pattern B in the codebase): - `archived boolean` column (not `deleted boolean`) - Dedicated `POST .../archive` and `POST .../unarchive` routes (not repurposing `DELETE`) - Reversible — users can unarchive via the API (UI for this will come later)	2026-02-27 16:46:19 -05:00
Kyle Carberry	edee917d88	feat: add experimental agents support (#22290 ) feat: add AI chat system with agent tools and chat UI Introduce the chatd subsystem and Agents UI for AI-powered chat within Coder workspaces. - Add chatd package with chat loop, message compaction, prompt management, and LLM provider integration (OpenAI, Anthropic) - Add agent tools: create workspace, list/read templates, read/write/ edit files, execute commands - Add chat API endpoints with streaming, message editing, and durable reconnection - Add database schema and migrations for chats, chat messages, chat providers, and chat model configs - Add RBAC policies and dbauthz enforcement for chat resources - Add Agents UI pages with conversation timeline, queued messages list, diff viewer, and model configuration panel - Add comprehensive test coverage including coderd integration tests, chatd unit tests, and Storybook stories - Gate feature behind experiments flag --------- Co-authored-by: Cian Johnston <cian@coder.com> Co-authored-by: Danielle Maywood <danielle@themaywoods.com> Co-authored-by: Jeremy Ruppel <jeremy@coder.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 16:50:56 +00:00
Jake Howell	d2787df442	feat: add AI Bridge request logs model filter (#22230 ) This pull-request implements a simple filtering logic so that we're able to pick which model the user actually used when logs were sent to AI Bridge. - Add `GET /aibridge/models` API endpoint that returns distinct model names from AI Bridge interceptions, with pagination and search support - New `ListAIBridgeModels` SQL query using case-sensitive prefix matching (`LIKE model \|\| '%'`) to allow B-tree index usage - Hand-written `ListAuthorizedAIBridgeModels` in `modelqueries.go` for RBAC authorization filter injection - `AIBridgeModels` search query parser in searchquery/search.go (defaults bare terms to the `model` field) - dbauthz wrappers, dbmetrics, and dbmock implementations for the new query <img width="292" height="185" alt="image" src="https://github.com/user-attachments/assets/134771df-2d26-4c54-acc4-27f58128b351" />	2026-02-26 02:40:45 +11:00
Cian Johnston	6336fee3a7	feat: add telemetry for task lifecycle events (#21922 ) Relates to https://github.com/coder/internal/issues/1259 Adds new database queries and telemetry collection functions to gather task lifecycle events (pause/resume cycles, idle time) for analytics. Task events track pause/resume activity, idle duration before pausing, paused duration, and time from resume to first app status, filtered to recent activity based on the telemetry snapshot interval. 🤖 Created with Mux (Opus 4.6).	2026-02-24 17:04:42 +00:00
Kacper Sawicki	1e274063d4	feat(coderd): filter expired API tokens server-side (#22263 ) ## Summary Moves expired token filtering from client-side to server-side by adding an `include_expired` parameter to the `GetAPIKeysByLoginType` and `GetAPIKeysByUserID` database queries. This is more efficient for large deployments with many expired/short-lived tokens. ## Changes - Add `include_expired` parameter to SQL queries using `OR` short-circuit - Add `include_expired` query parameter to `GET /users/{user}/keys/tokens` - Add `IncludeExpired` field to `codersdk.TokensFilter` - Remove client-side filtering from CLI `tokens list` command - Add `TestTokensFilterExpired` test Fixes coder/internal#1357	2026-02-24 15:27:03 +00:00
Jon Ayers	0a7a3da178	fix: exclude provisioner_state from workspace_build_with_user view (#22159 ) The provisioner state for a workspace build was being loaded for every long-lived agent rpc connection. Since this state can be anywhere from kilobytes to megabytes this can gradually cause the `coderd` memory footprint to grow over time. It's also a lot of unnecessary allocations for every query that fetches a workspace build since only a few callers ever actually reference the provisioner state. This PR removes it from the returned workspace build and adds a query to fetch the provisioner state explicitly.	2026-02-23 22:46:17 -06:00
Thomas Kosiewski	b776a14b46	fix(coderd): harden OAuth2 provider security (#22194 ) ## Summary Harden the OAuth2 provider with multiple security fixes addressing `coder/security#121` (CSRF session takeover) and converge on OAuth 2.1 compliance. ### Security Fixes \| Fix \| Description \| Commits \| \|-----\|-------------\|---------\| \| CSRF on `/oauth2/authorize` \| Enforce CSRF protection on the authorize endpoint POST (consent form submission) \| `ba7d646`, `b94a64e` \| \| Clickjacking: `frame-ancestors` CSP \| Prevent consent page from being iframed (`Content-Security-Policy: frame-ancestors 'none'` + `X-Frame-Options: DENY`) \| `597aeb2` \| \| Exact redirect URI matching \| Changed from prefix matching to full string exact matching per OAuth 2.1 §4.1.2.1 \| `73d64b1`, `93897f1` \| \| Store & verify `redirect_uri` \| Store redirect_uri with auth code in DB, verify at token exchange matches exactly (RFC 6749 §4.1.3) \| `50569b9`, `d7ca315` \| \| Mandatory PKCE \| Require `code_challenge` at authorization (for `response_type=code`) + unconditional `code_verifier` verification at token exchange \| `d7ca315`, `1cda1a9` \| \| Reject implicit grant \| `response_type=token` now returns `unsupported_response_type` error page (OAuth 2.1 removes implicit flow) \| `d7ca315`, `91b8863` \| ### Changes by File `coderd/httpmw/csrf.go` — Extended the CSRF `ExemptFunc` to enforce CSRF on `/oauth2/authorize` in addition to `/api` routes. The consent form POST is now CSRF-protected to prevent cross-site authorization code theft. `site/site.go` — Added `Content-Security-Policy: frame-ancestors 'none'` and `X-Frame-Options: DENY` headers to `RenderOAuthAllowPage` (consent page only — does not affect the SPA/global CSP used by AI tasks). `coderd/httpapi/queryparams.go` — Changed `RedirectURL` from prefix matching (`strings.HasPrefix(v.Path, base.Path)`) to full URI exact matching (`v.String() != base.String()`), comparing scheme, host, path, and query. `coderd/oauth2provider/authorize.go` — Added PKCE enforcement: `code_challenge` is required when `response_type=code` (via a conditional check, not `RequiredNotEmpty`, so `response_type=token` can reach the explicit rejection path). `ShowAuthorizePage` (GET) validates `response_type` before rendering and returns a 400 error page for unsupported types. `ProcessAuthorize` (POST) stores the `redirect_uri` with the auth code when explicitly provided. `coderd/oauth2provider/tokens.go` — PKCE verification is now unconditional (not gated on `code_challenge` being present in DB). If the stored code has a `redirect_uri`, the token endpoint verifies it matches exactly — mismatch returns `errBadCode` → `invalid_grant`. Missing `code_verifier` returns `invalid_grant`. `codersdk/oauth2.go` — `OAuth2ProviderResponseTypeToken` constant and `Valid()` acceptance are kept so the authorize handler can parse `response_type=token` and return the proper `unsupported_response_type` error rather than failing at parameter validation. *`coderd/database/migrations/000421_` — Added `redirect_uri text` column to `oauth2_provider_app_codes`. ### Design Decisions `state` parameter remains optional — The plan initially required `state` via `RequiredNotEmpty`, but this was reverted in `376a753` to avoid breaking existing clients. The `state` is still hashed and stored when provided (via `state_hash` column), securing clients that opt in. `response_type=token` kept in `Valid()` — Removing it from `Valid()` would cause the parameter parser to reject the request before the authorize handler can return the proper `unsupported_response_type` error. The constant is kept for correct error handling flow. CSP scoped to consent page only — `frame-ancestors 'none'` is set only on the OAuth consent page renderer, not globally. The SPA/global CSP was previously changed to allow framing for AI tasks ([#18102](https://github.com/coder/coder/pull/18102)); this change does not regress that. ### Out of Scope (follow-up PRs) - Bearer tokens in query strings (needs internal caller audit) - Scope enforcement on OAuth2 tokens - Rate limiting on dynamic client registration --- <details> <summary>📋 Implementation Plan</summary> # Plan: Harden OAuth2 Provider — Security Fixes + OAuth 2.1 Compliance ## Context & Why Security issue `coder/security#121` reports a critical session takeover via CSRF on the OAuth2 provider. This plan covers all remaining security fixes from that issue plus convergence on OAuth 2.1 requirements. The goal is a single PR that closes all actionable gaps. ## Current State (already committed on branch `csrf-sjx1`) \| Fix \| Status \| Commits \| \|-----\|--------\|---------\| \| Fix 1: CSRF on `/oauth2/authorize` \| ✅ Done \| `ba7d646`, `b94a64e` \| \| CSRF token in consent form HTML \| ✅ Done \| `b94a64e` \| \| `state_hash` column + storage \| ✅ Done (hash stored, but state still optional) \| `9167d83`, `b94a64e` \| \| Tests for CSRF + state hash \| ✅ Done \| `e4119b5` \| ## Remaining Work ### ~~Fix 2 — Require `state` parameter~~ (DROPPED) > Decision: Do not enforce `state` as required. The `state` parameter is still hashed and stored when provided (via `hashOAuth2State` / `state_hash` column from prior commits), but clients are not forced to supply it. This avoids breaking existing integrations that omit state. Rollback: Remove `"state"` from the `RequiredNotEmpty` call in `coderd/oauth2provider/authorize.go:42`: ```go // BEFORE (current on branch) p.RequiredNotEmpty("response_type", "client_id", "state", "code_challenge") // AFTER p.RequiredNotEmpty("response_type", "client_id", "code_challenge") ``` No test changes needed — tests already pass `state` voluntarily. ### Fix 4 — Exact redirect URI matching Currently `coderd/httpapi/queryparams.go:233` uses prefix matching: ```go // CURRENT — prefix match if v.Host != base.Host \|\| !strings.HasPrefix(v.Path, base.Path) { ``` OAuth 2.1 requires exact string matching. Change to: ```go // AFTER — exact match (OAuth 2.1 §4.1.2.1) if v.Host != base.Host \|\| v.Path != base.Path { ``` File: `coderd/httpapi/queryparams.go` — `RedirectURL` method Also update the error message from "must be a subset of" to "must exactly match". Additionally, store `redirect_uri` with the auth code and verify at the token endpoint (RFC 6749 §4.1.3): 1. New migration (same migration file or a new `000421`): Add `redirect_uri text` column to `oauth2_provider_app_codes` 2. Update INSERT query in `coderd/database/queries/oauth2.sql` to include `redirect_uri` 3. `coderd/oauth2provider/authorize.go`: Store `params.redirectURL.String()` when inserting the code 4. `coderd/oauth2provider/tokens.go`: After retrieving the code from DB, verify that `redirect_uri` from the token request matches the stored value exactly. Currently `tokens.go:103` calls `p.RedirectURL(vals, callbackURL, "redirect_uri")` for prefix validation only — it must compare against the stored redirect_uri from the code, not just the app's callback URL. <details> <summary>Why both exact match AND store+verify?</summary> Exact matching at the authorize endpoint prevents open redirectors (attacker can't use a sub-path). Storing and verifying at the token endpoint prevents code injection — an attacker who steals a code can't exchange it with a different redirect_uri than was originally authorized. This is required by RFC 6749 §4.1.3 and OAuth 2.1. </details> ### Fix 7 — `frame-ancestors` CSP on consent page The consent page can be iframed by a workspace app (same-site), which is the attack vector. Add a `Content-Security-Policy` header to prevent framing. File: `site/site.go` — `RenderOAuthAllowPage` function (~line 731)** Before writing the response, add: ```go func RenderOAuthAllowPage(rw http.ResponseWriter, r http.Request, data RenderOAuthAllowData) { rw.Header().Set("Content-Type", "text/html; charset=utf-8") // Prevent the consent page from being framed to mitigate // clickjacking attacks (coder/security#121). rw.Header().Set("Content-Security-Policy", "frame-ancestors 'none'") rw.Header().Set("X-Frame-Options", "DENY") ... ``` Both headers for defense-in-depth (CSP for modern browsers, X-Frame-Options for legacy). ### OAuth 2.1 — Mandatory PKCE Currently PKCE is checked only when `code_challenge` was provided during authorization (`tokens.go:258`): ```go // CURRENT — conditional check if dbCode.CodeChallenge.Valid && dbCode.CodeChallenge.String != "" { // verify PKCE } ``` OAuth 2.1 requires PKCE for ALL authorization code flows. Change to: File: `coderd/oauth2provider/authorize.go`* — Add `"code_challenge"` to required params: ```go p.RequiredNotEmpty("response_type", "client_id", "code_challenge") ``` File: `coderd/oauth2provider/tokens.go:257-265` — Make PKCE verification unconditional: ```go // AFTER — PKCE always required (OAuth 2.1) if req.CodeVerifier == "" { return codersdk.OAuth2TokenResponse{}, errInvalidPKCE } if !dbCode.CodeChallenge.Valid \|\| dbCode.CodeChallenge.String == "" { // Code was issued without a challenge — should not happen // with the authorize endpoint enforcement, but defend in // depth. return codersdk.OAuth2TokenResponse{}, errInvalidPKCE } if !VerifyPKCE(dbCode.CodeChallenge.String, req.CodeVerifier) { return codersdk.OAuth2TokenResponse{}, errInvalidPKCE } ``` File: `codersdk/oauth2.go` — Remove `OAuth2ProviderResponseTypeToken` from the enum or reject it explicitly in the authorize handler. Currently it's defined at line 216 but the handler ignores `response_type` and always issues a code. We should either: - (a) Remove the `"token"` variant from the enum and reject it with `unsupported_response_type`, OR - (b) Add an explicit check in `ProcessAuthorize` that rejects `response_type=token` Option (b) is simpler and more backwards-compatible: ```go // In ProcessAuthorize, after extracting params: if params.responseType != codersdk.OAuth2ProviderResponseTypeCode { httpapi.WriteOAuth2Error(ctx, rw, http.StatusBadRequest, codersdk.OAuth2ErrorCodeUnsupportedResponseType, "Only response_type=code is supported") return } ``` ### OAuth 2.1 — Bearer tokens in query strings `coderd/httpmw/apikey.go:743` accepts `access_token` from URL query parameters. OAuth 2.1 prohibits this. However, this may be used internally (e.g., workspace apps, DERP). Need to audit callers before removing. Approach: This is a larger change with potential breakage. Mark as a separate follow-up issue rather than including in this PR. Document the finding. ### OAuth 2.1 — Removed flows ✅ Already compliant. `tokens.go` only supports `authorization_code` and `refresh_token` grant types. The implicit grant (`response_type=token`) will be explicitly rejected per the PKCE section above. ### OAuth 2.1 — Refresh token rotation ✅ Already compliant. `tokens.go:442` deletes the old API key when a refresh token is used. ## Migration Plan All DB changes can go in a single new migration (or extend 000420 if the branch is rebased before merge). Columns to add: - `redirect_uri text` on `oauth2_provider_app_codes` The `state_hash` column is already added by migration 000420. ## Implementation Order 1. Fix 7 — CSP headers on consent page (isolated, no deps) 2. ~~Fix 2 — Require `state` parameter~~ (DROPPED — state stays optional) 3. Fix 4 — Exact redirect URI matching + store/verify redirect_uri 4. PKCE mandatory — Require `code_challenge` + reject `response_type=token` 5. Rollback — Remove `"state"` from `RequiredNotEmpty` in `authorize.go` 6. Tests — Update/add tests for all changes 7. `make gen` after DB changes ## Out of Scope (separate PRs) - Bearer tokens in query strings (needs internal caller audit) - Scope enforcement on OAuth2 tokens - Rate limiting / quota on dynamic client registration </details> --- _Generated with [`mux`](https://github.com/coder/mux) • Model: `anthropic:claude-opus-4-6` • Thinking: `xhigh`_	2026-02-23 12:18:44 +01:00
Danielle Maywood	911d734df9	fix: avoid re-using `AuthInstanceID` for sub agents (#22196 ) Parent agents were re-using AuthInstanceID when spawning child agents. This caused GetWorkspaceAgentByInstanceID to return the most recently created sub agent instead of the parent when the parent tried to refetch its own manifest. Fix by not reusing AuthInstanceID for sub agents, and updating GetWorkspaceAgentByInstanceID to filter them out entirely.	2026-02-19 16:56:29 +00:00
Danielle Maywood	92a6d6c2c0	chore: remove unnecessary loop variable captures (#22180 ) Since Go 1.22, the loop variable capture issue is resolved. Variables declared by for loops are now per-iteration rather than per-loop, making the 'v := v' pattern unnecessary.	2026-02-19 09:02:19 +00:00
Danielle Maywood	31c1279202	feat: notify on task auto pause, manual pause and manual resume (#22050 )	2026-02-18 16:30:16 +00:00
Paweł Banaszewski	90c11f3386	feat: add client column to aibridge_interceptions table (#21839 ) Adds `client` column to `aibridge_interceptions` table. It is set accordingly to what is passed from AI Bridge in `RecordInterception`. Adds interception filtering by `client` value. Depends on: https://github.com/coder/aibridge/pull/158 Updates aibridge library to include this change. Fixes: https://github.com/coder/aibridge/issues/31	2026-02-17 15:43:02 +01:00
Cian Johnston	194d79402e	chore: remove dbmem comment references (#22056 ) 👻 The ghost of dbmem managed to live on... until now.	2026-02-12 09:06:33 +00:00
George K	be94af386c	chore(coderd/database): enforce workspace ACL JSON object constraints (#22019 ) The constraints prevent faulty code from saving 'null' as JSON and breaking the `workspaces_expanded` view.	2026-02-10 16:17:29 -08:00
Cian Johnston	c2c2b6f16f	chore: remove call to taskname.Generate in dbgen (#22040 ) I was trying to figure out why `goleak` was complaining about a dangling http2 connection goroutine in tests. Turns out that `taskname.Generate` will call out to Anthropic if an API key is set, and we're calling it in `dbgen`. Modified to use testutil method instead.	2026-02-10 19:16:44 +00:00
Jon Ayers	6035e45cb8	feat: add e2e workspace build duration metric (#21739 ) Adds coderd_template_workspace_build_duration_seconds histogram that tracks the full duration from workspace build creation to agent ready. This captures the complete user-perceived build time including provisioning and agent startup. The metric is emitted when the agent reports ready/error/timeout via the lifecycle API, ensuring each build is counted exactly once per replica.	2026-02-06 16:26:02 -06:00
Zach	a31e476623	fix: make boundary usage telemetry collection atomic (#21907 ) Previously, UpsertBoundaryUsageStats (INSERT...ON CONFLICT DO UPDATE) and GetAndResetBoundaryUsageSummary (DELETE...RETURNING) could race during telemetry period cutover. Without serialization, an upsert concurrent with the delete could lose data (deleted right after being written) or commit after the delete (miscounted in the next period). Both operations now acquire LockIDBoundaryUsageStats within a transaction to ensure a clean cutover.	2026-02-06 09:52:17 -07:00
Mathias Fredriksson	c60c373bc9	fix(coderd): clean up task snapshots on task deletion (#21949 ) Task snapshots were orphaned when tasks were soft-deleted. The `task_snapshots` table has an `ON DELETE CASCADE` foreign key, but that only fires on hard deletes. Modified DeleteTask to use a CTE that atomically soft-deletes the task and removes its snapshot in a single transaction. The query now returns just the task UUID instead of the full row. Closes coder/internal#1283	2026-02-06 11:55:33 +02:00
Cian Johnston	25a0c807cb	chore(coderd/database/dbfake): add support for provisioner job timestamp control (#21944 ) Relates to https://github.com/coder/coder/pull/21922 / https://github.com/coder/internal/issues/1259 * Adds `dbfake.BuilderOption func(WorkspaceBuildBuilder)` Adds `BuilderOption` methods for setting various provisioner job related fields on `WorkspaceBuildBuilder`. * Migrates a number of existing tests that previously dependeded on provisioner job timing to use these updated methods in the following packages: * `coderd/jobreaper` * `coderd/notifications/reports` * `enterprise/coderd/schedule` * `enterprise/coderd/prebuilds` * `scripts/workspace-runtime-audit` 🤖 Created using Mux (Opus 4.5) --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-02-06 09:44:40 +00:00
Steven Masley	efd98bd93a	chore: add template toggle to disable module caching (#21931 ) There exists use cases to disable the new module caching behavior of workspace builds. This was the legacy behavior.	2026-02-05 14:38:55 -06:00
Mathias Fredriksson	96695edfed	fix(coderd/database): correct task pending status logic (#21886 ) Previously, tasks with pending provisioner jobs (not yet picked up) were incorrectly reported as "initializing". Refs #21887	2026-02-05 14:08:03 +02:00
Jon Ayers	22ece10a4a	feat: add healthy filter for workspace queries (#21743 ) Adds support for filtering workspaces by health status using healthy:true or healthy:false in the search query. This is done by changing `has-agent` to accept a list of statuses and aliasing `health:true` to `has-agent:connected` and `healthy:false` to `has-agent:timeout,disconnected`. Fixes #21623	2026-02-04 20:48:27 -06:00
Danielle Maywood	af0e171595	feat(coderd/agentapi): support terraform-defined subagent ids (#21837 ) Update `coderd/agentapi` to handle pre-created sub agents	2026-02-04 15:33:48 +00:00
Cian Johnston	91be688e39	chore(coderd/database): remove deprecated db2sdk.List(Lazy)? methods (#21902 ) Removes deprecated methods db2sdk.List and db2sdk.ListLazy.	2026-02-03 17:52:07 +00:00
Cian Johnston	353ebd9664	feat: add link for viewing raw build logs in workspace and template build jobs (#21727 ) * Adds support for parameter `format=text` in the following API routes: * `/api/v2/workspaceagents/:id/logs` * `/api/v2/workspacebuilds/:id/logs` * `/api/v2/templateversions/:id/logs` * `/api/v2/templateversions/:id/dry-run/:id/logs` * Adds links to view raw logs on the following pages: * Workspace build page * Template editor page * Template version page * Refactors existing log formatting in `cli/logs.go` to live in `codersdk`. 🤖 Generated with Claude Opus 4.5, reviewed by me. --------- Co-authored-by: Claude <noreply@anthropic.com>	2026-02-03 09:45:23 +00:00
Mathias Fredriksson	f75cbab6ce	fix(coderd/database): prevent AcquireProvisionerJob from grabbing canceled jobs (#21852 ) The AcquireProvisionerJob query only checked started_at IS NULL, allowing it to acquire jobs that were canceled while pending (which have completed_at set but started_at still NULL). Added completed_at IS NULL check to the query to prevent this. Also fixed JobCompleteBuilder.Do() in dbfake to set started_at when completing jobs to match production behavior. Fixes coder/internal#1323	2026-02-03 10:42:17 +02:00
Zach	90aeea5649	fix: handle boundary usage across snapshots and flush races (#21805 ) Previously there were two issues that could cause incorrect boundary usage telemetry data. 1. Bad handling across snapshot intervals: After telemetry snapshot deleted the DB row, the next flush would INSERT the stale cumulative data (which included already-reported usage). This would then be overwritten by subsequent UPDATE flushes, causing the delta between the last snapshot and the reset to be lost (under-reporting usage). Additionally, if there was no new usage after the reset, the tracker would carry over all usage from the previous period into the next period (over-reporting usage). 2. Missed usage from a race condition: Track() calls between the first mutex unlock and second mutex lock in FlushToDB() were lost. The data wasn't included in the current flush (already snapshotted) and was wiped by the subsequent reset. This is likely low impact to overall usage numbers in the real world. Fix by tracking unique workspace/user deltas separately from cumulative values and always tracking delta allowed/denied requests. Deltas are used for INSERT (fresh start after reset), cumulative for UPDATE (accurate unique counts within a period). All counters reset atomically before the DB operation so Track() calls during the operation are preserved for the next flush.	2026-02-02 09:11:54 -07:00
Jake Howell	052bd114a4	fix: resolve missing users in `<UserCombobox />` (#21822 ) Closes #21044 This pull-request addresses an issue we were seeing where we would attempt to filter the `<UserCombobox />` by the users username or email not their username (which the rendered options would show). To highlight this I created three different users. Each with a username that did not contain their `email` or `name` and attempted to filter. Attempting to search for `John` wouldn't actually show the user as his username was `x`, and infact whereas a subset of users might be returned from the backend for having `john` in the `email` it would've been filtered by the frontend for not being in the `name` field. \| Name \| Username \| \| --- \| --- \| \| `Jake` \| `z` \| \| `Jeff` \| `y` \| \| `John` \| `x` \| \| Previously \| Now \| \| --- \| --- \| \| <img width="560" height="547" alt="OLD_USER_COMBOBOX" src="https://github.com/user-attachments/assets/a0567264-0034-42ac-aba0-95b05c4f92dd" /> \| <img width="580" height="548" alt="NEW_USER_COMBOBOX" src="https://github.com/user-attachments/assets/1aa0c942-d340-4b1c-8dde-b97879525bfb" /> \|	2026-02-03 00:13:41 +11:00
Danielle Maywood	37aecda165	feat(coderd/provisionerdserver): insert sub agent resource (#21699 ) Update provisionerdserver to handle the changes introduced to provisionerd in https://github.com/coder/coder/pull/21602 We now create a relationship between `workspace_agent_devcontainers` and `workspace_agents` with the newly created `subagent_id`.	2026-01-30 17:19:19 +00:00
Steven Masley	dfbd541cee	chore: move List util out of db2sdk to avoid circular imports (#21733 )	2026-01-28 13:07:53 -06:00
Spike Curtis	7090a1e205	chore: renumber duplicate migration 000411 (#21720 ) Fixes recent duplicate DB migration in #21607	2026-01-28 08:01:58 +04:00
Spike Curtis	f358a6db11	chore: convert tailnet tables to UNLOGGED for improved write performance (#21607 ) This migration converts all tailnet coordination tables to UNLOGGED: - `tailnet_coordinators` - `tailnet_peers` - `tailnet_tunnels` UNLOGGED tables skip Write-Ahead Log (WAL) writes, significantly improving performance for high-frequency updates like coordinator heartbeats and peer state changes. The trade-off is that UNLOGGED tables are truncated on crash recovery and are not replicated to standby servers. This is acceptable for these tables because the data is ephemeral: 1. Coordinators re-register on startup 2. Peers re-establish connections on reconnect 3. Tunnels are re-created based on current peer state Migration notes: - Child tables must be converted before the parent table because LOGGED child tables cannot reference UNLOGGED parent tables (but the reverse is allowed) - The down migration reverses the order: parent first, then children Fixes https://github.com/coder/coder/issues/21333	2026-01-28 07:12:32 +04:00
Zach	7dfa33b410	feat: add boundary usage tracking database schema and tracker skeleton (#21670 ) feat: add boundary usage telemetry database schema and RBAC Adds the foundation for tracking boundary usage telemetry across Coder replicas. This includes: - Database schema: `boundary_usage_stats` table with per-replica stats (unique workspaces, unique users, allowed/denied request counts) - Database queries: upsert stats, get aggregated summary, reset stats, delete by replica ID - RBAC: `boundary_usage` resource type with read/update/delete actions, accessible only via system `BoundaryUsageTracker` subject (not regular user roles) - Tracker skeleton + docs: stub implementation in `coderd/boundaryusage/` The tracker accumulates stats in memory and periodically flushes to the database. Stats are aggregated across replicas for telemetry reporting, then reset when a new reporting period begins. The tracker implementation and plumbing will be done in a subsequent commit/PR. --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-27 13:29:21 -07:00
George K	c352a51b22	fix(coderd): authorize workspace start/stop/delete by transition action (#21691 ) Use transition-specific actions when authorizing workspace build parameter inserts in the database layer so start/stop/delete do not require workspace.update. Related to: https://github.com/coder/internal/issues/1299	2026-01-27 09:08:12 -08:00

1 2 3 4 5 ...

1330 Commits