coder

mirror of https://github.com/coder/coder.git synced 2026-06-03 13:08:25 +00:00

Author	SHA1	Message	Date
Kyle Carberry	27cbf5474b	refactor: remove /diff-status endpoint, include diff_status in chat payload (#23082 ) The `/chats/{chat}/diff-status` endpoint was redundant because: - The `Chat` type already has a `DiffStatus` field - Listing chats already resolves and returns `diff_status` - The `getChat` endpoint was the only one not resolving it (passing `nil`) ## Changes Backend: - `getChat` now calls `resolveChatDiffStatus` and includes the result in the response - Removed `getChatDiffStatus` handler, route (`GET /diff-status`), and SDK method - Tests updated to use `GetChat` instead of `GetChatDiffStatus` Frontend: - `AgentDetail.tsx`: uses `chatQuery.data?.diff_status` instead of separate query - `RemoteDiffPanel.tsx`: accepts `diffStatus` as a prop instead of fetching internally - `AgentsPage.tsx`: `diff_status_change` events now invalidate the chat query - Removed `chatDiffStatus` query, `chatDiffStatusKey`, and `getChatDiffStatus` API method	2026-03-16 14:40:22 +00:00
Callum Styan	36665e17b2	feat: add WatchAllWorkspaceBuilds endpoint for autostart scaletests (#22057 ) This PR adds a `WatchAllWorkspaces` function with `watch-all-workspaces` endpoint, which can be used to listen on a single global pubsub channel for _all_ workspace build updates, and makes use of it in the autostart scaletest. This negates the need to use a workspace watch pubsub channel _per_ workspace, which has auth overhead associated with each call. This is especially relevant in situations such as the autostart scaletest, where we need to start/stop a set of workspaces before we can configure their autostart config. The overhead associated with all the watch requests skews the scaletest results and makes it harder to reason about the performance of the autostart feature itself. The autostart scaletest also no longer generates its own metrics nor does it wait for all the workspaces to actually start via autostart. We should update the scaletest dashboard after both PRs are merged to measure autostart performance via the new metrics. The new function/endpoint and its usage in the autostart scaletest are gated behind an experiment feature flag, this is something we should discuss whether we want to enable the endpoint in prod by default or not. If so, we can remove the experiment. --------- Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Callum Styan <callum@coder.com>	2026-03-13 20:37:41 -07:00
Hugo Dutka	84527390c6	feat: chat desktop backend (#23005 ) Implement the backend for the desktop feature for agents. - Adds a new `/api/experimental/chats/$id/desktop` endpoint to coderd which exposes a VNC stream from a [portabledesktop](https://github.com/coder/portabledesktop) process running inside the workspace - Adds a new `spawn_computer_use_agent` tool to chatd, which spawns a subagent that has access to the `computer` tool which lets it interact with the `portabledesktop` process running inside the workspace - Adds the plumbing to make the above possible There's a follow up frontend PR here: https://github.com/coder/coder/pull/23006	2026-03-13 19:49:34 +01:00
Michael Suchacz	c3b6284955	feat: add chat cost analytics backend (#23036 ) Add cost tracking for LLM chat interactions with microdollar precision. ## Changes - Add `chatcost` package for per-message cost calculation using `shopspring/decimal` for intermediate arithmetic - Ceil rounding policy: fractional micros round UP to next whole micro (applied once after summing all components) - Database migration: `total_cost_micros` BIGINT column with historical backfill and `created_at` index - API endpoints: per-user cost summary and admin rollup under `/api/experimental/chats/cost/` - SDK types: `ChatCostSummary`, `ChatCostModelBreakdown`, `ChatCostUserRollup` - Fix `modeloptionsgen` to handle `decimal.Decimal` as opaque numeric type - Update frontend pricing test fixtures for string decimal types ## Design decisions - `NULL` = unpriced (no matching model config), `0` = free - Reasoning tokens included in output tokens (no double-counting) - Integer microdollars (BIGINT) for storage and API responses - Price config uses `decimal.Decimal` for exact parsing; totals use `int64` Frontend: #23037	2026-03-13 18:30:49 +01:00
Kacper Sawicki	df2360f56a	feat(coderd): add consolidated /debug/profile endpoint for pprof collection (#22892 ) ## Summary Adds a new `GET /api/v2/debug/profile` endpoint that collects multiple pprof profiles in a single request and returns them as a tar.gz archive. This allows collecting profiles (including block and mutex) without requiring `CODER_PPROF_ENABLE` to be set, and without restarting `coderd`. Closes #21679 ## What it does The endpoint: - Temporarily enables block and mutex profiling (normally disabled at runtime) - Runs CPU profile and/or trace for a configurable duration (default 10s, max 60s) - Collects snapshot profiles (heap, allocs, block, mutex, goroutine, threadcreate) - Returns a tar.gz archive containing all requested `.prof` files - Uses an atomic bool to prevent concurrent collections (returns 409 Conflict) - Is protected by the existing debug endpoint RBAC (owner-only) Supported profile types: cpu, heap, allocs, block, mutex, goroutine, threadcreate, trace Query parameters: - `duration`: How long to run timed profiles (default: `10s`, max: `60s`) - `profiles`: Comma-separated list of profile types (default: `cpu,heap,allocs,block,mutex,goroutine`) ## Additional changes - SDK client method (`codersdk.Client.DebugCollectProfile`) for easy programmatic access - `coder support bundle --pprof` integration: tries the consolidated endpoint first, falls back to individual `/debug/pprof/` endpoints for older servers - 8 new tests* covering defaults, custom profiles, trace+CPU, validation errors, authorization, and conflict detection	2026-03-13 14:09:39 +00:00
Kyle Carberry	690e3a87d8	feat: move chat messages to dedicated /chats/{id}/messages endpoint (#23021 ) ## Summary Moves the messages response out of `GET /chats/{id}` and into a dedicated `GET /chats/{id}/messages` endpoint. ### Backend - `GET /chats/{id}` now returns just the `Chat` object (no messages) - `GET /chats/{id}/messages` is a new endpoint returning `ChatMessagesResponse` with `messages` and `queued_messages` - Added `ChatMessagesResponse` SDK type and `GetChatMessages` client method ### Frontend - `getChat()` API method returns `Chat` instead of `ChatWithMessages` - Added `getChatMessages()` API method for the new endpoint - Split `chatQuery` into two: `chatQuery` (metadata) and `chatMessagesQuery` (messages) - Updated all cache mutations, optimistic updates, and websocket handlers - Updated tests and stories ### Files changed \| File \| Change \| \|---\|---\| \| `coderd/coderd.go` \| Register `GET /messages` route \| \| `coderd/chats.go` \| Simplify `getChat`, add `getChatMessages` handler \| \| `codersdk/chats.go` \| New type + method, update `GetChat` return \| \| `site/src/api/api.ts` \| New method, update `getChat` \| \| `site/src/api/queries/chats.ts` \| New query, update cache mutations \| \| `site/src/pages/AgentsPage/AgentDetail.tsx` \| Use separate queries \| \| `site/src/pages/AgentsPage/AgentDetail/ChatContext.ts` \| Update types and cache writes \| \| `site/src/pages/AgentsPage/AgentsPage.tsx` \| Update websocket cache handler \|	2026-03-13 08:35:46 -04:00
Cian Johnston	bc27274aba	feat(coderd): refactors github pr sync functionality (#22715 ) - Adds `_API_BASE_URL` to `CODER_EXTERNAL_AUTH_CONFIG_` - Extracts and refactors existing GitHub PR sync logic to new packages `coderd/gitsync` and `coderd/externalauth/gitprovider` - Associated wiring and tests Created using Opus 4.6	2026-03-10 18:46:01 +00:00
Kyle Carberry	b6d1a11c58	feat(chatd): add user-level custom prompt for agent chats (#22896 ) Adds a user-level custom prompt to the database. I'll be doing a follow-up for the UI, as we currently do not have user-level settings (it's just admin). I'll also make it very obvious for chats where there is a user-level prompt, but I don't know how yet.	2026-03-10 11:17:52 -04:00
Cian Johnston	c933ddcffd	fix(agents): persist system prompt server-side instead of localStorage (#22857 ) ## Problem The Admin → Agents → System Prompt textarea saved only to the browser's `localStorage`. The value was never sent to the backend, never stored in the database, and never injected into chats. Entering text, clicking Save, and refreshing the page showed no changes — the prompt was effectively a no-op. ## Root Cause Three disconnected layers: 1. Frontend wrote to `localStorage`, never called an API. 2. `handleCreateChat` never read `savedSystemPrompt`. 3. Backend hardcoded `chatd.DefaultSystemPrompt` on every chat creation — no field in `CreateChatRequest` accepted a custom prompt. ## Changes ### Database - Added `GetChatSystemPrompt` / `UpsertChatSystemPrompt` queries on the existing `site_configs` table (no migration needed). ### API - `GET /api/experimental/chats/system-prompt` — returns the configured prompt (any authenticated user). - `PUT /api/experimental/chats/system-prompt` — sets the prompt (admin-only, `rbac: deployment_config update`). - Input validation: max 32 KiB prompt length. ### Backend - `resolvedChatSystemPrompt(ctx)` checks for a custom prompt in the DB, falls back to `chatd.DefaultSystemPrompt` when empty/unset. - Logs a warning on DB errors instead of silently swallowing them. - Replaced the hardcoded `defaultChatSystemPrompt()` call in chat creation. ### Frontend - Replaced `localStorage` read/write with React Query `useQuery`/`useMutation` backed by the new endpoints. - Fixed `useEffect` draft sync to avoid clobbering in-progress user edits on refetch. - Added `try/catch` error handling on save (draft stays dirty for retry). - Save button disabled during mutation (`isSavingSystemPrompt`). - Query key follows kebab-case convention (`chat-system-prompt`). ### UX - Added hint: "When empty, the built-in default prompt is used." ### Tests - `TestChatSystemPrompt`: GET returns empty when unset, admin can set, non-admin gets 403. - dbauthz `TestMethodTestSuite` coverage for both new querier methods.	2026-03-10 11:46:53 +00:00
Kyle Carberry	47846c0ee4	fix(site): inject permissions and organizations metadata to eliminate loading spinners (#22741 ) ## Problem Two network requests were blocking the initial page render with fullscreen `<Loader fullscreen />` spinners: 1. `POST /api/v2/authcheck` (permissions) — blocked in `RequireAuth` via `AuthProvider.isLoading` 2. `GET /api/v2/organizations` — blocked in `DashboardProvider` All other bootstrap queries (`user`, `entitlements`, `appearance`, `experiments`, `build-info`, `regions`) already used server-side metadata injection via `index.html` meta tags and resolved instantly. These two did not. ## Solution Follow the existing `cachedQuery` + `<meta>` tag pattern to inject both datasets server-side: ### Server-side (`site/site.go`) - Add `Permissions` and `Organizations` fields to `htmlState` - Fetch organizations via `GetOrganizationsByUserID` in parallel with existing queries - Evaluate all `permissionChecks` using the RBAC authorizer directly - Inject results as HTML-escaped JSON into `<meta>` tags ### Frontend - Register `permissions` and `organizations` in `useEmbeddedMetadata` - Update `checkAuthorization()` to accept optional metadata and use `disabledRefetchOptions` when available - Update `organizations()` to accept optional metadata and use `cachedQuery` when available - Wire metadata through `AuthProvider` and `DashboardProvider` ### Note The Go `permissionChecks` map in `site/site.go` mirrors `site/src/modules/permissions/index.ts` and must be kept in sync.	2026-03-09 16:12:04 +00:00
Kacper Sawicki	49006685b0	fix: rate limit by user instead of IP for authenticated requests (#22049 ) ## Problem Rate limiting by user is broken (#20857). The rate limit middleware runs before API key extraction, so user ID is never in the request context. This causes: - Rate limiting falls back to IP address for all requests - `X-Coder-Bypass-Ratelimit` header for Owners is ignored (can't verify role without identity) ## Solution Adds `PrecheckAPIKey`, a root-level middleware that fully validates the API key on every request (expiry, OIDC refresh, DB updates, role lookup) and stores the result in context. Added once at the root router — not duplicated per route group. ### Architecture ``` Request → Root middleware stack: → ExtractRealIP, Logger, ... → PrecheckAPIKey(...) ← validates key, stores result, never rejects → HandleSubdomain(apiRateLimiter) ← workspace apps now also benefit → CORS, CSRF → /api/v2 or /api/experimental: → apiRateLimiter ← reads prechecked result from context → route handlers: → ExtractAPIKeyMW ← reuses prechecked data, adds route-specific logic → handler ``` ### Key design decisions \| Decision \| Rationale \| \|---\|---\| \| Full validation, not lightweight \| Spike's review: "the whole idea of a 'lightweight' extraction that skips security checks is fundamentally flawed." Only fully validated keys are used for rate limiting — expired/invalid keys fall back to IP. \| \| Structured error results \| `ValidateAPIKeyError` has a `Hard` flag that maps to `write` vs `optionalWrite`. Hard errors (5xx, OAuth refresh failures) surface even on optional-auth routes. Soft errors (missing/expired token) are swallowed on optional routes. \| \| Added once at the root \| Spike's review: "Why can't we add it once at the root?" Root placement means workspace app rate limiters also benefit. \| \| Skip prechecked when `SessionTokenFunc != nil` \| `workspaceapps/db.go` uses a custom `SessionTokenFunc` that extracts from `issueReq.SessionToken`. The prechecked result may have validated a different token. Falls back to `ValidateAPIKey` with the custom func. \| \| User status check stays in `ExtractAPIKey` \| Dormant activation is route-specific — `ValidateAPIKey` stores status but doesn't enforce it. \| \| Audience validation stays in `ExtractAPIKey` \| Depends on `cfg.AccessURL` and request path, uses `optionalWrite(403)` which depends on route config. \| ### Changes - `coderd/httpmw/apikey.go`: - New `ValidateAPIKey` function — extracted core validation logic, returns structured errors instead of writing HTTP responses - New `PrecheckAPIKey` middleware — calls `ValidateAPIKey`, stores result in `apiKeyPrecheckedContextKey`, never rejects - New types: `ValidateAPIKeyConfig`, `ValidateAPIKeyResult`, `ValidateAPIKeyError`, `APIKeyPrechecked` - Refactored `ExtractAPIKey` — consumes prechecked result from context (skipping redundant validation), falls back to `ValidateAPIKey` when no precheck available - Removed `ExtractAPIKeyForRateLimit` and `preExtractedAPIKey` - `coderd/httpmw/ratelimit.go`: Rate limiter checks `apiKeyPrecheckedContextKey` first, then `apiKeyContextKey` fallback (for unit tests / workspace apps), then IP - `coderd/coderd.go`: Added `PrecheckAPIKey` once at root `r.Use(...)` block, removed `ExtractAPIKeyForRateLimit` from `/api/v2` and `/api/experimental` - `coderd/coderd_test.go`: `TestRateLimitByUser` regression test with `BypassOwner` subtest Fixes #20857	2026-03-09 13:54:31 +01:00
Mathias Fredriksson	a104d608a3	feat: add file/image attachment support to chat input (#22604 ) This change adds support for image attachments to chat via add button and clipboard paste. Files are stored in a new `chat_files` table and referenced by ID in message content. File data is resolved from storage at LLM dispatch time, keeping the message content column small. Upload validates MIME types via content type or content sniffing against an allowlist (png, jpeg, gif, webp). The retrieval endpoint serves files with immutable caching headers. On the frontend, uploads start eagerly on attach with a background fetch to pre-warm the browser HTTP cache so the timeline renders instantly after send.	2026-03-06 21:05:26 +02:00
Hugo Dutka	48ab492f49	feat: agents git watch backend (#22565 ) Adds real-time git status watching for workspace agents, so the frontend can subscribe over WebSocket and show git file changes in near real-time. 1. Subscription is scoped to a chat via `GET /api/experimental/chats/{chat}/git/watch`. 2. The workspace agent automatically determines which paths to watch based on tool calls made by the chat (and its ancestor chats). 3. Workspace agent polls subscribed repo working trees on a 30s interval, on tools calls, and on explicit `refresh` from the client. 4. Scans are rate-limited to at most once per second. 5. Edited paths are tracked in-memory inside the workspace agent. There is no database persistence — state is lost on agent restart. This will be addresses in a future PR. 6. Messages sent over WebSocket include a full-repo snapshot (unified diff, branch, origin). A new message is emitted only when the snapshot changes. This PR was implemented with AI with me closely controlling what it's doing. The code follows a plan file that was updated continuously during implementation. Here's the file if you'd like to see it: [project.md](https://gist.github.com/hugodutka/8722cf80c92f8a56555f7bc595b770e2). It reflects the current state of the PR.	2026-03-06 10:47:55 +01:00
Jon Ayers	6c44de951d	feat: add Prometheus collector for DERP server expvar metrics (#22583 ) This PR does three things: - Exports derp expvars to the pprof endpoint - Exports the expvar metrics as prometheus metrics in both coderd and wsproxy - Updates our tailscale to a fix I also had to make to avoid a data race condition I generated this with mux but I also manually tested that the metrics were getting properly emitted	2026-03-06 01:57:58 -06:00
Kyle Carberry	6520159045	feat(chatd): add start_workspace tool to agent flow (#22646 ) ## Summary When a chat's workspace is stopped, the LLM previously had no way to start it — `create_workspace` would either create a duplicate workspace or fail. This adds a dedicated `start_workspace` tool to the agent flow. ## Changes ### New: `start_workspace` tool (`coderd/chatd/chattool/startworkspace.go`) - Detects if the chat's workspace is stopped and starts it via a new build with `transition=start` - Reuses the existing `waitForBuild` and `waitForAgent` helpers (shared logic) - Shares the workspace mutex with `create_workspace` to prevent races - Idempotent: returns immediately if the workspace is already running or building - Returns a `no_agent` / `not_ready` status if the agent isn't available yet (non-fatal) ### Updated: `create_workspace` stopped-workspace hint - `checkExistingWorkspace` now returns a `stopped` status with message `"use start_workspace to start it"` when it detects the chat's workspace is stopped, instead of falling through to create a new workspace ### Wiring - `chatd.Config` / `chatd.Server`: new `StartWorkspace` / `startWorkspaceFn` field - `coderd/chats.go`: new `chatStartWorkspace` method that calls `postWorkspaceBuildsInternal` with proper RBAC context - `coderd/coderd.go`: passes `chatStartWorkspace` into chatd config - Tool registered alongside `create_workspace` for root chats only (not subagents) ### Tests (`startworkspace_test.go`) - `NoWorkspace`: error when chat has no workspace - `AlreadyRunning`: idempotent return for workspace with successful start build - `StoppedWorkspace`: verifies StartFn is called, build is waited on, and success response returned	2026-03-05 15:34:24 +00:00
Kyle Carberry	30d534b36b	fix(chatd): fix relay race conditions, extract enterprise relay logic, move pubsub to OSS (#22589 ) ## Summary Fixes a bug where interrupting a streaming chat and sending a new message left the relay connected to the wrong replica. Expanded into a broader refactor that cleanly separates concerns: - OSS owns pubsub subscription, message catch-up, queue updates, status forwarding, and local parts merging. - Enterprise (`enterprise/coderd/chatd`) only manages relay dialing, reconnection, and stale-dial discarding for cross-replica streaming. ## Architecture ### OSS `coderd/chatd/chatd.go` `Subscribe()` builds the initial snapshot then runs a single merge goroutine that handles: - Pubsub subscription for durable events (status, messages, queue, errors) - Message catch-up via `AfterMessageID` - Local `message_part` forwarding - Relay events from enterprise (when `SubscribeFn` is set) - Sends `StatusNotification` to enterprise so it can manage relay lifecycle Key types: - `SubscribeFn` — enterprise hook, returns relay-only events channel - `SubscribeFnParams` — `ChatID`, `Chat`, `WorkerID`, `StatusNotifications`, `RequestHeader`, `DB`, `Logger` - `StatusNotification` — `Status` + `WorkerID`, sent to enterprise on pubsub status changes ### Enterprise `enterprise/coderd/chatd/chatd.go` `NewMultiReplicaSubscribeFn(cfg MultiReplicaSubscribeConfig)` returns a `SubscribeFn` that: - Opens an initial synchronous relay if the chat is running on a remote worker - Reads `StatusNotifications` from OSS to open/close relay connections - Handles async dial, reconnect timers, stale-dial discarding - Returns only relay `message_part` events ## Bug fixes ### Original bug: stale relay dial after interrupt `openRelayAsync` goroutines used `mergedCtx` (subscription-level), not a per-dial context. `closeRelay()` could not cancel in-flight dials. When the user interrupts and a new replica picks up the chat, the old dial goroutine could complete after the new one and deliver a stale `relayResult`. Fix: per-dial `dialCtx`/`dialCancel`, `expectedWorkerID` tracking, `workerID` on `relayResult`. `closeRelay()` cancels the dial context and drains `relayReadyCh`. Merge loop rejects mismatched worker IDs. ### Additional fixes - `statusNotifications` send-on-closed-channel race — goroutine now owns `close()` via defer - Enterprise spin-loop on `StatusNotifications` close — two-value receive with nil-out - `hasPubsub` set from `p.pubsub != nil` instead of subscription success — now tracks actual subscription result - `lastMessageID` not initialized from `afterMessageID` — caused duplicate messages on catch-up - `wrappedParts` goroutine leaked remote connection on `dialCtx` cancel - `closeRelay()` did not drain `relayReadyCh` - `setChatWaiting` race with `SendMessage(Interrupt)` — wrapped in `InTx` - `processChat` post-TX side effects fired when chat was taken by another worker — added `errChatTakenByOtherWorker` sentinel - Cancel closure data race on `reconnectTimer` - Bare blocking send on pubsub error path - `localParts` hot-spin after channel close - No-pubsub branch dropped relay events and initial snapshot - Failed relay dial caused permanent stall (no reconnect retry) - DB error during reconnect timer caused permanent stall - `time.NewTimer` replaced with `quartz.Clock` for testable timing ## Tests 9 enterprise tests covering: - Relay reconnect on drop (mock clock) - Async dial does not block merge loop - Relay snapshot delivery - Stale dial discarded after interrupt - Cancel during in-flight dial - Running-to-running worker switch - Failed dial retries (mock clock) - Local worker closes relay - Multiple consecutive reconnects (mock clock) All pass with `-race`.	2026-03-04 18:42:28 -05:00
Ehab Younes	9d2aed88c4	fix: register task pause/resume routes under /api/v2 (#22544 ) The pause/resume endpoints were only registered under /api/experimental but the frontend and Go SDK were calling /api/v2, resulting in 404s. Register the routes in the v2 group, update the SDK client paths, and fix swagger annotations (Accept → Produce) since these POST endpoints have no request body.	2026-03-03 16:34:33 +03:00
Cian Johnston	517cb0ce73	refactor(webpush): use RequireExperimentWithDevBypass middleware (#22525 ) Replace manual experiment checks in web-push handlers with the `RequireExperimentWithDevBypass` middleware on the route group, matching the pattern used by OAuth2, Agents, and MCP experiments. ## Changes - `coderd/coderd.go`: Add `RequireExperimentWithDevBypass` middleware to `/webpush` route group - `coderd/webpush.go`: Remove inline `api.Experiments.Enabled(codersdk.ExperimentWebPush)` checks from all three handlers - `cli/server.go`: Gate webpush dispatcher initialization with `buildinfo.IsDev()` fallback so dev builds always init the real dispatcher - `coderd/webpush_test.go`: Remove experiment enablement from tests (dev bypass handles it) Net effect: -26 lines removed, +5 added. Created using whatchamacallits (Opus 4.6 Max)	2026-03-03 09:49:04 +00:00
Kayla はな	2bdf80d452	fix: disable sharing ui when sharing is unavailable (#22390 ) Currently the sharing UI is only hidden under certain circumstances, rather than on a permission basis. This makes it permissions based, and makes some backend changes to make sure permissions are correct.	2026-03-03 02:04:55 +00:00
Kyle Carberry	c9ed1e17fc	feat(agents): add desktop notifications via VAPID web push (#22454 ) ## Summary Wire VAPID web push notifications into the Agents (chat) system so users get desktop notifications when an agent finishes running. ### Backend - Add `webpush.Dispatcher` to `chatd.Server` and pass it through from `coderd.Options.WebPushDispatcher` - In `processChat()`'s deferred cleanup, dispatch a web push notification when the chat reaches a terminal state: - `waiting` (success): "Agent has finished running." - `error` (failure): the error message, or "Agent encountered an error." - Sub-agent chats (`ParentChatID.Valid`) are skipped to avoid notification spam from internal delegation - Gracefully no-ops when the dispatcher is nil (web push disabled) ### Frontend - New `WebPushButton` component — a bell icon that uses the existing `useWebpushNotifications` hook - Returns `null` when the `web-push` experiment is off - Three states: loading spinner, green bell (subscribed), muted bell-off (unsubscribed) - Tooltip + toast feedback on toggle - Added to both the Agents page empty state top bar and the AgentDetail top bar - The Agents page has its own layout (no standard Navbar), so it needs its own subscribe button ### End-to-end flow 1. User clicks the bell icon on `/agents` → browser subscribes via VAPID 2. User starts an agent chat → chat enters `running` status 3. Agent finishes → `processChat` defer sets status to `waiting`/`error` → dispatches web push 4. Browser service worker shows a desktop notification with the chat title and status --------- Co-authored-by: Coder <coder@users.noreply.github.com>	2026-02-28 23:40:17 -05:00
Kyle Carberry	12083441e0	feat(chats): archive chats instead of hard-deleting them (#22406 ) ## Summary The UI has always labeled the action as "Archive agent" but the backend was performing a hard `DELETE`, permanently destroying chats and all their messages. This change replaces the hard delete with a soft archive, consistent with the pattern used by template versions. ## Changes ### Database - Migration 000423: Add `archived boolean DEFAULT false NOT NULL` column to `chats` table - Replace `DeleteChatByID` query with `ArchiveChatByID` (`UPDATE SET archived = true`) - Add `UnarchiveChatByID` query (`UPDATE SET archived = false`) - Filter archived chats from `GetChatsByOwnerID` (`WHERE archived = false`) ### API - Remove `DELETE /api/experimental/chats/{chat}` - Add `POST /api/experimental/chats/{chat}/archive` — archives a chat and all its descendants - Add `POST /api/experimental/chats/{chat}/unarchive` — unarchives a single chat (API only, no UI yet) ### Backend - `archiveChatTree()` recursively archives child chats (replaces `deleteChatTree()` which hard-deleted) - Chat daemon's `ArchiveChat()` archives the full chat tree in a transaction - Authorization uses `ActionUpdate` instead of `ActionDelete` ### SDK - Replace `DeleteChat()` with `ArchiveChat()` and `UnarchiveChat()` - Add `Archived` field to `Chat` struct ### Frontend - `archiveChat` API call uses `POST .../archive` instead of `DELETE` - No UI changes — the "Archive agent" button now actually archives instead of deleting ## Design Decision This follows the template version archive pattern (Pattern B in the codebase): - `archived boolean` column (not `deleted boolean`) - Dedicated `POST .../archive` and `POST .../unarchive` routes (not repurposing `DELETE`) - Reversible — users can unarchive via the API (UI for this will come later)	2026-02-27 16:46:19 -05:00
Kyle Carberry	edee917d88	feat: add experimental agents support (#22290 ) feat: add AI chat system with agent tools and chat UI Introduce the chatd subsystem and Agents UI for AI-powered chat within Coder workspaces. - Add chatd package with chat loop, message compaction, prompt management, and LLM provider integration (OpenAI, Anthropic) - Add agent tools: create workspace, list/read templates, read/write/ edit files, execute commands - Add chat API endpoints with streaming, message editing, and durable reconnection - Add database schema and migrations for chats, chat messages, chat providers, and chat model configs - Add RBAC policies and dbauthz enforcement for chat resources - Add Agents UI pages with conversation timeline, queued messages list, diff viewer, and model configuration panel - Add comprehensive test coverage including coderd integration tests, chatd unit tests, and Storybook stories - Gate feature behind experiments flag --------- Co-authored-by: Cian Johnston <cian@coder.com> Co-authored-by: Danielle Maywood <danielle@themaywoods.com> Co-authored-by: Jeremy Ruppel <jeremy@coder.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 16:50:56 +00:00
Sushant P	37a8e61ea2	chore: move Shared Workspaces from experiments to beta (#22206 ) * Removed the shared-workspaces experiment and cleaned up related middleware * Added beta tagging to the UI for shared workspaces	2026-02-23 08:30:32 -08:00
Steven Masley	e5f64eb21d	chore: optionally prefix authentication related cookies (#22148 ) When the deployment option is enabled auth cookies are prefixed with `__HOST-` ([info](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Set-Cookie)). This is all done in a middleware that intercepts all requests and strips the prefix on incoming request cookies.	2026-02-20 09:01:00 -06:00
Garrett Delfosse	e8d6016807	fix: allow users with workspace:create for any owner to list users (#21947 ) ## Summary Custom roles that can create workspaces on behalf of other users need to be able to list users to populate the owner dropdown in the workspace creation UI. Previously, this required a separate `user:read` permission, causing the dropdown to fail for custom roles. ## Changes - Modified `GetUsers` in `dbauthz` to check if the user can create workspaces for any owner (`workspace:create` with `owner_id: *`) - If the user has this permission, they can list all users without needing explicit `user:read` permission - Added tests to verify the new behavior ## Testing - Updated mock tests to assert the new authorization check - Added integration tests for both positive and negative cases Fixes #18203	2026-02-19 13:04:53 -05:00
Cian Johnston	4a3304fc38	feat(cli)!: expire tokens by default (#21783 ) ## Summary > NOTE: Calling this out as a breaking change in case existing consumers of the CLI depend on being able to see expired tokens OR being able to delete tokens immediately. Updates the `coder tokens rm` command to immediately expire a token by ID, preserving the token record for audit trail purposes. Tokens can still be deleted by passing `--delete`. ## Problem During an incident on dev.coder.com, operators needed to urgently expire an API key that was stuck in a hot loop. The only way to do this was via direct database access: ```sql UPDATE api_keys SET expires_at = NOW() WHERE id = '...'; ``` This is not ideal for operators who may not have direct DB access or want to avoid manual SQL. ## Solution This PR adds: - API endpoint: `PUT /api/v2/users/{user}/keys/{keyid}/expire` - Sets the token's `expires_at` to now - SDK method: `ExpireAPIKey(ctx, userID, keyID)` - Updates CLI: `coder tokens rm <name\|id\|token>` now _expires_ by default. You can still delete by passing the `--delete` flag. The `coder tokens list` command now also hides expired tokens by default. You can `--include-expired` if needed to include them. - Audit logging: The expire action is logged with old and new key states ## Test plan - Tests cover: owner expiring own token, admin expiring other user's token, non-admin cannot expire other's token, 404 for non-existent token Closes #21782 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-17 13:16:46 +00:00
Callum Styan	5f3be6b288	feat: add provisioner job queue wait time histogram and jobs enqueued counter (#21869 ) This PR adds some metrics to help identify job enqueue rates and latencies. This work was initiated as a way to help reduce the cost of the observation/measurement itself for autostart scaletests, which impacts our ability to identify/reason about the load caused by autostart. See: https://github.com/coder/internal/issues/1209 I've extended the metrics here to account for regular user initiated builds, prebuilds, autostarts, etc. IMO there is still the question here of whether we want to include or need the `transition` label, which is only present on workspace builds. Including it does lead to an increase in cardinality, and in the case of the histogram (when not using native histograms) that's at least a few extra series for every bucket. We could remove the transition label there but keep it on the counter. Additionally, the histogram is currently observing latencies for other jobs, such as template builds/version imports, those do not have a transition type associated with them. Tested briefly in a workspace, can see metric values like the following: - `coderd_workspace_builds_enqueued_total{build_reason="autostart",provisioner_type="terraform",status="success",transition="start"} 1` - `coderd_provisioner_job_queue_wait_seconds_bucket{build_reason="autostart",job_type="workspace_build",provisioner_type="terraform",transition="start",le="0.025"} 1` --------- Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-12 13:40:47 -08:00
Sas Swart	47b8ca940c	feat: add an endpoint to manually resume a coder task (#21948 ) Closes https://github.com/coder/internal/issues/1262. This PR adds: * the `POST /api/experimental/tasks/{user}/{task}/resume` endpoint * follows conventions from https://github.com/coder/internal/issues/1261 * sets the build reason to `task_resume` * a task that is not paused (ie. is already running), cannot be resumed.	2026-02-12 09:59:53 +02:00
Sas Swart	e6fbf501ac	feat: add an endpoint to manually pause a coder task (#21889 ) Closes https://github.com/coder/internal/issues/1261. This pull request adds an endpoint to pause coder tasks by stopping the underlying workspace. * Instead of `POST /api/v2/tasks/{user}/{task}/pause`, the endpoint is currently experimental. * We do not currently set the build reason to `task_manual_pause`, because build reasons are currently only used on stop transitions.	2026-02-09 08:56:41 +02:00
Jon Ayers	6035e45cb8	feat: add e2e workspace build duration metric (#21739 ) Adds coderd_template_workspace_build_duration_seconds histogram that tracks the full duration from workspace build creation to agent ready. This captures the complete user-perceived build time including provisioning and agent startup. The metric is emitted when the agent reports ready/error/timeout via the lifecycle API, ensuring each build is counted exactly once per replica.	2026-02-06 16:26:02 -06:00
Spike Curtis	b84bb43a07	feat: add standard encodings to binary cache (#21921 ) fixes: https://github.com/coder/internal/issues/1300 Adds brotli and zstd compression to the binary cache. Also refactors coderd's streaming encoding middleware to use the same standard set of compression algorithms, so we have them in one place.	2026-02-06 11:28:08 +04:00
Spike Curtis	6b1adb8b12	chore: refactor site handler to take cache dir (#21918 ) relates to: https://github.com/coder/internal/issues/1300 Refactors the options to the site handler to take the cache directory, rather than expecting the caller to call `ExtractOrReadBinFS` and pass the results. This is important in this stack because we need direct access to the cache directory for compressed file caching.	2026-02-06 10:56:48 +04:00
Steven Masley	a4ffafd46d	test: remove provisioner heartbeat from 'AllProvisionersStale' (#21903 ) Provisioner async heartbeat will mark the 'stale' provisioner as ready closes https://github.com/coder/internal/issues/1288	2026-02-04 08:29:44 -06:00
Steven Masley	6759b51cd6	feat: add endpoint to fetch singular org member (#21732 )	2026-02-03 12:48:25 -06:00
Zach	2204731ddb	feat: implement boundary usage tracker and telemetry collection (#21716 ) Implements telemetry for boundary usage tracking across all Coder replicas and reports them via telemetry. Changes: - Implement Tracker with Track(), FlushToDB(), and StartFlushLoop() methods - Add telemetry integration via collectBoundaryUsageSummary() - Use telemetry lock to ensure only one replica collects per period The tracker accumulates unique workspaces, unique users, and request counts (allowed/denied) in memory, then flushes to the database periodically. During telemetry collection, stats are aggregated across all replicas and reset for the next period.	2026-01-27 19:11:40 -07:00
Mathias Fredriksson	25d7f27cdb	feat(coderd): add task log snapshot storage endpoint (#21644 ) This change adds a POST /workspaceagents/me/tasks/{task}/log-snapshot endpoint for agents to upload task conversation history during workspace shutdown. This allows users to view task logs even when the workspace is stopped. The endpoint accepts agentapi format payloads (typically last 10 messages, max 64KB), wraps them in a format envelope, and upserts to the task_snapshots table. Uses agent token auth and validates the task belongs to the agent's workspace. Closes coder/internal#1253	2026-01-27 11:09:24 +02:00
Callum Styan	e195856c43	perf: reduce pg_notify call volume by batching together agent metadata updates (#21330 ) --------- Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-22 22:47:49 -08:00
Rowan Smith	b163b4c950	feat: support bundle updates to enable pprof and telemetry collection (#21486 ) - Adds pprof collection support now that we have the listeners automatically starting (requires Coder server 2.28.0+, includes a version check). Collects heap, allocs, profile (30s), block, mutex, goroutine, threadcreate, trace (30s), cmdline, symbol. Performs capture for 30 seconds and emits a log line stating as such. Enable capture by supplying the `--pprof` flag or `CODER_SUPPORT_BUNDLE_PPROF` env var. Collection of pprof data from both coderd and the Coder agent occurs. - Adds collection of Prometheus metrics, also requires 2.28.0+ - Adds the ability to include a template in the bundle independently of supplying the details of a running workspace by supplying the `--template` flag or `CODER_SUPPORT_BUNDLE_TEMPLATE` env var - Captures a list of workspaces the user has access to. Defaults to a max of 10, configurable via `--workspaces-total-cap` / `CODER_SUPPORT_BUNDLE_WORKSPACES_TOTAL_CAP` - Collects additional stats from the coderd deployment (aggregated workspace/session metrics), as well as entitlements via license and dismissed health checks. created with help from mux	2026-01-20 10:28:52 +11:00
Cian Johnston	08343a7a9f	perf: reduce number of queries made by /api/v2/workspaceagents/{id} (#21522 ) Relates to https://github.com/coder/internal/issues/1214 The `ExtractWorkspaceAgentParam` middleware ends up making 4 database queries to follow the chain of `WorkspaceAgent` -> `WorkspaceResource` -> `ProvisionerJob` -> `WorkspaceBuild` -- but then dropping all that hard work on the floor. The `api.workspaceAgent` handler that references this middleware then has to do all of that work again, plus one more query to get the related `User` so we can get the username. This pattern is also mirrored in `getDatabaseTerminal` but without the middleware. This PR: * Adds a new query `GetWorkspaceAgentAndWorkspaceByID` to fetch all this information at once to avoid the multiple round-trips, * Updates the existing usage of `GetWorkspaceAgentByID` to this new query instead, * Updates `ExtractWorkspaceAgentParam` to also store the workspace in the request context Dalibo: [0.63ms](https://explain.dalibo.com/plan/40bb597f3539gc6c)	2026-01-19 12:36:33 +00:00
Cian Johnston	3a62a8e70e	chore: improve healthcheck timeout message (#21520 ) Relates to https://github.com/coder/internal/issues/272 This flake has been persisting for a while, and unfortunately there's no detail on which healthcheck in particular is holding things up. This PR adds a concurrency-safe `healthcheck.Progress` and wires it through `healthcheck.Run`. If the healthcheck times out, it will provide information on which healthchecks are completed / running, and how long they took / are still taking. 🤖 Claude Opus 4.5 completed the first round of this implementation, which I then refactored.	2026-01-15 16:37:05 +00:00
Cian Johnston	32354261d3	chore(coderd/httpmw): extract HTTPRoute middleware (#21498 ) Extracts part of the prometheus middleware that stores the route information in the request context into its own middleware. Also adds request method information to context. Relates to https://github.com/coder/internal/issues/1214	2026-01-15 10:26:50 +00:00
George K	cc2efe9e1f	feat(coderd/rbac): make organization-member a per-org system custom role (#21359 ) Migrated the built-in organization-member role to DB storage so it can be customized per org. Closes https://github.com/coder/internal/issues/1073 (part 1)	2026-01-12 18:19:19 -08:00
Kacper Sawicki	6ca70d3618	feat(cli): add --no-build flag to state push for state-only updates (#21374 ) ## Summary Adds a `--no-build` flag to `coder state push` that updates the Terraform state directly without triggering a workspace build. ## Use Case This enables state-only migrations, such as migrating Kubernetes resources from deprecated types (e.g., `kubernetes_config_map`) to versioned types (e.g., `kubernetes_config_map_v1`): ```bash coder state pull my-workspace > state.json terraform init terraform state rm -state=state.json kubernetes_config_map.example terraform import -state=state.json kubernetes_config_map_v1.example default/example coder state push --no-build my-workspace state.json ``` ## Changes - Add `PUT /api/v2/workspacebuilds/{id}/state` endpoint to update state without triggering a build - Add `UpdateWorkspaceBuildState` SDK method - Add `--no-build`/`-n` flag to `coder state push` - Add confirmation prompt (can be skipped with `--yes`/`-y`) since this is a potentially dangerous operation - Add test for `--no-build` functionality Fixes #21336	2026-01-12 15:16:59 +01:00
Spike Curtis	bddb808b25	chore: arrange imports in a standard way (#21452 ) Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example: ``` import ( "context" "time" "github.com/prometheus/client_golang/prometheus" "golang.org/x/xerrors" "gopkg.in/natefinch/lumberjack.v2" "cdr.dev/slog/v3" "github.com/coder/coder/v2/codersdk/agentsdk" "github.com/coder/serpent" ) ``` 3 groups: standard library, 3rd partly libs, Coder libs. This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.	2026-01-08 15:24:11 +04:00
Spike Curtis	49b34a716a	fix: fix slog to always use array of Fields (#21426 ) Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder). It also updates dependencies that also use slog and were updated. I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule. Other dependencies, I pushed new tags.	2026-01-08 10:29:41 +04:00
Asher	4a97df3768	chore: rename flag to disable template insights (#21329 ) Because this affects more than just the template insights page (specifically it also affects the deployment stats endpoint which is shown on bottom bar and Prometheus), the group is being renamed generically to just "stats collection". In the future if we need to affect the other stats we can put those options here. Then, because this change only affects a portion of stats, specifically usage stats like connection and application time, bytes sent, etc, add a new sub-group called "usage stats". Then finally add back the "enable" flag. This also gives us a place to one day place an "anonymize" flag if we need to go that route.	2026-01-05 11:44:06 -09:00
Danielle Maywood	05529139bc	feat(coderd): support deleting dev containers (#21248 ) Add an endpoint to coderd to support deleting dev containers	2025-12-24 12:34:39 +00:00
Steven Masley	8fefd91e4a	feat!: support PKCE in the oauth2 client's auth/exchange flow (#21215 ) Breaking Change: Existing oauth apps might now use PKCE. If an unknown IdP type was being used, and it does not support PKCE, it will break. To fix, set the PKCE methods on the external auth to `none` ``` export CODER_EXTERNAL_AUTH_1_PKCE_METHODS=none ```	2025-12-15 17:41:47 +00:00
Asher	27f0413347	feat: add flag to disable template insights (#20940 ) Closes #20399 To summarize the original commit messages: - Do not log stats to the database. - Return errors on the insight endpoints. - Update the frontend to show those errors. - Also fixes an issue with getting the user status count via codersdk, since I added a test to ensure it was not disabled by this flag and it was sending the wrong payload.	2025-12-14 03:00:03 +00:00
George K	4379230a27	feat: add deployment-wide option to disable workspace sharing (#21172 ) Adds `--disable-workspace-sharing` option. Workspace sharing is disabled by not including user and group ACLs in the workspace RBAC object, which prevents ACL-based authz. Closes https://github.com/coder/internal/issues/1072 The commit also adds saving of workspace user/group ACLs in the test DB data generator.	2025-12-09 08:13:09 -08:00

1 2 3 4 5 ...

588 Commits