The pause/resume endpoints were only registered under /api/experimental
but the frontend and Go SDK were calling /api/v2, resulting in 404s.
Register the routes in the v2 group, update the SDK client paths, and
fix swagger annotations (Accept → Produce) since these POST endpoints
have no request body.
Replace manual experiment checks in web-push handlers with the
`RequireExperimentWithDevBypass` middleware on the route group, matching
the pattern used by OAuth2, Agents, and MCP experiments.
## Changes
- **`coderd/coderd.go`**: Add `RequireExperimentWithDevBypass`
middleware to `/webpush` route group
- **`coderd/webpush.go`**: Remove inline
`api.Experiments.Enabled(codersdk.ExperimentWebPush)` checks from all
three handlers
- **`cli/server.go`**: Gate webpush dispatcher initialization with
`buildinfo.IsDev()` fallback so dev builds always init the real
dispatcher
- **`coderd/webpush_test.go`**: Remove experiment enablement from tests
(dev bypass handles it)
Net effect: -26 lines removed, +5 added.
Created using whatchamacallits (Opus 4.6 Max)
Currently the sharing UI is only hidden under certain circumstances,
rather than on a permission basis. This makes it permissions based, and
makes some backend changes to make sure permissions are correct.
## Summary
Wire VAPID web push notifications into the Agents (chat) system so users
get desktop notifications when an agent finishes running.
### Backend
- Add `webpush.Dispatcher` to `chatd.Server` and pass it through from
`coderd.Options.WebPushDispatcher`
- In `processChat()`'s deferred cleanup, dispatch a web push
notification when the chat reaches a terminal state:
- **`waiting`** (success): "Agent has finished running."
- **`error`** (failure): the error message, or "Agent encountered an
error."
- Sub-agent chats (`ParentChatID.Valid`) are skipped to avoid
notification spam from internal delegation
- Gracefully no-ops when the dispatcher is nil (web push disabled)
### Frontend
- New `WebPushButton` component — a bell icon that uses the existing
`useWebpushNotifications` hook
- Returns `null` when the `web-push` experiment is off
- Three states: loading spinner, green bell (subscribed), muted bell-off
(unsubscribed)
- Tooltip + toast feedback on toggle
- Added to both the Agents page empty state top bar and the AgentDetail
top bar
- The Agents page has its own layout (no standard Navbar), so it needs
its own subscribe button
### End-to-end flow
1. User clicks the bell icon on `/agents` → browser subscribes via VAPID
2. User starts an agent chat → chat enters `running` status
3. Agent finishes → `processChat` defer sets status to `waiting`/`error`
→ dispatches web push
4. Browser service worker shows a desktop notification with the chat
title and status
---------
Co-authored-by: Coder <coder@users.noreply.github.com>
## Summary
The UI has always labeled the action as "Archive agent" but the backend
was performing a hard `DELETE`, permanently destroying chats and all
their messages.
This change replaces the hard delete with a soft archive, consistent
with the pattern used by template versions.
## Changes
### Database
- **Migration 000423**: Add `archived boolean DEFAULT false NOT NULL`
column to `chats` table
- Replace `DeleteChatByID` query with `ArchiveChatByID` (`UPDATE SET
archived = true`)
- Add `UnarchiveChatByID` query (`UPDATE SET archived = false`)
- Filter archived chats from `GetChatsByOwnerID` (`WHERE archived =
false`)
### API
- Remove `DELETE /api/experimental/chats/{chat}`
- Add `POST /api/experimental/chats/{chat}/archive` — archives a chat
and all its descendants
- Add `POST /api/experimental/chats/{chat}/unarchive` — unarchives a
single chat (API only, no UI yet)
### Backend
- `archiveChatTree()` recursively archives child chats (replaces
`deleteChatTree()` which hard-deleted)
- Chat daemon's `ArchiveChat()` archives the full chat tree in a
transaction
- Authorization uses `ActionUpdate` instead of `ActionDelete`
### SDK
- Replace `DeleteChat()` with `ArchiveChat()` and `UnarchiveChat()`
- Add `Archived` field to `Chat` struct
### Frontend
- `archiveChat` API call uses `POST .../archive` instead of `DELETE`
- No UI changes — the "Archive agent" button now actually archives
instead of deleting
## Design Decision
This follows the **template version archive pattern** (Pattern B in the
codebase):
- `archived boolean` column (not `deleted boolean`)
- Dedicated `POST .../archive` and `POST .../unarchive` routes (not
repurposing `DELETE`)
- Reversible — users can unarchive via the API (UI for this will come
later)
## Summary
Custom roles that can create workspaces on behalf of other users need to
be able to list users to populate the owner dropdown in the workspace
creation UI. Previously, this required a separate `user:read`
permission, causing the dropdown to fail for custom roles.
## Changes
- Modified `GetUsers` in `dbauthz` to check if the user can create
workspaces for any owner (`workspace:create` with `owner_id: *`)
- If the user has this permission, they can list all users without
needing explicit `user:read` permission
- Added tests to verify the new behavior
## Testing
- Updated mock tests to assert the new authorization check
- Added integration tests for both positive and negative cases
Fixes#18203
## Summary
> NOTE: Calling this out as a breaking change in case existing consumers
of the CLI depend on being able to see expired tokens OR being able to
delete tokens immediately.
Updates the `coder tokens rm` command to immediately expire a token by
ID, preserving the token record for audit trail purposes. Tokens can
still be deleted by passing `--delete`.
## Problem
During an incident on dev.coder.com, operators needed to urgently expire
an API key that was stuck in a hot loop. The only way to do this was via
direct database access:
```sql
UPDATE api_keys SET expires_at = NOW() WHERE id = '...';
```
This is not ideal for operators who may not have direct DB access or
want to avoid manual SQL.
## Solution
This PR adds:
- **API endpoint**: `PUT /api/v2/users/{user}/keys/{keyid}/expire` -
Sets the token's `expires_at` to now
- **SDK method**: `ExpireAPIKey(ctx, userID, keyID)`
- **Updates CLI**: `coder tokens rm <name|id|token>` now _expires_ by
default. You can still delete by passing the `--delete` flag. The `coder
tokens list` command now also hides expired tokens by default. You can
`--include-expired` if needed to include them.
- **Audit logging**: The expire action is logged with old and new key
states
## Test plan
- Tests cover: owner expiring own token, admin expiring other user's
token, non-admin cannot expire other's token, 404 for non-existent token
Closes#21782🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This PR adds some metrics to help identify job enqueue rates and
latencies. This work was initiated as a way to help reduce the cost of
the observation/measurement itself for autostart scaletests, which
impacts our ability to identify/reason about the load caused by
autostart. See: https://github.com/coder/internal/issues/1209
I've extended the metrics here to account for regular user initiated
builds, prebuilds, autostarts, etc. IMO there is still the question here
of whether we want to include or need the `transition` label, which is
only present on workspace builds. Including it does lead to an increase
in cardinality, and in the case of the histogram (when not using native
histograms) that's at least a few extra series for every bucket. We
could remove the transition label there but keep it on the counter.
Additionally, the histogram is currently observing latencies for other
jobs, such as template builds/version imports, those do not have a
transition type associated with them.
Tested briefly in a workspace, can see metric values like the following:
-
`coderd_workspace_builds_enqueued_total{build_reason="autostart",provisioner_type="terraform",status="success",transition="start"}
1`
-
`coderd_provisioner_job_queue_wait_seconds_bucket{build_reason="autostart",job_type="workspace_build",provisioner_type="terraform",transition="start",le="0.025"}
1`
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Closes https://github.com/coder/internal/issues/1261.
This pull request adds an endpoint to pause coder tasks by stopping the
underlying workspace.
* Instead of `POST /api/v2/tasks/{user}/{task}/pause`, the endpoint is
currently experimental.
* We do not currently set the build reason to `task_manual_pause`,
because build reasons are currently only used on stop transitions.
Adds coderd_template_workspace_build_duration_seconds histogram that
tracks the full duration from workspace build creation to agent ready.
This captures the complete user-perceived build time including
provisioning and agent startup.
The metric is emitted when the agent reports ready/error/timeout via the
lifecycle API, ensuring each build is counted exactly once per replica.
fixes: https://github.com/coder/internal/issues/1300
Adds brotli and zstd compression to the binary cache. Also refactors coderd's streaming encoding middleware to use the same standard set of compression algorithms, so we have them in one place.
relates to: https://github.com/coder/internal/issues/1300
Refactors the options to the site handler to take the cache directory, rather than expecting the caller to call `ExtractOrReadBinFS` and pass the results.
This is important in this stack because we need direct access to the cache directory for compressed file caching.
Implements telemetry for boundary usage tracking across all Coder
replicas and reports them via telemetry.
Changes:
- Implement Tracker with Track(), FlushToDB(), and StartFlushLoop() methods
- Add telemetry integration via collectBoundaryUsageSummary()
- Use telemetry lock to ensure only one replica collects per period
The tracker accumulates unique workspaces, unique users, and request
counts (allowed/denied) in memory, then flushes to the database
periodically. During telemetry collection, stats are aggregated across
all replicas and reset for the next period.
This change adds a POST /workspaceagents/me/tasks/{task}/log-snapshot
endpoint for agents to upload task conversation history during
workspace shutdown. This allows users to view task logs even when the
workspace is stopped.
The endpoint accepts agentapi format payloads (typically last 10
messages, max 64KB), wraps them in a format envelope, and upserts to the
task_snapshots table. Uses agent token auth and validates the task
belongs to the agent's workspace.
Closescoder/internal#1253
- Adds pprof collection support now that we have the listeners
automatically starting (requires Coder server 2.28.0+, includes a
version check). Collects heap, allocs, profile (30s), block, mutex,
goroutine, threadcreate, trace (30s), cmdline, symbol. Performs capture
for 30 seconds and emits a log line stating as such. Enable capture by
supplying the `--pprof` flag or `CODER_SUPPORT_BUNDLE_PPROF` env var.
Collection of pprof data from both coderd and the Coder agent occurs.
- Adds collection of Prometheus metrics, also requires 2.28.0+
- Adds the ability to include a template in the bundle independently of
supplying the details of a running workspace by supplying the
`--template` flag or `CODER_SUPPORT_BUNDLE_TEMPLATE` env var
- Captures a list of workspaces the user has access to. Defaults to a
max of 10, configurable via `--workspaces-total-cap` /
`CODER_SUPPORT_BUNDLE_WORKSPACES_TOTAL_CAP`
- Collects additional stats from the coderd deployment (aggregated
workspace/session metrics), as well as entitlements via license and
dismissed health checks.
created with help from mux
Relates to https://github.com/coder/internal/issues/1214
The `ExtractWorkspaceAgentParam` middleware ends up making 4 database
queries to follow the chain of `WorkspaceAgent` -> `WorkspaceResource`
-> `ProvisionerJob` -> `WorkspaceBuild` -- but then dropping all that
hard work on the floor. The `api.workspaceAgent` handler that references
this middleware then has to do all of that work again, plus one more
query to get the related `User` so we can get the username. This pattern
is also mirrored in `getDatabaseTerminal` but without the middleware.
This PR:
* Adds a new query `GetWorkspaceAgentAndWorkspaceByID` to fetch all
this information at once to avoid the multiple round-trips,
* Updates the existing usage of `GetWorkspaceAgentByID` to this new
query instead,
* Updates `ExtractWorkspaceAgentParam` to also store the workspace in
the request context
Dalibo: [0.63ms](https://explain.dalibo.com/plan/40bb597f3539gc6c)
Relates to https://github.com/coder/internal/issues/272
This flake has been persisting for a while, and unfortunately there's no
detail on which healthcheck in particular is holding things up.
This PR adds a concurrency-safe `healthcheck.Progress` and wires it
through `healthcheck.Run`. If the healthcheck times out, it will provide
information on which healthchecks are completed / running, and how long
they took / are still taking.
🤖 Claude Opus 4.5 completed the first round of this implementation,
which I then refactored.
Extracts part of the prometheus middleware that stores the route
information in the request context into its own middleware. Also adds
request method information to context.
Relates to https://github.com/coder/internal/issues/1214
## Summary
Adds a `--no-build` flag to `coder state push` that updates the
Terraform state directly without triggering a workspace build.
## Use Case
This enables state-only migrations, such as migrating Kubernetes
resources from deprecated types (e.g., `kubernetes_config_map`) to
versioned types (e.g., `kubernetes_config_map_v1`):
```bash
coder state pull my-workspace > state.json
terraform init
terraform state rm -state=state.json kubernetes_config_map.example
terraform import -state=state.json kubernetes_config_map_v1.example default/example
coder state push --no-build my-workspace state.json
```
## Changes
- Add `PUT /api/v2/workspacebuilds/{id}/state` endpoint to update state
without triggering a build
- Add `UpdateWorkspaceBuildState` SDK method
- Add `--no-build`/`-n` flag to `coder state push`
- Add confirmation prompt (can be skipped with `--yes`/`-y`) since this
is a potentially dangerous operation
- Add test for `--no-build` functionality
Fixes#21336
Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:
```
import (
"context"
"time"
"github.com/prometheus/client_golang/prometheus"
"golang.org/x/xerrors"
"gopkg.in/natefinch/lumberjack.v2"
"cdr.dev/slog/v3"
"github.com/coder/coder/v2/codersdk/agentsdk"
"github.com/coder/serpent"
)
```
3 groups: standard library, 3rd partly libs, Coder libs.
This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).
It also updates dependencies that also use slog and were updated.
I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.
Other dependencies, I pushed new tags.
Because this affects more than just the template insights
page (specifically it also affects the deployment stats endpoint which
is shown on bottom bar and Prometheus), the group is being renamed
generically to just "stats collection". In the future if we need to
affect the other stats we can put those options here.
Then, because this change only affects a portion of stats, specifically
usage stats like connection and application time, bytes sent, etc, add a
new sub-group called "usage stats".
Then finally add back the "enable" flag. This also gives us a place to
one day place an "anonymize" flag if we need to go that route.
**Breaking Change:** Existing oauth apps might now use PKCE. If an
unknown IdP type was being used, and it does not support PKCE, it will
break.
To fix, set the PKCE methods on the external auth to `none`
```
export CODER_EXTERNAL_AUTH_1_PKCE_METHODS=none
```
Closes#20399
To summarize the original commit messages:
- Do not log stats to the database.
- Return errors on the insight endpoints.
- Update the frontend to show those errors.
- Also fixes an issue with getting the user status count via codersdk,
since I added a test to ensure it was not disabled by this flag and it
was sending the wrong payload.
Adds `--disable-workspace-sharing` option.
Workspace sharing is disabled by not including user and group ACLs in
the workspace RBAC object, which prevents ACL-based authz.
Closes https://github.com/coder/internal/issues/1072
The commit also adds saving of workspace user/group ACLs in the test DB
data generator.
## Problem
Users may not realize that task notifications are disabled by default.
To improve awareness, we show a warning alert on the Tasks page when all
task notifications are disabled.
**Alert visibility logic:**
- Shows when **all** task notification templates (Task Working, Task
Idle, Task Completed, Task Failed) are disabled
- Can be dismissed by the user, which stores the dismissal in the user
preferences API
- If the user later enables any task notification in Account Settings,
the dismissal state is cleared so the alert will show again if they
disable all notifications in the future
<img width="2980" height="1588" alt="Screenshot 2025-11-25 at 17 48 17"
src="https://github.com/user-attachments/assets/316bf097-d9d2-4489-bc16-2987ba45f45c"
/>
## Changes
- Added a warning alert to the Tasks page when all task notifications
are disabled
- Introduced new `/users/{user}/preferences` endpoint to manage user
preferences (stored in `user_configs` table)
- Alert is dismissible and stores the dismissal state via the new user
preferences API endpoint
- Enabling any task notification in Account Settings clears the
dismissal state via the preferences API
- Added comprehensive Storybook stories for both TasksPage and
NotificationsPage to test all alert visibility states and interactions
Closes: https://github.com/coder/internal/issues/1089
This is to debug context timeouts on API requests to the agent.
Because rbac and database cannot be imported in slim, split the logger
middleware into slim and non-slim versions and break out the recovery
middleware.
This PR adds the backend implementation for modifying task prompts. Part
of https://github.com/coder/internal/issues/1084
## Changes
- New `UpdateTaskPrompt` database query to update task prompts
- New PATCH `/api/v2/tasks/{task}/prompt` endpoint
## Notes
This is part 1 of a 2-part PR stack. The frontend UI will be added in a
follow-up PR based on this branch
(https://github.com/coder/coder/pull/20812).
---
🤖 PR was written by Claude Sonnet 4.5 Thinking using [Coder
Mux](https://github.com/coder/cmux) and reviewed by a human 👩
Fixes: #20744
Upsert audit and connection log entries with a context derived from the API context, rather than the individual request so that we don't error out if the request is canceled or the client hangs up (e.g. if we return an error).
Experiments passed to provisioners to determine behavior. This adds
`--experiments` flag to provisioner daemons. Prior to this, provisioners
had no method to turn on/off experiments.
The authz recorder is causing a lot of memory to be allocated, and is a
memory leak for websocket connections.
This change makes it opt-in on a per request basis (ontop of `isDev`).
To get the authz headers, use `Copy as cURL` on chrome and append the
header `x-authz-checks=true`.
Adds the following debug routes for people with the `debug_info:read`
permission:
- `/api/v2/debug/pprof` for `net/http/pprof`
- `/`
- `/cmdline`
- `/profile`
- `/symbol`
- `/trace`
- `/*`
- `/api/v2/debug/metrics` for Prometheus metrics
# Add support for low-level API key scopes
This PR adds support for fine-grained API key scopes based on RBAC resource:action pairs. It includes:
1. A new endpoint `/api/v2/auth/scopes` to list all public low-level API key scopes
2. Generated constants in the SDK for all public scopes
3. Tests to verify scope validation during token creation
4. Updated API documentation to reflect the expanded scope options
The implementation allows users to create API keys with specific permissions like `workspace:read` or `template:use` instead of only the legacy `all` or `application_connect` scopes.
Fixes#19847