This PR adds a `WatchAllWorkspaces` function with `watch-all-workspaces`
endpoint, which can be used to listen on a single global pubsub channel
for _all_ workspace build updates, and makes use of it in the autostart
scaletest.
This negates the need to use a workspace watch pubsub channel _per_
workspace, which has auth overhead associated with each call. This is
especially relevant in situations such as the autostart scaletest, where
we need to start/stop a set of workspaces before we can configure their
autostart config. The overhead associated with all the watch requests
skews the scaletest results and makes it harder to reason about the
performance of the autostart feature itself.
The autostart scaletest also no longer generates its own metrics nor
does it wait for all the workspaces to actually start via autostart. We
should update the scaletest dashboard after both PRs are merged to
measure autostart performance via the new metrics.
The new function/endpoint and its usage in the autostart scaletest are
gated behind an experiment feature flag, this is something we should
discuss whether we want to enable the endpoint in prod by default or
not. If so, we can remove the experiment.
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Callum Styan <callum@coder.com>
The provisioner state for a workspace build was being loaded for every
long-lived agent rpc connection. Since this state can be anywhere from
kilobytes to megabytes this can gradually cause the `coderd` memory
footprint to grow over time. It's also a lot of unnecessary allocations
for every query that fetches a workspace build since only a few callers
ever actually reference the provisioner state.
This PR removes it from the returned workspace build and adds a query to
fetch the provisioner state explicitly.
This PR adds some metrics to help identify job enqueue rates and
latencies. This work was initiated as a way to help reduce the cost of
the observation/measurement itself for autostart scaletests, which
impacts our ability to identify/reason about the load caused by
autostart. See: https://github.com/coder/internal/issues/1209
I've extended the metrics here to account for regular user initiated
builds, prebuilds, autostarts, etc. IMO there is still the question here
of whether we want to include or need the `transition` label, which is
only present on workspace builds. Including it does lead to an increase
in cardinality, and in the case of the histogram (when not using native
histograms) that's at least a few extra series for every bucket. We
could remove the transition label there but keep it on the counter.
Additionally, the histogram is currently observing latencies for other
jobs, such as template builds/version imports, those do not have a
transition type associated with them.
Tested briefly in a workspace, can see metric values like the following:
-
`coderd_workspace_builds_enqueued_total{build_reason="autostart",provisioner_type="terraform",status="success",transition="start"}
1`
-
`coderd_provisioner_job_queue_wait_seconds_bucket{build_reason="autostart",job_type="workspace_build",provisioner_type="terraform",transition="start",le="0.025"}
1`
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Closes https://github.com/coder/internal/issues/1261.
This pull request adds an endpoint to pause coder tasks by stopping the
underlying workspace.
* Instead of `POST /api/v2/tasks/{user}/{task}/pause`, the endpoint is
currently experimental.
* We do not currently set the build reason to `task_manual_pause`,
because build reasons are currently only used on stop transitions.
* Adds support for parameter `format=text` in the following API routes:
* `/api/v2/workspaceagents/:id/logs`
* `/api/v2/workspacebuilds/:id/logs`
* `/api/v2/templateversions/:id/logs`
* `/api/v2/templateversions/:id/dry-run/:id/logs`
* Adds links to view raw logs on the following pages:
* Workspace build page
* Template editor page
* Template version page
* Refactors existing log formatting in `cli/logs.go` to live in `codersdk`.
🤖 Generated with Claude Opus 4.5, reviewed by me.
---------
Co-authored-by: Claude <noreply@anthropic.com>
Relates to https://github.com/coder/internal/issues/1214
The `ExtractWorkspaceAgentParam` middleware ends up making 4 database
queries to follow the chain of `WorkspaceAgent` -> `WorkspaceResource`
-> `ProvisionerJob` -> `WorkspaceBuild` -- but then dropping all that
hard work on the floor. The `api.workspaceAgent` handler that references
this middleware then has to do all of that work again, plus one more
query to get the related `User` so we can get the username. This pattern
is also mirrored in `getDatabaseTerminal` but without the middleware.
This PR:
* Adds a new query `GetWorkspaceAgentAndWorkspaceByID` to fetch all
this information at once to avoid the multiple round-trips,
* Updates the existing usage of `GetWorkspaceAgentByID` to this new
query instead,
* Updates `ExtractWorkspaceAgentParam` to also store the workspace in
the request context
Dalibo: [0.63ms](https://explain.dalibo.com/plan/40bb597f3539gc6c)
## Summary
Adds a `--no-build` flag to `coder state push` that updates the
Terraform state directly without triggering a workspace build.
## Use Case
This enables state-only migrations, such as migrating Kubernetes
resources from deprecated types (e.g., `kubernetes_config_map`) to
versioned types (e.g., `kubernetes_config_map_v1`):
```bash
coder state pull my-workspace > state.json
terraform init
terraform state rm -state=state.json kubernetes_config_map.example
terraform import -state=state.json kubernetes_config_map_v1.example default/example
coder state push --no-build my-workspace state.json
```
## Changes
- Add `PUT /api/v2/workspacebuilds/{id}/state` endpoint to update state
without triggering a build
- Add `UpdateWorkspaceBuildState` SDK method
- Add `--no-build`/`-n` flag to `coder state push`
- Add confirmation prompt (can be skipped with `--yes`/`-y`) since this
is a potentially dangerous operation
- Add test for `--no-build` functionality
Fixes#21336
Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:
```
import (
"context"
"time"
"github.com/prometheus/client_golang/prometheus"
"golang.org/x/xerrors"
"gopkg.in/natefinch/lumberjack.v2"
"cdr.dev/slog/v3"
"github.com/coder/coder/v2/codersdk/agentsdk"
"github.com/coder/serpent"
)
```
3 groups: standard library, 3rd partly libs, Coder libs.
This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).
It also updates dependencies that also use slog and were updated.
I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.
Other dependencies, I pushed new tags.
Relates to #20925
This PR modifies the `postWorkspaceBuild` handler to automatically unset
dormancy on a workspace when a start transition is requested.
Previously, the client was responsible for unsetting the dormancy on the
workspace prior to posting a workspace build.
As we're moving away from the SidebarAppID nomenclature, this PR
introduces a new `TaskAppID` field to `codersdk.WorkspaceBuild` and
deprecates the `AITaskSidebarAppID` field. They both contain the same
value.
Closes https://github.com/coder/internal/issues/885
Adds a new database method GetProvisionerJobByIDWithLock that uses FOR
UPDATE without SKIP LOCKED to fix workspace build cancellation returning
500 errors when jobs are locked.
This pull request introduces support for external workspace management, allowing users to register and manage workspaces that are provisioned and managed outside of the Coder.
Depends on: https://github.com/coder/terraform-provider-coder/pull/424
* GET /api/v2/init-script - Gets the agent initialization script
* By default, it returns a script for Linux (amd64), but with query parameters (os and arch) you can get the init script for different platforms
* GET /api/v2/workspaces/{workspace}/external-agent/{agent}/credentials - Gets credentials for an external agent **(enterprise)**
* Updated queries to filter workspaces/templates by the has_external_agent field
This PR introduces new build reason values to identify what type of
connection triggered a workspace build, helping to troubleshoot
workspace-related issues.
## Database Migration
Added migration 000349_extend_workspace_build_reason.up.sql that extends
the build_reason enum with new values:
```
dashboard, cli, ssh_connection, vscode_connection, jetbrains_connection
```
## Implementation
The build reason is specified through the API when creating new
workspace builds:
- Dashboard: Automatically sets reason to `dashboard` when users start
workspaces via the web interface
- CLI `start` command: Sets reason to `cli` when workspaces are started
via the command line
- CLI `ssh` command: Sets reason to ssh_connection when workspaces are
started due to SSH connections
- VS Code connections: Will be set to `vscode_connection` by the VS Code
extension through CLI hidden flag
(https://github.com/coder/vscode-coder/pull/550)
- JetBrains connections: Will be set to `jetbrains_connection` by the
Jetbrains Toolbox
(https://github.com/coder/coder-jetbrains-toolbox/pull/150) and
Jetbrains Gateway extension
(https://github.com/coder/jetbrains-coder/pull/561)
## UI Changes:
* Tooltip with reason in Build history
<img width="309" height="457" alt="image"
src="https://github.com/user-attachments/assets/bde8440b-bf3b-49a1-a244-ed7e8eb9763c"
/>
* Reason in Audit Logs Row tooltip
<img width="906" height="237" alt="image"
src="https://github.com/user-attachments/assets/ebbb62c7-cf07-4398-afbf-323c83fb6426"
/>
<img width="909" height="188" alt="image"
src="https://github.com/user-attachments/assets/1ddbab07-44bf-4dee-8867-b4e2cd56ae96"
/>
- Adds a query for counting managed agent workspace builds between two
timestamps
- The "Actual" field in the feature entitlement for managed agents is
now populated with the value read from the database
- The wsbuilder package now validates AI agent usage against the limit
when a license is installed
Closescoder/internal#777
This is the second PR for moving connection events out of the audit log.
This PR:
- Adds the `/api/v2/connectionlog` endpoint
- Adds filtering for `GetAuthorizedConnectionLogsOffset` and thus the endpoint.
There's quite a few, but I was aiming for feature parity with the audit log.
1. `organization:<id|name>`
2. `workspace_owner:<username>`
3. `workspace_owner_email:<email>`
4. `type:<ssh|vscode|jetbrains|reconnecting_pty|workspace_app|port_forwarding>`
5. `username:<username>`
- Only includes web-based connection events (workspace apps, web port forwarding) as only those include user metadata.
6. `user_email:<email>`
7. `connected_after:<time>`
8. `connected_before:<time>`
9. `workspace_id:<id>`
10. `connection_id:<id>`
- If you have one snapshot of the connection log, and some sessions are ongoing in that snapshot, you could use this filter to check if they've been closed since.
11. `status:<connected|disconnected>`
- If `connected` only sessions with a null `close_time` are returned, if `disconnected`, only those with a non-null `close_time`. If filter is omitted, both are returned.
Future PRs:
- Populate `count` on `ConnectionLogResponse` using a seperate query (to preemptively mitigate the issue described in #17689)
- Implement a table in the Web UI for viewing connection logs.
- Write a query to delete old events from the audit log, call it from dbpurge.
- Write documentation for the endpoint / feature (including these filters)
Closes#17791
This PR adds ability to cancel workspace builds that are in "pending"
status.
Breaking changes:
- CancelWorkspaceBuild method in codersdk now accepts an optional
request parameter
API:
- Added `expect_status` query parameter to the cancel workspace build
endpoint
- This parameter ensures the job hasn't changed state before canceling
- API returns `412 Precondition Failed` if the job is not in the
expected status
- Valid values: `running` or `pending`
- Wrapped the entire cancel method in a database transaction
UI:
- Added confirmation dialog to the `Cancel` button, since it's a
destructive operation


- Enabled cancel action for pending workspaces (`expect_status=pending`
is sent if workspace is in pending status)

---------
Co-authored-by: Dean Sheather <dean@deansheather.com>
## Description
Follow-up from PR https://github.com/coder/coder/pull/18333
Related with:
https://github.com/coder/coder/pull/18333#discussion_r2159300881
This changes the authorization logic to first try the normal workspace
authorization check, and only if the resource is a prebuilt workspace,
fall back to the prebuilt workspace authorization check. Since prebuilt
workspaces are a subset of workspaces, the normal workspace check is
more likely to succeed. This is a small optimization to reduce
unnecessary prebuilt authorization calls.
`BuildError` response from `wsbuilder` does not support rich errors from validation. Changed this to use the `Validations` block of codersdk responses to return all errors for invalid parameters.
When in experimental this was used as an escape hatch. Removed to be
consistent with the template author's intentions
Backwards compatible, removing an experimental api field that is no longer used.
# What does this do?
This does parameter validation for dynamic parameters in `wsbuilder`. All input parameters are validated in `coder/coder` before being sent to terraform.
The heart of this PR is [`ResolveParameters`](https://github.com/coder/coder/blob/b65001e89c0577199a8e470c138c51e91cf2350c/coderd/dynamicparameters/resolver.go#L30-L30).
# What else changes?
`wsbuilder` now needs to load the terraform files into memory to succeed. This does add a larger memory requirement to workspace builds.
# Future work
- Sort autostart handling workspaces by template version id. So workspaces with the same template version only load the terraform files once from the db, and store them in the cache.
Alternate fix for https://github.com/coder/coder/issues/18080
Modifies wsbuilder to complete the provisioner job and mark the
workspace as deleted if it is clear that no provisioner will be able to
pick up the delete build.
This has a significant advantage of not deviating too much from the
current semantics of `POST /api/v2/workspacebuilds`.
https://github.com/coder/coder/pull/18460 ends up returning a 204 on
orphan delete due to no build being created.
Downside is that we have to duplicate some responsibilities of
provisionerdserver in wsbuilder.
There is a slight gotcha to this approach though: if you stop a
provisioner and then immediately try to orphan-delete, the job will
still be created because of the provisioner heartbeat interval. However
you can cancel it and try again.
## Description
This PR adds support for deleting prebuilt workspaces via the
authorization layer. It introduces special-case handling to ensure that
`prebuilt_workspace` permissions are evaluated when attempting to delete
a prebuilt workspace, falling back to the standard `workspace` resource
as needed.
Prebuilt workspaces are a subset of workspaces, identified by having
`owner_id` set to `PREBUILD_SYSTEM_USER`.
This means:
* A user with `prebuilt_workspace.delete` permission is allowed to
**delete only prebuilt workspaces**.
* A user with `workspace.delete` permission can **delete both normal and
prebuilt workspaces**.
⚠️ This implementation is scoped to **deletion operations only**. No
other operations are currently supported for the `prebuilt_workspace`
resource.
To delete a workspace, users must have the following permissions:
* `workspace.read`: to read the current workspace state
* `update`: to modify workspace metadata and related resources during
deletion (e.g., updating the `deleted` field in the database)
* `delete`: to perform the actual deletion of the workspace
## Changes
* Introduced `authorizeWorkspace()` helper to handle prebuilt workspace
authorization logic.
* Ensured both `prebuilt_workspace` and `workspace` permissions are
checked.
* Added comments to clarify the current behavior and limitations.
* Moved `SystemUserID` constant from the `prebuilds` package to the
`database` package `PrebuildsSystemUserID` to resolve an import cycle
(commit
https://github.com/coder/coder/pull/18333/commits/f24e4ab4b6f0a56726fd04be2d7302c9fdb52d53).
* Update middleware `ExtractOrganizationMember` to include system user
members.
Relates to https://github.com/coder/coder/issues/15432
Ensures that no workspace build timings with zero values for started_at or ended_at are inserted into the DB or returned from the API.
Dynamic params skip parameter validation in coder/coder.
This is because conditional parameters cannot be validated
with the static parameters in the database.
Closes https://github.com/coder/coder/issues/17691
`ExtractOrganizationMembersParam` will allow fetching a user with only
organization permissions. If the user belongs to 0 orgs, then the user "does not exist"
from an org perspective. But if you are a site-wide admin, then the user does exist.
This does ~95% of the backend work required to integrate the AI work.
Most left to integrate from the tasks branch is just frontend, which
will be a lot smaller I believe.
The real difference between this branch and that one is the abstraction
-- this now attaches statuses to apps, and returns the latest status
reported as part of a workspace.
This change enables us to have a similar UX to in the tasks branch, but
for agents other than Claude Code as well. Any app can report status
now.
- Update go.mod to use Go 1.24.1
- Update GitHub Actions setup-go action to use Go 1.24.1
- Fix linting issues with golangci-lint by:
- Updating to golangci-lint v1.57.1 (more compatible with Go 1.24.1)
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
Related to #17082
Some notifications ( workspace created and workspace manually updated )
are using wrong variables to build the Action URL. Fixing it.
The experimental functions in `golang.org/x/exp/slices` are now
available in the standard library since Go 1.21.
Reference: https://go.dev/doc/go1.21#slices
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
Closes https://github.com/coder/internal/issues/323
This PR adds an `email` field to the `data.owner` payload for workspace
created and workspace manually updated notifications, as well as user
account created/activated/suspended.
Relates to https://github.com/coder/coder/issues/15845
Rather than sending the notification to the user, we send it to the
template admins. We also do not send it to the person that created the
request.
Relates to https://github.com/coder/coder/issues/15845
When the `/workspace/<name>/builds` endpoint is hit, we check if the
requested template version is different to the previously used template
version. If these values differ, we can assume that the workspace has
been manually updated and send the appropriate notification. Automatic
updates happen in the lifecycle executor and bypasses this endpoint
entirely.
- Refactors `checkProvisioners` into `db2sdk.MatchedProvisioners`
- Adds a separate RBAC subject just for reading provisioner daemons
- Adds matched provisioners information to additional endpoints relating to
workspace builds and templates
-Updates existing unit tests for above endpoints
-Adds API endpoint for matched provisioners of template dry-run job
-Updates CLI to show warning when creating/starting/stopping/deleting
workspaces for which no provisoners are available
---------
Co-authored-by: Danny Kopping <danny@coder.com>
Anyone with authz access to a workspace should be able to read timings
information of its builds.
To do this without `AsSystemContext` would do an extra 4 db calls.
Closes#14716Closes#14717
Adds a new user-scoped tailnet API endpoint (`api/v2/tailnet`) with a new RPC stream for receiving updates on workspaces owned by a specific user, as defined in #14716.
When a stream is started, the `WorkspaceUpdatesProvider` will begin listening on the user-scoped pubsub events implemented in #14964. When a relevant event type is seen (such as a workspace state transition), the provider will query the DB for all the workspaces (and agents) owned by the user. This gets compared against the result of the previous query to produce a set of workspace updates.
Workspace updates can be requested for any user ID, however only workspaces the authorised user is permitted to `ActionRead` will have their updates streamed.
Opening a tunnel to an agent requires that the user can perform `ActionSSH` against the workspace containing it.