Use typed atomics (atomic.Int64, atomic.Int32, etc.) in test files to prevent
mixing atomic and non-atomic access on the same value, guarantee 64-bit
alignment on 32-bit platforms, and provide a cleaner API.
Adds structured `secret_requirements` to dynamic parameter responses and
enforces missing required secrets during workspace start.
Stop, delete, and tag rendering paths skip secret requirement
enforcement so unmet secrets do not prevent cleanup. The SDK, generated
API docs/types, and backend render/resolver/wsbuilder tests are updated
for the new contract.
## Problem
When a template has `require_active_version` enabled and the chat agent
tries to start a workspace that is stopped on an older template version,
the agent gets stuck in an infinite loop: `start_workspace` fails with a
403 (the old version is not the active version and the user lacks
`ActionUpdate` on the template), then `create_workspace` sees the
existing stopped workspace and tells the agent to use `start_workspace`,
repeat forever.
The root cause is that `chatStartWorkspace()` passes the start build
request through without setting `TemplateVersionID`, so `wsbuilder`
defaults to the previous build's template version — which RBAC rejects
when `RequireActiveVersion` is true.
## Fix
In `chatStartWorkspace()` (`coderd/exp_chats.go`), when the template's
access control has `RequireActiveVersion` enabled, explicitly set
`req.TemplateVersionID` to `template.ActiveVersionID` before calling
`postWorkspaceBuildsInternal()`. This mirrors how the autobuild executor
handles the same scenario (`coderd/autobuild/lifecycle_executor.go`).
If the new active version introduces required parameters that cannot be
resolved automatically (no defaults, no previous values), the build
fails at parameter validation before a provisioner job is created. In
that case, a clear error message tells the user to update and start the
workspace from the UI instead of surfacing a raw internal error.
On successful auto-update, the tool response includes
`updated_to_active_version`, `update_reason`, and a human-readable
`message` so the model can explain to the user what happened.
<img width="782" height="122" alt="image"
src="https://github.com/user-attachments/assets/289430d6-066e-41cf-bc97-cd013dcf717d"
/>
### Changes
- **`coderd/exp_chats.go`**: `chatStartWorkspace()` loads the template,
checks `RequireActiveVersion` via `AccessControlStore`, and pins the
build to the active version when required. New
`isChatStartWorkspaceManualUpdateRequiredError()` classifies parameter
validation failures from both the dynamic parameters path
(`DiagnosticError`) and the classic path (`ErrParameterValidation`
sentinel).
- **`coderd/wsbuilder/wsbuilder.go`**: New `ErrParameterValidation`
sentinel error, wrapped into the classic parameter validation
`BuildError` so callers can use `errors.Is` instead of string matching.
- **`coderd/x/chatd/chattool/startworkspace.go`**:
`waitForAgentAndRespond` now returns `map[string]any` instead of
`fantasy.ToolResponse`, letting the caller annotate the result (e.g.
auto-update metadata) before converting. Error handling for `StartFn`
checks for `httperror.Responder` errors to surface clean messages for
the manual-update case.
- **`coderd/x/chatd/chattool/startworkspace_test.go`**: Two new tests —
`StoppedWorkspaceReportsAutoUpdate` (verifies auto-update fields in
response) and `ManualUpdateRequired` (verifies clean error message
without internal wrapping).
### Follow-up
The manual-update error message could include a direct link to the
workspace settings page, but the chattool layer does not currently have
access to the deployment's access URL. Plumbing it through is
straightforward but out of scope for this fix.
Closes CODAGT-192
Replace the old `InTx` ruleguard rule in `scripts/rules.go` with a
custom in-tree `go/analysis` analyzer under `scripts/intxcheck/`. The
new analyzer catches the same direct and pass-through misuse classes as
before, plus two new classes the pattern-matcher couldn't reach:
- **Indirect same-package helper misuse** — flags `p.someHelper(ctx)`
inside `InTx` when the helper body uses the outer store (the PR #24369
bug class).
- **Nested dangerous closures** — descends into `go func() { ... }()`,
`defer func() { ... }()`, and immediately-invoked function literals.
The analyzer uses semantic `types.Object` identity instead of raw
expression string comparison, which avoids false positives from
closure-local shadowing and catches simple aliases like `outer := s.db`
and `alias := s`.
This PR also fixes three real outer-store-inside-transaction bugs the
new analyzer surfaced:
- `coderd/wsbuilder/wsbuilder.go`: `FindMatchingPresetID` and
`getWorkspaceTask` now use the inner transaction store instead of
`b.store`.
- `enterprise/dbcrypt/dbcrypt.go`: `ensureEncrypted` now calls
`s.InsertDBCryptKey` (the tx-wrapped store) instead of
`db.InsertDBCryptKey`. The `dbCrypt.InTx` method wraps the raw tx in a
new `*dbCrypt`, so `s.InsertDBCryptKey` still dispatches through the
encryption layer.
Two call sites need `// intxcheck:ignore` suppressions. Both are one-off
patterns that only look like misuse because the analyzer doesn't track
assignments — proving them safe would require full dataflow analysis,
which is well beyond what a targeted lint like this should attempt:
- `coderd/database/dbfake/dbfake.go` — `b.db` is reassigned to `tx` on
the preceding line, so `b.doInTX()` actually uses the transaction. The
analyzer sees the original `b.db` identity and flags it.
- `coderd/database/db_test.go` — test intentionally passes the outer
store to `require.Equal` to assert that nested `InTx` returns the same
handle.
Suppressions use `// intxcheck:ignore` instead of `//nolint:intxcheck`
because `intxcheck` runs as a standalone `go/analysis` tool outside
golangci-lint. golangci-lint's `nolintlint` checker flags `//nolint`
directives for linters it doesn't control, so we use a custom comment
prefix to avoid that conflict.
## Description
- Updates `wsbuilder` to return a `BuildError` with
`http.StatusBadRequest` to signify a "validation error" on missing or
invalid parameters
- Adds a short-circuit in `prebuilds.StoreReconciler` to mark presets
for which creating a build returns a "validation error" as "validation
failed" and skip further attempts to reconcile.
- Adds a test to verify the above
- Introduces a new Prometheus metric
`coderd_prebuilt_workspaces_preset_validation_failed` to track the above
Closes: https://github.com/coder/coder/issues/21237
---------
Co-authored-by: Cian Johnston <cian@coder.com>
The provisioner state for a workspace build was being loaded for every
long-lived agent rpc connection. Since this state can be anywhere from
kilobytes to megabytes this can gradually cause the `coderd` memory
footprint to grow over time. It's also a lot of unnecessary allocations
for every query that fetches a workspace build since only a few callers
ever actually reference the provisioner state.
This PR removes it from the returned workspace build and adds a query to
fetch the provisioner state explicitly.
This PR adds some metrics to help identify job enqueue rates and
latencies. This work was initiated as a way to help reduce the cost of
the observation/measurement itself for autostart scaletests, which
impacts our ability to identify/reason about the load caused by
autostart. See: https://github.com/coder/internal/issues/1209
I've extended the metrics here to account for regular user initiated
builds, prebuilds, autostarts, etc. IMO there is still the question here
of whether we want to include or need the `transition` label, which is
only present on workspace builds. Including it does lead to an increase
in cardinality, and in the case of the histogram (when not using native
histograms) that's at least a few extra series for every bucket. We
could remove the transition label there but keep it on the counter.
Additionally, the histogram is currently observing latencies for other
jobs, such as template builds/version imports, those do not have a
transition type associated with them.
Tested briefly in a workspace, can see metric values like the following:
-
`coderd_workspace_builds_enqueued_total{build_reason="autostart",provisioner_type="terraform",status="success",transition="start"}
1`
-
`coderd_provisioner_job_queue_wait_seconds_bucket{build_reason="autostart",job_type="workspace_build",provisioner_type="terraform",transition="start",le="0.025"}
1`
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Only task workspaces have the checks in wsbuilder for violating the
managed agent caps in the license.
Stopped tasks that are resumed with a regular workspace start **still
count as usage**.
Use transition-specific actions when authorizing workspace build
parameter inserts in the database layer so start/stop/delete do not
require workspace.update.
Related to: https://github.com/coder/internal/issues/1299
Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:
```
import (
"context"
"time"
"github.com/prometheus/client_golang/prometheus"
"golang.org/x/xerrors"
"gopkg.in/natefinch/lumberjack.v2"
"cdr.dev/slog/v3"
"github.com/coder/coder/v2/codersdk/agentsdk"
"github.com/coder/serpent"
)
```
3 groups: standard library, 3rd partly libs, Coder libs.
This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
Closes https://github.com/coder/coder/issues/18356.
This change finds and selects a matching preset if one was not chosen
during workspace creation. This solidifies the relationship between
presets and parameters.
When a workspace is created without in explicitly chosen preset, it will
now still be eligible to claim a prebuilt workspace if one is available.
This PR sets a constraint of 1MB on the provisioner job logs written to
the database. This is consistent with the constraint we place on
workspace agent logs:
https://github.com/coder/coder/blob/4ac6be6d835dc36c242e35a26b584b784040bf28/coderd/database/dump.sql#L2030
It also adds a message printed to the front end about the provisioner
log overflow, and updates the message printed to the front end when
workspace startup logs exceed the max, as it was causing some customers
to think their startup script had failed to run.
- Adds a query for counting managed agent workspace builds between two
timestamps
- The "Actual" field in the feature entitlement for managed agents is
now populated with the value read from the database
- The wsbuilder package now validates AI agent usage against the limit
when a license is installed
Closescoder/internal#777
## Description
Follow-up from PR https://github.com/coder/coder/pull/18333
Related with:
https://github.com/coder/coder/pull/18333#discussion_r2159300881
This changes the authorization logic to first try the normal workspace
authorization check, and only if the resource is a prebuilt workspace,
fall back to the prebuilt workspace authorization check. Since prebuilt
workspaces are a subset of workspaces, the normal workspace check is
more likely to succeed. This is a small optimization to reduce
unnecessary prebuilt authorization calls.
`BuildError` response from `wsbuilder` does not support rich errors from validation. Changed this to use the `Validations` block of codersdk responses to return all errors for invalid parameters.
When in experimental this was used as an escape hatch. Removed to be
consistent with the template author's intentions
Backwards compatible, removing an experimental api field that is no longer used.
# What does this do?
This does parameter validation for dynamic parameters in `wsbuilder`. All input parameters are validated in `coder/coder` before being sent to terraform.
The heart of this PR is [`ResolveParameters`](https://github.com/coder/coder/blob/b65001e89c0577199a8e470c138c51e91cf2350c/coderd/dynamicparameters/resolver.go#L30-L30).
# What else changes?
`wsbuilder` now needs to load the terraform files into memory to succeed. This does add a larger memory requirement to workspace builds.
# Future work
- Sort autostart handling workspaces by template version id. So workspaces with the same template version only load the terraform files once from the db, and store them in the cache.
Use richer `previewtypes.Parameter` for `wsbuilder`. This is a pre-requirement to adding dynamic parameter validation.
The richer type contains more information than the `db` parameter, so the conversion is lossless.
Alternate fix for https://github.com/coder/coder/issues/18080
Modifies wsbuilder to complete the provisioner job and mark the
workspace as deleted if it is clear that no provisioner will be able to
pick up the delete build.
This has a significant advantage of not deviating too much from the
current semantics of `POST /api/v2/workspacebuilds`.
https://github.com/coder/coder/pull/18460 ends up returning a 204 on
orphan delete due to no build being created.
Downside is that we have to duplicate some responsibilities of
provisionerdserver in wsbuilder.
There is a slight gotcha to this approach though: if you stop a
provisioner and then immediately try to orphan-delete, the job will
still be created because of the provisioner heartbeat interval. However
you can cancel it and try again.
## Description
This PR adds support for deleting prebuilt workspaces via the
authorization layer. It introduces special-case handling to ensure that
`prebuilt_workspace` permissions are evaluated when attempting to delete
a prebuilt workspace, falling back to the standard `workspace` resource
as needed.
Prebuilt workspaces are a subset of workspaces, identified by having
`owner_id` set to `PREBUILD_SYSTEM_USER`.
This means:
* A user with `prebuilt_workspace.delete` permission is allowed to
**delete only prebuilt workspaces**.
* A user with `workspace.delete` permission can **delete both normal and
prebuilt workspaces**.
⚠️ This implementation is scoped to **deletion operations only**. No
other operations are currently supported for the `prebuilt_workspace`
resource.
To delete a workspace, users must have the following permissions:
* `workspace.read`: to read the current workspace state
* `update`: to modify workspace metadata and related resources during
deletion (e.g., updating the `deleted` field in the database)
* `delete`: to perform the actual deletion of the workspace
## Changes
* Introduced `authorizeWorkspace()` helper to handle prebuilt workspace
authorization logic.
* Ensured both `prebuilt_workspace` and `workspace` permissions are
checked.
* Added comments to clarify the current behavior and limitations.
* Moved `SystemUserID` constant from the `prebuilds` package to the
`database` package `PrebuildsSystemUserID` to resolve an import cycle
(commit
https://github.com/coder/coder/pull/18333/commits/f24e4ab4b6f0a56726fd04be2d7302c9fdb52d53).
* Update middleware `ExtractOrganizationMember` to include system user
members.
The fields must be nullable because there’s a period of time between
inserting a row into the database and finishing the “plan” provisioner
job when the final value of the field is unknown.
The existing code persists all static parameters and their values. Using
the previous build as the source if no new inputs are found.
Dynamic params do not have a state of the parameters saved to disk. So
instead, all previous values are persisted always, and new inputs
override.
Dynamic params skip parameter validation in coder/coder.
This is because conditional parameters cannot be validated
with the static parameters in the database.
Pass through the user input as is. The previous code only passed through
parameters that existed in the db (static params). This would omit
conditional params.
Validation is enforced by the dynamic params websocket, so validation at
this point is not required.
This pull request allows coder workspace agents to be reinitialized when
a prebuilt workspace is claimed by a user. This facilitates the transfer
of ownership between the anonymous prebuilds system user and the new
owner of the workspace.
Only a single agent per prebuilt workspace is supported for now, but
plumbing has already been done to facilitate the seamless transition to
multi-agent support.
---------
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
Follow-up from a [previous Pull
Request](https://github.com/coder/coder/pull/16965) required some
additional testing of Presets from the API perspective.
In the process of adding the new tests, I updated the API to enforce
preset parameter values based on the selected preset instead of trusting
whichever frontend makes the request. This avoids errors scenarios in
prebuilds where a prebuild might expect a certain preset but find a
different set of actual parameter values.
Using negative permissions, this role prevents a user's ability to
create & delete a workspace within a given organization.
Workspaces are uniquely owned by an org and a user, so the org has to
supercede the user permission with a negative permission.
# Use case
Organizations must be able to restrict a member's ability to create a
workspace. This permission is implicitly granted (see
https://github.com/coder/coder/issues/16546#issuecomment-2655437860).
To revoke this permission, the solution chosen was to use negative
permissions in a built in role called `WorkspaceCreationBan`.
# Rational
Using negative permissions is new territory, and not ideal. However,
workspaces are in a unique position.
Workspaces have 2 owners. The organization and the user. To prevent
users from creating a workspace in another organization, an [implied
negative
permission](https://github.com/coder/coder/blob/36d9f5ddb3d98029fee07d004709e1e51022e979/coderd/rbac/policy.rego#L172-L192)
is used. So the truth table looks like: _how to read this table
[here](https://github.com/coder/coder/blob/36d9f5ddb3d98029fee07d004709e1e51022e979/coderd/rbac/README.md#roles)_
| Role (example) | Site | Org | User | Result |
|-----------------|------|------|------|--------|
| non-org-member | \_ | N | YN\_ | N |
| user | \_ | \_ | Y | Y |
| WorkspaceBan | \_ | N | Y | Y |
| unauthenticated | \_ | \_ | \_ | N |
This new role, `WorkspaceCreationBan` is the same truth table condition
as if the user was not a member of the organization (when doing a
workspace create/delete). So this behavior **is not entirely new**.
<details>
<summary>How to do it without a negative permission</summary>
The alternate approach would be to remove the implied permission, and
grant it via and organization role. However this would add new behavior
that an organizational role has the ability to grant a user permissions
on their own resources?
It does not make sense for an org role to prevent user from changing
their profile information for example. So the only option is to create a
new truth table column for resources that are owned by both an
organization and a user.
| Role (example) | Site | Org |User+Org| User | Result |
|-----------------|------|------|--------|------|--------|
| non-org-member | \_ | N | \_ | \_ | N |
| user | \_ | \_ | \_ | \_ | N |
| WorkspaceAllow | \_ | \_ | Y | \_ | Y |
| unauthenticated | \_ | \_ | \_ | \_ | N |
Now a user has no opinion on if they can create a workspace, which feels
a little wrong. A user should have the authority over what is theres.
There is fundamental _philosophical_ question of "Who does a workspace
belong to?". The user has some set of autonomy, yet it is the
organization that controls it's existence. A head scratcher 🤔
</details>
## Will we need more negative built in roles?
There are few resources that have shared ownership. Only
`ResourceOrganizationMember` and `ResourceGroupMember`. Since negative
permissions is intended to revoke access to a shared resource, then
**no.** **This is the only one we need**.
Classic resources like `ResourceTemplate` are entirely controlled by the
Organization permissions. And resources entirely in the user control
(like user profile) are only controlled by `User` permissions.
![Uploading Screenshot 2025-02-26 at 22.26.52.png…]()
---------
Co-authored-by: Jaayden Halko <jaayden.halko@gmail.com>
Co-authored-by: ケイラ <mckayla@hey.com>
This pull requests adds the necessary migrations and queries to support
presets within the coderd database. Future PRs will build functionality
to the provisioners and the frontend.
Relates to https://github.com/coder/coder/issues/15894:
- Adds `coderdenttest.NewExternalProvisionerDaemonTerraform`
- Adds integration-style test coverage for creating a workspace with
`coder_workspace_tags` specified in `main.tf`
- Modifies `coderd/wsbuilder` to fetch template version variables and
includes them in eval context for evaluating `coder_workspace_tags`
- Refactors `checkProvisioners` into `db2sdk.MatchedProvisioners`
- Adds a separate RBAC subject just for reading provisioner daemons
- Adds matched provisioners information to additional endpoints relating to
workspace builds and templates
-Updates existing unit tests for above endpoints
-Adds API endpoint for matched provisioners of template dry-run job
-Updates CLI to show warning when creating/starting/stopping/deleting
workspaces for which no provisoners are available
---------
Co-authored-by: Danny Kopping <danny@coder.com>
Before db_metrics were all or nothing. Now `InTx` metrics are always recorded, and query metrics are opt in.
Adds instrumentation & logging around serialization failures in the database.
Removes our pseudo rbac resources like `WorkspaceApplicationConnect` in favor of additional verbs like `ssh`. This is to make more intuitive permissions for building custom roles.
The source of truth is now `policy.go`
Just moved `rbac.Action` -> `policy.Action`. This is for the stacked PR to not have circular dependencies when doing autogen. Without this, the autogen can produce broken golang code, which prevents the autogen from compiling.
So just avoiding circular dependencies. Doing this in it's own PR to reduce LoC diffs in the primary PR, since this has 0 functional changes.