Commit Graph

113 Commits

Author SHA1 Message Date
Steven Masley 19573e8aee feat!: patchTemplateMeta to use optional fields (#24984)
Closes https://github.com/coder/coder/issues/13112

**Breaking Change**: Removed status code `StatusNotModified` when no
diffs occur in a patch. Now the patch is always applied and a template
is always returned.
2026-05-11 12:43:52 -05:00
Cian Johnston d7439a9de0 feat: add Prometheus metrics for chatd subsystem (#24371)
Adds 7 Prometheus metrics to the chatd subsystem and introduces typed
`ActivityBumpReason` for deadline bump attribution.

| Metric | Type | Labels |
|--------|------|--------|
| `coderd_chatd_chats` | Gauge | `state` (streaming, waiting) |
| `coderd_chatd_message_count` | Histogram | `provider` |
| `coderd_chatd_prompt_size_bytes` | Histogram | `provider` |
| `coderd_chatd_tool_result_size_bytes` | Histogram | `provider`,
`tool_name` |
| `coderd_chatd_ttft_seconds` | Histogram | `provider` |
| `coderd_chatd_compaction_total` | Counter | `provider`, `result` |
| `coderd_chatd_steps_total` | Counter | `provider` |

> 🤖
2026-04-15 19:53:10 +01:00
Steven Masley 84de391f26 chore: add tallyman events for ai seat tracking (#22689)
AI seat tracking inserted as heartbeat into usage table.
2026-03-18 09:30:22 -05:00
George K e5c19d0af4 feat: backend support for creating and storing service accounts (#22698)
Add is_service_account column to users table with CHECK constraints
enforcing login_type='none' and empty email for service accounts.
Update user creation API to validate service account constraints.

Related to:
https://linear.app/codercom/issue/PLAT-27/feat-backend-support-for-creating-and-storing-service-accounts
2026-03-11 10:19:08 -07:00
Cian Johnston 81468323e0 fix(coderd): use dbtime.Now() instead of time.Now() in test assertions against DB timestamps (#22685)
`time.Now()` has nanosecond precision while Postgres timestamps are
microsecond precision. When tests compare `time.Now()` against
DB-sourced timestamps using `Before`/`After`/`WithinRange`/etc., there
is a non-zero flake risk from the precision mismatch.

This replaces `time.Now()` with `dbtime.Now()` (which rounds to
microsecond precision) in all test assertions that compare against
database timestamps.

Follows from #22684.

## Changes (11 files)

| File | Changes |
|---|---|
| `coderd/apikey_test.go` | 11 comparisons with `ExpiresAt` |
| `coderd/users_test.go` | 2 comparisons with `ExpiresAt` |
| `coderd/oauth2_test.go` | 1 comparison with `token.Expiry` |
| `coderd/workspaces_test.go` | 2 comparisons with `DormantAt` |
| `coderd/workspaceagents_test.go` | 3 comparisons with
`ConnectedAt`/`DisconnectedAt` |
| `coderd/workspaceapps/db_test.go` | 1 comparison with `token.Expiry` |
| `coderd/provisionerdserver/provisionerdserver_test.go` | 1 comparison
with `key.ExpiresAt` |
| `enterprise/coderd/workspaces_test.go` | 1 comparison with `DormantAt`
|
| `enterprise/coderd/license/license_test.go` | 3 `NotBefore` values |
| `enterprise/coderd/licenses_test.go` | 2 `NotBefore` values |
| `enterprise/coderd/users_test.go` | 3 `Next()` comparisons |

## Not changed (intentionally)

- `scaletest/placebo/run_test.go` — compares wall-clock elapsed time,
not DB timestamps
- `cli/server_test.go`, `coderd/jwtutils/jwt_test.go`,
`enterprise/aibridgeproxyd/aibridgeproxyd_test.go` — TLS cert fields,
not DB-stored
- `coderd/azureidentity/azureidentity_test.go` — Azure cert expiry, not
DB


🤖 Generated by Claude Opus 4.6 but reviewed manually.
2026-03-06 09:14:11 +00:00
Kyle Carberry 219d02bdc3 fix(coderd): poll for metrics in TestWorkspaceProvisionerdServerMetrics (#22644)
## Problem

`TestWorkspaceProvisionerdServerMetrics` flakes because metric
assertions run immediately after
`AwaitWorkspaceBuildJobCompleted` returns, but metrics are updated
**asynchronously after the
DB transaction commits** in `completeWorkspaceBuildJob`.

The timeline in the provisioner server:
1. DB transaction commits (`provisionerdserver.go:~2362`) — job marked
completed
2. Audit logging, notifications, DB queries (`~2370-2427`)
3. **Metric `.Observe()`** (`~2463`) — happens ~100 lines later

The test synchronization (`AwaitWorkspaceBuildJobCompleted`) polls for
`CompletedAt != nil`,
which fires at step 1. The metric assertion then executes before step 3,
causing the flake.

## Fix

Wrap all three metric assertions (prebuild creation, prebuild claim,
regular workspace
creation) in `require.Eventually` to poll until the metric appears, then
assert on the value.

## Test

- `go test -run TestWorkspaceProvisionerdServerMetrics -count=5` — all
pass
- `go test -race -run TestWorkspaceProvisionerdServerMetrics -count=1` —
clean
2026-03-04 22:30:36 -05:00
Sushant P 37a8e61ea2 chore: move Shared Workspaces from experiments to beta (#22206)
* Removed the shared-workspaces experiment and cleaned up related
middleware
* Added beta tagging to the UI for shared workspaces
2026-02-23 08:30:32 -08:00
Jake Howell 051ed34580 feat: convert soft_limit to limit (#22048)
In relation to
[`internal#1281`](https://github.com/coder/internal/issues/1281)

Remove the `soft_limit` field from the `Feature` type and simplify
license limit handling. This change:

- Removes the `soft_limit` field from the API and SDK
- Uses the soft limit value as the single `limit` value in the UI and
API
- Simplifies warning logic to only show warnings when the limit is
exceeded
- Updates tests to reflect the new behavior
- Updates the UI to use the single limit value for display
2026-02-20 16:09:12 +11:00
Danielle Maywood 92a6d6c2c0 chore: remove unnecessary loop variable captures (#22180)
Since Go 1.22, the loop variable capture issue is resolved. Variables
declared by for loops are now per-iteration rather than per-loop, making
the 'v := v' pattern unnecessary.
2026-02-19 09:02:19 +00:00
Callum Styan 5f3be6b288 feat: add provisioner job queue wait time histogram and jobs enqueued counter (#21869)
This PR adds some metrics to help identify job enqueue rates and
latencies. This work was initiated as a way to help reduce the cost of
the observation/measurement itself for autostart scaletests, which
impacts our ability to identify/reason about the load caused by
autostart. See: https://github.com/coder/internal/issues/1209

I've extended the metrics here to account for regular user initiated
builds, prebuilds, autostarts, etc. IMO there is still the question here
of whether we want to include or need the `transition` label, which is
only present on workspace builds. Including it does lead to an increase
in cardinality, and in the case of the histogram (when not using native
histograms) that's at least a few extra series for every bucket. We
could remove the transition label there but keep it on the counter.

Additionally, the histogram is currently observing latencies for other
jobs, such as template builds/version imports, those do not have a
transition type associated with them.

Tested briefly in a workspace, can see metric values like the following:
-
`coderd_workspace_builds_enqueued_total{build_reason="autostart",provisioner_type="terraform",status="success",transition="start"}
1`
-
`coderd_provisioner_job_queue_wait_seconds_bucket{build_reason="autostart",job_type="workspace_build",provisioner_type="terraform",transition="start",le="0.025"}
1`

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-12 13:40:47 -08:00
Jon Ayers 3c1db17361 fix: use existing transaction to claim prebuild (#21862)
- Claiming a prebuild was happening outside a transaction
2026-02-02 17:57:59 -06:00
Ethan a464ab67c6 test: use explicit names in TestStartAutoUpdate to prevent flake (#21745)
The test was creating two template versions without explicit names,
relying on `namesgenerator.NameDigitWith()` which can produce
collisions. When both versions got the same random name, the test failed
with a 409 Conflict error.

Fix by giving each version an explicit name (`v1`, `v2`).

Closes https://github.com/coder/internal/issues/1309

---

*Generated by [mux](https://mux.coder.com)*
2026-01-30 13:24:06 +11:00
Steven Masley 799b190dee fix: do not enforce managed agent limit for non-task workspaces (#21689)
Only task workspaces have the checks in wsbuilder for violating the
managed agent caps in the license.

Stopped tasks that are resumed with a regular workspace start **still
count as usage**.
2026-01-27 19:01:17 -06:00
George K d29a168785 fix(coderd/rbac): reinstate deployment-wide workspace.share permission for owner role (#21620)
The removal of that permission from the role broke valid use cases (e.g.
a site owner user creating a workspace owned by a system account and
then trying to share it with another user).

The bulk of the PR is made up of the rollbacks of the previously
introduced test updates necessitated by the removal.

Related to: https://github.com/coder/internal/issues/1285
2026-01-22 08:12:15 -08:00
Mathias Fredriksson 97e8a5b093 fix(coderd): allow agent auth during workspace shutdown (#21538)
Agents were losing authentication during workspace shutdown, causing
shutdown scripts to fail. The auth query required agents to belong to
the latest build, but during shutdown a `stop` build becomes latest while
the `start` build's agents are still running.

Modified the auth query to allow `start` build agents to authenticate
temporarily during `stop` execution. The query allows auth when:

- Agent's `start` build job succeeded
- Latest build is `stop` with `pending`/`running` job status
- Builds are adjacent (`stop` is `build_number + 1`)
- Template versions match

Auth closes once `stop` completes.

Renamed `GetWorkspaceAgentAndLatestBuildByAuthToken` to
`GetAuthenticatedWorkspaceAgentAndBuildByAuthToken` since it returns the
agent's build (not always latest) during shutdown.

Closes coder/internal#1249
Fixes #19467
2026-01-21 13:18:43 +00:00
Susana Ferreira 6ef9670384 fix: limit concurrent database connections in prebuild reconciliation (#20908)
## Description

This PR addresses database connection pool exhaustion during prebuilds
reconciliation by introducing two changes:
* `CanSkipReconciliation`: Filters out presets that don't need
reconciliation before spawning goroutines. This ensures we only create
goroutines for presets that will (_most likely_) perform database
operations, avoiding unnecessary connection pool usage.
* Dynamic `eg.SetLimit`: Limits concurrent goroutines based on the
configured database connection pool size (`CODER_PG_CONN_MAX_OPEN / 2`).
This replaces the previous hardcoded limit of 5, ensuring the
reconciliation loop scales appropriately with the configured pool size
while leaving capacity for other database operations.

## Changes

* Add `CanSkipReconciliation()` method to `PresetSnapshot` that returns
true for inactive presets with no running workspaces, no pending jobs,
or expired prebuilds.
* Add `maxDBConnections` parameter to `NewStoreReconciler` and compute
`reconciliationConcurrency` as half the pool size (minimum 1).
* Add `ReconciliationConcurrency()` getter method to `StoreReconciler`.
* Add `eg.SetLimit(c.reconciliationConcurrency)` to bound concurrent
reconciliation goroutines.
* Add `PresetsTotal` and `PresetsReconciled` to `ReconcileStats` for
observability.
* Add `TestCanSkipReconciliation` unit tests.
* Add `TestReconciliationConcurrency` unit tests.
* Add benchmark tests for reconciliation performance.

## Benchmarks

* `BenchmarkReconcileAll_NoOps`: Tests presets with no reconciliation
actions. All presets are filtered by `CanSkipReconciliation`, resulting
in no goroutines spawned and no database connections used.
* `BenchmarkReconcileAll_ConnectionContention`: Tests presets where all
require reconciliation actions. All presets spawn goroutines, but
concurrency is limited by `eg.SetLimit(reconciliationConcurrency)`.
* `BenchmarkReconcileAll_Mix`: Simulates a realistic scenario with a
large subset of inactive presets (filtered by `CanSkipReconciliation`)
and a smaller subset requiring reconciliation (limited by
`eg.SetLimit`).

Closes: https://github.com/coder/coder/issues/20606
2026-01-21 10:56:31 +00:00
George K cc2efe9e1f feat(coderd/rbac): make organization-member a per-org system custom role (#21359)
Migrated the built-in organization-member role to DB storage so it can be customized per org.

Closes https://github.com/coder/internal/issues/1073 (part 1)
2026-01-12 18:19:19 -08:00
Spike Curtis bddb808b25 chore: arrange imports in a standard way (#21452)
Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:

```
import (
	"context"
	"time"

	"github.com/prometheus/client_golang/prometheus"
	"golang.org/x/xerrors"
	"gopkg.in/natefinch/lumberjack.v2"

	"cdr.dev/slog/v3"
	"github.com/coder/coder/v2/codersdk/agentsdk"
	"github.com/coder/serpent"
)
```

3 groups: standard library, 3rd partly libs, Coder libs.

This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
2026-01-08 15:24:11 +04:00
Spike Curtis 49b34a716a fix: fix slog to always use array of Fields (#21426)
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).

It also updates dependencies that also use slog and were updated.

I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.

Other dependencies, I pushed new tags.
2026-01-08 10:29:41 +04:00
Sas Swart 9a0024c45f chore: add tracing to prebuilds (#21443)
The implementation for prebuilt workspaces is complex and conversations
regarding edge cases and bugs frequently get bogged down by minutiae,
because it's hard to reason about the behaviour of the system.

To alleviate this, I've introduced otel tracing to the StoreReconciler
(see attached). We can now directly observe the behaviour of the
prebuilds system under load in order to inform our decisions.

Traces are terminated at the boundary between prebuilds and workspace
builder, because of prebuilt workspaces' "fire and forget" philosophy
and to prevent span explosion.

<img width="3024" height="1718" alt="image"
src="https://github.com/user-attachments/assets/f9b207be-8f2c-475e-98a8-46ef70bda446"
/>
2026-01-07 11:04:40 +02:00
Steven Masley 35f1c44455 test: fix path seperator on windows for unit test (#21382)
Test TestWorkspaceTemplateParamsChange writes a file to disk

Closes https://github.com/coder/internal/issues/1213
2025-12-23 15:13:16 +00:00
Steven Masley 61d7d2983f fix: remove state information from apply (#21373)
Delete builds were not deleting resources as the tf state being sent in the apply request was empty.
State removed from apply request and read from the session instead.
2025-12-22 16:18:53 +00:00
Steven Masley 3194bcfc9e chore: distinct operations for provisioner's 'parse', 'init', 'plan', 'apply', 'graph' (#21064)
Provisioner steps broken into smaller granular actions.
Changes:
- `ExtractArchive` moved to `init` request (was in `configure`)
- Writing `tfstate` moved to `plan` (was in `configure`)
- Moved most plan/apply outputs to `GraphComplete`
2025-12-15 11:26:41 -06:00
George K 103967ed02 feat: add sharing info to /workspaces endpoint (#21049)
closes: https://github.com/coder/internal/issues/858

Similar to https://github.com/coder/coder/pull/19375, this one uses
system permissions for fetching actual user and group data.

Modifies the `workspaces_expanded` view to fetch the required data; this way it's made available to all code paths that make use of it.  

Also fixes a bug in a test helper function that can result in `null` being saved to the DB for `user_acl` or `group_acl` and break tests; a defensive check constraint that prevents this is worth a PR, e.g:

`ALTER TABLE workspaces
   ADD CONSTRAINT group_acl_is_object CHECK (jsonb_typeof(group_acl) = 'object');`

Also adds missing  `OwnerName` in `ConvertWorkspaceRows`.
2025-12-15 08:42:08 -08:00
Danielle Maywood c12303f0b2 fix: allow agents to be created on dormant workspaces (#20909)
Closes https://github.com/coder/coder/issues/20711

We now allow agents to be created on dormant workspaces.

I've ran the test with and without the change. I've confirmed that -
without the fix - it triggers the "rbac: unauthorized" error.
2025-11-25 06:24:33 +00:00
blinkagent[bot] 48b8e22502 fix: add Windows stub for CacheTFProviders (#20840)
Fixes https://github.com/coder/internal/issues/1119

## Description

The `CacheTFProviders` function in `testutil/terraform_cache.go` was
only available on Linux and macOS due to the `//go:build linux ||
darwin` build tag. This caused a compile error on Windows when
`enterprise/coderd/workspaces_test.go` tried to call it:

```
enterprise\coderd\workspaces_test.go:3403:28: undefined: testutil.CacheTFProviders
```

## Changes

1. Added `testutil/terraform_cache_windows.go` with a Windows-specific
stub implementation that returns an empty string
2. Updated `downloadProviders` helper in
`enterprise/coderd/workspaces_test.go` to handle empty paths gracefully

## Behavior

- On Linux/macOS: Terraform providers are cached as before
- On Windows: Provider caching is skipped, tests download providers
normally during `terraform init`

## Testing

This should fix the Windows nightly gauntlet failure. The test will
still run on Windows, just without provider caching optimization.

Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>
2025-11-20 07:52:07 +00:00
Kacper Sawicki f543a87b78 chore: cache terraform providers for workspaces terraform tests (#20603)
Fixes flaky `TestWorkspaceTagsTerraform` and
`TestWorkspaceTemplateParamsChange` tests that were failing with
`connection reset by peer` errors when downloading the coder/coder
provider.

This applies the same caching solution which was done in
https://github.com/coder/coder/pull/17373

1. Extracts provider caching logic into `testutil/terraform_cache.go`
2. Updates TestProvision to use the shared caching helpers
3. Updates enterprise workspace tests to use the shared caching helpers

The cache is persisted at `~/.cache/coderv2-test/` and automatically
cached between CI runs via existing GitHub Actions cache setup.

Closes https://github.com/coder/internal/issues/607
2025-11-12 08:43:22 +00:00
Hugo Dutka e62c5db678 chore: remove references to dbtestutil.WillUsePostgres (#20436)
Addresses https://github.com/coder/internal/issues/758.

This PR only cleans up dead code, it makes no changes to test logic.
2025-10-23 14:24:54 +02:00
Brett Kolodny 38ca98745b feat: add shared_with_group: and shared_with_user: filters to /workspaces endpoint (#19875)
Adds shared_with_user and shared_with_group filters to the /workspaces
endpoint.

- `shared_with_user`: filters workspaces shared with a specific user.
Accepts a user UUID or username.
- `shared_with_group`: filters workspaces shared with a specific group.
Accepts:
  - a group UUID, or
  - `<organization name>/<group name>`, or
  - `<group name>` (resolved in the default organization).


Closes
[coder/internal#1004](https://github.com/coder/internal/issues/1004)
2025-09-19 16:05:27 -04:00
Brett Kolodny e6b04d1918 feat: add shared filter to workspaces query (#19807)
Adds a `shared:<boolean>` search query to the `/workspaces [get]`
endpoint


https://github.com/user-attachments/assets/ccf84bd9-c1fd-4085-825b-2e3176a2d488

Closes
[coder/internal#972](https://github.com/coder/internal/issues/972)
2025-09-16 12:37:39 -04:00
Ethan 995b330250 test: avoid sharing deployment values between subtests (#19833)
Blink didn't figure out a CI failure on main was caused by a data race; fixing it.


I've also updated the [blink prompt](https://gist.githubusercontent.com/ethanndickson/8dea9f1db3957ac1baf30ae8ce6f1a42/raw/060aea7fabb82bef0029a17dad9a5daee7940760/blink-flake-instructions.md).

https://github.com/coder/coder/actions/runs/17737809615
2025-09-16 13:51:26 +10:00
Brett Kolodny 854f3c0187 feat: add workspaces/acl [delete] endpoint (#19772)
Closes
[coder/internal#971](https://github.com/coder/internal/issues/971)
2025-09-12 12:21:01 -04:00
Susana Ferreira 353f5dedc1 fix(coderd): fix logic for reporting prebuilt workspace duration metric (#19641)
## Description

When creating a prebuilt workspace, both `flags.IsPrebuild` and
`flags.IsFirstBuild` are true. Previously, the logic rejected cases with
multiple flags, so `coderd_workspace_creation_duration_seconds` wasn’t
updated for prebuilt creations. This is the only valid scenario where
two flags can be true.

## Changes

* Fix logic to update `coderd_workspace_creation_duration_seconds`
metric for prebuilt workspaces.
* Add prebuild helper functions to coderdenttest (other prebuild tests
can reuse this).
* Update workspace's provisionerdmetric tests to include this metric.

Follow-up: https://github.com/coder/coder/pull/19503
Related to: https://github.com/coder/coder/issues/19528
2025-08-29 15:48:48 +01:00
Callum Styan 321c2b8fce fix: fix flake in TestExecutorAutostartSkipsWhenNoProvisionersAvailable (#19478)
The flake here had two causes:
1. related to usage of time.Now() in MustWaitForProvisionersAvailable
and
2. the fact that UpdateProvisionerLastSeenAt can not use a time that is
further in the past than the current LastSeenAt time

Previously the test here was calling
`coderdtest.MustWaitForProvisionersAvailable` which was using `time.Now`
rather than the next tick time like the real `hasProvisionersAvailable`
function does. Additionally, when using `UpdateProvisionerLastSeenAt`
the underlying db query enforces that the time we're trying to set
`LastSeenAt` to cannot be older than the current value.

I was able to reliably reproduce the flake by executing both the
`UpdateProvisionerLastSeenAt` call and `tickCh <- next` in their own
goroutines, the former with a small sleep to reliably ensure we'd
trigger the autobuild before we set the `LastSeenAt` time. That's when I
also noticed that `coderdtest.MustWaitForProvisionersAvailable` was
using `time.Now` instead of the tick time. When I updated that function
to take in a tick time + added a 2nd call to
`UpdateProvisionerLastSeenAt` to set an original non-stale time, we
could then never get the test to pass because the later call to set the
stale time would not actually modify `LastSeenAt`. On top of that,
calling the provisioner daemons closer in the middle of the function
doesn't really do anything of value in this test.

**The fix for the flake is to keep the go routines, ensuring there would
be a flake if there was not a relevant fix, but to include the fix which
is to ensure that we explicitly wait for the provisioner to be stale
before passing the time to `tickCh`.**

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-08-28 12:07:50 -07:00
Susana Ferreira 0ab345ca84 feat: add prebuild timing metrics to Prometheus (#19503)
## Description

This PR introduces one counter and two histograms related to workspace
creation and claiming. The goal is to provide clearer observability into
how workspaces are created (regular vs prebuild) and the time cost of
those operations.

### `coderd_workspace_creation_total`

* Metric type: Counter
* Name: `coderd_workspace_creation_total`
* Labels: `organization_name`, `template_name`, `preset_name`

This counter tracks whether a regular workspace (not created from a
prebuild pool) was created using a preset or not.
Currently, we already expose `coderd_prebuilt_workspaces_claimed_total`
for claimed prebuilt workspaces, but we lack a comparable metric for
regular workspace creations. This metric fills that gap, making it
possible to compare regular creations against claims.

Implementation notes:
* Exposed as a `coderd_` metric, consistent with other workspace-related
metrics (e.g. `coderd_api_workspace_latest_build`:
https://github.com/coder/coder/blob/main/coderd/prometheusmetrics/prometheusmetrics.go#L149).
* Every `defaultRefreshRate` (1 minute ), DB query
`GetRegularWorkspaceCreateMetrics` is executed to fetch all regular
workspaces (not created from a prebuild pool).
* The counter is updated with the total from all time (not just since
metric introduction). This differs from the histograms below, which only
accumulate from their introduction forward.

### `coderd_workspace_creation_duration_seconds` &
`coderd_prebuilt_workspace_claim_duration_seconds`

* Metric types: Histogram
* Names:
  * `coderd_workspace_creation_duration_seconds`
* Labels: `organization_name`, `template_name`, `preset_name`, `type`
(`regular`, `prebuild`)
  * `coderd_prebuilt_workspace_claim_duration_seconds`
    * Labels: `organization_name`, `template_name`, `preset_name`

We already have `coderd_provisionerd_workspace_build_timings_seconds`,
which tracks build run times for all workspace builds handled by the
provisioner daemon.
However, in the context of this issue, we are only interested in
creation and claim build times, not all transitions; additionally, this
metric does not include `preset_name`, and adding it there would
significantly increase cardinality. Therefore, separate more focused
metrics are introduced here:
* `coderd_workspace_creation_duration_seconds`: Build time to create a
workspace (either a regular workspace or the build into a prebuild pool,
for prebuild initial provisioning build).
* `coderd_prebuilt_workspace_claim_duration_seconds`: Time to claim a
prebuilt workspace from the pool.

The reason for two separate histograms is that:
* Creation (regular or prebuild): provisioning builds with similar time
magnitude, generally expected to take longer than a claim operation.
* Claim: expected to be a much faster provisioning build.

#### Native histogram usage

Provisioning times vary widely between projects. Using static buckets
risks unbalanced or poorly informative histograms.
To address this, these metrics use [Prometheus native
histograms](https://prometheus.io/docs/specs/native_histograms/):
* First introduced in Prometheus v2.40.0
* Recommended stable usage from v2.45+
* Requires Go client `prometheus/client_golang` v1.15.0+
* Experimental and must be explicitly enabled on the server
(`--enable-feature=native-histograms`)

For compatibility, we also retain a classic bucket definition (aligned
with the existing provisioner metric:
https://github.com/coder/coder/blob/main/provisionerd/provisionerd.go#L182-L189).
* If native histograms are enabled, Prometheus ingests the
high-resolution histogram.
* If not, it falls back to the predefined buckets.

Implementation notes:
* Unlike the counter, these histograms are updated in real-time at
workspace build job completion.
* They reflect data only from the point of introduction forward (no
historical backfill).

## Relates to 

Closes: https://github.com/coder/coder/issues/19528
Native histograms tested in observability stack:
https://github.com/coder/observability/pull/50
2025-08-28 15:00:26 +01:00
ケイラ d7ee1019c0 feat: add endpoint for retrieving workspace acl (#19375)
Implements `/acl [get]` for workspaces, with tests.
Blocked by experiment enablement
2025-08-25 07:11:18 -05:00
Dean Sheather 6eb02d1c2a chore: wire up usage tracking for managed agents (#19096)
Wires up the usage collector and publisher to coderd.

Relates to coder/internal#814
2025-08-20 23:38:09 +10:00
Susana Ferreira 560cf84251 fix: prevent activity bump for prebuilt workspaces (#19263)
## Description

This PR ensures that activity-based deadline extensions ("activity
bumping") are not applied to prebuilt workspaces. Prebuilds are managed
by the reconciliation loop and must not have `deadline` or
`max_deadline` values set or extended, as they are not part of the
regular lifecycle executor path.

## Changes

- Update `ActivityBumpWorkspace` SQL query to discard prebuilt
workspaces
- Update application layer to avoid calling activity bump logic on
prebuilt workspaces

Related with: 
* Issue: https://github.com/coder/coder/issues/18898
* PR: https://github.com/coder/coder/pull/19252
2025-08-20 12:19:14 +01:00
Callum Styan f2ee89c36a fix: fix more TestWorkspaceAutobuild flakes (#19417)
made these commits yesterday but apparently I forgot to push so they got
missed in https://github.com/coder/coder/pull/19398

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-08-19 10:24:40 -07:00
Callum Styan 9e5c83ae0d fix: fix flakes in TestWorkspaceAutobuild due to incorrect tick time (#19398)
we missed these in the previous PR, we find `tickTime2`
and pass it to the `tickCh`, but we were incorrectly passing `tickTime`
to `UpdateProvisionerLastSeenAt` in some places

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-08-19 09:24:40 -07:00
Callum Styan 6c902a7410 fix: don't create autostart workspace builds with no available provisioners (#19067)
This should fix https://github.com/coder/coder/issues/17941 by introducing a check for whether there are any valid (non-stale provisioners for a job in the autobuild executor code path.

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-08-15 08:50:51 -07:00
Susana Ferreira 734299de71 fix: disallow lifecycle endpoints for prebuilt workspaces (#19264)
## Description

This PR updates the API to prevent lifecycle configuration endpoints
from being used on prebuilt workspaces. Since prebuilds are managed by
the reconciliation loop and do not participate in the regular workspace
lifecycle, they must not support per-workspace overrides for fields like
deadline, TTL, autostart, or dormancy.

Attempting to use these endpoints on a prebuilt workspace will now
return a clear validation error (`409 Conflict`) with an appropriate
explanation. This prevents accidental misconfiguration and preserves the
lifecycle separation between prebuilds and regular workspaces.

## Changes

The following endpoints now return an error if the target workspace is a
prebuild:

* `PUT /workspaces/{workspace}/extend`
* `PUT /workspaces/{workspace}/ttl`
* `PUT /workspaces/{workspace}/autostart`
* `PUT /workspaces/{workspace}/dormant`

Update endpoints logic to use the API clock in order to allow time
mocking in tests.

Related with: 
* Issue: https://github.com/coder/coder/issues/18898
* PR: https://github.com/coder/coder/pull/19252
2025-08-14 11:30:19 +01:00
Susana Ferreira 8567ecbe52 fix: set prebuilds lifecycle parameters on creation and claim (#19252)
## Description

This PR ensures that prebuilt workspaces are properly excluded from the
lifecycle executor and treated as a separate class of workspaces, fully
managed by the prebuild reconciliation loop.

It introduces two lifecycle guarantees:
* When a prebuilt workspace is created (i.e., when the workspace build
completes), all lifecycle-related fields are unset, ensuring the
workspace does not participate in TTL, autostop, autostart, dormancy, or
auto-deletion logic.
* When a prebuilt workspace is claimed, it transitions into a regular
user workspace. At this point, all lifecycle fields are correctly
populated according to template-level configurations, allowing the
workspace to be managed by the lifecycle executor as expected.

## Changes

* Prebuilt workspaces now have all lifecycle-relevant fields unset
during creation
* When a prebuild is claimed:
* Lifecycle fields are set based on template and workspace level
configurations. This ensures a clean transition into the standard
workspace lifecycle flow.
* Updated lifecycle-related SQL update queries to explicitly exclude
prebuilt workspaces.

## Relates 

Related issue: https://github.com/coder/coder/issues/18898

To reduce the scope of this PR and make the review process more
manageable, the original implementation has been split into the
following focused PRs:
* https://github.com/coder/coder/pull/19259
* https://github.com/coder/coder/pull/19263
* https://github.com/coder/coder/pull/19264
* https://github.com/coder/coder/pull/19265

These PRs should be considered in conjunction with this one to
understand the complete set of lifecycle separation changes for prebuilt
workspaces.
2025-08-13 12:45:46 +01:00
ケイラ 7bb52e1f8a test: add tests for updating workspace acl (#19240) 2025-08-07 17:09:46 -06:00
Dean Sheather 9a6dd73f68 feat: add managed agent license limit checks (#18937)
- Adds a query for counting managed agent workspace builds between two
timestamps
- The "Actual" field in the feature entitlement for managed agents is
now populated with the value read from the database
- The wsbuilder package now validates AI agent usage against the limit
when a license is installed

Closes coder/internal#777
2025-07-22 13:39:26 +10:00
Steven Masley aedc019b4e feat: include template variables in dynamic parameter rendering (#18819)
Closes https://github.com/coder/coder/issues/18671

Template variables now loaded into dynamic parameters.
2025-07-21 13:02:31 -05:00
Steven Masley 1319ae293f chore: support zip filetypes in the file cache (#18750) 2025-07-08 15:46:39 -06:00
Susana Ferreira 211393a69c fix: exclude prebuilt workspaces from lifecycle executor (#18762)
## Description

This PR updates the lifecycle executor to explicitly exclude prebuilt
workspaces from being considered for lifecycle operations such as
`autostart`, `autostop`, `dormancy`, `default TTL` and `failure TTL`.

Prebuilt workspaces (i.e., those owned by the prebuild system user) are
handled separately by the prebuild reconciliation loop. Including them
in the lifecycle executor could lead to unintended behavior such as
incorrect scheduling or state transitions.

## Changes

* Updated the lifecycle executor query
`GetWorkspacesEligibleForTransition` to exclude workspaces with
`owner_id = 'c42fdf75-3097-471c-8c33-fb52454d81c0'` (prebuilds).
* Added tests to verify prebuilt workspaces are not considered in:
  * Autostop
  * Autostart
  * Default TTL
  * Dormancy
  * Failure TTL

Fixes: https://github.com/coder/coder/issues/18740
Related to: https://github.com/coder/coder/issues/18658
2025-07-08 11:35:28 +01:00
Sas Swart c6e0ba12d3 feat: graduate prebuilds to general availability (#18607)
This PR removes the prebuilds experiment and allows the use of prebuilds
without opting into an experiment.
2025-06-26 15:54:52 +02:00
Steven Masley 341b54e604 fix: allow dynamic parameters without requiring org membership (#18531) 2025-06-24 10:33:10 -05:00