coder

mirror of https://github.com/coder/coder.git synced 2026-06-03 21:18:24 +00:00

Author	SHA1	Message	Date
Steven Masley	19573e8aee	feat!: patchTemplateMeta to use optional fields (#24984 ) Closes https://github.com/coder/coder/issues/13112 Breaking Change: Removed status code `StatusNotModified` when no diffs occur in a patch. Now the patch is always applied and a template is always returned.	2026-05-11 12:43:52 -05:00
Cian Johnston	d7439a9de0	feat: add Prometheus metrics for chatd subsystem (#24371 ) Adds 7 Prometheus metrics to the chatd subsystem and introduces typed `ActivityBumpReason` for deadline bump attribution. \| Metric \| Type \| Labels \| \|--------\|------\|--------\| \| `coderd_chatd_chats` \| Gauge \| `state` (streaming, waiting) \| \| `coderd_chatd_message_count` \| Histogram \| `provider` \| \| `coderd_chatd_prompt_size_bytes` \| Histogram \| `provider` \| \| `coderd_chatd_tool_result_size_bytes` \| Histogram \| `provider`, `tool_name` \| \| `coderd_chatd_ttft_seconds` \| Histogram \| `provider` \| \| `coderd_chatd_compaction_total` \| Counter \| `provider`, `result` \| \| `coderd_chatd_steps_total` \| Counter \| `provider` \| > 🤖	2026-04-15 19:53:10 +01:00
Steven Masley	84de391f26	chore: add tallyman events for ai seat tracking (#22689 ) AI seat tracking inserted as heartbeat into usage table.	2026-03-18 09:30:22 -05:00
George K	e5c19d0af4	feat: backend support for creating and storing service accounts (#22698 ) Add is_service_account column to users table with CHECK constraints enforcing login_type='none' and empty email for service accounts. Update user creation API to validate service account constraints. Related to: https://linear.app/codercom/issue/PLAT-27/feat-backend-support-for-creating-and-storing-service-accounts	2026-03-11 10:19:08 -07:00
Cian Johnston	81468323e0	fix(coderd): use dbtime.Now() instead of time.Now() in test assertions against DB timestamps (#22685 ) `time.Now()` has nanosecond precision while Postgres timestamps are microsecond precision. When tests compare `time.Now()` against DB-sourced timestamps using `Before`/`After`/`WithinRange`/etc., there is a non-zero flake risk from the precision mismatch. This replaces `time.Now()` with `dbtime.Now()` (which rounds to microsecond precision) in all test assertions that compare against database timestamps. Follows from #22684. ## Changes (11 files) \| File \| Changes \| \|---\|---\| \| `coderd/apikey_test.go` \| 11 comparisons with `ExpiresAt` \| \| `coderd/users_test.go` \| 2 comparisons with `ExpiresAt` \| \| `coderd/oauth2_test.go` \| 1 comparison with `token.Expiry` \| \| `coderd/workspaces_test.go` \| 2 comparisons with `DormantAt` \| \| `coderd/workspaceagents_test.go` \| 3 comparisons with `ConnectedAt`/`DisconnectedAt` \| \| `coderd/workspaceapps/db_test.go` \| 1 comparison with `token.Expiry` \| \| `coderd/provisionerdserver/provisionerdserver_test.go` \| 1 comparison with `key.ExpiresAt` \| \| `enterprise/coderd/workspaces_test.go` \| 1 comparison with `DormantAt` \| \| `enterprise/coderd/license/license_test.go` \| 3 `NotBefore` values \| \| `enterprise/coderd/licenses_test.go` \| 2 `NotBefore` values \| \| `enterprise/coderd/users_test.go` \| 3 `Next()` comparisons \| ## Not changed (intentionally) - `scaletest/placebo/run_test.go` — compares wall-clock elapsed time, not DB timestamps - `cli/server_test.go`, `coderd/jwtutils/jwt_test.go`, `enterprise/aibridgeproxyd/aibridgeproxyd_test.go` — TLS cert fields, not DB-stored - `coderd/azureidentity/azureidentity_test.go` — Azure cert expiry, not DB 🤖 Generated by Claude Opus 4.6 but reviewed manually.	2026-03-06 09:14:11 +00:00
Kyle Carberry	219d02bdc3	fix(coderd): poll for metrics in TestWorkspaceProvisionerdServerMetrics (#22644 ) ## Problem `TestWorkspaceProvisionerdServerMetrics` flakes because metric assertions run immediately after `AwaitWorkspaceBuildJobCompleted` returns, but metrics are updated asynchronously after the DB transaction commits in `completeWorkspaceBuildJob`. The timeline in the provisioner server: 1. DB transaction commits (`provisionerdserver.go:~2362`) — job marked completed 2. Audit logging, notifications, DB queries (`~2370-2427`) 3. Metric `.Observe()` (`~2463`) — happens ~100 lines later The test synchronization (`AwaitWorkspaceBuildJobCompleted`) polls for `CompletedAt != nil`, which fires at step 1. The metric assertion then executes before step 3, causing the flake. ## Fix Wrap all three metric assertions (prebuild creation, prebuild claim, regular workspace creation) in `require.Eventually` to poll until the metric appears, then assert on the value. ## Test - `go test -run TestWorkspaceProvisionerdServerMetrics -count=5` — all pass - `go test -race -run TestWorkspaceProvisionerdServerMetrics -count=1` — clean	2026-03-04 22:30:36 -05:00
Sushant P	37a8e61ea2	chore: move Shared Workspaces from experiments to beta (#22206 ) * Removed the shared-workspaces experiment and cleaned up related middleware * Added beta tagging to the UI for shared workspaces	2026-02-23 08:30:32 -08:00
Jake Howell	051ed34580	feat: convert `soft_limit` to `limit` (#22048 ) In relation to [`internal#1281`](https://github.com/coder/internal/issues/1281) Remove the `soft_limit` field from the `Feature` type and simplify license limit handling. This change: - Removes the `soft_limit` field from the API and SDK - Uses the soft limit value as the single `limit` value in the UI and API - Simplifies warning logic to only show warnings when the limit is exceeded - Updates tests to reflect the new behavior - Updates the UI to use the single limit value for display	2026-02-20 16:09:12 +11:00
Danielle Maywood	92a6d6c2c0	chore: remove unnecessary loop variable captures (#22180 ) Since Go 1.22, the loop variable capture issue is resolved. Variables declared by for loops are now per-iteration rather than per-loop, making the 'v := v' pattern unnecessary.	2026-02-19 09:02:19 +00:00
Callum Styan	5f3be6b288	feat: add provisioner job queue wait time histogram and jobs enqueued counter (#21869 ) This PR adds some metrics to help identify job enqueue rates and latencies. This work was initiated as a way to help reduce the cost of the observation/measurement itself for autostart scaletests, which impacts our ability to identify/reason about the load caused by autostart. See: https://github.com/coder/internal/issues/1209 I've extended the metrics here to account for regular user initiated builds, prebuilds, autostarts, etc. IMO there is still the question here of whether we want to include or need the `transition` label, which is only present on workspace builds. Including it does lead to an increase in cardinality, and in the case of the histogram (when not using native histograms) that's at least a few extra series for every bucket. We could remove the transition label there but keep it on the counter. Additionally, the histogram is currently observing latencies for other jobs, such as template builds/version imports, those do not have a transition type associated with them. Tested briefly in a workspace, can see metric values like the following: - `coderd_workspace_builds_enqueued_total{build_reason="autostart",provisioner_type="terraform",status="success",transition="start"} 1` - `coderd_provisioner_job_queue_wait_seconds_bucket{build_reason="autostart",job_type="workspace_build",provisioner_type="terraform",transition="start",le="0.025"} 1` --------- Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-12 13:40:47 -08:00
Jon Ayers	3c1db17361	fix: use existing transaction to claim prebuild (#21862 ) - Claiming a prebuild was happening outside a transaction	2026-02-02 17:57:59 -06:00
Ethan	a464ab67c6	test: use explicit names in TestStartAutoUpdate to prevent flake (#21745 ) The test was creating two template versions without explicit names, relying on `namesgenerator.NameDigitWith()` which can produce collisions. When both versions got the same random name, the test failed with a 409 Conflict error. Fix by giving each version an explicit name (`v1`, `v2`). Closes https://github.com/coder/internal/issues/1309 --- Generated by [mux](https://mux.coder.com)	2026-01-30 13:24:06 +11:00
Steven Masley	799b190dee	fix: do not enforce managed agent limit for non-task workspaces (#21689 ) Only task workspaces have the checks in wsbuilder for violating the managed agent caps in the license. Stopped tasks that are resumed with a regular workspace start still count as usage.	2026-01-27 19:01:17 -06:00
George K	d29a168785	fix(coderd/rbac): reinstate deployment-wide workspace.share permission for owner role (#21620 ) The removal of that permission from the role broke valid use cases (e.g. a site owner user creating a workspace owned by a system account and then trying to share it with another user). The bulk of the PR is made up of the rollbacks of the previously introduced test updates necessitated by the removal. Related to: https://github.com/coder/internal/issues/1285	2026-01-22 08:12:15 -08:00
Mathias Fredriksson	97e8a5b093	fix(coderd): allow agent auth during workspace shutdown (#21538 ) Agents were losing authentication during workspace shutdown, causing shutdown scripts to fail. The auth query required agents to belong to the latest build, but during shutdown a `stop` build becomes latest while the `start` build's agents are still running. Modified the auth query to allow `start` build agents to authenticate temporarily during `stop` execution. The query allows auth when: - Agent's `start` build job succeeded - Latest build is `stop` with `pending`/`running` job status - Builds are adjacent (`stop` is `build_number + 1`) - Template versions match Auth closes once `stop` completes. Renamed `GetWorkspaceAgentAndLatestBuildByAuthToken` to `GetAuthenticatedWorkspaceAgentAndBuildByAuthToken` since it returns the agent's build (not always latest) during shutdown. Closes coder/internal#1249 Fixes #19467	2026-01-21 13:18:43 +00:00
Susana Ferreira	6ef9670384	fix: limit concurrent database connections in prebuild reconciliation (#20908 ) ## Description This PR addresses database connection pool exhaustion during prebuilds reconciliation by introducing two changes: * `CanSkipReconciliation`: Filters out presets that don't need reconciliation before spawning goroutines. This ensures we only create goroutines for presets that will (_most likely_) perform database operations, avoiding unnecessary connection pool usage. * Dynamic `eg.SetLimit`: Limits concurrent goroutines based on the configured database connection pool size (`CODER_PG_CONN_MAX_OPEN / 2`). This replaces the previous hardcoded limit of 5, ensuring the reconciliation loop scales appropriately with the configured pool size while leaving capacity for other database operations. ## Changes * Add `CanSkipReconciliation()` method to `PresetSnapshot` that returns true for inactive presets with no running workspaces, no pending jobs, or expired prebuilds. * Add `maxDBConnections` parameter to `NewStoreReconciler` and compute `reconciliationConcurrency` as half the pool size (minimum 1). * Add `ReconciliationConcurrency()` getter method to `StoreReconciler`. * Add `eg.SetLimit(c.reconciliationConcurrency)` to bound concurrent reconciliation goroutines. * Add `PresetsTotal` and `PresetsReconciled` to `ReconcileStats` for observability. * Add `TestCanSkipReconciliation` unit tests. * Add `TestReconciliationConcurrency` unit tests. * Add benchmark tests for reconciliation performance. ## Benchmarks * `BenchmarkReconcileAll_NoOps`: Tests presets with no reconciliation actions. All presets are filtered by `CanSkipReconciliation`, resulting in no goroutines spawned and no database connections used. * `BenchmarkReconcileAll_ConnectionContention`: Tests presets where all require reconciliation actions. All presets spawn goroutines, but concurrency is limited by `eg.SetLimit(reconciliationConcurrency)`. * `BenchmarkReconcileAll_Mix`: Simulates a realistic scenario with a large subset of inactive presets (filtered by `CanSkipReconciliation`) and a smaller subset requiring reconciliation (limited by `eg.SetLimit`). Closes: https://github.com/coder/coder/issues/20606	2026-01-21 10:56:31 +00:00
George K	cc2efe9e1f	feat(coderd/rbac): make organization-member a per-org system custom role (#21359 ) Migrated the built-in organization-member role to DB storage so it can be customized per org. Closes https://github.com/coder/internal/issues/1073 (part 1)	2026-01-12 18:19:19 -08:00
Spike Curtis	bddb808b25	chore: arrange imports in a standard way (#21452 ) Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example: ``` import ( "context" "time" "github.com/prometheus/client_golang/prometheus" "golang.org/x/xerrors" "gopkg.in/natefinch/lumberjack.v2" "cdr.dev/slog/v3" "github.com/coder/coder/v2/codersdk/agentsdk" "github.com/coder/serpent" ) ``` 3 groups: standard library, 3rd partly libs, Coder libs. This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.	2026-01-08 15:24:11 +04:00
Spike Curtis	49b34a716a	fix: fix slog to always use array of Fields (#21426 ) Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder). It also updates dependencies that also use slog and were updated. I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule. Other dependencies, I pushed new tags.	2026-01-08 10:29:41 +04:00
Sas Swart	9a0024c45f	chore: add tracing to prebuilds (#21443 ) The implementation for prebuilt workspaces is complex and conversations regarding edge cases and bugs frequently get bogged down by minutiae, because it's hard to reason about the behaviour of the system. To alleviate this, I've introduced otel tracing to the StoreReconciler (see attached). We can now directly observe the behaviour of the prebuilds system under load in order to inform our decisions. Traces are terminated at the boundary between prebuilds and workspace builder, because of prebuilt workspaces' "fire and forget" philosophy and to prevent span explosion. <img width="3024" height="1718" alt="image" src="https://github.com/user-attachments/assets/f9b207be-8f2c-475e-98a8-46ef70bda446" />	2026-01-07 11:04:40 +02:00
Steven Masley	35f1c44455	test: fix path seperator on windows for unit test (#21382 ) Test TestWorkspaceTemplateParamsChange writes a file to disk Closes https://github.com/coder/internal/issues/1213	2025-12-23 15:13:16 +00:00
Steven Masley	61d7d2983f	fix: remove state information from apply (#21373 ) Delete builds were not deleting resources as the tf state being sent in the apply request was empty. State removed from apply request and read from the session instead.	2025-12-22 16:18:53 +00:00
Steven Masley	3194bcfc9e	chore: distinct operations for provisioner's 'parse', 'init', 'plan', 'apply', 'graph' (#21064 ) Provisioner steps broken into smaller granular actions. Changes: - `ExtractArchive` moved to `init` request (was in `configure`) - Writing `tfstate` moved to `plan` (was in `configure`) - Moved most plan/apply outputs to `GraphComplete`	2025-12-15 11:26:41 -06:00
George K	103967ed02	feat: add sharing info to /workspaces endpoint (#21049 ) closes: https://github.com/coder/internal/issues/858 Similar to https://github.com/coder/coder/pull/19375, this one uses system permissions for fetching actual user and group data. Modifies the `workspaces_expanded` view to fetch the required data; this way it's made available to all code paths that make use of it. Also fixes a bug in a test helper function that can result in `null` being saved to the DB for `user_acl` or `group_acl` and break tests; a defensive check constraint that prevents this is worth a PR, e.g: `ALTER TABLE workspaces ADD CONSTRAINT group_acl_is_object CHECK (jsonb_typeof(group_acl) = 'object');` Also adds missing `OwnerName` in `ConvertWorkspaceRows`.	2025-12-15 08:42:08 -08:00
Danielle Maywood	c12303f0b2	fix: allow agents to be created on dormant workspaces (#20909 ) Closes https://github.com/coder/coder/issues/20711 We now allow agents to be created on dormant workspaces. I've ran the test with and without the change. I've confirmed that - without the fix - it triggers the "rbac: unauthorized" error.	2025-11-25 06:24:33 +00:00
blinkagent[bot]	48b8e22502	fix: add Windows stub for CacheTFProviders (#20840 ) Fixes https://github.com/coder/internal/issues/1119 ## Description The `CacheTFProviders` function in `testutil/terraform_cache.go` was only available on Linux and macOS due to the `//go:build linux \|\| darwin` build tag. This caused a compile error on Windows when `enterprise/coderd/workspaces_test.go` tried to call it: ``` enterprise\coderd\workspaces_test.go:3403:28: undefined: testutil.CacheTFProviders ``` ## Changes 1. Added `testutil/terraform_cache_windows.go` with a Windows-specific stub implementation that returns an empty string 2. Updated `downloadProviders` helper in `enterprise/coderd/workspaces_test.go` to handle empty paths gracefully ## Behavior - On Linux/macOS: Terraform providers are cached as before - On Windows: Provider caching is skipped, tests download providers normally during `terraform init` ## Testing This should fix the Windows nightly gauntlet failure. The test will still run on Windows, just without provider caching optimization. Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>	2025-11-20 07:52:07 +00:00
Kacper Sawicki	f543a87b78	chore: cache terraform providers for workspaces terraform tests (#20603 ) Fixes flaky `TestWorkspaceTagsTerraform` and `TestWorkspaceTemplateParamsChange` tests that were failing with `connection reset by peer` errors when downloading the coder/coder provider. This applies the same caching solution which was done in https://github.com/coder/coder/pull/17373 1. Extracts provider caching logic into `testutil/terraform_cache.go` 2. Updates TestProvision to use the shared caching helpers 3. Updates enterprise workspace tests to use the shared caching helpers The cache is persisted at `~/.cache/coderv2-test/` and automatically cached between CI runs via existing GitHub Actions cache setup. Closes https://github.com/coder/internal/issues/607	2025-11-12 08:43:22 +00:00
Hugo Dutka	e62c5db678	chore: remove references to dbtestutil.WillUsePostgres (#20436 ) Addresses https://github.com/coder/internal/issues/758. This PR only cleans up dead code, it makes no changes to test logic.	2025-10-23 14:24:54 +02:00
Brett Kolodny	38ca98745b	feat: add shared_with_group: and shared_with_user: filters to /workspaces endpoint (#19875 ) Adds shared_with_user and shared_with_group filters to the /workspaces endpoint. - `shared_with_user`: filters workspaces shared with a specific user. Accepts a user UUID or username. - `shared_with_group`: filters workspaces shared with a specific group. Accepts: - a group UUID, or - `<organization name>/<group name>`, or - `<group name>` (resolved in the default organization). Closes [coder/internal#1004](https://github.com/coder/internal/issues/1004)	2025-09-19 16:05:27 -04:00
Brett Kolodny	e6b04d1918	feat: add shared filter to workspaces query (#19807 ) Adds a `shared:<boolean>` search query to the `/workspaces [get]` endpoint https://github.com/user-attachments/assets/ccf84bd9-c1fd-4085-825b-2e3176a2d488 Closes [coder/internal#972](https://github.com/coder/internal/issues/972)	2025-09-16 12:37:39 -04:00
Ethan	995b330250	test: avoid sharing deployment values between subtests (#19833 ) Blink didn't figure out a CI failure on main was caused by a data race; fixing it. I've also updated the [blink prompt](https://gist.githubusercontent.com/ethanndickson/8dea9f1db3957ac1baf30ae8ce6f1a42/raw/060aea7fabb82bef0029a17dad9a5daee7940760/blink-flake-instructions.md). https://github.com/coder/coder/actions/runs/17737809615	2025-09-16 13:51:26 +10:00
Brett Kolodny	854f3c0187	feat: add workspaces/acl [delete] endpoint (#19772 ) Closes [coder/internal#971](https://github.com/coder/internal/issues/971)	2025-09-12 12:21:01 -04:00
Susana Ferreira	353f5dedc1	fix(coderd): fix logic for reporting prebuilt workspace duration metric (#19641 ) ## Description When creating a prebuilt workspace, both `flags.IsPrebuild` and `flags.IsFirstBuild` are true. Previously, the logic rejected cases with multiple flags, so `coderd_workspace_creation_duration_seconds` wasn’t updated for prebuilt creations. This is the only valid scenario where two flags can be true. ## Changes * Fix logic to update `coderd_workspace_creation_duration_seconds` metric for prebuilt workspaces. * Add prebuild helper functions to coderdenttest (other prebuild tests can reuse this). * Update workspace's provisionerdmetric tests to include this metric. Follow-up: https://github.com/coder/coder/pull/19503 Related to: https://github.com/coder/coder/issues/19528	2025-08-29 15:48:48 +01:00
Callum Styan	321c2b8fce	fix: fix flake in TestExecutorAutostartSkipsWhenNoProvisionersAvailable (#19478 ) The flake here had two causes: 1. related to usage of time.Now() in MustWaitForProvisionersAvailable and 2. the fact that UpdateProvisionerLastSeenAt can not use a time that is further in the past than the current LastSeenAt time Previously the test here was calling `coderdtest.MustWaitForProvisionersAvailable` which was using `time.Now` rather than the next tick time like the real `hasProvisionersAvailable` function does. Additionally, when using `UpdateProvisionerLastSeenAt` the underlying db query enforces that the time we're trying to set `LastSeenAt` to cannot be older than the current value. I was able to reliably reproduce the flake by executing both the `UpdateProvisionerLastSeenAt` call and `tickCh <- next` in their own goroutines, the former with a small sleep to reliably ensure we'd trigger the autobuild before we set the `LastSeenAt` time. That's when I also noticed that `coderdtest.MustWaitForProvisionersAvailable` was using `time.Now` instead of the tick time. When I updated that function to take in a tick time + added a 2nd call to `UpdateProvisionerLastSeenAt` to set an original non-stale time, we could then never get the test to pass because the later call to set the stale time would not actually modify `LastSeenAt`. On top of that, calling the provisioner daemons closer in the middle of the function doesn't really do anything of value in this test. The fix for the flake is to keep the go routines, ensuring there would be a flake if there was not a relevant fix, but to include the fix which is to ensure that we explicitly wait for the provisioner to be stale before passing the time to `tickCh`. --------- Signed-off-by: Callum Styan <callumstyan@gmail.com>	2025-08-28 12:07:50 -07:00
Susana Ferreira	0ab345ca84	feat: add prebuild timing metrics to Prometheus (#19503 ) ## Description This PR introduces one counter and two histograms related to workspace creation and claiming. The goal is to provide clearer observability into how workspaces are created (regular vs prebuild) and the time cost of those operations. ### `coderd_workspace_creation_total` * Metric type: Counter * Name: `coderd_workspace_creation_total` * Labels: `organization_name`, `template_name`, `preset_name` This counter tracks whether a regular workspace (not created from a prebuild pool) was created using a preset or not. Currently, we already expose `coderd_prebuilt_workspaces_claimed_total` for claimed prebuilt workspaces, but we lack a comparable metric for regular workspace creations. This metric fills that gap, making it possible to compare regular creations against claims. Implementation notes: * Exposed as a `coderd_` metric, consistent with other workspace-related metrics (e.g. `coderd_api_workspace_latest_build`: https://github.com/coder/coder/blob/main/coderd/prometheusmetrics/prometheusmetrics.go#L149). * Every `defaultRefreshRate` (1 minute ), DB query `GetRegularWorkspaceCreateMetrics` is executed to fetch all regular workspaces (not created from a prebuild pool). * The counter is updated with the total from all time (not just since metric introduction). This differs from the histograms below, which only accumulate from their introduction forward. ### `coderd_workspace_creation_duration_seconds` & `coderd_prebuilt_workspace_claim_duration_seconds` * Metric types: Histogram * Names: * `coderd_workspace_creation_duration_seconds` * Labels: `organization_name`, `template_name`, `preset_name`, `type` (`regular`, `prebuild`) * `coderd_prebuilt_workspace_claim_duration_seconds` * Labels: `organization_name`, `template_name`, `preset_name` We already have `coderd_provisionerd_workspace_build_timings_seconds`, which tracks build run times for all workspace builds handled by the provisioner daemon. However, in the context of this issue, we are only interested in creation and claim build times, not all transitions; additionally, this metric does not include `preset_name`, and adding it there would significantly increase cardinality. Therefore, separate more focused metrics are introduced here: * `coderd_workspace_creation_duration_seconds`: Build time to create a workspace (either a regular workspace or the build into a prebuild pool, for prebuild initial provisioning build). * `coderd_prebuilt_workspace_claim_duration_seconds`: Time to claim a prebuilt workspace from the pool. The reason for two separate histograms is that: * Creation (regular or prebuild): provisioning builds with similar time magnitude, generally expected to take longer than a claim operation. * Claim: expected to be a much faster provisioning build. #### Native histogram usage Provisioning times vary widely between projects. Using static buckets risks unbalanced or poorly informative histograms. To address this, these metrics use [Prometheus native histograms](https://prometheus.io/docs/specs/native_histograms/): * First introduced in Prometheus v2.40.0 * Recommended stable usage from v2.45+ * Requires Go client `prometheus/client_golang` v1.15.0+ * Experimental and must be explicitly enabled on the server (`--enable-feature=native-histograms`) For compatibility, we also retain a classic bucket definition (aligned with the existing provisioner metric: https://github.com/coder/coder/blob/main/provisionerd/provisionerd.go#L182-L189). * If native histograms are enabled, Prometheus ingests the high-resolution histogram. * If not, it falls back to the predefined buckets. Implementation notes: * Unlike the counter, these histograms are updated in real-time at workspace build job completion. * They reflect data only from the point of introduction forward (no historical backfill). ## Relates to Closes: https://github.com/coder/coder/issues/19528 Native histograms tested in observability stack: https://github.com/coder/observability/pull/50	2025-08-28 15:00:26 +01:00
ケイラ	d7ee1019c0	feat: add endpoint for retrieving workspace acl (#19375 ) Implements `/acl [get]` for workspaces, with tests. Blocked by experiment enablement	2025-08-25 07:11:18 -05:00
Dean Sheather	6eb02d1c2a	chore: wire up usage tracking for managed agents (#19096 ) Wires up the usage collector and publisher to coderd. Relates to coder/internal#814	2025-08-20 23:38:09 +10:00
Susana Ferreira	560cf84251	fix: prevent activity bump for prebuilt workspaces (#19263 ) ## Description This PR ensures that activity-based deadline extensions ("activity bumping") are not applied to prebuilt workspaces. Prebuilds are managed by the reconciliation loop and must not have `deadline` or `max_deadline` values set or extended, as they are not part of the regular lifecycle executor path. ## Changes - Update `ActivityBumpWorkspace` SQL query to discard prebuilt workspaces - Update application layer to avoid calling activity bump logic on prebuilt workspaces Related with: * Issue: https://github.com/coder/coder/issues/18898 * PR: https://github.com/coder/coder/pull/19252	2025-08-20 12:19:14 +01:00
Callum Styan	f2ee89c36a	fix: fix more `TestWorkspaceAutobuild` flakes (#19417 ) made these commits yesterday but apparently I forgot to push so they got missed in https://github.com/coder/coder/pull/19398 --------- Signed-off-by: Callum Styan <callumstyan@gmail.com>	2025-08-19 10:24:40 -07:00
Callum Styan	9e5c83ae0d	fix: fix flakes in TestWorkspaceAutobuild due to incorrect tick time (#19398 ) we missed these in the previous PR, we find `tickTime2` and pass it to the `tickCh`, but we were incorrectly passing `tickTime` to `UpdateProvisionerLastSeenAt` in some places Signed-off-by: Callum Styan <callumstyan@gmail.com>	2025-08-19 09:24:40 -07:00
Callum Styan	6c902a7410	fix: don't create autostart workspace builds with no available provisioners (#19067 ) This should fix https://github.com/coder/coder/issues/17941 by introducing a check for whether there are any valid (non-stale provisioners for a job in the autobuild executor code path. --------- Signed-off-by: Callum Styan <callumstyan@gmail.com>	2025-08-15 08:50:51 -07:00
Susana Ferreira	734299de71	fix: disallow lifecycle endpoints for prebuilt workspaces (#19264 ) ## Description This PR updates the API to prevent lifecycle configuration endpoints from being used on prebuilt workspaces. Since prebuilds are managed by the reconciliation loop and do not participate in the regular workspace lifecycle, they must not support per-workspace overrides for fields like deadline, TTL, autostart, or dormancy. Attempting to use these endpoints on a prebuilt workspace will now return a clear validation error (`409 Conflict`) with an appropriate explanation. This prevents accidental misconfiguration and preserves the lifecycle separation between prebuilds and regular workspaces. ## Changes The following endpoints now return an error if the target workspace is a prebuild: * `PUT /workspaces/{workspace}/extend` * `PUT /workspaces/{workspace}/ttl` * `PUT /workspaces/{workspace}/autostart` * `PUT /workspaces/{workspace}/dormant` Update endpoints logic to use the API clock in order to allow time mocking in tests. Related with: * Issue: https://github.com/coder/coder/issues/18898 * PR: https://github.com/coder/coder/pull/19252	2025-08-14 11:30:19 +01:00
Susana Ferreira	8567ecbe52	fix: set prebuilds lifecycle parameters on creation and claim (#19252 ) ## Description This PR ensures that prebuilt workspaces are properly excluded from the lifecycle executor and treated as a separate class of workspaces, fully managed by the prebuild reconciliation loop. It introduces two lifecycle guarantees: * When a prebuilt workspace is created (i.e., when the workspace build completes), all lifecycle-related fields are unset, ensuring the workspace does not participate in TTL, autostop, autostart, dormancy, or auto-deletion logic. * When a prebuilt workspace is claimed, it transitions into a regular user workspace. At this point, all lifecycle fields are correctly populated according to template-level configurations, allowing the workspace to be managed by the lifecycle executor as expected. ## Changes * Prebuilt workspaces now have all lifecycle-relevant fields unset during creation * When a prebuild is claimed: * Lifecycle fields are set based on template and workspace level configurations. This ensures a clean transition into the standard workspace lifecycle flow. * Updated lifecycle-related SQL update queries to explicitly exclude prebuilt workspaces. ## Relates Related issue: https://github.com/coder/coder/issues/18898 To reduce the scope of this PR and make the review process more manageable, the original implementation has been split into the following focused PRs: * https://github.com/coder/coder/pull/19259 * https://github.com/coder/coder/pull/19263 * https://github.com/coder/coder/pull/19264 * https://github.com/coder/coder/pull/19265 These PRs should be considered in conjunction with this one to understand the complete set of lifecycle separation changes for prebuilt workspaces.	2025-08-13 12:45:46 +01:00
ケイラ	7bb52e1f8a	test: add tests for updating workspace acl (#19240 )	2025-08-07 17:09:46 -06:00
Dean Sheather	9a6dd73f68	feat: add managed agent license limit checks (#18937 ) - Adds a query for counting managed agent workspace builds between two timestamps - The "Actual" field in the feature entitlement for managed agents is now populated with the value read from the database - The wsbuilder package now validates AI agent usage against the limit when a license is installed Closes coder/internal#777	2025-07-22 13:39:26 +10:00
Steven Masley	aedc019b4e	feat: include template variables in dynamic parameter rendering (#18819 ) Closes https://github.com/coder/coder/issues/18671 Template variables now loaded into dynamic parameters.	2025-07-21 13:02:31 -05:00
Steven Masley	1319ae293f	chore: support zip filetypes in the file cache (#18750 )	2025-07-08 15:46:39 -06:00
Susana Ferreira	211393a69c	fix: exclude prebuilt workspaces from lifecycle executor (#18762 ) ## Description This PR updates the lifecycle executor to explicitly exclude prebuilt workspaces from being considered for lifecycle operations such as `autostart`, `autostop`, `dormancy`, `default TTL` and `failure TTL`. Prebuilt workspaces (i.e., those owned by the prebuild system user) are handled separately by the prebuild reconciliation loop. Including them in the lifecycle executor could lead to unintended behavior such as incorrect scheduling or state transitions. ## Changes * Updated the lifecycle executor query `GetWorkspacesEligibleForTransition` to exclude workspaces with `owner_id = 'c42fdf75-3097-471c-8c33-fb52454d81c0'` (prebuilds). * Added tests to verify prebuilt workspaces are not considered in: * Autostop * Autostart * Default TTL * Dormancy * Failure TTL Fixes: https://github.com/coder/coder/issues/18740 Related to: https://github.com/coder/coder/issues/18658	2025-07-08 11:35:28 +01:00
Sas Swart	c6e0ba12d3	feat: graduate prebuilds to general availability (#18607 ) This PR removes the prebuilds experiment and allows the use of prebuilds without opting into an experiment.	2025-06-26 15:54:52 +02:00
Steven Masley	341b54e604	fix: allow dynamic parameters without requiring org membership (#18531 )	2025-06-24 10:33:10 -05:00

1 2 3

113 Commits