Commit Graph

21 Commits

Author SHA1 Message Date
Marcin Tojek 65ef6df1df docs: add documentation for preset invalidation (#21018)
Fixes #17917
2025-12-03 11:43:49 +01:00
Susana Ferreira 14e80022c9 fix(docs): fix 'prebuilds' system user typo (#20356)
## Description

Fix typo on documentation regarding system user `prebuilds`.
2025-10-17 10:34:55 +01:00
Atif Ali ef51e7d07a chore(docs): update numbered lists to be consistent (#20350) 2025-10-16 20:11:18 +00:00
Susana Ferreira 104aa19014 chore(docs): improve prebuild provsioners section (#20321)
## Description

Follow-up from: https://github.com/coder/coder/pull/20305 to include a
note about `coder_workspace_tags` being cumulative and a new step to
validate the status of the prebuild provisioners.
Fix steps formatting.
2025-10-16 11:22:48 +01:00
Susana Ferreira 09e2daf282 chore(docs): add external provisioner configuration for prebuilds (#20305)
## Description

Update the Prebuilds troubleshooting page to include a new section,
“Preventing prebuild queue contention (recommended)”, outlining a
best-practice configuration to prevent prebuild jobs from overwhelming
the provisioner queue.

This setup introduces a dedicated prebuild provisioner pool and has been
successfully tested internally in dogfood:
https://github.com/coder/dogfood/pull/201

Closes: https://github.com/coder/coder/issues/20241
2025-10-15 15:34:21 +01:00
Sas Swart 06db58771f docs: add troubleshooting steps for prebuilt workspaces (#20231)
This PR adds troubleshooting steps to guide Coder operators when they
suspect that prebuilds might have overwhelmed their deployments.

Closes https://github.com/coder/coder/issues/19490

---------

Co-authored-by: Susana Ferreira <susana@coder.com>
2025-10-14 13:20:43 +02:00
Susana Ferreira 0ab345ca84 feat: add prebuild timing metrics to Prometheus (#19503)
## Description

This PR introduces one counter and two histograms related to workspace
creation and claiming. The goal is to provide clearer observability into
how workspaces are created (regular vs prebuild) and the time cost of
those operations.

### `coderd_workspace_creation_total`

* Metric type: Counter
* Name: `coderd_workspace_creation_total`
* Labels: `organization_name`, `template_name`, `preset_name`

This counter tracks whether a regular workspace (not created from a
prebuild pool) was created using a preset or not.
Currently, we already expose `coderd_prebuilt_workspaces_claimed_total`
for claimed prebuilt workspaces, but we lack a comparable metric for
regular workspace creations. This metric fills that gap, making it
possible to compare regular creations against claims.

Implementation notes:
* Exposed as a `coderd_` metric, consistent with other workspace-related
metrics (e.g. `coderd_api_workspace_latest_build`:
https://github.com/coder/coder/blob/main/coderd/prometheusmetrics/prometheusmetrics.go#L149).
* Every `defaultRefreshRate` (1 minute ), DB query
`GetRegularWorkspaceCreateMetrics` is executed to fetch all regular
workspaces (not created from a prebuild pool).
* The counter is updated with the total from all time (not just since
metric introduction). This differs from the histograms below, which only
accumulate from their introduction forward.

### `coderd_workspace_creation_duration_seconds` &
`coderd_prebuilt_workspace_claim_duration_seconds`

* Metric types: Histogram
* Names:
  * `coderd_workspace_creation_duration_seconds`
* Labels: `organization_name`, `template_name`, `preset_name`, `type`
(`regular`, `prebuild`)
  * `coderd_prebuilt_workspace_claim_duration_seconds`
    * Labels: `organization_name`, `template_name`, `preset_name`

We already have `coderd_provisionerd_workspace_build_timings_seconds`,
which tracks build run times for all workspace builds handled by the
provisioner daemon.
However, in the context of this issue, we are only interested in
creation and claim build times, not all transitions; additionally, this
metric does not include `preset_name`, and adding it there would
significantly increase cardinality. Therefore, separate more focused
metrics are introduced here:
* `coderd_workspace_creation_duration_seconds`: Build time to create a
workspace (either a regular workspace or the build into a prebuild pool,
for prebuild initial provisioning build).
* `coderd_prebuilt_workspace_claim_duration_seconds`: Time to claim a
prebuilt workspace from the pool.

The reason for two separate histograms is that:
* Creation (regular or prebuild): provisioning builds with similar time
magnitude, generally expected to take longer than a claim operation.
* Claim: expected to be a much faster provisioning build.

#### Native histogram usage

Provisioning times vary widely between projects. Using static buckets
risks unbalanced or poorly informative histograms.
To address this, these metrics use [Prometheus native
histograms](https://prometheus.io/docs/specs/native_histograms/):
* First introduced in Prometheus v2.40.0
* Recommended stable usage from v2.45+
* Requires Go client `prometheus/client_golang` v1.15.0+
* Experimental and must be explicitly enabled on the server
(`--enable-feature=native-histograms`)

For compatibility, we also retain a classic bucket definition (aligned
with the existing provisioner metric:
https://github.com/coder/coder/blob/main/provisionerd/provisionerd.go#L182-L189).
* If native histograms are enabled, Prometheus ingests the
high-resolution histogram.
* If not, it falls back to the predefined buckets.

Implementation notes:
* Unlike the counter, these histograms are updated in real-time at
workspace build job completion.
* They reflect data only from the point of introduction forward (no
historical backfill).

## Relates to 

Closes: https://github.com/coder/coder/issues/19528
Native histograms tested in observability stack:
https://github.com/coder/observability/pull/50
2025-08-28 15:00:26 +01:00
Sas Swart 4e9ee80882 feat(enterprise/coderd): allow system users to be added to groups (#19518)
closes https://github.com/coder/coder/issues/18274

This pull request makes system users visible in various group related
queries so that they can be added to and removed from groups. This
allows system user quotas to be configured. System users are still
ignored in certain queries, such as when license seat consumption is
determined.

This pull request further ensures the existence of a
"coder_prebuilt_workspaces" group in any organization that needs
prebuilt workspaces

---------

Co-authored-by: Susana Ferreira <susana@coder.com>
2025-08-27 16:57:59 +02:00
Edward Angert 49f32d14eb docs: add dev containers and scheduling to prebuilt workspaces known issues (#18816)
closes #18806 

- [x] scheduling limitation
- [x] dev containers limitation
- [x] edit intro


[preview](https://coder.com/docs/@18806-prebuilds-known-limits/admin/templates/extending-templates/prebuilt-workspaces)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Documentation**
* Clarified the introduction and administrator responsibilities for
prebuilt workspaces.
* Integrated compatibility information about DevContainers and workspace
scheduling more contextually.
* Added explicit notes on limitations with dev containers integration
and workspace autostart/autostop features.
* Improved configuration examples and clarified scheduling instructions.
* Enhanced explanations of scheduling behavior and lifecycle steps for
better understanding.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: EdwardAngert <17991901+EdwardAngert@users.noreply.github.com>
Co-authored-by: Sas Swart <sas.swart.cdk@gmail.com>
Co-authored-by: Susana Ferreira <susana@coder.com>
2025-08-22 15:54:28 +02:00
Sas Swart f9a6adc704 feat: claim prebuilds based on workspace parameters instead of preset id (#19279)
Closes https://github.com/coder/coder/issues/18356.

This change finds and selects a matching preset if one was not chosen
during workspace creation. This solidifies the relationship between
presets and parameters.

When a workspace is created without in explicitly chosen preset, it will
now still be eligible to claim a prebuilt workspace if one is available.
2025-08-20 11:02:53 +02:00
Cian Johnston afb54f6884 chore: revert feat(enterprise/coderd): allow system users to be added to groups (#19254)
This reverts commit b200fc8e67
(https://github.com/coder/coder/pull/18341).
2025-08-08 12:18:07 +01:00
Sas Swart b200fc8e67 feat(enterprise/coderd): allow system users to be added to groups (#18341)
closes https://github.com/coder/coder/issues/18274

This pull request makes system users visible in various group related
queries so that they can be added to and removed from groups. This
allows system user quotas to be configured. System users are still
ignored in certain queries, such as when license seat consumption is
determined.

This pull request further ensures the existence of a
"coder_prebuilt_workspaces" group in any organization that needs
prebuilt workspaces

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
  * Organization and group member listings now include system users.
* **Bug Fixes**
* Updated tests to reflect the inclusion of system users in member and
group queries.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-08-08 11:03:17 +02:00
Susana Ferreira 211393a69c fix: exclude prebuilt workspaces from lifecycle executor (#18762)
## Description

This PR updates the lifecycle executor to explicitly exclude prebuilt
workspaces from being considered for lifecycle operations such as
`autostart`, `autostop`, `dormancy`, `default TTL` and `failure TTL`.

Prebuilt workspaces (i.e., those owned by the prebuild system user) are
handled separately by the prebuild reconciliation loop. Including them
in the lifecycle executor could lead to unintended behavior such as
incorrect scheduling or state transitions.

## Changes

* Updated the lifecycle executor query
`GetWorkspacesEligibleForTransition` to exclude workspaces with
`owner_id = 'c42fdf75-3097-471c-8c33-fb52454d81c0'` (prebuilds).
* Added tests to verify prebuilt workspaces are not considered in:
  * Autostop
  * Autostart
  * Default TTL
  * Dormancy
  * Failure TTL

Fixes: https://github.com/coder/coder/issues/18740
Related to: https://github.com/coder/coder/issues/18658
2025-07-08 11:35:28 +01:00
Susana Ferreira 57a6d59d8d docs: add warning about prebuilds incompatibility with certain features (#18689)
## Description

This PR adds a warning to the prebuilds documentation about
incompatibility with Workspace schedule (autostart/autostop), dormancy,
and DevContainers. These configurations can interfere with prebuild
behavior and should be avoided for now.

Preview:
![Screenshot 2025-07-01 at 12 58
02](https://github.com/user-attachments/assets/e1a837de-b66c-4414-bd0b-471474b43b84)
2025-07-01 13:59:07 +01:00
Sas Swart c6e0ba12d3 feat: graduate prebuilds to general availability (#18607)
This PR removes the prebuilds experiment and allows the use of prebuilds
without opting into an experiment.
2025-06-26 15:54:52 +02:00
Yevhenii Shcherbina 8e3022ed9e docs: add documentation for prebuild scheduling feature (#18462)
Follow-up to https://github.com/coder/coder/pull/18126

Changes:
- address issue mentioned here:
https://github.com/coder/coder/pull/18126#discussion_r2144557600
- add docs for prebuilds scheduling

---------

Co-authored-by: Danny Kopping <danny@coder.com>
Co-authored-by: Atif Ali <atif@coder.com>
2025-06-20 10:08:47 -04:00
Susana Ferreira 6f6e73af03 feat: implement expiration policy logic for prebuilds (#17996)
## Summary 

This PR introduces support for expiration policies in prebuilds. The TTL
(time-to-live) is retrieved from the Terraform configuration
([terraform-provider-coder
PR](https://github.com/coder/terraform-provider-coder/pull/404)):
```
prebuilds = {
	  instances = 2
	  expiration_policy {
		  ttl = 86400
	  }
  }
```
**Note**: Since there is no need for precise TTL enforcement down to the
second, in this implementation expired prebuilds are handled in a single
reconciliation cycle: they are deleted, and new instances are created
only if needed to match the desired count.

## Changes

* The outcome of a reconciliation cycle is now expressed as a slice of
reconciliation actions, instead of a single aggregated action.
* Adjusted reconciliation logic to delete expired prebuilds and
guarantee that the number of desired instances is correct.
* Updated relevant data structures and methods to support expiration
policies parameters.
* Added documentation to `Prebuilt workspaces` page
* Update `terraform-provider-coder` to version 2.5.0:
https://github.com/coder/terraform-provider-coder/releases/tag/v2.5.0

Depends on: https://github.com/coder/terraform-provider-coder/pull/404
Fixes: https://github.com/coder/coder/issues/17916
2025-05-26 20:31:24 +01:00
Danny Kopping d2d21898f2 chore: reduce ignore_changes suggestion scope (#17947)
We probably shouldn't be suggesting `ignore_changes = all`. Only the
attributes which cause drift in prebuilds should be ignored; everything
else can behave as normal.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Edward Angert <EdwardAngert@users.noreply.github.com>
2025-05-20 22:16:23 +02:00
Danny Kopping 8914f7a95b chore: improve prebuilds docs (#17850)
These items came up in an internal "bug bash" session yesterday.

@EdwardAngert note: I've reverted to the "transparent" phrasing; the
current docs confused a couple folks yesterday, and I feel that
"transparent" is clearly understood in this context.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Edward Angert <EdwardAngert@users.noreply.github.com>
2025-05-16 19:25:09 +02:00
Danny Kopping 6e967780c9 feat: track resource replacements when claiming a prebuilt workspace (#17571)
Closes https://github.com/coder/internal/issues/369

We can't know whether a replacement (i.e. drift of terraform state
leading to a resource needing to be deleted/recreated) will take place
apriori; we can only detect it at `plan` time, because the provider
decides whether a resource must be replaced and it cannot be inferred
through static analysis of the template.

**This is likely to be the most common gotcha with using prebuilds,
since it requires a slight template modification to use prebuilds
effectively**, so let's head this off before it's an issue for
customers.

Drift details will now be logged in the workspace build logs:


![image](https://github.com/user-attachments/assets/da1988b6-2cbe-4a79-a3c5-ea29891f3d6f)

Plus a notification will be sent to template admins when this situation
arises:


![image](https://github.com/user-attachments/assets/39d555b1-a262-4a3e-b529-03b9f23bf66a)

A new metric - `coderd_prebuilt_workspaces_resource_replacements_total`
- will also increment each time a workspace encounters replacements.

We only track _that_ a resource replacement occurred, not how many. Just
one is enough to ruin a prebuild, but we can't know apriori which
replacement would cause this.
For example, say we have 2 replacements: a `docker_container` and a
`null_resource`; we don't know which one might
cause an issue (or indeed if either would), so we just track the
replacement.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
2025-05-14 14:52:22 +02:00
Danny Kopping 58adc629fa chore: add prebuild docs (#17580)
Partially addresses https://github.com/coder/internal/issues/593
2025-05-09 07:26:35 +00:00