54 Commits

Author SHA1 Message Date
Mathias Fredriksson 98d5e7948d fix(coderd/autobuild): handle concurrent build number race in lifecycle executor (#25824)
The lifecycle executor did not handle unique-violation errors from
InsertWorkspaceBuild. When a concurrent actor (API handler, another
lifecycle executor, or prebuilds reconciler) inserts a workspace build
with the same build number, PostgreSQL returns a unique constraint
violation on workspace_builds_workspace_id_build_number_key. The
lifecycle executor treated this as a hard error, logging it and storing
it in stats.Errors.

The per-workspace advisory lock (pg_try_advisory_xact_lock) prevents
two lifecycle executors from racing, but does not protect against
races with the CreateWorkspaceBuild API handler or the prebuilds
reconciler, which use different (or no) locking.

Catch the specific unique-violation error after InTx returns (where
the transaction is already rolled back) and clear it. The concurrent
actor's build takes effect; the lifecycle executor treats the
workspace as a no-op for this tick.

Closes coder/internal#455
Closes PLAT-290
2026-05-29 17:12:31 +03:00
Zach ddc0e99c69 chore: remove coder_secret Terraform integration (#25512)
Removes the coder_secret Terraform integration: the data.coder_secret
consumption path through provisionerdserver → provisioner.proto →
provisioner/terraform, the dynamic-parameter secret-requirement
validation, and the workspace-update / resolve-autostart surfaces that
depended on it. This is being done due to a product/feature direction
change (see PLAT-243). User-secret CRUD (DB, REST, CLI, UI, telemetry, audit)
and the agent-manifest secret-injection path are untouched.

The provisionerd API is bumped from v1.17 to v1.18 rather than rolled
back: v1.17 shipped in v2.33.x, so user_secrets field numbers are
reserved and the changelog documents both versions.

Generated with assistance from Coder Agents.
2026-05-21 09:19:29 -06:00
dylanhuff-at-coder fb84e72319 feat: add secret requirement contract to dynamic parameters (#24785)
Adds structured `secret_requirements` to dynamic parameter responses and
enforces missing required secrets during workspace start.

Stop, delete, and tag rendering paths skip secret requirement
enforcement so unmet secrets do not prevent cleanup. The SDK, generated
API docs/types, and backend render/resolver/wsbuilder tests are updated
for the new contract.
2026-04-29 16:38:26 -07:00
Danielle Maywood 974ca3eda6 fix: use "idle timeout" as task auto-pause reason (#22287) 2026-02-24 16:45:56 +00:00
Danielle Maywood 31c1279202 feat: notify on task auto pause, manual pause and manual resume (#22050) 2026-02-18 16:30:16 +00:00
Cian Johnston f8eea54e97 fix(coderd): use BuildReasonTaskAutoPause for task workspaces (#22126)
Relates to https://github.com/coder/internal/issues/1252

When a workspace with a TaskID hits its deadline, use
BuildReasonTaskAutoPause instead of BuildReasonAutostop. This allows
downstream systems to distinguish between regular autostop and task
workspace pauses.

Created by Mux using Opus 4.5.
2026-02-17 15:11:04 +00:00
Callum Styan 5f3be6b288 feat: add provisioner job queue wait time histogram and jobs enqueued counter (#21869)
This PR adds some metrics to help identify job enqueue rates and
latencies. This work was initiated as a way to help reduce the cost of
the observation/measurement itself for autostart scaletests, which
impacts our ability to identify/reason about the load caused by
autostart. See: https://github.com/coder/internal/issues/1209

I've extended the metrics here to account for regular user initiated
builds, prebuilds, autostarts, etc. IMO there is still the question here
of whether we want to include or need the `transition` label, which is
only present on workspace builds. Including it does lead to an increase
in cardinality, and in the case of the histogram (when not using native
histograms) that's at least a few extra series for every bucket. We
could remove the transition label there but keep it on the counter.

Additionally, the histogram is currently observing latencies for other
jobs, such as template builds/version imports, those do not have a
transition type associated with them.

Tested briefly in a workspace, can see metric values like the following:
-
`coderd_workspace_builds_enqueued_total{build_reason="autostart",provisioner_type="terraform",status="success",transition="start"}
1`
-
`coderd_provisioner_job_queue_wait_seconds_bucket{build_reason="autostart",job_type="workspace_build",provisioner_type="terraform",transition="start",le="0.025"}
1`

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-12 13:40:47 -08:00
Spike Curtis bddb808b25 chore: arrange imports in a standard way (#21452)
Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:

```
import (
	"context"
	"time"

	"github.com/prometheus/client_golang/prometheus"
	"golang.org/x/xerrors"
	"gopkg.in/natefinch/lumberjack.v2"

	"cdr.dev/slog/v3"
	"github.com/coder/coder/v2/codersdk/agentsdk"
	"github.com/coder/serpent"
)
```

3 groups: standard library, 3rd partly libs, Coder libs.

This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
2026-01-08 15:24:11 +04:00
Spike Curtis 49b34a716a fix: fix slog to always use array of Fields (#21426)
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).

It also updates dependencies that also use slog and were updated.

I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.

Other dependencies, I pushed new tags.
2026-01-08 10:29:41 +04:00
Callum Styan 6c902a7410 fix: don't create autostart workspace builds with no available provisioners (#19067)
This should fix https://github.com/coder/coder/issues/17941 by introducing a check for whether there are any valid (non-stale provisioners for a job in the autobuild executor code path.

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-08-15 08:50:51 -07:00
Steven Masley 8ba8b4f061 chore: add profiling labels for pprof analysis (#19232)
PProf labels segment the code into groups for determing the source of
cpu/memory profiles. Since the web server and background jobs share a
lot of the same code (eg wsbuilder), it helps to know if the load is
user induced, or background job based.
2025-08-07 11:21:17 -05:00
Dean Sheather 9a6dd73f68 feat: add managed agent license limit checks (#18937)
- Adds a query for counting managed agent workspace builds between two
timestamps
- The "Actual" field in the feature entitlement for managed agents is
now populated with the value read from the database
- The wsbuilder package now validates AI agent usage against the limit
when a license is installed

Closes coder/internal#777
2025-07-22 13:39:26 +10:00
Susana Ferreira 211393a69c fix: exclude prebuilt workspaces from lifecycle executor (#18762)
## Description

This PR updates the lifecycle executor to explicitly exclude prebuilt
workspaces from being considered for lifecycle operations such as
`autostart`, `autostop`, `dormancy`, `default TTL` and `failure TTL`.

Prebuilt workspaces (i.e., those owned by the prebuild system user) are
handled separately by the prebuild reconciliation loop. Including them
in the lifecycle executor could lead to unintended behavior such as
incorrect scheduling or state transitions.

## Changes

* Updated the lifecycle executor query
`GetWorkspacesEligibleForTransition` to exclude workspaces with
`owner_id = 'c42fdf75-3097-471c-8c33-fb52454d81c0'` (prebuilds).
* Added tests to verify prebuilt workspaces are not considered in:
  * Autostop
  * Autostart
  * Default TTL
  * Dormancy
  * Failure TTL

Fixes: https://github.com/coder/coder/issues/18740
Related to: https://github.com/coder/coder/issues/18658
2025-07-08 11:35:28 +01:00
Steven Masley 7b152cdd91 chore: increase fileCache hit rate in autobuilds lifecycle (#18507)
`wsbuilder` hits the file cache when running validation. This solution is imperfect, but by first sorting workspaces by their template version id, the cache hit rate should improve.
2025-06-24 07:36:39 -05:00
Steven Masley 82af2e019d feat: implement dynamic parameter validation (#18482)
# What does this do?

This does parameter validation for dynamic parameters in `wsbuilder`. All input parameters are validated in `coder/coder` before being sent to terraform.

The heart of this PR is [`ResolveParameters`](https://github.com/coder/coder/blob/b65001e89c0577199a8e470c138c51e91cf2350c/coderd/dynamicparameters/resolver.go#L30-L30).

# What else changes?

`wsbuilder` now needs to load the terraform files into memory to succeed. This does add a larger memory requirement to workspace builds.

# Future work

- Sort autostart handling workspaces by template version id. So workspaces with the same template version only load the terraform files once from the db, and store them in the cache.
2025-06-23 12:35:15 -05:00
Steven Masley e76d58f2b6 chore: disable parameter validatation for dynamic params for all transitions (#17926)
Dynamic params skip parameter validation in coder/coder.
This is because conditional parameters cannot be validated 
with the static parameters in the database.
2025-05-20 10:09:53 -05:00
Danielle Maywood 1267c9c405 fix: ensure reason present for workspace autoupdated notification (#17935)
Fixes https://github.com/coder/coder/issues/17930

Update the `WorkspaceAutoUpdated` notification to only display the
reason if it is present.
2025-05-20 16:01:57 +01:00
Danielle Maywood 50ff06cc3c chore: acquire lock for individual workspace transition (#15859)
When Coder is ran in High Availability mode, each Coder instance has a
lifecycle executor. These lifecycle executors are all trying to do the
same work, and whilst transactions saves us from this causing an issue,
we are still doing extra work that could be prevented.

This PR adds a `TryAcquireLock` call for each attempted workspace
transition, meaning two Coder instances shouldn't duplicate effort.
2024-12-13 16:59:27 +00:00
Danielle Maywood e21a301682 fix: make GetWorkspacesEligibleForTransition return even less false positives (#15594)
Relates to https://github.com/coder/coder/issues/15082

Further to https://github.com/coder/coder/pull/15429, this reduces the
amount of false-positives returned by the 'is eligible for autostart'
part of the query. We achieve this by calculating the 'next start at'
time of the workspace, storing it in the database, and using it in our
`GetWorkspacesEligibleForTransition` query.

The prior implementation of the 'is eligible for autostart' query would
return _all_ workspaces that at some point in the future _might_ be
eligible for autostart. This now ensures we only return workspaces that
_should_ be eligible for autostart.

We also now pass `currentTick` instead of `t` to the
`GetWorkspacesEligibleForTransition` query as otherwise we'll have one
round of workspaces that are skipped by `isEligibleForTransition` due to
`currentTick` being a truncated version of `t`.
2024-12-02 21:02:36 +00:00
Cian Johnston 2b57dcc68c feat(coderd): add matched provisioner daemons information to more places (#15688)
- Refactors `checkProvisioners` into `db2sdk.MatchedProvisioners`
- Adds a separate RBAC subject just for reading provisioner daemons
- Adds matched provisioners information to additional endpoints relating to
  workspace builds and templates
-Updates existing unit tests for above endpoints
-Adds API endpoint for matched provisioners of template dry-run job
-Updates CLI to show warning when creating/starting/stopping/deleting
 workspaces for which no provisoners are available

---------

Co-authored-by: Danny Kopping <danny@coder.com>
2024-12-02 20:54:32 +00:00
Steven Masley 144d3f3e3d chore: record lifecycle duration metric to prometheus (#15279)
`autobuild_execution_duration_seconds` keeps track of how long autobuild
takes and exposes it via prometheus histogram
2024-10-30 10:20:47 -04:00
Steven Masley ccfffc6911 chore: add tx metrics and logs for serialization errors (#15215)
Before db_metrics were all or nothing. Now `InTx` metrics are always recorded, and query metrics are opt in.


Adds instrumentation & logging around serialization failures in the database.
2024-10-25 12:14:15 -04:00
Steven Masley 343f8ec9ab chore: join owner, template, and org in new workspace view (#15116)
Joins in fields like `username`, `avatar_url`, `organization_name`,
`template_name` to `workspaces` via a **view**. 
The view must be maintained moving forward, but this prevents needing to
add RBAC permissions to fetch related workspace fields.
2024-10-22 09:20:54 -05:00
Bruno Quaresma 10327fb3a9 fix(coderd): humanize duration on notifications (#14333) 2024-08-19 15:49:47 -03:00
Danny Kopping c6076d2d0d chore: improve notification template tests' resilience (#14196) 2024-08-07 11:33:26 +02:00
Bruno Quaresma 58b810fb0a fix: fix dormancy notifications (#14029) 2024-07-29 11:20:04 -03:00
Bruno Quaresma 0d9615b4fd feat(coderd): notify when workspace is marked as dormant (#13868) 2024-07-24 13:38:21 -03:00
Marcin Tojek fbd1d7f9a7 feat: notify on successful autoupdate (#13903) 2024-07-18 15:19:12 +02:00
Marcin Tojek 7c41f957de feat: autostop workspaces owned by suspended users (#13790) 2024-07-04 13:35:41 +00:00
Steven Masley 845407fe7a chore: cover deadline crossing autostart border on start (#13115)
When starting a workspace, if the deadline crosses an autostart boundary, the deadline is set to autostart + TTL. 
This copies the behavior in `ActivityBumpWorkspace`, but does not require activity.
2024-05-01 10:43:04 -05:00
Jon Ayers 383eed93f8 fix: use correct logger for lifecycle_executor (#11763) 2024-01-23 14:33:55 -06:00
Jon Ayers ce49a55f56 chore: update build_reason 'autolock' -> 'dormancy' (#11074) 2023-12-07 17:11:57 -06:00
Jon Ayers 48d69c9e60 fix: update autostart context to include querying users (#10929) 2023-11-28 17:56:49 -06:00
Colin Adler 7f39ff854e fix: skip autostart for suspended/dormant users (#10771) 2023-11-22 11:14:32 -06:00
Steven Masley 290180b104 feat!: bump workspace activity by 1 hour (#10704)
Marked as a breaking change as the previous activity bump was always the TTL duration of the workspace/template.

This change is more cost conservative, only bumping by 1 hour for workspace activity. To accommodate wrap around, eg bumping a workspace into the next autostart, the deadline is bumped by the TTL if the workspace crosses the autostart threshold.

This is a niche case that is likely caused by an idle terminal making a workspace survive through a night. The next morning, the workspace will get activity bumped the default TTL on the autostart, being similar to as if the workspace was autostarted again.

In practice, a good way to avoid this is to set a max_deadline of <24hrs to avoid wrap around entirely.
2023-11-15 09:42:27 -06:00
Jon Ayers 75ab16d19a fix: prevent db deadlock when workspaces go dormant (#10618) 2023-11-13 13:40:20 -06:00
Jon Ayers 997493d4ae feat: add template setting to require active template version (#10277) 2023-10-18 17:07:21 -05:00
Colin Adler 1ad998ee3a fix: add requester IP to workspace build audit logs (#10242) 2023-10-18 15:08:02 -05:00
Steven Masley 39c0539d42 feat: add controls to template for determining startup days (#10226)
* feat: template controls which days can autostart
* Add unit test to test blocking autostart with DaysOfWeek
2023-10-13 11:57:18 -05:00
Spike Curtis 983e8c3ae8 feat: add API support for workspace automatic updates (#10099)
* Added automatic_updates to workspaces table

Signed-off-by: Spike Curtis <spike@coder.com>

* Queries and API updates

Signed-off-by: Spike Curtis <spike@coder.com>

* Golden files

Signed-off-by: Spike Curtis <spike@coder.com>

* Enable automatic updates on autostart

Signed-off-by: Spike Curtis <spike@coder.com>

* db migration number

Signed-off-by: Spike Curtis <spike@coder.com>

* fix imports and ts mock

Signed-off-by: Spike Curtis <spike@coder.com>

* code review updates

Signed-off-by: Spike Curtis <spike@coder.com>

---------

Signed-off-by: Spike Curtis <spike@coder.com>
2023-10-06 13:27:12 +04:00
Jon Ayers b32d79ef0b fix: fix failed workspaces continuously auto-deleting (#10069)
- Fixes an issue where workspaces that are eligible for auto-deletion
  are retried every tick (1 minute) even if the previous deletion
  transition failed.

  The updated logic only attempts to delete workspaces that previously
  failed once a day (24 hours since last attempt).
2023-10-05 14:11:39 -05:00
Jon Ayers 91265678ad chore: add auditing to workspace dormancy (#10070)
- Adds an audit log for workspaces automatically transitioned to the dormant
  state.
- Imposes a mininum of 1 minute on cleanup-related fields. This is to
  prevent accidental API misuse from resulting in catastrophe.
2023-10-05 13:41:07 -05:00
Steven Masley 5021e23105 chore: compute job status as column (#10024)
* chore: provisioner job status as column
* use provisioner job status for workspace searching
2023-10-04 20:57:46 -05:00
Spike Curtis 375c70d141 feat: integrate Acquirer for provisioner jobs (#9717)
* chore: add Acquirer to provisionerdserver pkg

Signed-off-by: Spike Curtis <spike@coder.com>

* code review improvements & fixes

Signed-off-by: Spike Curtis <spike@coder.com>

* feat: integrate Acquirer for provisioner jobs

Signed-off-by: Spike Curtis <spike@coder.com>

* Fix imports, whitespace

Signed-off-by: Spike Curtis <spike@coder.com>

* provisionerdserver always closes; remove poll interval from playwright

Signed-off-by: Spike Curtis <spike@coder.com>

* post jobs outside transactions

Signed-off-by: Spike Curtis <spike@coder.com>

* graceful shutdown in test

Signed-off-by: Spike Curtis <spike@coder.com>

* Mark AcquireJob deprecated

Signed-off-by: Spike Curtis <spike@coder.com>

* Graceful shutdown on all provisionerd tests

Signed-off-by: Spike Curtis <spike@coder.com>

* Deprecate, not remove CLI flags

Signed-off-by: Spike Curtis <spike@coder.com>

---------

Signed-off-by: Spike Curtis <spike@coder.com>
2023-09-19 10:25:57 +04:00
Mathias Fredriksson ad23d33f28 refactor(coderd/schedule): move cron schedule to cron package (#9507)
This removes an indirect import of `coderd/database` from the CLI and
results in a logical separation between server related and generalized
schedule.

No size change (yet).

Ref: #9380
2023-09-04 16:48:25 +03:00
Mathias Fredriksson 19d7da3d24 refactor(coderd/database): split Time and Now into dbtime package (#9482)
Ref: #9380
2023-09-01 16:50:12 +00:00
Jon Ayers 7f14b50dbe chore: rename locked to dormant (#9290)
* chore: rename locked to dormant

- The following columns have been updated:
  - workspace.locked_at -> dormant_at
  - template.inactivity_ttl -> time_til_dormant
  - template.locked_ttl -> time_til_dormant_autodelete

This change has also been reflected in the SDK.

A route has also been updated from /workspaces/<id>/lock to /workspaces/<id>/dormant
2023-08-24 13:25:54 -05:00
Kyle Carberry 22e781eced chore: add /v2 to import module path (#9072)
* chore: add /v2 to import module path

go mod requires semantic versioning with versions greater than 1.x

This was a mechanical update by running:
```
go install github.com/marwan-at-work/mod/cmd/mod@latest
mod upgrade
```

Migrate generated files to import /v2

* Fix gen
2023-08-18 18:55:43 +00:00
Jon Ayers e43608395c feat: add frontend for locked workspaces (#8655)
- Fix workspaces query for locked workspaces.
2023-08-03 19:46:02 -05:00
Jon Ayers b47d076756 feat: add deleting_at column to workspaces (#8333) 2023-07-20 22:01:11 -05:00