Commit Graph

124 Commits

Author SHA1 Message Date
Danielle Maywood 06dbadab11 fix(coderd): ensure lifecycle executor has sufficient task permissions (#20539)
We recently made a change to the `wsbuilder` to handle task related
logic. Our test coverage for the lifecycle executor didn't handle this
scenario and so we missed that it had insufficient permissions.

This PR adds `Update` and `Read` permissions for `Task`s in the
lifecycle executor, as well as an autostart/autostop test tailored to
task workspaces to verify the change.

---

Anthropic's Claude Sonnet 4.5 Thinking was involved in writing the tests
2025-10-29 15:44:35 +00:00
Hugo Dutka e62c5db678 chore: remove references to dbtestutil.WillUsePostgres (#20436)
Addresses https://github.com/coder/internal/issues/758.

This PR only cleans up dead code, it makes no changes to test logic.
2025-10-23 14:24:54 +02:00
Callum Styan 6574299fcc fix: fix TestExecutorAutostartSkipsWhenNoProvisionersAvailable flake, part 2 (#19649)
This test still flakes occasionally, see
https://github.com/coder/internal/issues/954#issuecomment-3237154735

The cause appears to be related to the assignment of `time.Now()` as the
`LastSeenAt` time when creating a provisioner which can flake with the
calculated scheduled next autostart and the code to set then
`require.Eventually` the updated provisioner LastSeenAt.

Instead we should simply calculate all time values for the stale portion
of the test based on the provisioners LastSeenAt value to avoid such
issues.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-09-01 08:52:34 -07:00
Callum Styan 321c2b8fce fix: fix flake in TestExecutorAutostartSkipsWhenNoProvisionersAvailable (#19478)
The flake here had two causes:
1. related to usage of time.Now() in MustWaitForProvisionersAvailable
and
2. the fact that UpdateProvisionerLastSeenAt can not use a time that is
further in the past than the current LastSeenAt time

Previously the test here was calling
`coderdtest.MustWaitForProvisionersAvailable` which was using `time.Now`
rather than the next tick time like the real `hasProvisionersAvailable`
function does. Additionally, when using `UpdateProvisionerLastSeenAt`
the underlying db query enforces that the time we're trying to set
`LastSeenAt` to cannot be older than the current value.

I was able to reliably reproduce the flake by executing both the
`UpdateProvisionerLastSeenAt` call and `tickCh <- next` in their own
goroutines, the former with a small sleep to reliably ensure we'd
trigger the autobuild before we set the `LastSeenAt` time. That's when I
also noticed that `coderdtest.MustWaitForProvisionersAvailable` was
using `time.Now` instead of the tick time. When I updated that function
to take in a tick time + added a 2nd call to
`UpdateProvisionerLastSeenAt` to set an original non-stale time, we
could then never get the test to pass because the later call to set the
stale time would not actually modify `LastSeenAt`. On top of that,
calling the provisioner daemons closer in the middle of the function
doesn't really do anything of value in this test.

**The fix for the flake is to keep the go routines, ensuring there would
be a flake if there was not a relevant fix, but to include the fix which
is to ensure that we explicitly wait for the provisioner to be stale
before passing the time to `tickCh`.**

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-08-28 12:07:50 -07:00
Callum Styan 6c902a7410 fix: don't create autostart workspace builds with no available provisioners (#19067)
This should fix https://github.com/coder/coder/issues/17941 by introducing a check for whether there are any valid (non-stale provisioners for a job in the autobuild executor code path.

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-08-15 08:50:51 -07:00
Susana Ferreira 8567ecbe52 fix: set prebuilds lifecycle parameters on creation and claim (#19252)
## Description

This PR ensures that prebuilt workspaces are properly excluded from the
lifecycle executor and treated as a separate class of workspaces, fully
managed by the prebuild reconciliation loop.

It introduces two lifecycle guarantees:
* When a prebuilt workspace is created (i.e., when the workspace build
completes), all lifecycle-related fields are unset, ensuring the
workspace does not participate in TTL, autostop, autostart, dormancy, or
auto-deletion logic.
* When a prebuilt workspace is claimed, it transitions into a regular
user workspace. At this point, all lifecycle fields are correctly
populated according to template-level configurations, allowing the
workspace to be managed by the lifecycle executor as expected.

## Changes

* Prebuilt workspaces now have all lifecycle-relevant fields unset
during creation
* When a prebuild is claimed:
* Lifecycle fields are set based on template and workspace level
configurations. This ensures a clean transition into the standard
workspace lifecycle flow.
* Updated lifecycle-related SQL update queries to explicitly exclude
prebuilt workspaces.

## Relates 

Related issue: https://github.com/coder/coder/issues/18898

To reduce the scope of this PR and make the review process more
manageable, the original implementation has been split into the
following focused PRs:
* https://github.com/coder/coder/pull/19259
* https://github.com/coder/coder/pull/19263
* https://github.com/coder/coder/pull/19264
* https://github.com/coder/coder/pull/19265

These PRs should be considered in conjunction with this one to
understand the complete set of lifecycle separation changes for prebuilt
workspaces.
2025-08-13 12:45:46 +01:00
Steven Masley 8ba8b4f061 chore: add profiling labels for pprof analysis (#19232)
PProf labels segment the code into groups for determing the source of
cpu/memory profiles. Since the web server and background jobs share a
lot of the same code (eg wsbuilder), it helps to know if the load is
user induced, or background job based.
2025-08-07 11:21:17 -05:00
Dean Sheather 9a6dd73f68 feat: add managed agent license limit checks (#18937)
- Adds a query for counting managed agent workspace builds between two
timestamps
- The "Actual" field in the feature entitlement for managed agents is
now populated with the value read from the database
- The wsbuilder package now validates AI agent usage against the limit
when a license is installed

Closes coder/internal#777
2025-07-22 13:39:26 +10:00
Susana Ferreira 211393a69c fix: exclude prebuilt workspaces from lifecycle executor (#18762)
## Description

This PR updates the lifecycle executor to explicitly exclude prebuilt
workspaces from being considered for lifecycle operations such as
`autostart`, `autostop`, `dormancy`, `default TTL` and `failure TTL`.

Prebuilt workspaces (i.e., those owned by the prebuild system user) are
handled separately by the prebuild reconciliation loop. Including them
in the lifecycle executor could lead to unintended behavior such as
incorrect scheduling or state transitions.

## Changes

* Updated the lifecycle executor query
`GetWorkspacesEligibleForTransition` to exclude workspaces with
`owner_id = 'c42fdf75-3097-471c-8c33-fb52454d81c0'` (prebuilds).
* Added tests to verify prebuilt workspaces are not considered in:
  * Autostop
  * Autostart
  * Default TTL
  * Dormancy
  * Failure TTL

Fixes: https://github.com/coder/coder/issues/18740
Related to: https://github.com/coder/coder/issues/18658
2025-07-08 11:35:28 +01:00
Steven Masley 7b152cdd91 chore: increase fileCache hit rate in autobuilds lifecycle (#18507)
`wsbuilder` hits the file cache when running validation. This solution is imperfect, but by first sorting workspaces by their template version id, the cache hit rate should improve.
2025-06-24 07:36:39 -05:00
Steven Masley 82af2e019d feat: implement dynamic parameter validation (#18482)
# What does this do?

This does parameter validation for dynamic parameters in `wsbuilder`. All input parameters are validated in `coder/coder` before being sent to terraform.

The heart of this PR is [`ResolveParameters`](https://github.com/coder/coder/blob/b65001e89c0577199a8e470c138c51e91cf2350c/coderd/dynamicparameters/resolver.go#L30-L30).

# What else changes?

`wsbuilder` now needs to load the terraform files into memory to succeed. This does add a larger memory requirement to workspace builds.

# Future work

- Sort autostart handling workspaces by template version id. So workspaces with the same template version only load the terraform files once from the db, and store them in the cache.
2025-06-23 12:35:15 -05:00
Cian Johnston 49fcffc266 fix!: stop workspace before update (#18425)
Fixes https://github.com/coder/coder/issues/17840

NOTE: calling this out as a breaking change so that it is highly visible
in the changelog.

* CLI: Modifies `coder update` to stop the workspace if already running.
* UI: Modifies "update" button to always stop the workspace if already
running.
2025-06-23 09:12:37 +01:00
ケイラ fae30a00fd chore: remove unnecessary redeclarations in for loops (#18440) 2025-06-20 13:16:55 -06:00
Hugo Dutka d3ed6fe652 chore(coderd/autobuild): use dbtestutil.WillUsePostgres instead of os.Getenv in test (#18145)
Standardizing on `WillUsePostgres` will make it easier to remove the
check entirely once dbmem is removed.

Related to https://github.com/coder/coder/issues/15109.
2025-06-02 13:58:07 +02:00
Spike Curtis 6c0bed0f53 chore: update to coder/quartz v0.2.0 (#18007)
Upgrade to coder/quartz v0.2.0 including fixing up a minor API breaking change.
2025-05-27 16:05:03 +04:00
Steven Masley e76d58f2b6 chore: disable parameter validatation for dynamic params for all transitions (#17926)
Dynamic params skip parameter validation in coder/coder.
This is because conditional parameters cannot be validated 
with the static parameters in the database.
2025-05-20 10:09:53 -05:00
Danielle Maywood 1267c9c405 fix: ensure reason present for workspace autoupdated notification (#17935)
Fixes https://github.com/coder/coder/issues/17930

Update the `WorkspaceAutoUpdated` notification to only display the
reason if it is present.
2025-05-20 16:01:57 +01:00
ケイラ f670bc31f5 chore: update testutil chan helpers (#17408) 2025-04-16 10:37:09 -06:00
Jon Ayers 17ddee05e5 chore: update golang to 1.24.1 (#17035)
- Update go.mod to use Go 1.24.1
- Update GitHub Actions setup-go action to use Go 1.24.1
- Fix linting issues with golangci-lint by:
  - Updating to golangci-lint v1.57.1 (more compatible with Go 1.24.1)

🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <claude@anthropic.com>
2025-03-26 01:56:39 -05:00
Mathias Fredriksson e693b66b47 test(coderd/autobuild): fix context initialization in tests (#16173) 2025-01-17 14:51:53 +02:00
Danielle Maywood 7c595e2631 feat: allow removing deadline for running workspace (#16085)
Fixes https://github.com/coder/coder/issues/9775

When a workspace's TTL is removed, and the workspace is running, the
deadline is removed from the workspace.

This also modifies the frontend to not show a confirmation dialog when
the change is to remove autostop.
2025-01-13 21:37:57 +00:00
Cian Johnston 7b88776403 chore(testutil): add testutil.GoleakOptions (#16070)
- Adds `testutil.GoleakOptions` and consolidates existing options to
this location
- Pre-emptively adds required ignore for this Dependabot PR to pass CI
https://github.com/coder/coder/pull/16066
2025-01-08 15:38:37 +00:00
Danielle Maywood f0e81ab455 feat: notify on workspace creation (#15934) 2024-12-20 13:53:10 +00:00
Danielle Maywood 50ff06cc3c chore: acquire lock for individual workspace transition (#15859)
When Coder is ran in High Availability mode, each Coder instance has a
lifecycle executor. These lifecycle executors are all trying to do the
same work, and whilst transactions saves us from this causing an issue,
we are still doing extra work that could be prevented.

This PR adds a `TryAcquireLock` call for each attempted workspace
transition, meaning two Coder instances shouldn't duplicate effort.
2024-12-13 16:59:27 +00:00
Danielle Maywood e21a301682 fix: make GetWorkspacesEligibleForTransition return even less false positives (#15594)
Relates to https://github.com/coder/coder/issues/15082

Further to https://github.com/coder/coder/pull/15429, this reduces the
amount of false-positives returned by the 'is eligible for autostart'
part of the query. We achieve this by calculating the 'next start at'
time of the workspace, storing it in the database, and using it in our
`GetWorkspacesEligibleForTransition` query.

The prior implementation of the 'is eligible for autostart' query would
return _all_ workspaces that at some point in the future _might_ be
eligible for autostart. This now ensures we only return workspaces that
_should_ be eligible for autostart.

We also now pass `currentTick` instead of `t` to the
`GetWorkspacesEligibleForTransition` query as otherwise we'll have one
round of workspaces that are skipped by `isEligibleForTransition` due to
`currentTick` being a truncated version of `t`.
2024-12-02 21:02:36 +00:00
Cian Johnston 2b57dcc68c feat(coderd): add matched provisioner daemons information to more places (#15688)
- Refactors `checkProvisioners` into `db2sdk.MatchedProvisioners`
- Adds a separate RBAC subject just for reading provisioner daemons
- Adds matched provisioners information to additional endpoints relating to
  workspace builds and templates
-Updates existing unit tests for above endpoints
-Adds API endpoint for matched provisioners of template dry-run job
-Updates CLI to show warning when creating/starting/stopping/deleting
 workspaces for which no provisoners are available

---------

Co-authored-by: Danny Kopping <danny@coder.com>
2024-12-02 20:54:32 +00:00
Cian Johnston 30e6fbd35c fix(coderd): ensure correct RBAC when enqueueing notifications (#15478)
- Assert rbac in fake notifications enqueuer
- Move fake notifications enqueuer to separate notificationstest package
- Update dbauthz rbac policy to allow provisionerd and autostart to create and read notification messages
- Update tests as required
2024-11-12 12:40:46 +00:00
Steven Masley 144d3f3e3d chore: record lifecycle duration metric to prometheus (#15279)
`autobuild_execution_duration_seconds` keeps track of how long autobuild
takes and exposes it via prometheus histogram
2024-10-30 10:20:47 -04:00
Steven Masley ccfffc6911 chore: add tx metrics and logs for serialization errors (#15215)
Before db_metrics were all or nothing. Now `InTx` metrics are always recorded, and query metrics are opt in.


Adds instrumentation & logging around serialization failures in the database.
2024-10-25 12:14:15 -04:00
Steven Masley 343f8ec9ab chore: join owner, template, and org in new workspace view (#15116)
Joins in fields like `username`, `avatar_url`, `organization_name`,
`template_name` to `workspaces` via a **view**. 
The view must be maintained moving forward, but this prevents needing to
add RBAC permissions to fetch related workspace fields.
2024-10-22 09:20:54 -05:00
Bruno Quaresma 10327fb3a9 fix(coderd): humanize duration on notifications (#14333) 2024-08-19 15:49:47 -03:00
Danny Kopping c6076d2d0d chore: improve notification template tests' resilience (#14196) 2024-08-07 11:33:26 +02:00
Kayla Washburn-Love bf4b7abf14 chore(coderd): allow creating workspaces without specifying an organization (#14048) 2024-07-30 10:44:02 -06:00
Marcin Tojek cf1fcab514 feat: notify about created user account (#14010) 2024-07-30 15:37:45 +02:00
Bruno Quaresma 58b810fb0a fix: fix dormancy notifications (#14029) 2024-07-29 11:20:04 -03:00
Bruno Quaresma 0d9615b4fd feat(coderd): notify when workspace is marked as dormant (#13868) 2024-07-24 13:38:21 -03:00
Marcin Tojek 91cbe679c0 chore: move notiffake to testutil (#13933) 2024-07-18 13:36:02 +00:00
Marcin Tojek fbd1d7f9a7 feat: notify on successful autoupdate (#13903) 2024-07-18 15:19:12 +02:00
Marcin Tojek 7c41f957de feat: autostop workspaces owned by suspended users (#13790) 2024-07-04 13:35:41 +00:00
Spike Curtis e5268e4551 chore: spin clock library out to coder/quartz repo (#13777)
Code that was in `/clock` has been moved to github.com/coder/quartz.  This PR refactors our use of the clock library to point to the external Quartz repo.
2024-07-03 15:02:54 +04:00
Spike Curtis 0268c7a659 chore: refactor autobuild/notify to use clock test (#13566)
Refactor autobuild/notify and tests to use the clock testing library.

I also rewrote some of the comments because I didn't understand them when I was looking at the package.
2024-06-13 16:01:17 +04:00
Steven Masley 845407fe7a chore: cover deadline crossing autostart border on start (#13115)
When starting a workspace, if the deadline crosses an autostart boundary, the deadline is set to autostart + TTL. 
This copies the behavior in `ActivityBumpWorkspace`, but does not require activity.
2024-05-01 10:43:04 -05:00
Kayla Washburn-Love f705f9a5eb test: ensure RequireActiveVersion is actually set when testing with AGPL store (#12843) 2024-04-02 11:29:22 -06:00
Jon Ayers 383eed93f8 fix: use correct logger for lifecycle_executor (#11763) 2024-01-23 14:33:55 -06:00
Cian Johnston df7ed18e1b chore(coderd/autobuild): wait for active template version and inactive template version (#11210) 2023-12-14 13:58:57 +00:00
Cian Johnston 2883cad6ad fix(coderd/autobuild): wait for template version job in TestExecutorInactiveWorkspace (#11150) 2023-12-12 12:43:02 +00:00
Jon Ayers 02696f2df9 chore: fix flake in TestExecutorAutostopTemplateDisabled (#11096) 2023-12-08 09:02:54 +00:00
Jon Ayers ce49a55f56 chore: update build_reason 'autolock' -> 'dormancy' (#11074) 2023-12-07 17:11:57 -06:00
Jon Ayers 48d69c9e60 fix: update autostart context to include querying users (#10929) 2023-11-28 17:56:49 -06:00
Colin Adler 7f39ff854e fix: skip autostart for suspended/dormant users (#10771) 2023-11-22 11:14:32 -06:00