The provisioner state for a workspace build was being loaded for every
long-lived agent rpc connection. Since this state can be anywhere from
kilobytes to megabytes this can gradually cause the `coderd` memory
footprint to grow over time. It's also a lot of unnecessary allocations
for every query that fetches a workspace build since only a few callers
ever actually reference the provisioner state.
This PR removes it from the returned workspace build and adds a query to
fetch the provisioner state explicitly.
Since Go 1.22, the loop variable capture issue is resolved. Variables
declared by for loops are now per-iteration rather than per-loop, making
the 'v := v' pattern unnecessary.
Relates to https://github.com/coder/coder/pull/21922 /
https://github.com/coder/internal/issues/1259
* Adds `dbfake.BuilderOption func(*WorkspaceBuildBuilder)`
* Adds `BuilderOption` methods for setting various provisioner job
related fields on `WorkspaceBuildBuilder`.
* Migrates a number of existing tests that previously dependeded on
provisioner job timing to use these updated methods in the following
packages:
* `coderd/jobreaper`
* `coderd/notifications/reports`
* `enterprise/coderd/schedule`
* `enterprise/coderd/prebuilds`
* `scripts/workspace-runtime-audit`
🤖 Created using Mux (Opus 4.5)
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Agents were losing authentication during workspace shutdown, causing
shutdown scripts to fail. The auth query required agents to belong to
the latest build, but during shutdown a `stop` build becomes latest while
the `start` build's agents are still running.
Modified the auth query to allow `start` build agents to authenticate
temporarily during `stop` execution. The query allows auth when:
- Agent's `start` build job succeeded
- Latest build is `stop` with `pending`/`running` job status
- Builds are adjacent (`stop` is `build_number + 1`)
- Template versions match
Auth closes once `stop` completes.
Renamed `GetWorkspaceAgentAndLatestBuildByAuthToken` to
`GetAuthenticatedWorkspaceAgentAndBuildByAuthToken` since it returns the
agent's build (not always latest) during shutdown.
Closes coder/internal#1249
Fixes#19467
Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:
```
import (
"context"
"time"
"github.com/prometheus/client_golang/prometheus"
"golang.org/x/xerrors"
"gopkg.in/natefinch/lumberjack.v2"
"cdr.dev/slog/v3"
"github.com/coder/coder/v2/codersdk/agentsdk"
"github.com/coder/serpent"
)
```
3 groups: standard library, 3rd partly libs, Coder libs.
This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).
It also updates dependencies that also use slog and were updated.
I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.
Other dependencies, I pushed new tags.
Closes https://github.com/coder/coder/issues/20913
I've ran the test without the fix, verified the test caught the issue,
then applied the fix, and confirmed the issue no longer happens.
---
🤖 PR was initially written by Claude Opus 4.5 Thinking using Claude Code
and then review by a human 👩
## Description
This PR ensures that lifecycle-related changes made via template
schedule updates do **not affect prebuilt workspaces**. Since prebuilds
are managed by the reconciliation loop and do not participate in the
regular lifecycle executor flow, they must be excluded from any updates
triggered by template configuration changes.
This includes changes to TTL, dormant-deletion scheduling, deadline and
autostart scheduling.
## Changes
- Updated SQL query `UpdateWorkspacesTTLByTemplateID` to exclude
prebuilt workspaces
- Updated SQL query `UpdateWorkspacesDormantDeletingAtByTemplateID` to
exclude prebuilt workspaces
- Updated application-layer logic to skip any updates to lifecycle
parameters if a workspace is a prebuild
- Preserved all existing update behavior for regular user workspaces
This change guarantees that only lifecycle-managed workspaces are
affected when template-level configurations are modified, preserving
strict boundaries between prebuild and user workspace lifecycles.
Related with:
* Issue: https://github.com/coder/coder/issues/18898
* PR: https://github.com/coder/coder/pull/19252
Closes https://github.com/coder/internal/issues/884
We're adding this as a `go run` in `lint/go` for now, since adding it to
golangci-lint ourselves involves recompiling golangci-lint and then
running that new binary. I'll look into proposing it being added to the
public golangci-lint linters.
Doesn't appear to cause the lint ci job to take any longer, which is
nice.
## Description
This PR ensures that prebuilt workspaces are properly excluded from the
lifecycle executor and treated as a separate class of workspaces, fully
managed by the prebuild reconciliation loop.
It introduces two lifecycle guarantees:
* When a prebuilt workspace is created (i.e., when the workspace build
completes), all lifecycle-related fields are unset, ensuring the
workspace does not participate in TTL, autostop, autostart, dormancy, or
auto-deletion logic.
* When a prebuilt workspace is claimed, it transitions into a regular
user workspace. At this point, all lifecycle fields are correctly
populated according to template-level configurations, allowing the
workspace to be managed by the lifecycle executor as expected.
## Changes
* Prebuilt workspaces now have all lifecycle-relevant fields unset
during creation
* When a prebuild is claimed:
* Lifecycle fields are set based on template and workspace level
configurations. This ensures a clean transition into the standard
workspace lifecycle flow.
* Updated lifecycle-related SQL update queries to explicitly exclude
prebuilt workspaces.
## Relates
Related issue: https://github.com/coder/coder/issues/18898
To reduce the scope of this PR and make the review process more
manageable, the original implementation has been split into the
following focused PRs:
* https://github.com/coder/coder/pull/19259
* https://github.com/coder/coder/pull/19263
* https://github.com/coder/coder/pull/19264
* https://github.com/coder/coder/pull/19265
These PRs should be considered in conjunction with this one to
understand the complete set of lifecycle separation changes for prebuilt
workspaces.
- Adds/improves a lot of comments to make the autostop calculation code
clearer
- Changes the behavior of the enterprise template schedule store to
match the behavior of the workspace TTL endpoint when the new TTL is
zero
- Fixes a bug in the workspace TTL endpoint where it could unset the
build deadline, even though a max_deadline was specified
- Adds a new constraint to the workspace_builds table that enforces the
deadline is non-zero and below the max_deadline if it is set
- Adds CHECK constraint enum generation to scripts/dbgen, used for
testing the above constraint
- Adds Dean and Danielle as CODEOWNERS for the autostop calculation code
- Update go.mod to use Go 1.24.1
- Update GitHub Actions setup-go action to use Go 1.24.1
- Fix linting issues with golangci-lint by:
- Updating to golangci-lint v1.57.1 (more compatible with Go 1.24.1)
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
Relates to https://github.com/coder/coder/issues/15390
Currently when a user creates a workspace, their workspace's TTL is
determined by the template's default TTL. If the Coder instance is AGPL,
or if the template has disallowed the user from configuring autostop,
then it is not possible to change the workspace's TTL after creation.
Any changes to the template's default TTL only takes effect on _new_
workspaces.
This PR modifies the behaviour slightly so that on AGPL Coder, or on
enterprise when a template does not allow user's to configure their
workspace's TTL, updating the template's default TTL will also update
any workspace's TTL to match this value.
Relates to https://github.com/coder/coder/issues/15082
Further to https://github.com/coder/coder/pull/15429, this reduces the
amount of false-positives returned by the 'is eligible for autostart'
part of the query. We achieve this by calculating the 'next start at'
time of the workspace, storing it in the database, and using it in our
`GetWorkspacesEligibleForTransition` query.
The prior implementation of the 'is eligible for autostart' query would
return _all_ workspaces that at some point in the future _might_ be
eligible for autostart. This now ensures we only return workspaces that
_should_ be eligible for autostart.
We also now pass `currentTick` instead of `t` to the
`GetWorkspacesEligibleForTransition` query as otherwise we'll have one
round of workspaces that are skipped by `isEligibleForTransition` due to
`currentTick` being a truncated version of `t`.
- Assert rbac in fake notifications enqueuer
- Move fake notifications enqueuer to separate notificationstest package
- Update dbauthz rbac policy to allow provisionerd and autostart to create and read notification messages
- Update tests as required
Joins in fields like `username`, `avatar_url`, `organization_name`,
`template_name` to `workspaces` via a **view**.
The view must be maintained moving forward, but this prevents needing to
add RBAC permissions to fetch related workspace fields.
When starting a workspace, if the deadline crosses an autostart boundary, the deadline is set to autostart + TTL.
This copies the behavior in `ActivityBumpWorkspace`, but does not require activity.
* chore: remove max_ttl from templates
Completely removing max_ttl as a feature on template scheduling. Must use other template scheduling features to achieve autostop.
* chore: add org ID as optional param to AcquireJob
* chore: plumb through organization id to provisioner daemons
* add org id to provisioner domain key
* enforce org id argument
* dbgen provisioner jobs defaults to default org
* fix: do not set max deadline for workspaces on template update
When templates are updated and schedule data is changed, we update all
running workspaces to have up-to-date scheduling information that sticks
to the new policy.
When updating the max_deadline for existing running workspaces, if the
max_deadline was before now()+2h we would set the max_deadline to
now()+2h.
Builds that don't/shouldn't have a max_deadline have it set to 0, which
is always before now()+2h, and thus would always have the max_deadline
updated.
* test: add unit test to excercise template schedule bug
---------
Co-authored-by: Steven Masley <stevenmasley@gmail.com>
Fixes#9823.
- Decomposes UpdateWorkspaceBuildByID into UpdateWorkspaceBuildProvisionerStateByID and UpdateWorkspaceBuildDeadlineByID.
- Replaces existing invocations of UpdateWorkspaceBuildByID with the newer queries where applicable.
- Modifies GetActiveWorkspaceBuildsByTemplateID to not return incomplete workspace builds.
This removes an indirect import of `coderd/database` from the CLI and
results in a logical separation between server related and generalized
schedule.
No size change (yet).
Ref: #9380
* chore: rename locked to dormant
- The following columns have been updated:
- workspace.locked_at -> dormant_at
- template.inactivity_ttl -> time_til_dormant
- template.locked_ttl -> time_til_dormant_autodelete
This change has also been reflected in the SDK.
A route has also been updated from /workspaces/<id>/lock to /workspaces/<id>/dormant
* chore: add /v2 to import module path
go mod requires semantic versioning with versions greater than 1.x
This was a mechanical update by running:
```
go install github.com/marwan-at-work/mod/cmd/mod@latest
mod upgrade
```
Migrate generated files to import /v2
* Fix gen