## Problem
With the new tasks data model, a task starts with an `initializing`
status. However, the API returns `current_state: null` to represent the
agent state, causing the frontend to display "No message available".
This PR updates `codersdk.Task` to return a `current_state` when the
task is initializing with meaningful messages about what's happening
during task initialization.
**Previous message**
<img width="2764" height="288" alt="Screenshot 2025-11-07 at 09 06 13"
src="https://github.com/user-attachments/assets/feec9f15-91ca-4378-8565-5f9de062d11a"
/>
**New message**
<img width="2726" height="226" alt="Screenshot 2025-11-12 at 11 00 15"
src="https://github.com/user-attachments/assets/2f9bee3e-7ac4-4382-b1c3-1d06bbc2906e"
/>
## Changes
- Populate `current_state` with descriptive initialization messages when
task status is `initializing` and no valid app status exists for the
current build
- **dbfake**: Fix `WorkspaceBuild` builder to properly handle
pending/running jobs by linking tasks without requiring agent/app
resources
**Note:** UI Storybook changes to reflect these new messages will be
addressed in a follow-up PR.
Closes: https://github.com/coder/internal/issues/1063
Adds some extra meta data sent to provisioners. Also adds a field
`reuse_terraform_workspace` to tell the provisioner whether or not to
use the caching experiment.
While investigating a flake I noticed that the dbfake workspace builder
executes all database inserts without a transaction. Since our real
wsbuilder implementation utilizes one it makes sense to do here as well.
For example, our normal workspace <-> build relationship is such that a
workspace cannot exist with at least one build. However, our
GetWorkspaces query left joins workspace builds but has types that are
non-nullable, leading to flakes like coder/internal#1103.
## Description
This PR introduces an optimization to automatically cancel pending
prebuild-related jobs from non-active template versions in the
reconciliation loop.
## Problem
Currently, when a template is configured with more prebuild instances
than available provisioners, the provisioner queue can become flooded
with pending prebuild jobs. This issue is worsened when
provisioning/deprovisioning operations take a long time.
When the prebuild reconciliation loop generates jobs faster than
provisioners can process them, pending jobs accumulate in the queue.
Since prebuilt workspaces should always run the latest active template
version, pending prebuild jobs from non-active versions become obsolete
once a new version is promoted.
## Solution
The reconciliation loop cancels pending prebuild-related jobs from
non-active template versions that match the following criteria:
* Build number: 1 (initial build created by the reconciliation loop)
* Job status: `pending`
* Not yet picked up by a provisioner (`worker_id` is `NULL`)
* Owned by the prebuilds system user
* Workspace transition: `start`
This prevents the queue from being cluttered with stale prebuild jobs
that would provision workspaces on an outdated template version that
would consequently need to be deprovisioned.
## Changes
* Added new SQL query `CountPendingNonActivePrebuilds` to identify
presets with pending jobs from non-active versions
* Added new SQL query `UpdatePrebuildProvisionerJobWithCancel` to cancel
jobs for a specific preset
* New reconciliation action type `ActionTypeCancelPending` handles the
cancellation logic
* Cancellation is non-blocking: failures to cancel prebuild jobs are
logged as errors and don't prevent other reconciliation actions
## Follow-up PR
Canceling pending prebuild jobs leaves workspaces in a Canceled state.
While no Terraform resources need to be destroyed (since jobs were
canceled before provisioning started), these database records should
still be cleaned up. This will be addressed in a follow-up PR.
Closes: https://github.com/coder/coder/issues/20242
relates to #778
Somehow in `TestWorkspaceAgent` the agent with the test instance identifier is not being added to the database, or is getting deleted.
I'm adding some additional logging to `dbfake` and setting the affected tests to dump postgres on error, to see if we can get to the bottom of the issue.
This PR sets a constraint of 1MB on the provisioner job logs written to
the database. This is consistent with the constraint we place on
workspace agent logs:
https://github.com/coder/coder/blob/4ac6be6d835dc36c242e35a26b584b784040bf28/coderd/database/dump.sql#L2030
It also adds a message printed to the front end about the provisioner
log overflow, and updates the message printed to the front end when
workspace startup logs exceed the max, as it was causing some customers
to think their startup script had failed to run.
## Description
This PR adds support for `description` and `icon` fields to
`template_version_presets`. These fields will allow displaying richer
information for presets in the UI, improving the user experience when
creating a workspace.
Both fields are optional, non-nullable, and default to empty strings.
## Changes
* Database migration with the addition of `description VARCHAR(128)` and
`icon VARCHAR(256)` columns to the `template_version_presets` table.
* Updated the `CreateWorkspacePageView` in the UI
Note: UI changes will be addressed in a separate PR
Closes https://github.com/coder/internal/issues/312
Depends on https://github.com/coder/terraform-provider-coder/pull/408
This PR adds support for defining an **autoscaling block** for
prebuilds, allowing number of desired instances to scale dynamically
based on a schedule.
Example usage:
```
data "coder_workspace_preset" "us-nix" {
...
prebuilds = {
instances = 0 # default to 0 instances
scheduling = {
timezone = "UTC" # a single timezone is used for simplicity
# Scale to 3 instances during the work week
schedule {
cron = "* 8-18 * * 1-5" # from 8AM–6:59PM, Mon–Fri, UTC
instances = 3 # scale to 3 instances
}
# Scale to 1 instance on Saturdays for urgent support queries
schedule {
cron = "* 8-14 * * 6" # from 8AM–2:59PM, Sat, UTC
instances = 1 # scale to 1 instance
}
}
}
}
```
### Behavior
- Multiple `schedule` blocks per `prebuilds` block are supported.
- If the current time matches any defined autoscaling schedule, the
corresponding number of instances is used.
- If no schedule matches, the **default instance count**
(`prebuilds.instances`) is used as a fallback.
### Why
This feature allows prebuild instance capacity to adapt to predictable
usage patterns, such as:
- Scaling up during business hours or high-demand periods
- Reducing capacity during off-hours to save resources
### Cron specification
The cron specification is interpreted as a **continuous time range.**
For example, the expression:
```
* 9-18 * * 1-5
```
is intended to represent a continuous range from **09:00 to 18:59**,
Monday through Friday.
However, due to minor implementation imprecision, it is currently
interpreted as a range from **08:59:00 to 18:58:59**, Monday through
Friday.
This slight discrepancy arises because the evaluation is based on
whether a specific **point in time** falls within the range, using the
`github.com/coder/coder/v2/coderd/schedule/cron` library, which performs
per-minute matching rather than strict range evaluation.
---------
Co-authored-by: Danny Kopping <danny@coder.com>
Deletion of data is uncommon in our database, so the introduction of sub agents
and the deletion of them introduced issues with foreign key assumptions, as can
be seen in coder/internal#685. We could have only addressed the specific case by
allowing cascade deletion of stats as well as handling in the stats collector,
but it's unclear how many more such edge-cases we could run into.
In this change, we mark the rows as deleted via boolean instead, and filter them
out in all relevant queries.
Fixescoder/internal#685
This pull request allows coder workspace agents to be reinitialized when
a prebuilt workspace is claimed by a user. This facilitates the transfer
of ownership between the anonymous prebuilds system user and the new
owner of the workspace.
Only a single agent per prebuilt workspace is supported for now, but
plumbing has already been done to facilitate the seamless transition to
multi-agent support.
---------
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
Deleted organizations are still attempting to sync members. This causes
an error on inserting the member, and would likely cause issues later in
the sync process even if that member is inserted. Deleted orgs should be
skipped.
- Refactors existing `mcp` package to use `kylecarbs/aisdk-go` and moves
to `codersdk/toolsdk` package.
- Updates existing MCP server implementation to use `codersdk/toolsdk`
Co-authored-by: Kyle Carberry <kyle@coder.com>
- Update go.mod to use Go 1.24.1
- Update GitHub Actions setup-go action to use Go 1.24.1
- Fix linting issues with golangci-lint by:
- Updating to golangci-lint v1.57.1 (more compatible with Go 1.24.1)
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
Underscores and double hyphens are now blocked. The regex is almost the
exact same as the `coder_app` `slug` regex, but uppercase characters are
still permitted.
The failure condition being fixed is `w1` and `w2` could belong
to different users, organizations, and templates and still cause a
serializable failure if run concurrently. This is because the old query
did a `seq scan` on the `workspace_builds` table. Since that is the
table being updated, we really want to prevent that.
So before this would fail for any 2 workspaces. Now it only fails if
`w1` and `w2` are owned by the same user and organization.
Closes#14716Closes#14717
Adds a new user-scoped tailnet API endpoint (`api/v2/tailnet`) with a new RPC stream for receiving updates on workspaces owned by a specific user, as defined in #14716.
When a stream is started, the `WorkspaceUpdatesProvider` will begin listening on the user-scoped pubsub events implemented in #14964. When a relevant event type is seen (such as a workspace state transition), the provider will query the DB for all the workspaces (and agents) owned by the user. This gets compared against the result of the previous query to produce a set of workspace updates.
Workspace updates can be requested for any user ID, however only workspaces the authorised user is permitted to `ActionRead` will have their updates streamed.
Opening a tunnel to an agent requires that the user can perform `ActionSSH` against the workspace containing it.
We currently send empty payloads to pubsub channels of the form `workspace:<workspace_id>` to notify listeners of updates to workspaces (such as for refreshing the workspace dashboard).
To support https://github.com/coder/coder/issues/14716, we'll instead send `WorkspaceEvent` payloads to pubsub channels of the form `workspace_owner:<owner_id>`. This enables a listener to receive events for all workspaces owned by a user.
This PR replaces the usage of the old channels without modifying any existing behaviors.
```
type WorkspaceEvent struct {
Kind WorkspaceEventKind `json:"kind"`
WorkspaceID uuid.UUID `json:"workspace_id" format:"uuid"`
// AgentID is only set for WorkspaceEventKindAgent* events
// (excluding AgentTimeout)
AgentID *uuid.UUID `json:"agent_id,omitempty" format:"uuid"`
}
```
We've defined `WorkspaceEventKind`s based on how the old channel was used, but it's not yet necessary to inspect the types of any of the events, as the existing listeners are designed to fire off any of them.
```
WorkspaceEventKindStateChange WorkspaceEventKind = "state_change"
WorkspaceEventKindStatsUpdate WorkspaceEventKind = "stats_update"
WorkspaceEventKindMetadataUpdate WorkspaceEventKind = "mtd_update"
WorkspaceEventKindAppHealthUpdate WorkspaceEventKind = "app_health"
WorkspaceEventKindAgentLifecycleUpdate WorkspaceEventKind = "agt_lifecycle_update"
WorkspaceEventKindAgentLogsUpdate WorkspaceEventKind = "agt_logs_update"
WorkspaceEventKindAgentConnectionUpdate WorkspaceEventKind = "agt_connection_update"
WorkspaceEventKindAgentLogsOverflow WorkspaceEventKind = "agt_logs_overflow"
WorkspaceEventKindAgentTimeout WorkspaceEventKind = "agt_timeout"
```
Joins in fields like `username`, `avatar_url`, `organization_name`,
`template_name` to `workspaces` via a **view**.
The view must be maintained moving forward, but this prevents needing to
add RBAC permissions to fetch related workspace fields.
* chore: create type for unique role names
Using `string` was confusing when something should be combined with
org context, and when not to. Naming this new name, "RoleIdentifier"
* chore: add org ID as optional param to AcquireJob
* chore: plumb through organization id to provisioner daemons
* add org id to provisioner domain key
* enforce org id argument
* dbgen provisioner jobs defaults to default org
Drop "New" and "Builder" from the function names, in favor of the top-level resource created. This shortens tests and gives a nice syntax. Since everything is a builder, the prefix and suffix don't add much value and just make things harder to read.
I've also chosen to leave `Do()` as the function to insert into the database. Even though it's a builder pattern, I fear `.Build()` might be confusing with Workspace Builds. One other idea is `Insert()` but if we later add dbfake functions that update, this might be inconsistent.
Convert to builder for consistency with rest of the package. This will make it easier to use, and means we can drop "Builder" from function arguments since they are all builders in the package.
I'd like to convert dbfake into a builder pattern to prevent a proliferation of XXXWithYYY methods. This is one step of the way by removing the Non-builder function.
Refactors SSH tests to skip provisionerd and instead use dbfake to insert workspaces and builds. This should make tests faster and more reliable.
dbfake.WorkspaceBuild is refactored to use a "builder" pattern with "fluent" options, as the number of options and variants was starting to get out of hand.
* feat: add dbfakedata for workspace builds and resources
This creates `coderdtest.NewWithDatabase` and adds a series of
helper functions to `dbfake` that insert structured fake data
for resources into the database.
It allows us to remove provisionerd from a significant amount of
tests which should speed them up and reduce flakes.
* Rename dbfakedata to dbfake
* Migrate workspaceagents_test.go to use the new dbfake
* Migrate agent_test.go to use the new fakes
* Fix comments