Commit Graph

135 Commits

Author SHA1 Message Date
Zach ddc0e99c69 chore: remove coder_secret Terraform integration (#25512)
Removes the coder_secret Terraform integration: the data.coder_secret
consumption path through provisionerdserver → provisioner.proto →
provisioner/terraform, the dynamic-parameter secret-requirement
validation, and the workspace-update / resolve-autostart surfaces that
depended on it. This is being done due to a product/feature direction
change (see PLAT-243). User-secret CRUD (DB, REST, CLI, UI, telemetry, audit)
and the agent-manifest secret-injection path are untouched.

The provisionerd API is bumped from v1.17 to v1.18 rather than rolled
back: v1.17 shipped in v2.33.x, so user_secrets field numbers are
reserved and the changelog documents both versions.

Generated with assistance from Coder Agents.
2026-05-21 09:19:29 -06:00
Zach 79735f2d45 feat: plumb user secrets through provisioner chain to terraform (#24542)
This change passes user secrets from coderd to the Terraform process at
workspace build time so the `data.coder_secret` data source in
terraform-provider-coder can resolve values at plan time.

Secrets traverse two proto hops: `provisionerdserver` fetches them
via`ListUserSecretsWithValues`, attaches them to
`AcquiredJob.WorkspaceBuild.user_secrets` on `provisionerd.proto`;
`runner.go` forwards into `PlanRequest.user_secrets` on
`provisioner.proto`; the Terraform provisioner encodes each as
`CODER_SECRET_ENV_<name>` or `CODER_SECRET_FILE_<hex(path)>` before
invoking `terraform plan`. Only plan requests carry secrets; apply runs
with `nil` because values are baked into plan state.

Fetch is gated on a workspace transitioning to start. stop and delete
transitions never carry secrets, so revoking or deleting a stored secret
cannot make a workspace unstoppable. DB errors on the fetch fail the job
outright rather than silently continuing with an empty secret set.

Note that user secrets will be stored in the workspace_builds table in
provisioner_state with other Terraform state (including other sensitive data).
2026-04-27 08:26:07 -06:00
Michael Suchacz e5707a13d6 feat: support multiple agents with shared instance-identity auth (#24325)
> This PR was authored by Mux on behalf of Mike.

## Summary

Adds support for multiple peer root workspace agents sharing the same
`auth_instance_id`, so AWS, Azure, and GCP instance-identity auth can
issue the correct session token for a selected agent instead of assuming
a
single root agent per instance.

## Problem

When a Terraform template attaches two or more `coder_agent` resources
(with `auth = "aws-instance-identity"`) to a single compute instance,
every agent shares the same cloud instance ID. The existing singular
lookup picks whichever agent was created most recently, silently
ignoring
the others.

## Solution

Introduce an optional pre-auth agent selector (`CODER_AGENT_NAME`) and
make the server-side lookup ambiguity-aware.

**Database layer:**
- `GetWorkspaceAgentsByInstanceID` (`:many`): returns all matching root
  agents for an instance ID.
- `GetWorkspaceAgentByInstanceIDAndName` (`:one`): returns the named
root
  agent for disambiguation.

**SDK and CLI:**
- `agent_name` field added to AWS, Azure, and GCP request structs
  (`omitempty` for backward compatibility).
- `CODER_AGENT_NAME` env var and `--agent-name` flag wired into the
agent
  bootstrap before instance-identity auth runs.

**Server handler (`handleAuthInstanceID`):**
- When `agent_name` is present: direct lookup by (instance ID, name).
- When absent: legacy lookup, then resource-scoped ambiguity check.
  Returns 409 with available agent names if multiple root agents match.
- Whitespace-only names are trimmed and treated as unspecified.
- Sub-agents remain excluded (`parent_id IS NULL` filter).

**Verification template:**
- `examples/templates/aws-multi-agent/` provisions one EC2 instance with
  two agents (`main` and `dev`), both using instance-identity auth with
  `CODER_AGENT_NAME` set in the cloud-init user data.

## Backward compatibility

Existing single-agent deployments work unchanged. The `agent_name` field
is optional with `omitempty`, and the unnamed path preserves today's
behavior when only one root agent matches.
2026-04-16 13:59:09 +02:00
Sas Swart 5b6b7719df fix: make prebuild claiming durable and idempotent (#23108)
## Problem

When a prebuilt workspace is claimed, the agent reinitializes via a
single fire-and-forget pubsub event over SSE. If the agent's SSE
connection is interrupted at claim time, the event is permanently lost —
the workspace is stuck with no self-healing path.

Additionally, regular (non-prebuild) workspaces had no way to opt out of
the `/reinit` polling loop — agents would reconnect indefinitely to an
endpoint that would never send them anything useful.

## Root Cause

`workspaceAgentReinit` fetches the workspace (with its current
`owner_id`) via `GetWorkspaceByAgentID`, but never checked whether a
claim already happened. It only subscribed to pubsub for future events.
The database already has durable claim state (`owner_id` changes from
`PrebuildsSystemUserID` to the real user), but no layer ever consulted
it on reconnection.

## Solution

### Server-side durable check with first-build-initiator gating

**TOCTOU-safe ordering**: Subscribe to pubsub claim events *before* any
durable checks, so a claim that fires during the check is buffered in
the channel rather than lost.

**First-build-initiator gating**: When `!workspace.IsPrebuild()` (owner
is no longer the system user), look up the first build's `InitiatorID`.
The prebuild reconciler always uses `PrebuildsSystemUserID` as the
initiator. This distinguishes claimed prebuilds from regular workspaces
without any SQL schema changes.

- **Regular workspace** (first build initiator ≠ system user) → **409
Conflict**, agent stops reconnecting
- **Claimed prebuild, build completed** → pre-seed channel with reinit
event and close it, transmitter delivers one-shot then exits
- **Claimed prebuild, build in-progress** → fall through to pubsub
subscription, agent waits for completion event
- **Unclaimed prebuild** → pubsub subscription (existing happy path)

### Declarative reinit events (defense-in-depth)

- Added `UserID` field to `ReinitializationEvent` with JSON tags
- Switched pubsub serialization from raw string to JSON (with
backward-compat fallback for rolling upgrades)
- Populated `UserID` at both the publish site and the durable check

### Agent SDK: 409 handling

`WaitForReinitLoop` detects 409 Conflict from the server and closes the
`reinitEvents` channel, cleanly exiting the retry goroutine.

### Agent CLI: fixed two bugs + added reinitCtx

- **Closed channel (`!ok`)**: now blocks on `<-ctx.Done()` instead of
`continue`, keeping the current agent running. Previously this would
leak agents by skipping `agnt.Close()` and re-entering the loop.
- **Duplicate owner reinit**: cancels `reinitCtx` (stops the reinit
goroutine), then blocks on `<-ctx.Done()`. Previously `continue` would
skip cleanup and create a new agent on the next loop iteration.
- **`reinitCtx`**: a cancellable child of `ctx` passed to
`WaitForReinitLoop`, allowing the agent to stop the reinit HTTP polling
after reinit completes.

### Agent-side idempotency

Tracks `lastOwnerID` in the agent reinit loop — duplicate events for the
same owner are skipped.

## Testing

- **"unclaimed prebuild receives reinit via pubsub"**: prebuild owned by
system user, pubsub event triggers reinit
- **"claimed prebuild receives one-shot reinit on reconnect"**: first
build by system user, owner changed, build completed → immediate reinit
(no pubsub needed)
- **"claimed prebuild waits during in-progress claim build"**: claimed
but build still running → no reinit until build completes
- **"regular workspace gets 409"**: first build by real user → 409
Conflict, agent stops polling
- Updated claim publisher/listener tests: verify `UserID` survives JSON
round-trip + backward compat with raw string payloads
- Updated SSE round-trip test: verify `UserID` survives transmit →
receive cycle

Fixes #22359

## Rolling upgrade note

During a rolling deploy where old coderd instances coexist with new
ones, the pubsub `ReinitializationEvent` has a new `workspace_id` field
(JSON key `workspace_id`). Old publishers send a raw reason string
instead of JSON; the new listener gracefully falls back by treating the
entire payload as the reason and filling in `WorkspaceID` from context.
The only visible effect during the upgrade window is that `WorkspaceID`
may be the zero UUID in agent-side logs — this is cosmetic and resolves
once all instances are updated.
2026-04-02 23:51:02 +02:00
Steven Masley 84de391f26 chore: add tallyman events for ai seat tracking (#22689)
AI seat tracking inserted as heartbeat into usage table.
2026-03-18 09:30:22 -05:00
Steven Masley 537260aa22 fix: early oidc refresh with fake idp tests (#22712)
Wrote unit tests that implement a fake idp to verify the oauth package
actually refreshes the token
2026-03-06 16:51:27 +00:00
Cian Johnston 81468323e0 fix(coderd): use dbtime.Now() instead of time.Now() in test assertions against DB timestamps (#22685)
`time.Now()` has nanosecond precision while Postgres timestamps are
microsecond precision. When tests compare `time.Now()` against
DB-sourced timestamps using `Before`/`After`/`WithinRange`/etc., there
is a non-zero flake risk from the precision mismatch.

This replaces `time.Now()` with `dbtime.Now()` (which rounds to
microsecond precision) in all test assertions that compare against
database timestamps.

Follows from #22684.

## Changes (11 files)

| File | Changes |
|---|---|
| `coderd/apikey_test.go` | 11 comparisons with `ExpiresAt` |
| `coderd/users_test.go` | 2 comparisons with `ExpiresAt` |
| `coderd/oauth2_test.go` | 1 comparison with `token.Expiry` |
| `coderd/workspaces_test.go` | 2 comparisons with `DormantAt` |
| `coderd/workspaceagents_test.go` | 3 comparisons with
`ConnectedAt`/`DisconnectedAt` |
| `coderd/workspaceapps/db_test.go` | 1 comparison with `token.Expiry` |
| `coderd/provisionerdserver/provisionerdserver_test.go` | 1 comparison
with `key.ExpiresAt` |
| `enterprise/coderd/workspaces_test.go` | 1 comparison with `DormantAt`
|
| `enterprise/coderd/license/license_test.go` | 3 `NotBefore` values |
| `enterprise/coderd/licenses_test.go` | 2 `NotBefore` values |
| `enterprise/coderd/users_test.go` | 3 `Next()` comparisons |

## Not changed (intentionally)

- `scaletest/placebo/run_test.go` — compares wall-clock elapsed time,
not DB timestamps
- `cli/server_test.go`, `coderd/jwtutils/jwt_test.go`,
`enterprise/aibridgeproxyd/aibridgeproxyd_test.go` — TLS cert fields,
not DB-stored
- `coderd/azureidentity/azureidentity_test.go` — Azure cert expiry, not
DB


🤖 Generated by Claude Opus 4.6 but reviewed manually.
2026-03-06 09:14:11 +00:00
Jon Ayers 0a7a3da178 fix: exclude provisioner_state from workspace_build_with_user view (#22159)
The provisioner state for a workspace build was being loaded for every
long-lived agent rpc connection. Since this state can be anywhere from
kilobytes to megabytes this can gradually cause the `coderd` memory
footprint to grow over time. It's also a lot of unnecessary allocations
for every query that fetches a workspace build since only a few callers
ever actually reference the provisioner state.

This PR removes it from the returned workspace build and adds a query to
fetch the provisioner state explicitly.
2026-02-23 22:46:17 -06:00
Danielle Maywood 911d734df9 fix: avoid re-using AuthInstanceID for sub agents (#22196)
Parent agents were re-using AuthInstanceID when spawning child agents.
This caused GetWorkspaceAgentByInstanceID to return the most recently
created sub agent instead of the parent when the parent tried to refetch
its own manifest.

Fix by not reusing AuthInstanceID for sub agents, and updating
GetWorkspaceAgentByInstanceID to filter them out entirely.
2026-02-19 16:56:29 +00:00
Danielle Maywood 37aecda165 feat(coderd/provisionerdserver): insert sub agent resource (#21699)
Update provisionerdserver to handle the changes introduced to
provisionerd in https://github.com/coder/coder/pull/21602

We now create a relationship between `workspace_agent_devcontainers` and
`workspace_agents` with the newly created `subagent_id`.
2026-01-30 17:19:19 +00:00
Steven Masley e13f2a9869 chore: remove extra stop_modules from provisionerd proto (#21706)
Was a duplicate of start_modules

Closes https://github.com/coder/coder/issues/21206
2026-01-28 09:25:47 -06:00
Cian Johnston 7b44976618 fix(coderd/provisionerdserver): correct managed agent tracking (#21696)
Relates to https://github.com/coder/internal/issues/1282

Updates tracking of managed agents to be predicated instead on the
presence of a related `task_id` instead of the presence of a
`coder_ai_task` resource.
2026-01-27 12:14:52 +00:00
Steven Masley 89f4d60e7b chore: remove experiment "terraform-directory-reuse" (#21397)
Experiment is no longer required, the new method will be released without an experiment and without a toggle

Main PR is: https://github.com/coder/coder/pull/21398
2026-01-09 11:13:16 -06:00
Spike Curtis bddb808b25 chore: arrange imports in a standard way (#21452)
Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:

```
import (
	"context"
	"time"

	"github.com/prometheus/client_golang/prometheus"
	"golang.org/x/xerrors"
	"gopkg.in/natefinch/lumberjack.v2"

	"cdr.dev/slog/v3"
	"github.com/coder/coder/v2/codersdk/agentsdk"
	"github.com/coder/serpent"
)
```

3 groups: standard library, 3rd partly libs, Coder libs.

This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
2026-01-08 15:24:11 +04:00
Spike Curtis 49b34a716a fix: fix slog to always use array of Fields (#21426)
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).

It also updates dependencies that also use slog and were updated.

I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.

Other dependencies, I pushed new tags.
2026-01-08 10:29:41 +04:00
Steven Masley 9ca5b44b56 chore: implement persistent terraform directories (experimental) (#20563)
Prior to this, every workspace build ran `terraform init` in a fresh
directory. This would mean the `modules` are downloaded fresh. If the
module is not pinned, subsequent workspace builds would have different
modules.
2025-11-13 07:50:17 -06:00
Steven Masley 04727c06e8 chore: add experiment toggle for terraform workspace caching (#20559)
Experiments passed to provisioners to determine behavior. This adds
`--experiments` flag to provisioner daemons. Prior to this, provisioners
had no method to turn on/off experiments.
2025-11-12 14:26:15 -06:00
Steven Masley 9149c1e9f2 chore: append template metadata to protobuf config (#20558)
Adds some extra meta data sent to provisioners. Also adds a field
`reuse_terraform_workspace` to tell the provisioner whether or not to
use the caching experiment.
2025-11-12 12:46:39 -06:00
Mathias Fredriksson ce04f6cc5d fix(coderd): remove deprecated AITaskSidebarApp column (#20680)
This column was no longer used in `v2.28` and the codersdk field
deprecated. Both can now be dropped in `v2.29`.

Closes coder/internal#974
2025-11-07 12:45:45 +02:00
Mathias Fredriksson a6b0eae38d refactor(coderd): drop sidebar app constraint and simplify provisionerdserver for tasks (#20591)
Updates coder/internal#973
Updates coder/internal#974
2025-11-03 13:46:38 +02:00
Cian Johnston 1961252918 chore(coderd/provisionerdserver): address flake in TestServer_ExpirePrebuildsSessionToken (#20648)
Addresses a flake seen locally by @mafredri:

```
panic: interface conversion: proto.isAcquiredJob_Type is nil, not *proto.AcquiredJob_WorkspaceBuild_ [recovered]
        panic: interface conversion: proto.isAcquiredJob_Type is nil, not *proto.AcquiredJob_WorkspaceBuild_

goroutine 77 [running]:
testing.tRunner.func1.2({0x35ba440, 0xc000f15620})
        /usr/local/go/src/testing/testing.go:1734 +0x21c
testing.tRunner.func1()
        /usr/local/go/src/testing/testing.go:1737 +0x35e
panic({0x35ba440?, 0xc000f15620?})
        /usr/local/go/src/runtime/panic.go:792 +0x132
github.com/coder/coder/v2/coderd/provisionerdserver_test.TestServer_ExpirePrebuildsSessionToken(0xc00010d500)                                 /home/coder/coder/coderd/provisionerdserver/provisionerdserver_test.go:4128 +0xc4b
testing.tRunner(0xc00010d500, 0x4bd8450)
        /usr/local/go/src/testing/testing.go:1792 +0xf4
created by testing.(*T).Run in goroutine 1
        /usr/local/go/src/testing/testing.go:1851 +0x413
FAIL    github.com/coder/coder/v2/coderd/provisionerdserver     20.830s
FAIL
```

It's unclear why this would happen in the first place.
2025-11-03 11:39:02 +00:00
Danielle Maywood 5a31c590e6 fix(coderd/provisionerdserver): pipe through task id and prompt (#20408)
Pipes through the Task's ID and prompt into the provisioner. This is
required to support the new `coder_ai_task.prompt` field and modified
`coder_ai_task.id` field.
2025-10-24 09:43:48 +01:00
Cian Johnston dc6e50d6b7 feat(coderd/telemetry): add telemetry for database Tasks (#20279)
Adds Tasks to telemetry snapshots

Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
2025-10-17 10:48:56 +01:00
Mathias Fredriksson a8f87c2625 feat(coderd): implement task to app linking (#20237)
This change adds workspace build/agent/app linking to tasks and wires it
into `wsbuilder` and `provisionerdserver`.

Closes coder/internal#948
Closes coder/coder#20212
Closes coder/coder#19773
2025-10-13 12:57:06 +03:00
Cian Johnston 06cbb2890f fix: expire token for prebuilds user when regenerating session token (#19667)
* provisionerdserver: Expires prebuild user token for workspace, if it
exists, when regenerating session token.
* dbauthz: disallow prebuilds user from creating api keys
* dbpurge: added functionality to expire stale api keys owned by the
prebuilds user
2025-09-02 09:38:43 +01:00
Susana Ferreira 0ab345ca84 feat: add prebuild timing metrics to Prometheus (#19503)
## Description

This PR introduces one counter and two histograms related to workspace
creation and claiming. The goal is to provide clearer observability into
how workspaces are created (regular vs prebuild) and the time cost of
those operations.

### `coderd_workspace_creation_total`

* Metric type: Counter
* Name: `coderd_workspace_creation_total`
* Labels: `organization_name`, `template_name`, `preset_name`

This counter tracks whether a regular workspace (not created from a
prebuild pool) was created using a preset or not.
Currently, we already expose `coderd_prebuilt_workspaces_claimed_total`
for claimed prebuilt workspaces, but we lack a comparable metric for
regular workspace creations. This metric fills that gap, making it
possible to compare regular creations against claims.

Implementation notes:
* Exposed as a `coderd_` metric, consistent with other workspace-related
metrics (e.g. `coderd_api_workspace_latest_build`:
https://github.com/coder/coder/blob/main/coderd/prometheusmetrics/prometheusmetrics.go#L149).
* Every `defaultRefreshRate` (1 minute ), DB query
`GetRegularWorkspaceCreateMetrics` is executed to fetch all regular
workspaces (not created from a prebuild pool).
* The counter is updated with the total from all time (not just since
metric introduction). This differs from the histograms below, which only
accumulate from their introduction forward.

### `coderd_workspace_creation_duration_seconds` &
`coderd_prebuilt_workspace_claim_duration_seconds`

* Metric types: Histogram
* Names:
  * `coderd_workspace_creation_duration_seconds`
* Labels: `organization_name`, `template_name`, `preset_name`, `type`
(`regular`, `prebuild`)
  * `coderd_prebuilt_workspace_claim_duration_seconds`
    * Labels: `organization_name`, `template_name`, `preset_name`

We already have `coderd_provisionerd_workspace_build_timings_seconds`,
which tracks build run times for all workspace builds handled by the
provisioner daemon.
However, in the context of this issue, we are only interested in
creation and claim build times, not all transitions; additionally, this
metric does not include `preset_name`, and adding it there would
significantly increase cardinality. Therefore, separate more focused
metrics are introduced here:
* `coderd_workspace_creation_duration_seconds`: Build time to create a
workspace (either a regular workspace or the build into a prebuild pool,
for prebuild initial provisioning build).
* `coderd_prebuilt_workspace_claim_duration_seconds`: Time to claim a
prebuilt workspace from the pool.

The reason for two separate histograms is that:
* Creation (regular or prebuild): provisioning builds with similar time
magnitude, generally expected to take longer than a claim operation.
* Claim: expected to be a much faster provisioning build.

#### Native histogram usage

Provisioning times vary widely between projects. Using static buckets
risks unbalanced or poorly informative histograms.
To address this, these metrics use [Prometheus native
histograms](https://prometheus.io/docs/specs/native_histograms/):
* First introduced in Prometheus v2.40.0
* Recommended stable usage from v2.45+
* Requires Go client `prometheus/client_golang` v1.15.0+
* Experimental and must be explicitly enabled on the server
(`--enable-feature=native-histograms`)

For compatibility, we also retain a classic bucket definition (aligned
with the existing provisioner metric:
https://github.com/coder/coder/blob/main/provisionerd/provisionerd.go#L182-L189).
* If native histograms are enabled, Prometheus ingests the
high-resolution histogram.
* If not, it falls back to the predefined buckets.

Implementation notes:
* Unlike the counter, these histograms are updated in real-time at
workspace build job completion.
* They reflect data only from the point of introduction forward (no
historical backfill).

## Relates to 

Closes: https://github.com/coder/coder/issues/19528
Native histograms tested in observability stack:
https://github.com/coder/observability/pull/50
2025-08-28 15:00:26 +01:00
Cian Johnston bd139f3a43 fix(coderd/provisionerdserver): workaround lack of coder_ai_task resource on stop transition (#19560)
This works around the issue where a task may "disappear" on stop.
Re-using the previous value of `has_ai_task` and `sidebar_app_id` on a
stop transition.

---------

Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
2025-08-27 10:33:17 +01:00
Dean Sheather 1a601c30ad chore: move usage types to new package (#19103) 2025-08-20 23:48:38 +10:00
Dean Sheather 6eb02d1c2a chore: wire up usage tracking for managed agents (#19096)
Wires up the usage collector and publisher to coderd.

Relates to coder/internal#814
2025-08-20 23:38:09 +10:00
Cian Johnston 155c7bbc65 chore(coderd/provisionerdserver): avoid fk error on invalid ai_task_sidebar_app_id (#19253)
This is a workaround for https://github.com/coder/coder/issues/18776

We avoid the foreign key issue by checking the previously inserted
workspace applications before calling UpdateWorkspaceAITask. Now,
affected workspaces will show as "not running an AI task" on the single
task view, which is technically correct.

We also insert a provisioner job log at WARN level to ensure that the
user sees some information that they have run into this issue, as well
as logging on the server side.

Longer term, we plan to modify how the workspace tasks view is
presented. This is a stopgap measure until we solidify that plan.

NOTE: this does **not** address the fact that stopping a workspace with
`has_ai_task: true` will result in the completed stop build no longer
having `has_ai_task: true`, resulting in tasks "disappearing" on stop.
2025-08-08 18:06:32 +01:00
Cian Johnston b0ba798f78 chore(coderd/provisionerdserver): remove dangerous usage of dbtestutil.DisableForeignKeysAndTriggers (#19122)
May have caught https://github.com/coder/coder/issues/18776
2025-08-04 21:51:33 +01:00
Benjamin Peinhardt e4dc2d9418 fix: add constraint and runtime check for provisioner logs size limit (#18893)
This PR sets a constraint of 1MB on the provisioner job logs written to
the database. This is consistent with the constraint we place on
workspace agent logs:
https://github.com/coder/coder/blob/4ac6be6d835dc36c242e35a26b584b784040bf28/coderd/database/dump.sql#L2030

It also adds a message printed to the front end about the provisioner
log overflow, and updates the message printed to the front end when
workspace startup logs exceed the max, as it was causing some customers
to think their startup script had failed to run.
2025-07-30 19:09:53 -05:00
Danny Kopping 0238f2926d feat: persist AI task state in template imports & workspace builds (#18449) 2025-06-24 10:36:37 +00:00
ケイラ fae30a00fd chore: remove unnecessary redeclarations in for loops (#18440) 2025-06-20 13:16:55 -06:00
Hugo Dutka 910858b731 chore(coderd/provisionerdserver): convert dbmem tests to use postgres (#18278) 2025-06-09 10:05:29 +02:00
Steven Masley 0428c5ec1c chore: include 'everyone' group in template importing (#18257) 2025-06-05 19:25:36 +00:00
Steven Masley 8387dd27ab chore: add form_type parameter argument to db (#17920)
`form_type` is a new parameter field in the terraform provider. Bring
that field into coder/coder.

Validation for `multi-select` has also been added.
2025-05-29 08:55:19 -05:00
Cian Johnston 776c144128 fix(coderd): ensure agent timings are non-zero on insert (#18065)
Relates to https://github.com/coder/coder/issues/15432

Ensures that no workspace build timings with zero values for started_at or ended_at are inserted into the DB or returned from the API.
2025-05-29 13:36:06 +01:00
Michael Suchacz b7462fb256 feat: improve transaction safety in CompleteJob function (#17970)
This PR refactors the CompleteJob function to use database transactions
more consistently for better atomicity guarantees. The large function
was broken down into three specialized handlers:

- completeTemplateImportJob
- completeWorkspaceBuildJob
- completeTemplateDryRunJob

Each handler now uses the Database.InTx wrapper to ensure all database
operations for a job completion are performed within a single
transaction, preventing partial updates in case of failures.

Added comprehensive tests for transaction behavior for each job type.

Fixes #17694

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-05-21 16:48:51 +02:00
Steven Masley 789c4beba7 chore: add dynamic parameter error if missing metadata from provisioner (#17809) 2025-05-14 12:21:36 -05:00
Danny Kopping 6e967780c9 feat: track resource replacements when claiming a prebuilt workspace (#17571)
Closes https://github.com/coder/internal/issues/369

We can't know whether a replacement (i.e. drift of terraform state
leading to a resource needing to be deleted/recreated) will take place
apriori; we can only detect it at `plan` time, because the provider
decides whether a resource must be replaced and it cannot be inferred
through static analysis of the template.

**This is likely to be the most common gotcha with using prebuilds,
since it requires a slight template modification to use prebuilds
effectively**, so let's head this off before it's an issue for
customers.

Drift details will now be logged in the workspace build logs:


![image](https://github.com/user-attachments/assets/da1988b6-2cbe-4a79-a3c5-ea29891f3d6f)

Plus a notification will be sent to template admins when this situation
arises:


![image](https://github.com/user-attachments/assets/39d555b1-a262-4a3e-b529-03b9f23bf66a)

A new metric - `coderd_prebuilt_workspaces_resource_replacements_total`
- will also increment each time a workspace encounters replacements.

We only track _that_ a resource replacement occurred, not how many. Just
one is enough to ruin a prebuild, but we can't know apriori which
replacement would cause this.
For example, say we have 2 replacements: a `docker_container` and a
`null_resource`; we don't know which one might
cause an issue (or indeed if either would), so we just track the
replacement.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
2025-05-14 14:52:22 +02:00
Sas Swart 425ee6fa55 feat: reinitialize agents when a prebuilt workspace is claimed (#17475)
This pull request allows coder workspace agents to be reinitialized when
a prebuilt workspace is claimed by a user. This facilitates the transfer
of ownership between the anonymous prebuilds system user and the new
owner of the workspace.

Only a single agent per prebuilt workspace is supported for now, but
plumbing has already been done to facilitate the seamless transition to
multi-agent support.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
2025-05-14 14:15:36 +02:00
Danielle Maywood 0b5f27f566 feat: add parent_id column to workspace_agents table (#17758)
Adds a new nullable column `parent_id` to `workspace_agents` table. This
lays the groundwork for having child agents.
2025-05-13 00:01:31 +01:00
Danny Kopping af2941bb92 feat: add is_prebuild_claim to distinguish post-claim provisioning (#17757)
Used in combination with
https://github.com/coder/terraform-provider-coder/pull/396

This is required by both https://github.com/coder/coder/pull/17475 and
https://github.com/coder/coder/pull/17571

Operators may need to conditionalize their templates to perform certain
operations once a prebuilt workspace has been claimed. This value will
**only** be set once a claim takes place and a subsequent `terraform
apply` occurs. Any `terraform apply` runs thereafter will be
indistinguishable from a normal run on a workspace.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
2025-05-12 14:19:03 +00:00
Steven Masley ea65ddc17d fix: correct user roles being passed into terraform context (#17460)
Roles were being passed into the workspace context incorrectly. Site
wide scopes were being org scoped. Roles outside the org should also not
be sent.
2025-04-17 15:42:23 -05:00
ケイラ f670bc31f5 chore: update testutil chan helpers (#17408) 2025-04-16 10:37:09 -06:00
Sas Swart a98605913a feat: mark prebuilds as such and set their preset ids (#16965)
This pull request closes https://github.com/coder/internal/issues/513
2025-04-14 15:34:50 +02:00
Sas Swart 0b2b643ce2 feat: persist prebuild definitions on template import (#16951)
This PR allows provisioners to recognise and report prebuild definitions
to the coder control plane. It also allows the coder control plane to
then persist these to its store.

closes https://github.com/coder/internal/issues/507

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: evgeniy-scherbina <evgeniy.shcherbina.es@gmail.com>
2025-04-07 10:35:28 +02:00
Mathias Fredriksson 5c8cac9fb7 feat: add name to workspace agent devcontainers (#17089)
In the presence of multiple devcontainers, it would be nice to
differentiate them by name. This change inherits the resource name from
terraform.

Refs #17076
2025-03-25 12:59:20 +00:00
ケイラ 5b3eda6719 chore: persist template import terraform plan in postgres (#17012) 2025-03-24 10:01:50 -06:00