`NewDataBuilder` allocated `make([]byte, 0, req.FileSize)` using the
client-supplied `int64` with no upper-bound check. The DRPC 4 MiB wire
cap limits message size but not the integer value, so a crafted message
with `FileSize = 1<<40` forces a 1 TiB allocation, triggering an
unrecoverable `runtime.throw` that kills the entire `coderd` process.
Add a `MaxFileSize` constant (100 MiB, matching `HTTPFileMaxBytes` in
`coderd/files.go`) and reject negative or oversized `FileSize`, plus
negative or excessive `Chunks`, before the allocation.
`BytesToDataUpload` also returns an error for oversized data to preserve
the encode/decode round-trip contract. Fix a pre-existing reversed
subtraction in the `Add()` overflow error message.
Closes https://linear.app/codercom/issue/PLAT-231
<details>
<summary>Implementation details</summary>
- `provisionersdk/proto/dataupload.go`: New exported `MaxFileSize`
constant; validation in `NewDataBuilder` and `BytesToDataUpload`. Fixed
reversed subtraction in `Add()` error.
- `provisionersdk/proto/dataupload_test.go`: New
`TestNewDataBuilderValidation` with 7 subtests.
- Updated all 5 callers of `BytesToDataUpload` for new error return.
- Audited all `make([]byte, ...)` in provisioner paths; no other
client-supplied sizes.
</details>
> Generated by Coder Agents on behalf of @f0ssel
My agent added `//nolint:testpackage` to a test file on one of my PRs.
Again. This PR cleans it up across the entire repo and updates the
in-repo conventions so future agents stop doing it.
The repo already has a precedent for white-box tests that need to touch
unexported symbols: `*_internal_test.go` (145+ existing files). The
`testpackage` linter's default `skip-regexp` exempts that filename
suffix, so the `//nolint:testpackage` directive is unnecessary in every
case where someone reached for it. This PR renames 51 such files to
`*_internal_test.go` via `git mv` so blame and history follow, and
strips the dead directive from 2 files that were already correctly named
(`coderd/oauth2provider/authorize_internal_test.go`,
`coderd/x/chatd/advisor_internal_test.go`).
`.claude/docs/TESTING.md` now documents the rule explicitly under *Test
Package Naming*, which is imported into the root `AGENTS.md` via
`@.claude/docs/TESTING.md`. The rule: prefer `package foo_test`; if you
need internal access, rename the file to `*_internal_test.go` rather
than adding a nolint directive.
Removes the coder_secret Terraform integration: the data.coder_secret
consumption path through provisionerdserver → provisioner.proto →
provisioner/terraform, the dynamic-parameter secret-requirement
validation, and the workspace-update / resolve-autostart surfaces that
depended on it. This is being done due to a product/feature direction
change (see PLAT-243). User-secret CRUD (DB, REST, CLI, UI, telemetry, audit)
and the agent-manifest secret-injection path are untouched.
The provisionerd API is bumped from v1.17 to v1.18 rather than rolled
back: v1.17 shipped in v2.33.x, so user_secrets field numbers are
reserved and the changelog documents both versions.
Generated with assistance from Coder Agents.
## Summary
When a workspace build fails because the user is over their group quota,
the chat tools currently surface the failure as a bare `"workspace build
failed: insufficient quota"` string with no machine-readable error code
and no visibility into the user's current usage. Agents and the UI
cannot distinguish quota failures from any other Terraform error, so
users see an opaque message and have no clear path to recovery.
This PR tags quota failures with a typed error code at the source and
propagates it through the chat tool layer so callers can react to it
explicitly.
Relates to CODAGT-20
## Changes
**Provisioner runner**
- Add `InsufficientQuotaErrorCode = "INSUFFICIENT_QUOTA"` and set it
explicitly at the `commitQuota` failure site via a new
`failedWorkspaceBuildfCode` helper, so `provisioner_jobs.error_code` is
populated only on the genuine quota path. The substring matcher used for
externally produced sentinels (e.g. `"missing parameter"`, `"required
template variables"`) is intentionally not extended; provider errors
that happen to mention "insufficient quota" stay classified as generic
build failures.
**SDK and API contract**
- Add `JobErrorCodeInsufficientQuota` and a
`JobIsInsufficientQuotaErrorCode` helper to `codersdk`.
- Extend the swagger `enums` tag on `ProvisionerJob.ErrorCode` to
include `INSUFFICIENT_QUOTA`.
- Regenerate `coderd/apidoc`, `docs/reference/api/*`, and
`site/src/api/typesGenerated.ts`.
**chattool create_workspace / start_workspace**
- `waitForBuild` now returns a typed `*workspaceBuildError` carrying
both the message and the `JobErrorCode`, instead of a bare error string.
- New `quotaerror.go` introduces a structured `quotaErrorResult` (with
`error_code`, `title`, `message`, `build_id`, and optional `quota`) and
a best-effort `workspaceQuotaDetails` lookup that wraps owner
authorization internally and fetches `credits_consumed` and `budget`
from the database. Quota lookup failures (including authorization
failures) never block the failure payload.
- On quota-coded build failures, both `create_workspace` and
`start_workspace` now return the structured response (with the recovery
guidance inlined into `message`) instead of the bare `"insufficient
quota"` string. This applies to all three failure paths: post-creation,
an in-progress existing build, and a freshly triggered start build.
Non-quota build failures continue to use the existing
`buildToolResponse` / `newBuildError` path.
- Owner authorization is wrapped only on the call sites that need it
(the `CreateFn` and `StartFn` invocations and the quota-detail lookup),
so idempotent fast paths (already running, already in progress,
existing-workspace early returns) do not pay for an extra RBAC
round-trip or fail when role lookup is transient.
## Out of scope
- No changes to quota math, allowances, or bypass behavior.
- No automatic retries.
- No new quota-inspection tools and no changes to MCP
`coder_create_workspace` (which returns immediately and never observed
the build outcome here).
- No frontend UI changes; those will land in a follow-up PR that
consumes the new `INSUFFICIENT_QUOTA` code.
This change passes user secrets from coderd to the Terraform process at
workspace build time so the `data.coder_secret` data source in
terraform-provider-coder can resolve values at plan time.
Secrets traverse two proto hops: `provisionerdserver` fetches them
via`ListUserSecretsWithValues`, attaches them to
`AcquiredJob.WorkspaceBuild.user_secrets` on `provisionerd.proto`;
`runner.go` forwards into `PlanRequest.user_secrets` on
`provisioner.proto`; the Terraform provisioner encodes each as
`CODER_SECRET_ENV_<name>` or `CODER_SECRET_FILE_<hex(path)>` before
invoking `terraform plan`. Only plan requests carry secrets; apply runs
with `nil` because values are baked into plan state.
Fetch is gated on a workspace transitioning to start. stop and delete
transitions never carry secrets, so revoking or deleting a stored secret
cannot make a workspace unstoppable. DB errors on the fetch fail the job
outright rather than silently continuing with an empty secret set.
Note that user secrets will be stored in the workspace_builds table in
provisioner_state with other Terraform state (including other sensitive data).
## Description
Implements the server-side merge logic for the `merge_strategy`
attribute added to `coder_env` in [terraform-provider-coder
v2.15.0](https://github.com/coder/terraform-provider-coder/pull/489).
This allows template authors to control how duplicate environment
variable names are combined across multiple `coder_env` resources.
Relates to https://github.com/coder/coder/issues/21885
## Supported strategies
| Strategy | Behavior |
|----------|----------|
| `replace` (default) | Last value wins — backward compatible |
| `append` | Joins values with `:` separator (e.g. PATH additions) |
| `prepend` | Prepends value with `:` separator |
| `error` | Fails the build if the variable is already defined |
## Example
```hcl
resource "coder_env" "path_tools" {
agent_id = coder_agent.dev.id
name = "PATH"
value = "/home/coder/tools/bin"
merge_strategy = "append"
}
```
## Changes
- **Proto**: Added `merge_strategy` field to `Env` message in
`provisioner.proto`
- **State reader**: Updated `agentEnvAttributes` struct and proto
construction in `resources.go`
- **Merge logic**: Added `mergeExtraEnvs()` function in
`provisionerdserver.go` with strategy-aware merging for both agent envs
and devcontainer subagent envs
- **Tests**: 15 unit tests covering all strategies, edge cases (empty
values, mixed strategies, multiple appends)
- **Dependency**: Bumped `terraform-provider-coder` v2.14.0 → v2.15.0
- **Fixtures**: Updated `duplicate-env-keys` test fixtures and golden
files
## Ordering
When multiple resources `append` or `prepend` to the same key, they are
processed in alphabetical order by Terraform resource address (per the
determinism fix in #22706).
# What this does
Dynamic parameters caches the `./terraform/modules` directory for parameter usage. What this PR does is send over this archive to the provisioner when building workspaces.
This allow terraform to skip downloading modules from their registries, a step that takes seconds.
<img width="1223" height="429" alt="Screenshot From 2025-12-29 12-57-52" src="https://github.com/user-attachments/assets/16066e0a-ac79-4296-819d-924f4b0418dc" />
# Wire protocol
The wire protocol reuses the same mechanism used to download the modules `provisoner -> coder`. It splits up large archives into multiple protobuf messages so larger archives can be sent under the message size limit.
# 🚨 Behavior Change (Breaking Change) 🚨
**Before this PR** modules were downloaded on every workspace build. This means unpinned modules always fetched the latest version
**After this PR** modules are cached at template import time, and their versions are effectively pinned for all subsequent workspace builds.
**This is just the protobuf changes for the PR https://github.com/coder/coder/pull/21398**
Moved `UploadFileRequest` from `provisionerd.proto` -> `provisioner.proto`.
Renamed to `FileUpload` because it is now bi-directional.
This **is backwards compatible**. I tested it to confirm the payloads are identical. Types were just renamed and moved around.
```golang
func TestTypeUpgrade(t *testing.T) {
t.Parallel()
x := &proto2.UploadFileRequest{
Type: &proto2.UploadFileRequest_ChunkPiece{
ChunkPiece: &proto.ChunkPiece{
Data: []byte("Hello World!"),
FullDataHash: []byte("Foobar"),
PieceIndex: 42,
},
},
}
data, err := protobuf.Marshal(x)
require.NoError(t, err)
// Exactly the same output
// EhgKDEhlbGxvIFdvcmxkIRIGRm9vYmFyGCo= on `main`
// EhgKDEhlbGxvIFdvcmxkIRIGRm9vYmFyGCo= on this branch
fmt.Println(base64.StdEncoding.EncodeToString(data))
}
```
# What this does
This allows provisioner daemons to download files from `coderd`'s `files` table. This is used to send over cached module files and prevent the need of downloading these modules on each workspace build.
Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:
```
import (
"context"
"time"
"github.com/prometheus/client_golang/prometheus"
"golang.org/x/xerrors"
"gopkg.in/natefinch/lumberjack.v2"
"cdr.dev/slog/v3"
"github.com/coder/coder/v2/codersdk/agentsdk"
"github.com/coder/serpent"
)
```
3 groups: standard library, 3rd partly libs, Coder libs.
This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).
It also updates dependencies that also use slog and were updated.
I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.
Other dependencies, I pushed new tags.
Provisioner steps broken into smaller granular actions.
Changes:
- `ExtractArchive` moved to `init` request (was in `configure`)
- Writing `tfstate` moved to `plan` (was in `configure`)
- Moved most plan/apply outputs to `GraphComplete`
Prior to this, every workspace build ran `terraform init` in a fresh
directory. This would mean the `modules` are downloaded fresh. If the
module is not pinned, subsequent workspace builds would have different
modules.
Adds some extra meta data sent to provisioners. Also adds a field
`reuse_terraform_workspace` to tell the provisioner whether or not to
use the caching experiment.
Refactors all Terraform file path logic into a centralized tfpath package. This consolidates all path construction into a single, testable Layout type.
Instead of passing around `string` for directories, pass around the `Layout` which has the file location methods on it.
Closes https://github.com/coder/internal/issues/978
- Introduce `CODER_TASK_ID` and `CODER_TASK_PROMPT` to the provisioner
environment
- Make use of new `app_id` field in provider, with a fallback to
`sidebar_app.id` for backwards compatibility
**For now** I've left the `taskPrompt` and `taskID` as a TODO as we do
not yet create these values.
This pull request introduces support for external workspace management, allowing users to register and manage workspaces that are provisioned and managed outside of the Coder.
Depends on: https://github.com/coder/terraform-provider-coder/pull/424
* GET /api/v2/init-script - Gets the agent initialization script
* By default, it returns a script for Linux (amd64), but with query parameters (os and arch) you can get the init script for different platforms
* GET /api/v2/workspaces/{workspace}/external-agent/{agent}/credentials - Gets credentials for an external agent **(enterprise)**
* Updated queries to filter workspaces/templates by the has_external_agent field
## Description
This PR adds support for `description` and `icon` fields to
`template_version_presets`. These fields will allow displaying richer
information for presets in the UI, improving the user experience when
creating a workspace.
Both fields are optional, non-nullable, and default to empty strings.
## Changes
* Database migration with the addition of `description VARCHAR(128)` and
`icon VARCHAR(256)` columns to the `template_version_presets` table.
* Updated the `CreateWorkspacePageView` in the UI
Note: UI changes will be addressed in a separate PR
Closes https://github.com/coder/internal/issues/312
Depends on https://github.com/coder/terraform-provider-coder/pull/408
This PR adds support for defining an **autoscaling block** for
prebuilds, allowing number of desired instances to scale dynamically
based on a schedule.
Example usage:
```
data "coder_workspace_preset" "us-nix" {
...
prebuilds = {
instances = 0 # default to 0 instances
scheduling = {
timezone = "UTC" # a single timezone is used for simplicity
# Scale to 3 instances during the work week
schedule {
cron = "* 8-18 * * 1-5" # from 8AM–6:59PM, Mon–Fri, UTC
instances = 3 # scale to 3 instances
}
# Scale to 1 instance on Saturdays for urgent support queries
schedule {
cron = "* 8-14 * * 6" # from 8AM–2:59PM, Sat, UTC
instances = 1 # scale to 1 instance
}
}
}
}
```
### Behavior
- Multiple `schedule` blocks per `prebuilds` block are supported.
- If the current time matches any defined autoscaling schedule, the
corresponding number of instances is used.
- If no schedule matches, the **default instance count**
(`prebuilds.instances`) is used as a fallback.
### Why
This feature allows prebuild instance capacity to adapt to predictable
usage patterns, such as:
- Scaling up during business hours or high-demand periods
- Reducing capacity during off-hours to save resources
### Cron specification
The cron specification is interpreted as a **continuous time range.**
For example, the expression:
```
* 9-18 * * 1-5
```
is intended to represent a continuous range from **09:00 to 18:59**,
Monday through Friday.
However, due to minor implementation imprecision, it is currently
interpreted as a range from **08:59:00 to 18:58:59**, Monday through
Friday.
This slight discrepancy arises because the evaluation is based on
whether a specific **point in time** falls within the range, using the
`github.com/coder/coder/v2/coderd/schedule/cron` library, which performs
per-minute matching rather than strict range evaluation.
---------
Co-authored-by: Danny Kopping <danny@coder.com>
This PR implements protobuf streaming to handle large module files by:
1. **Streaming large payloads**: When module files exceed the 4MB limit,
they're streamed in chunks using a new UploadFile RPC method
2. **Database storage**: Streamed files are stored in the database and
referenced by hash for deduplication
3. **Backward compatibility**: Small module files continue using the
existing direct payload method
## Summary
This PR introduces support for expiration policies in prebuilds. The TTL
(time-to-live) is retrieved from the Terraform configuration
([terraform-provider-coder
PR](https://github.com/coder/terraform-provider-coder/pull/404)):
```
prebuilds = {
instances = 2
expiration_policy {
ttl = 86400
}
}
```
**Note**: Since there is no need for precise TTL enforcement down to the
second, in this implementation expired prebuilds are handled in a single
reconciliation cycle: they are deleted, and new instances are created
only if needed to match the desired count.
## Changes
* The outcome of a reconciliation cycle is now expressed as a slice of
reconciliation actions, instead of a single aggregated action.
* Adjusted reconciliation logic to delete expired prebuilds and
guarantee that the number of desired instances is correct.
* Updated relevant data structures and methods to support expiration
policies parameters.
* Added documentation to `Prebuilt workspaces` page
* Update `terraform-provider-coder` to version 2.5.0:
https://github.com/coder/terraform-provider-coder/releases/tag/v2.5.0
Depends on: https://github.com/coder/terraform-provider-coder/pull/404
Fixes: https://github.com/coder/coder/issues/17916
`v1.5` is going out with release `v2.22`
I had to reorder `module_files` and `resource_replacements` because of
this.
---------
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Closes https://github.com/coder/internal/issues/369
We can't know whether a replacement (i.e. drift of terraform state
leading to a resource needing to be deleted/recreated) will take place
apriori; we can only detect it at `plan` time, because the provider
decides whether a resource must be replaced and it cannot be inferred
through static analysis of the template.
**This is likely to be the most common gotcha with using prebuilds,
since it requires a slight template modification to use prebuilds
effectively**, so let's head this off before it's an issue for
customers.
Drift details will now be logged in the workspace build logs:

Plus a notification will be sent to template admins when this situation
arises:

A new metric - `coderd_prebuilt_workspaces_resource_replacements_total`
- will also increment each time a workspace encounters replacements.
We only track _that_ a resource replacement occurred, not how many. Just
one is enough to ruin a prebuild, but we can't know apriori which
replacement would cause this.
For example, say we have 2 replacements: a `docker_container` and a
`null_resource`; we don't know which one might
cause an issue (or indeed if either would), so we just track the
replacement.
---------
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
This pull request allows coder workspace agents to be reinitialized when
a prebuilt workspace is claimed by a user. This facilitates the transfer
of ownership between the anonymous prebuilds system user and the new
owner of the workspace.
Only a single agent per prebuilt workspace is supported for now, but
plumbing has already been done to facilitate the seamless transition to
multi-agent support.
---------
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
fixes https://github.com/coder/internal/issues/584
Ignore canceled error when sending an acquired job, since dRPC is racy and will sometimes return this error even after successfully sending the job, if the test is quickly finished.
- Update go.mod to use Go 1.24.1
- Update GitHub Actions setup-go action to use Go 1.24.1
- Fix linting issues with golangci-lint by:
- Updating to golangci-lint v1.57.1 (more compatible with Go 1.24.1)
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
This change allows specifying devcontainers in terraform and plumbs it
through to the agent via agent manifest.
This will be used for autostarting devcontainers in a workspace.
Depends on coder/terraform-provider-coder#368
Updates #16423