Commit Graph

72 Commits

Author SHA1 Message Date
Zach 091d31224d fix: replace moby/moby namesgenerator with internal implementation (#21377)
Replace the external moby/moby/pkg/namesgenerator dependency with an
internal implementation using gofakeit/v7. The moby package has ~25k
unique name combinations, and with its retry parameter only adds a
random digit 0-9, giving ~250k possibilities. In parallel tests, this
has led to collisions (flakes).

The new internal API at coderd/util/namesgenerator eliminates the
external dependnecy and offers functions with explicit uniqueness
guarantees. This PR also consolidates fragmented name generation in a
few places to use the new package.

| Old (moby/moby)                     | New                    |
|-------------------------------------|------------------------|
| namesgenerator.GetRandomName(0)     | NameWith("_")          |
| namesgenerator.GetRandomName(>0)    | NameDigitWith("_")     |
| testutil.GetRandomName(t)           | UniqueName()           |
| testutil.GetRandomNameHyphenated(t) | UniqueNameWith("-")    |

namesgenerator package API:
- NameWith(delim): random name, not unique
- NameDigitWith(delim): random name with 1-9 suffix, not unique
- UniqueName(): guaranteed unique via atomic counter
- UniqueNameWith(delim): unique with custom delimiter

Names continue to be docker style `[adjective][delim][surname]`. Unique
names are truncated to 32 characters (preserving the numeric suffix) to
fit common name length limits in Coder.

Related test flakes:
https://github.com/coder/internal/issues/1212
https://github.com/coder/internal/issues/118
https://github.com/coder/internal/issues/1068
2026-01-09 15:40:26 -07:00
Spike Curtis bddb808b25 chore: arrange imports in a standard way (#21452)
Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:

```
import (
	"context"
	"time"

	"github.com/prometheus/client_golang/prometheus"
	"golang.org/x/xerrors"
	"gopkg.in/natefinch/lumberjack.v2"

	"cdr.dev/slog/v3"
	"github.com/coder/coder/v2/codersdk/agentsdk"
	"github.com/coder/serpent"
)
```

3 groups: standard library, 3rd partly libs, Coder libs.

This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
2026-01-08 15:24:11 +04:00
Spike Curtis 49b34a716a fix: fix slog to always use array of Fields (#21426)
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).

It also updates dependencies that also use slog and were updated.

I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.

Other dependencies, I pushed new tags.
2026-01-08 10:29:41 +04:00
Steven Masley 04727c06e8 chore: add experiment toggle for terraform workspace caching (#20559)
Experiments passed to provisioners to determine behavior. This adds
`--experiments` flag to provisioner daemons. Prior to this, provisioners
had no method to turn on/off experiments.
2025-11-12 14:26:15 -06:00
Susana Ferreira 0ab345ca84 feat: add prebuild timing metrics to Prometheus (#19503)
## Description

This PR introduces one counter and two histograms related to workspace
creation and claiming. The goal is to provide clearer observability into
how workspaces are created (regular vs prebuild) and the time cost of
those operations.

### `coderd_workspace_creation_total`

* Metric type: Counter
* Name: `coderd_workspace_creation_total`
* Labels: `organization_name`, `template_name`, `preset_name`

This counter tracks whether a regular workspace (not created from a
prebuild pool) was created using a preset or not.
Currently, we already expose `coderd_prebuilt_workspaces_claimed_total`
for claimed prebuilt workspaces, but we lack a comparable metric for
regular workspace creations. This metric fills that gap, making it
possible to compare regular creations against claims.

Implementation notes:
* Exposed as a `coderd_` metric, consistent with other workspace-related
metrics (e.g. `coderd_api_workspace_latest_build`:
https://github.com/coder/coder/blob/main/coderd/prometheusmetrics/prometheusmetrics.go#L149).
* Every `defaultRefreshRate` (1 minute ), DB query
`GetRegularWorkspaceCreateMetrics` is executed to fetch all regular
workspaces (not created from a prebuild pool).
* The counter is updated with the total from all time (not just since
metric introduction). This differs from the histograms below, which only
accumulate from their introduction forward.

### `coderd_workspace_creation_duration_seconds` &
`coderd_prebuilt_workspace_claim_duration_seconds`

* Metric types: Histogram
* Names:
  * `coderd_workspace_creation_duration_seconds`
* Labels: `organization_name`, `template_name`, `preset_name`, `type`
(`regular`, `prebuild`)
  * `coderd_prebuilt_workspace_claim_duration_seconds`
    * Labels: `organization_name`, `template_name`, `preset_name`

We already have `coderd_provisionerd_workspace_build_timings_seconds`,
which tracks build run times for all workspace builds handled by the
provisioner daemon.
However, in the context of this issue, we are only interested in
creation and claim build times, not all transitions; additionally, this
metric does not include `preset_name`, and adding it there would
significantly increase cardinality. Therefore, separate more focused
metrics are introduced here:
* `coderd_workspace_creation_duration_seconds`: Build time to create a
workspace (either a regular workspace or the build into a prebuild pool,
for prebuild initial provisioning build).
* `coderd_prebuilt_workspace_claim_duration_seconds`: Time to claim a
prebuilt workspace from the pool.

The reason for two separate histograms is that:
* Creation (regular or prebuild): provisioning builds with similar time
magnitude, generally expected to take longer than a claim operation.
* Claim: expected to be a much faster provisioning build.

#### Native histogram usage

Provisioning times vary widely between projects. Using static buckets
risks unbalanced or poorly informative histograms.
To address this, these metrics use [Prometheus native
histograms](https://prometheus.io/docs/specs/native_histograms/):
* First introduced in Prometheus v2.40.0
* Recommended stable usage from v2.45+
* Requires Go client `prometheus/client_golang` v1.15.0+
* Experimental and must be explicitly enabled on the server
(`--enable-feature=native-histograms`)

For compatibility, we also retain a classic bucket definition (aligned
with the existing provisioner metric:
https://github.com/coder/coder/blob/main/provisionerd/provisionerd.go#L182-L189).
* If native histograms are enabled, Prometheus ingests the
high-resolution histogram.
* If not, it falls back to the predefined buckets.

Implementation notes:
* Unlike the counter, these histograms are updated in real-time at
workspace build job completion.
* They reflect data only from the point of introduction forward (no
historical backfill).

## Relates to 

Closes: https://github.com/coder/coder/issues/19528
Native histograms tested in observability stack:
https://github.com/coder/observability/pull/50
2025-08-28 15:00:26 +01:00
Dean Sheather 6eb02d1c2a chore: wire up usage tracking for managed agents (#19096)
Wires up the usage collector and publisher to coderd.

Relates to coder/internal#814
2025-08-20 23:38:09 +10:00
Steven Masley 1d1070d051 chore: ensure proper rbac permissions on 'Acquire' file in the cache (#18348)
The file cache was caching the `Unauthorized` errors if a user without
the right perms opened the file first. So all future opens would fail.

Now the cache always opens with a subject that can read files. And authz
is checked on the Acquire per user.
2025-06-16 13:40:45 +00:00
Mathias Fredriksson 70723d3b51 fix(coderd): fix panics by always checking for non-nil request logger (#18228) 2025-06-12 13:50:50 +03:00
Steven Masley 789c4beba7 chore: add dynamic parameter error if missing metadata from provisioner (#17809) 2025-05-14 12:21:36 -05:00
Danny Kopping 6e967780c9 feat: track resource replacements when claiming a prebuilt workspace (#17571)
Closes https://github.com/coder/internal/issues/369

We can't know whether a replacement (i.e. drift of terraform state
leading to a resource needing to be deleted/recreated) will take place
apriori; we can only detect it at `plan` time, because the provider
decides whether a resource must be replaced and it cannot be inferred
through static analysis of the template.

**This is likely to be the most common gotcha with using prebuilds,
since it requires a slight template modification to use prebuilds
effectively**, so let's head this off before it's an issue for
customers.

Drift details will now be logged in the workspace build logs:


![image](https://github.com/user-attachments/assets/da1988b6-2cbe-4a79-a3c5-ea29891f3d6f)

Plus a notification will be sent to template admins when this situation
arises:


![image](https://github.com/user-attachments/assets/39d555b1-a262-4a3e-b529-03b9f23bf66a)

A new metric - `coderd_prebuilt_workspaces_resource_replacements_total`
- will also increment each time a workspace encounters replacements.

We only track _that_ a resource replacement occurred, not how many. Just
one is enough to ruin a prebuild, but we can't know apriori which
replacement would cause this.
For example, say we have 2 replacements: a `docker_container` and a
`null_resource`; we don't know which one might
cause an issue (or indeed if either would), so we just track the
replacement.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
2025-05-14 14:52:22 +02:00
Steven Masley 64807e1d61 chore: apply the 4mb max limit on drpc protocol message size (#17771)
Respect the 4mb max limit on proto messages
2025-05-13 11:24:51 -05:00
Michael Suchacz 06d39151dc feat: extend request logs with auth & DB info (#17304)
Closes #16903
2025-04-15 13:27:23 +02:00
Michael Suchacz ce22de8d15 feat: log long-lived connections acceptance (#17219)
Closes #16904
2025-04-08 08:30:05 +00:00
Cian Johnston 95363c9041 fix(enterprise/coderd): remove useless provisioner daemon id from request (#16723)
`ServeProvisionerDaemonRequest` has had an ID field for quite a while
now.
This field is only used for telemetry purposes; the actual daemon ID is
created upon insertion in the database. There's no reason to set it, and
it's confusing to do so. Deprecating the field and removing references
to it.
2025-02-27 09:08:08 +00:00
Mathias Fredriksson 071bb26018 feat(coderd): add endpoint to list provisioner daemons (#16028)
Updates #15190
Updates #15084
Supersedes #15940
2025-01-14 16:40:26 +00:00
Spike Curtis 2c7f8ac65f chore: migrate to coder/websocket 1.8.12 (#15898)
Migrates us to `coder/websocket` v1.8.12 rather than `nhooyr/websocket` on an older version.

Works around https://github.com/coder/websocket/issues/504 by adding an explicit test for `xerrors.Is(err, io.EOF)` where we were previously getting `io.EOF` from the netConn.
2024-12-19 00:51:30 +04:00
Danielle Maywood 0896f339c4 refactor(coderd/provisionerdserver): use quartz.Clock instead of TimeNowFn (#15642)
Replace `TimeNowFn` in `provisionerdserver` with `quartz.Clock` as
well as pass `coderd`'s `Clock` to `provisionerdserver`.
2024-11-25 16:25:36 +00:00
Sas Swart 814dd6f854 feat(coderd): update API to allow filtering provisioner daemons by tags (#15448)
This PR provides new parameters to an endpoint that will be necessary
for #15048
2024-11-15 11:33:22 +02:00
Garrett Delfosse 50124fefdc feat: remove org flag requirement for provisioners (#14722) 2024-09-20 12:45:31 -04:00
Garrett Delfosse 335eb05223 feat: add keys to organization provision daemons (#14627) 2024-09-16 20:02:08 +00:00
Steven Masley 93eef7b542 chore: keep entitlements in the options only, simplify fields (#14434)
* chore: refactor entitlements to keep it in just the options

Duplicating the reference did not feel valuable, just confusing
2024-08-26 13:05:03 -05:00
Steven Masley af125c3795 chore: refactor entitlements to be a safe object to use (#14406)
* chore: refactor entitlements to be passable as an argument

Previously, all usage of entitlements requires mutex usage on the
api struct directly. This prevents passing the entitlements to
a sub package. It also creates the possibility for misuse.
2024-08-23 16:21:58 -05:00
Garrett Delfosse 2279441517 feat: add --key flag to provisionerd start (#14002) 2024-07-25 15:26:26 -04:00
Garrett Delfosse ca83017dc1 feat: accept provisioner keys for provisioner auth (#13972) 2024-07-25 10:22:55 -04:00
Garrett Delfosse 0a07c7e554 feat: get org scoped provisioners (#13953) 2024-07-23 14:56:46 +00:00
Danny Kopping 1691768fb9 chore: use store enqueuer with external provisioners (#13881) 2024-07-12 13:51:13 +02:00
Danny Kopping bdd2caf95d feat: implement thin vertical slice of system-generated notifications (#13537) 2024-07-08 15:38:50 +02:00
Steven Masley cb6b5e8fbd chore: push rbac actions to policy package (#13274)
Just moved `rbac.Action` -> `policy.Action`. This is for the stacked PR to not have circular dependencies when doing autogen. Without this, the autogen can produce broken golang code, which prevents the autogen from compiling.

So just avoiding circular dependencies. Doing this in it's own PR to reduce LoC diffs in the primary PR, since this has 0 functional changes.
2024-05-15 09:46:35 -05:00
Steven Masley eeb3d63be6 chore: merge authorization contexts (#12816)
* chore: merge authorization contexts

Instead of 2 auth contexts from apikey and dbauthz, merge them to
just use dbauthz. It is annoying to have two.

* fixup authorization reference
2024-03-29 10:14:27 -05:00
Steven Masley f0f9569d51 chore: enforce that provisioners can only acquire jobs in their own organization (#12600)
* chore: add org ID as optional param to AcquireJob
* chore: plumb through organization id to provisioner daemons
* add org id to provisioner domain key
* enforce org id argument
* dbgen provisioner jobs defaults to default org
2024-03-18 12:48:13 -05:00
Steven Masley b5f866c1cb chore: add organization_id column to provisioner daemons (#12356)
* chore: add organization_id column to provisioner daemons
* Update upsert to include organization id on set
2024-03-06 12:04:50 -06:00
Steven Masley 5c6974e55f feat: implement provisioner auth middleware and proper org params (#12330)
* feat: provisioner auth in mw to allow ExtractOrg

Step to enable org scoped provisioner daemons

* chore: handle default org handling for provisioner daemons
2024-03-04 15:15:41 -06:00
Kayla Washburn-Love 475c3650ca feat: add support for optional external auth providers (#12021) 2024-02-21 11:18:38 -07:00
Cian Johnston 643c3ee54b refactor(provisionerd): move provisionersdk.VersionCurrent -> provisionerdproto.VersionCurrent (#12225) 2024-02-20 12:44:19 +00:00
Cian Johnston a2cbb0f87f fix(enterprise/coderd): check provisionerd API version on connection (#12191) 2024-02-16 18:43:07 +00:00
Spike Curtis 1f5a6d59ba chore: consolidate websocketNetConn implementations (#12065)
Consolidates websocketNetConn from multiple packages in favor of a central one in codersdk
2024-02-09 11:39:08 +04:00
Cian Johnston 04fd96a014 feat(coderd): add provisioner_daemons to /debug/health endpoint (#11393)
Adds a healthcheck for provisioner daemons to /debug/health endpoint.
2024-01-08 09:29:04 +00:00
Cian Johnston 1ef96022b0 feat(coderd): add provisioner build version and api_version on serve (#11369)
* assert provisioner daemon version and api_version in unit tests
* add build info in HTTP header, extract codersdk.BuildVersionHeader
* add api_version to codersdk.ProvisionerDaemon
* testutil.MustString -> testutil.MustRandString
2024-01-03 09:01:57 +00:00
Cian Johnston 213b768785 feat(coderd): insert provisioner daemons (#11207)
* Adds UpdateProvisionerDaemonLastSeenAt
* Adds heartbeat to provisioner daemons
* Inserts provisioner daemons to database upon start
* Ensures TagOwner is an empty string and not nil
* Adds COALESCE() in idx_provisioner_daemons_name_owner_key
2023-12-18 16:44:52 +00:00
Cian Johnston b02796655e fix(coderd/database): remove column updated_at from provisioner_daemons table (#11108) 2023-12-12 11:19:28 +00:00
Cian Johnston 2b19a2369f chore(coderd): move provisionerd tags to provisionersdk (#11100) 2023-12-08 12:10:25 +00:00
Cian Johnston 1e349f0d50 feat(cli): allow specifying name of provisioner daemon (#11077)
- Adds a --name argument to provisionerd start
- Plumbs through name to integrated and external provisioners
- Defaults to hostname if not specified for external, hostname-N for integrated
- Adds cliutil.Hostname
2023-12-07 16:59:13 +00:00
Cian Johnston a235644046 fix(codersdk): make codersdk.ProvisionerDaemon.UpdatedAt a codersdk.NullTime (#11037) 2023-12-05 15:40:45 +00:00
Cian Johnston 5fad611020 feat(coderd): add last_seen_at and version to provisioner_daemons table (#11033)
Related to #10676

- Adds columns last_seen_at and version to provisioner_daemons table
- Adds the above to codersdk.ProvisionerDaemons struct
2023-12-05 13:54:38 +00:00
Colin Adler 504cedf15a feat: add telemetry for external provisioners (#10322) 2023-10-18 14:20:30 -05:00
Cian Johnston 6875faf238 fix(coderd/provisionerdserver): pass through api ctx to provisionerdserver (#10259)
Passes through coderd API ctx to provisionerd server so we can cancel workspace updates when API is shutting down.
2023-10-16 13:50:07 +01:00
Kyle Carberry 8abca9bea7 chore: rename git_auth to external_auth in our schema (#9935)
* chore: rename `git_auth` to `external_auth` in our schema

We're changing Git auth to be external auth. It will support
any OAuth2 or OIDC provider.

To split up the larger change I want to contribute the schema
changes first, and I'll add the feature itself in another PR.

* Fix names

* Fix outdated view

* Rename some additional places

* Fix sort order

* Fix template versions auth route

* Fix types

* Fix dbauthz
2023-09-29 19:13:20 +00:00
Spike Curtis 375c70d141 feat: integrate Acquirer for provisioner jobs (#9717)
* chore: add Acquirer to provisionerdserver pkg

Signed-off-by: Spike Curtis <spike@coder.com>

* code review improvements & fixes

Signed-off-by: Spike Curtis <spike@coder.com>

* feat: integrate Acquirer for provisioner jobs

Signed-off-by: Spike Curtis <spike@coder.com>

* Fix imports, whitespace

Signed-off-by: Spike Curtis <spike@coder.com>

* provisionerdserver always closes; remove poll interval from playwright

Signed-off-by: Spike Curtis <spike@coder.com>

* post jobs outside transactions

Signed-off-by: Spike Curtis <spike@coder.com>

* graceful shutdown in test

Signed-off-by: Spike Curtis <spike@coder.com>

* Mark AcquireJob deprecated

Signed-off-by: Spike Curtis <spike@coder.com>

* Graceful shutdown on all provisionerd tests

Signed-off-by: Spike Curtis <spike@coder.com>

* Deprecate, not remove CLI flags

Signed-off-by: Spike Curtis <spike@coder.com>

---------

Signed-off-by: Spike Curtis <spike@coder.com>
2023-09-19 10:25:57 +04:00
Spike Curtis 8d7eb1728c fix: stop inserting provisioner daemons into the database (#9108)
Signed-off-by: Spike Curtis <spike@coder.com>
2023-09-08 10:37:36 +00:00
Mathias Fredriksson 19d7da3d24 refactor(coderd/database): split Time and Now into dbtime package (#9482)
Ref: #9380
2023-09-01 16:50:12 +00:00