Commit Graph

50 Commits

Author SHA1 Message Date
Spike Curtis bddb808b25 chore: arrange imports in a standard way (#21452)
Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:

```
import (
	"context"
	"time"

	"github.com/prometheus/client_golang/prometheus"
	"golang.org/x/xerrors"
	"gopkg.in/natefinch/lumberjack.v2"

	"cdr.dev/slog/v3"
	"github.com/coder/coder/v2/codersdk/agentsdk"
	"github.com/coder/serpent"
)
```

3 groups: standard library, 3rd partly libs, Coder libs.

This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
2026-01-08 15:24:11 +04:00
Spike Curtis 49b34a716a fix: fix slog to always use array of Fields (#21426)
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).

It also updates dependencies that also use slog and were updated.

I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.

Other dependencies, I pushed new tags.
2026-01-08 10:29:41 +04:00
Steven Masley 3194bcfc9e chore: distinct operations for provisioner's 'parse', 'init', 'plan', 'apply', 'graph' (#21064)
Provisioner steps broken into smaller granular actions.
Changes:
- `ExtractArchive` moved to `init` request (was in `configure`)
- Writing `tfstate` moved to `plan` (was in `configure`)
- Moved most plan/apply outputs to `GraphComplete`
2025-12-15 11:26:41 -06:00
Callum Styan 5a18cf4c86 fix: remove unintentionally added print in test code (#20391)
accidentally added in https://github.com/coder/coder/pull/19786

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-10-20 18:51:15 -07:00
Callum Styan 141ef23c81 fix: introduce dedicated queries for workspaces and workspace agents metrics (#19786)
aid in differentiation between sources of calls to `GetWorkspaces` but introducing new queries for metrics specific use cases

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-10-17 13:40:10 -07:00
Spike Curtis 1354d84eb4 chore: refactor instance identity to be a SessionTokenProvider (#19566)
Refactors Agent instance identity to be a SessionTokenProvider.

Refactors the CLI to create Agent clients via a centralized function, rather than add-hoc via individual command handlers and their flags.

This allows commands besides `coder agent`, but which still use the agent identity, to support instance identity authentication.

Fixes #19111 by unifying all API requests to go thru the SessionTokenProvider for auth credentials.
2025-09-03 10:38:42 +04:00
Susana Ferreira 0ab345ca84 feat: add prebuild timing metrics to Prometheus (#19503)
## Description

This PR introduces one counter and two histograms related to workspace
creation and claiming. The goal is to provide clearer observability into
how workspaces are created (regular vs prebuild) and the time cost of
those operations.

### `coderd_workspace_creation_total`

* Metric type: Counter
* Name: `coderd_workspace_creation_total`
* Labels: `organization_name`, `template_name`, `preset_name`

This counter tracks whether a regular workspace (not created from a
prebuild pool) was created using a preset or not.
Currently, we already expose `coderd_prebuilt_workspaces_claimed_total`
for claimed prebuilt workspaces, but we lack a comparable metric for
regular workspace creations. This metric fills that gap, making it
possible to compare regular creations against claims.

Implementation notes:
* Exposed as a `coderd_` metric, consistent with other workspace-related
metrics (e.g. `coderd_api_workspace_latest_build`:
https://github.com/coder/coder/blob/main/coderd/prometheusmetrics/prometheusmetrics.go#L149).
* Every `defaultRefreshRate` (1 minute ), DB query
`GetRegularWorkspaceCreateMetrics` is executed to fetch all regular
workspaces (not created from a prebuild pool).
* The counter is updated with the total from all time (not just since
metric introduction). This differs from the histograms below, which only
accumulate from their introduction forward.

### `coderd_workspace_creation_duration_seconds` &
`coderd_prebuilt_workspace_claim_duration_seconds`

* Metric types: Histogram
* Names:
  * `coderd_workspace_creation_duration_seconds`
* Labels: `organization_name`, `template_name`, `preset_name`, `type`
(`regular`, `prebuild`)
  * `coderd_prebuilt_workspace_claim_duration_seconds`
    * Labels: `organization_name`, `template_name`, `preset_name`

We already have `coderd_provisionerd_workspace_build_timings_seconds`,
which tracks build run times for all workspace builds handled by the
provisioner daemon.
However, in the context of this issue, we are only interested in
creation and claim build times, not all transitions; additionally, this
metric does not include `preset_name`, and adding it there would
significantly increase cardinality. Therefore, separate more focused
metrics are introduced here:
* `coderd_workspace_creation_duration_seconds`: Build time to create a
workspace (either a regular workspace or the build into a prebuild pool,
for prebuild initial provisioning build).
* `coderd_prebuilt_workspace_claim_duration_seconds`: Time to claim a
prebuilt workspace from the pool.

The reason for two separate histograms is that:
* Creation (regular or prebuild): provisioning builds with similar time
magnitude, generally expected to take longer than a claim operation.
* Claim: expected to be a much faster provisioning build.

#### Native histogram usage

Provisioning times vary widely between projects. Using static buckets
risks unbalanced or poorly informative histograms.
To address this, these metrics use [Prometheus native
histograms](https://prometheus.io/docs/specs/native_histograms/):
* First introduced in Prometheus v2.40.0
* Recommended stable usage from v2.45+
* Requires Go client `prometheus/client_golang` v1.15.0+
* Experimental and must be explicitly enabled on the server
(`--enable-feature=native-histograms`)

For compatibility, we also retain a classic bucket definition (aligned
with the existing provisioner metric:
https://github.com/coder/coder/blob/main/provisionerd/provisionerd.go#L182-L189).
* If native histograms are enabled, Prometheus ingests the
high-resolution histogram.
* If not, it falls back to the predefined buckets.

Implementation notes:
* Unlike the counter, these histograms are updated in real-time at
workspace build job completion.
* They reflect data only from the point of introduction forward (no
historical backfill).

## Relates to 

Closes: https://github.com/coder/coder/issues/19528
Native histograms tested in observability stack:
https://github.com/coder/observability/pull/50
2025-08-28 15:00:26 +01:00
Mathias Fredriksson 1b66495b70 fix(coderd/prometheusmetrics)!: filter deleted wsbuilds to reduce db load (#19197)
This change removes the `GetLatestWorkspaceBuilds` query which includes
all workspaces for all time (including deleted). This allows us to also
stop using `GetProvisionerJobsByIDs` for said builds as the job status
is included in `GetWorkspaces` called separately.

**BREAKING CHANGE**: The `coderd_api_workspace_latest_build` Prometheus
metric no longer includes builds belonging to deleted workspaces, as
such, this metric will show fewer statuses.

Fixes coder/internal#717
2025-08-11 14:48:31 +03:00
Callum Styan ffbfaf2a6f feat: allow bypassing current CORS magic based on template config (#18706)
Solves https://github.com/coder/coder/issues/15096

This is a slight rework/refactor of the earlier PRs from @dannykopping
and @Emyrk:
- https://github.com/coder/coder/pull/15669
- https://github.com/coder/coder/pull/15684
- https://github.com/coder/coder/pull/17596

Rather than having a per-app CORS behaviour setting and additionally a
template level setting for ports, this PR adds a single template level
CORS behaviour setting that is then used by all apps/ports for
workspaces created from that template.

The main changes are in `proxy.go` and `request.go` to:
a) get the CORS behaviour setting from the template
b) have `HandleSubdomain` bypass the CORS middleware handler if the
selected behaviour is `passthru`
c) in `proxyWorkspaceApp`, do not modify the response if the selected
behaviour is `passthru`

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added support for configuring CORS behavior ("simple" or "passthru")
at the template level for all shared ports.
* Introduced a new "CORS Behavior" setting in the template creation and
settings forms.
* API endpoints and responses now include the optional `cors_behavior`
property for templates.
* Workspace apps and proxy now honor the specified CORS behavior,
enabling conditional CORS middleware application.
* Enhanced workspace app tests with comprehensive scenarios covering
CORS behaviors and authentication states.

* **Bug Fixes**
  * None.

* **Documentation**
* Updated API and admin documentation to describe the new
`cors_behavior` property and its usage.
* Added examples and schema references for CORS behavior in relevant API
docs.

* **Tests**
* Extended automated tests to cover different CORS behavior scenarios
for templates and workspace apps.

* **Chores**
* Updated audit logging to track changes to the `cors_behavior` field on
templates.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-07-30 13:42:39 -07:00
ケイラ fae30a00fd chore: remove unnecessary redeclarations in for loops (#18440) 2025-06-20 13:16:55 -06:00
Hugo Dutka 277c2c7ea7 chore(coderd/prometheusmetrics): remove dbmem from tests (#18238) 2025-06-05 09:30:27 +02:00
Steven Masley ca38729840 chore: revert dynamic params as a safe experiment (#17510) 2025-04-22 16:21:15 +00:00
Jon Ayers 17ddee05e5 chore: update golang to 1.24.1 (#17035)
- Update go.mod to use Go 1.24.1
- Update GitHub Actions setup-go action to use Go 1.24.1
- Fix linting issues with golangci-lint by:
  - Updating to golangci-lint v1.57.1 (more compatible with Go 1.24.1)

🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <claude@anthropic.com>
2025-03-26 01:56:39 -05:00
Spike Curtis 5861e516b9 chore: add standard test logger ignoring db canceled (#15556)
Refactors our use of `slogtest` to instantiate a "standard logger" across most of our tests.  This standard logger incorporates https://github.com/coder/slog/pull/217 to also ignore database query canceled errors by default, which are a source of low-severity flakes.

Any test that has set non-default `slogtest.Options` is left alone. In particular, `coderdtest` defaults to ignoring all errors. We might consider revisiting that decision now that we have better tools to target the really common flaky Error logs on shutdown.
2024-11-18 14:09:22 +04:00
Colin Adler 3de98c25db feat: add prometheus metric for tracking user statuses (#15281) 2024-10-30 18:41:16 +00:00
Garrett Delfosse 922f4c545f fix: handle new agent stat format correctly (#14576)
---------

Co-authored-by: Ethan Dickson <ethan@coder.com>
2024-09-20 01:52:14 +10:00
Kayla Washburn-Love bf4b7abf14 chore(coderd): allow creating workspaces without specifying an organization (#14048) 2024-07-30 10:44:02 -06:00
Ethan dd243686e4 chore!: remove deprecated agent v1 routes (#13486) 2024-06-11 12:22:59 +10:00
Garrett Delfosse 5b9a65e5c1 chore: move Batcher and Tracker to workspacestats (#13418) 2024-06-10 15:35:23 -04:00
Garrett Delfosse c550d0641d feat: move shared ports out of experiment (#13120) 2024-05-02 14:11:33 -04:00
Pavel Aseev 4682355eed chore: deprecate gauge metrics with _total suffix (#12744) (#12976)
* chore: deprecate gauge metrics with _total suffix (#12744)

Deprecated metrics:
- coderd_oauth2_external_requests_rate_limit_total
- coderd_api_workspace_latest_build_total

* Apply suggestions from code review

add link to follow-up issue

Co-authored-by: Cian Johnston <public@cianjohnston.ie>

---------

Co-authored-by: Cian Johnston <public@cianjohnston.ie>
2024-04-24 11:23:24 +03:00
Steven Masley 0a8c8ce5cc chore: remove InsertWorkspaceAgentStat query (#12869)
* chore: remove InsertWorkspaceAgentStat query

InsertWorkspaceAgentStats (batch) exists. We only used the singular in
a single unit test place. Removing the single for the batch, reducing
the interface size.
2024-04-09 12:35:27 -05:00
Danny Kopping 79fb8e43c5 feat: expose workspace statuses (with details) as a prometheus metric (#12762)
Implements #12462
2024-04-02 09:57:36 +02:00
Danny Kopping 9cfd5baa91 feat(coderd): export metric indicating each experiment's status (#12657) 2024-03-19 14:11:27 +02:00
Steven Masley f0f9569d51 chore: enforce that provisioners can only acquire jobs in their own organization (#12600)
* chore: add org ID as optional param to AcquireJob
* chore: plumb through organization id to provisioner daemons
* add org id to provisioner domain key
* enforce org id argument
* dbgen provisioner jobs defaults to default org
2024-03-18 12:48:13 -05:00
Danny Kopping 7a7105ad66 feat: make agent stats' cardinality configurable (#12535) 2024-03-13 12:03:36 +02:00
Cian Johnston 8f40ee3465 Revert "feat: make agent stats' cardinality configurable (#12468)" (#12533)
This reverts commit 21d1873d97.
2024-03-11 14:33:36 +00:00
Danny Kopping 21d1873d97 feat: make agent stats' cardinality configurable (#12468)
Closes #12221
2024-03-11 16:04:08 +02:00
Steven Masley fe867d02e0 fix: correct perms for forbidden error in TemplateScheduleStore.Load (#11286)
* chore: TemplateScheduleStore.Load() throwing forbidden error
* fix: workspace agent scope to include template
2023-12-20 11:38:49 -06:00
Kyle Carberry 5abfe5afd0 chore: rename dbfake to dbmem (#10432) 2023-10-30 17:42:20 +00:00
Kayla Washburn c194119689 chore: rename AwaitTemplateVersionJobCompleted and AwaitWorkspaceBuildJobCompleted (#10003) 2023-10-03 11:02:56 -06:00
Mathias Fredriksson 19d7da3d24 refactor(coderd/database): split Time and Now into dbtime package (#9482)
Ref: #9380
2023-09-01 16:50:12 +00:00
Cian Johnston 22f31e721c fix(coderd/prometheusmetrics): close batcher to force flush before asserting agent stats (#9465) 2023-08-31 11:40:57 +01:00
Spike Curtis 60d5002eb6 refactor: change template archive extraction to be on provisioner (#9264)
* refactor provisionersdk protocol

Signed-off-by: Spike Curtis <spike@coder.com>

* refactor provisioners to use new protocol

Signed-off-by: Spike Curtis <spike@coder.com>

* refactor provisionerd to use new protocol

Signed-off-by: Spike Curtis <spike@coder.com>

* refactor tests & proto renames

* Fixes from self-review

Signed-off-by: Spike Curtis <spike@coder.com>

* appease fmt & link

Signed-off-by: Spike Curtis <spike@coder.com>

* code review fixes & e2e fixes

Signed-off-by: Spike Curtis <spike@coder.com>

* More fmt

Signed-off-by: Spike Curtis <spike@coder.com>

* Code review fixes

Signed-off-by: Spike Curtis <spike@coder.com>

* new gen; use uuid for session workdir

Signed-off-by: Spike Curtis <spike@coder.com>

* Revert nix-based gen CI task until dogfood is on nix

Signed-off-by: Spike Curtis <spike@coder.com>

* revert deleting dogfood Docker stuff

Signed-off-by: Spike Curtis <spike@coder.com>

* Revert "revert deleting dogfood Docker stuff"

This reverts commit 9762158167.

---------

Signed-off-by: Spike Curtis <spike@coder.com>
2023-08-25 06:10:15 +00:00
Kyle Carberry 22e781eced chore: add /v2 to import module path (#9072)
* chore: add /v2 to import module path

go mod requires semantic versioning with versions greater than 1.x

This was a mechanical update by running:
```
go install github.com/marwan-at-work/mod/cmd/mod@latest
mod upgrade
```

Migrate generated files to import /v2

* Fix gen
2023-08-18 18:55:43 +00:00
Cian Johnston 9fb18f3ae5 feat(coderd): batch agent stats inserts (#8875)
This PR adds support for batching inserts to the workspace_agents_stats table.
Up to 1024 stats are batched, and flushed every second in a batch.
2023-08-04 17:00:42 +01:00
Dean Sheather 2f0a9996e7 chore: add derpserver to wsproxy, add proxies to derpmap (#7311) 2023-07-27 02:21:04 +10:00
Steven Masley de1a7a9210 chore: join user information to workspace_build and template_version (#8625)
* include minimial user on template version and build
* Add unit test to ensure join is superset
2023-07-25 09:14:38 -04:00
Colin Adler c47b78c44b chore: replace wsconncache with a single tailnet (#8176) 2023-07-12 17:37:31 -05:00
goodspark dd4aafb350 feat: add template info tags to coderd_agents_up metric (#7942)
Co-authored-by: Colin Adler <colin1adler@gmail.com>
2023-07-11 12:39:14 -05:00
Spike Curtis b6666cf1cf chore: tailnet debug logging (#7260)
* Enable discovery (disco) debug

Signed-off-by: Spike Curtis <spike@coder.com>

* Better debug on reconnectingPTY

Signed-off-by: Spike Curtis <spike@coder.com>

* Agent logging in appstest

Signed-off-by: Spike Curtis <spike@coder.com>

* More reconnectingPTY logging

Signed-off-by: Spike Curtis <spike@coder.com>

* Add logging to coordinator

Signed-off-by: Spike Curtis <spike@coder.com>

* Update agent/agent.go

Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>

* Update agent/agent.go

Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>

* Update agent/agent.go

Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>

* Update agent/agent.go

Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>

* Clarify logs; remove unrelated changes

Signed-off-by: Spike Curtis <spike@coder.com>

---------

Signed-off-by: Spike Curtis <spike@coder.com>
Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
2023-04-27 13:59:01 +04:00
Marcin Tojek 942aba3a66 feat: expose agent stats via Prometheus endpoint (#7115)
* WIP

* WIP

* WIP

* Agents

* fix

* 1min

* fix

* WIP

* Test

* docs

* fmt

* Add timer to measure the metrics collection

* Use CachedGaugeVec

* Unit tests

* WIP

* WIP

* db: GetWorkspaceAgentStatsAndLabels

* fmt

* WIP

* gauges

* feat: collect

* fix

* fmt

* minor fixes

* Prometheus flag

* fix

* WIP

* fix tests

* WIP

* fix json

* Rx Tx bytes

* CloseFunc

* fix

* fix

* Fixes

* fix

* fix: IgnoreErrors

* Fix: Windows

* fix

* reflect.DeepEquals
2023-04-14 16:14:52 +02:00
Marcin Tojek 0347231bb8 feat: expose agent metrics via Prometheus endpoint (#7011)
* WIP

* WIP

* WIP

* Agents

* fix

* 1min

* fix

* WIP

* Test

* docs

* fmt

* Add timer to measure the metrics collection

* Use CachedGaugeVec

* Unit tests

* Address PR comments
2023-04-07 17:48:52 +02:00
Steven Masley 8b424f03c2 chore: Rename databasefake --> dbfake (#6011) 2023-02-02 19:28:55 -06:00
Steven Masley 4a6fc40949 feat: Add database data generator to make fakedbs easier to populate (#5922)
* feat: Add database data generator to make fakedbs easier to populate
2023-01-31 15:10:03 -06:00
Mathias Fredriksson 8afdf24d10 chore: Update sqlc to v1.16.0 (#5788)
* chore: Update sqlc to v1.16.0

* chore: Fix cases where types became Null-types

* chore: Set parameter_schemas default_destination_scheme and default_source_scheme to NOT NULL

* chore: Add enum validation to database fake

* chore: Fix all tests that skipping enum values

* fix: Use correct err in providionerdserver audit log failure log
2023-01-23 13:14:47 +02:00
Marcin Tojek 534bff2ff5 fix: coderd/prometheusmetrics wait for all metrics in require.Eventually (#5338) 2022-12-07 17:50:17 +01:00
Dean Sheather 29d804e692 feat: add API key scopes and application_connect scope (#4067) 2022-09-19 17:39:02 +00:00
Kyle Carberry 7bdb8ff9cf feat: Add workspace metrics export to Prometheus (#3421)
This adds workspace totals indexed by status. It could be any
codersdk.ProvisionerJobStatus.
2022-08-09 01:08:42 +00:00
Kyle Carberry 3279504cbe feat: Add active users prometheus metric (#3406)
This  allows deployments using our Prometheus export t determine
the number of active users in the past hour.

The interval is an hour to align with API key last used refresh times.

SSH connections poll to check shutdown time, so this will be accurate
even on long-running connections without dashboard requests.
2022-08-08 10:09:46 -05:00