Commit Graph

33 Commits

Author SHA1 Message Date
Marcin Tojek 036ed5672f fix!: remove deprecated prometheus metrics (#21788)
## Description

Removes the following deprecated Prometheus metrics:

- `coderd_api_workspace_latest_build_total` → use
`coderd_api_workspace_latest_build` instead
- `coderd_oauth2_external_requests_rate_limit_total` → use
`coderd_oauth2_external_requests_rate_limit` instead

These metrics were deprecated in #12976 because gauge metrics should
avoid the `_total` suffix per [Prometheus naming
conventions](https://prometheus.io/docs/practices/naming/).

## Changes

- Removed deprecated metric `coderd_api_workspace_latest_build_total`
from `coderd/prometheusmetrics/prometheusmetrics.go`
- Removed deprecated metric
`coderd_oauth2_external_requests_rate_limit_total` from
`coderd/promoauth/oauth2.go`
- Updated tests to use the non-deprecated metric name

Fixes #12999
2026-01-30 13:30:06 +01:00
Spike Curtis bddb808b25 chore: arrange imports in a standard way (#21452)
Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:

```
import (
	"context"
	"time"

	"github.com/prometheus/client_golang/prometheus"
	"golang.org/x/xerrors"
	"gopkg.in/natefinch/lumberjack.v2"

	"cdr.dev/slog/v3"
	"github.com/coder/coder/v2/codersdk/agentsdk"
	"github.com/coder/serpent"
)
```

3 groups: standard library, 3rd partly libs, Coder libs.

This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
2026-01-08 15:24:11 +04:00
Spike Curtis 49b34a716a fix: fix slog to always use array of Fields (#21426)
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).

It also updates dependencies that also use slog and were updated.

I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.

Other dependencies, I pushed new tags.
2026-01-08 10:29:41 +04:00
Callum Styan 141ef23c81 fix: introduce dedicated queries for workspaces and workspace agents metrics (#19786)
aid in differentiation between sources of calls to `GetWorkspaces` but introducing new queries for metrics specific use cases

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-10-17 13:40:10 -07:00
Susana Ferreira 0ab345ca84 feat: add prebuild timing metrics to Prometheus (#19503)
## Description

This PR introduces one counter and two histograms related to workspace
creation and claiming. The goal is to provide clearer observability into
how workspaces are created (regular vs prebuild) and the time cost of
those operations.

### `coderd_workspace_creation_total`

* Metric type: Counter
* Name: `coderd_workspace_creation_total`
* Labels: `organization_name`, `template_name`, `preset_name`

This counter tracks whether a regular workspace (not created from a
prebuild pool) was created using a preset or not.
Currently, we already expose `coderd_prebuilt_workspaces_claimed_total`
for claimed prebuilt workspaces, but we lack a comparable metric for
regular workspace creations. This metric fills that gap, making it
possible to compare regular creations against claims.

Implementation notes:
* Exposed as a `coderd_` metric, consistent with other workspace-related
metrics (e.g. `coderd_api_workspace_latest_build`:
https://github.com/coder/coder/blob/main/coderd/prometheusmetrics/prometheusmetrics.go#L149).
* Every `defaultRefreshRate` (1 minute ), DB query
`GetRegularWorkspaceCreateMetrics` is executed to fetch all regular
workspaces (not created from a prebuild pool).
* The counter is updated with the total from all time (not just since
metric introduction). This differs from the histograms below, which only
accumulate from their introduction forward.

### `coderd_workspace_creation_duration_seconds` &
`coderd_prebuilt_workspace_claim_duration_seconds`

* Metric types: Histogram
* Names:
  * `coderd_workspace_creation_duration_seconds`
* Labels: `organization_name`, `template_name`, `preset_name`, `type`
(`regular`, `prebuild`)
  * `coderd_prebuilt_workspace_claim_duration_seconds`
    * Labels: `organization_name`, `template_name`, `preset_name`

We already have `coderd_provisionerd_workspace_build_timings_seconds`,
which tracks build run times for all workspace builds handled by the
provisioner daemon.
However, in the context of this issue, we are only interested in
creation and claim build times, not all transitions; additionally, this
metric does not include `preset_name`, and adding it there would
significantly increase cardinality. Therefore, separate more focused
metrics are introduced here:
* `coderd_workspace_creation_duration_seconds`: Build time to create a
workspace (either a regular workspace or the build into a prebuild pool,
for prebuild initial provisioning build).
* `coderd_prebuilt_workspace_claim_duration_seconds`: Time to claim a
prebuilt workspace from the pool.

The reason for two separate histograms is that:
* Creation (regular or prebuild): provisioning builds with similar time
magnitude, generally expected to take longer than a claim operation.
* Claim: expected to be a much faster provisioning build.

#### Native histogram usage

Provisioning times vary widely between projects. Using static buckets
risks unbalanced or poorly informative histograms.
To address this, these metrics use [Prometheus native
histograms](https://prometheus.io/docs/specs/native_histograms/):
* First introduced in Prometheus v2.40.0
* Recommended stable usage from v2.45+
* Requires Go client `prometheus/client_golang` v1.15.0+
* Experimental and must be explicitly enabled on the server
(`--enable-feature=native-histograms`)

For compatibility, we also retain a classic bucket definition (aligned
with the existing provisioner metric:
https://github.com/coder/coder/blob/main/provisionerd/provisionerd.go#L182-L189).
* If native histograms are enabled, Prometheus ingests the
high-resolution histogram.
* If not, it falls back to the predefined buckets.

Implementation notes:
* Unlike the counter, these histograms are updated in real-time at
workspace build job completion.
* They reflect data only from the point of introduction forward (no
historical backfill).

## Relates to 

Closes: https://github.com/coder/coder/issues/19528
Native histograms tested in observability stack:
https://github.com/coder/observability/pull/50
2025-08-28 15:00:26 +01:00
Callum Styan 014a2d5b0f perf: don't call GetUserByID unnecessarily for Agents metrics loops (#19395)
At the moment, the loop which retrieves and updates the values of the
agents metrics excessively calls `GetUserByID` (a DB query). First it
retrieves a list of all workspaces, filtering out inactive agents (not
entirely clear to me whether this is non-running workspaces, or just
dead agents), and then iterates over those workspaces to get the rest of
the relevant data for the metrics. The next call is `GetUserByID` for
`workspace.OwnerID`. This is unnecessary because the `workspaces_visible` view we pull workspaces from has already been joined with the users table to get the username/name/etc.

This should at least partially resolve
https://github.com/coder/internal/issues/726 
---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-08-21 11:01:32 -07:00
Mathias Fredriksson 1b66495b70 fix(coderd/prometheusmetrics)!: filter deleted wsbuilds to reduce db load (#19197)
This change removes the `GetLatestWorkspaceBuilds` query which includes
all workspaces for all time (including deleted). This allows us to also
stop using `GetProvisionerJobsByIDs` for said builds as the job status
is included in `GetWorkspaces` called separately.

**BREAKING CHANGE**: The `coderd_api_workspace_latest_build` Prometheus
metric no longer includes builds belonging to deleted workspaces, as
such, this metric will show fewer statuses.

Fixes coder/internal#717
2025-08-11 14:48:31 +03:00
Steven Masley ca38729840 chore: revert dynamic params as a safe experiment (#17510) 2025-04-22 16:21:15 +00:00
Colin Adler 3de98c25db feat: add prometheus metric for tracking user statuses (#15281) 2024-10-30 18:41:16 +00:00
Steven Masley 343f8ec9ab chore: join owner, template, and org in new workspace view (#15116)
Joins in fields like `username`, `avatar_url`, `organization_name`,
`template_name` to `workspaces` via a **view**. 
The view must be maintained moving forward, but this prevents needing to
add RBAC permissions to fetch related workspace fields.
2024-10-22 09:20:54 -05:00
Garrett Delfosse 922f4c545f fix: handle new agent stat format correctly (#14576)
---------

Co-authored-by: Ethan Dickson <ethan@coder.com>
2024-09-20 01:52:14 +10:00
Colin Adler 40390ecc30 chore: fix TestServer/Prometheus/DBMetricsDisabled test flake (#13453)
See: https://github.com/coder/coder/actions/runs/9352137263/job/25739550487#step:5:368
2024-06-03 15:38:59 -05:00
Pavel Aseev 4682355eed chore: deprecate gauge metrics with _total suffix (#12744) (#12976)
* chore: deprecate gauge metrics with _total suffix (#12744)

Deprecated metrics:
- coderd_oauth2_external_requests_rate_limit_total
- coderd_api_workspace_latest_build_total

* Apply suggestions from code review

add link to follow-up issue

Co-authored-by: Cian Johnston <public@cianjohnston.ie>

---------

Co-authored-by: Cian Johnston <public@cianjohnston.ie>
2024-04-24 11:23:24 +03:00
Danny Kopping 79fb8e43c5 feat: expose workspace statuses (with details) as a prometheus metric (#12762)
Implements #12462
2024-04-02 09:57:36 +02:00
Danny Kopping 9cfd5baa91 feat(coderd): export metric indicating each experiment's status (#12657) 2024-03-19 14:11:27 +02:00
Danny Kopping 7a7105ad66 feat: make agent stats' cardinality configurable (#12535) 2024-03-13 12:03:36 +02:00
Cian Johnston 8f40ee3465 Revert "feat: make agent stats' cardinality configurable (#12468)" (#12533)
This reverts commit 21d1873d97.
2024-03-11 14:33:36 +00:00
Danny Kopping 21d1873d97 feat: make agent stats' cardinality configurable (#12468)
Closes #12221
2024-03-11 16:04:08 +02:00
Steven Masley b7bdb17460 feat: add metrics to workspace agent scripts (#11132)
* push startup script metrics to agent
2023-12-13 11:45:43 -06:00
Steven Masley 5021e23105 chore: compute job status as column (#10024)
* chore: provisioner job status as column
* use provisioner job status for workspace searching
2023-10-04 20:57:46 -05:00
Mathias Fredriksson 19d7da3d24 refactor(coderd/database): split Time and Now into dbtime package (#9482)
Ref: #9380
2023-09-01 16:50:12 +00:00
Kyle Carberry 22e781eced chore: add /v2 to import module path (#9072)
* chore: add /v2 to import module path

go mod requires semantic versioning with versions greater than 1.x

This was a mechanical update by running:
```
go install github.com/marwan-at-work/mod/cmd/mod@latest
mod upgrade
```

Migrate generated files to import /v2

* Fix gen
2023-08-18 18:55:43 +00:00
Dean Sheather 2f0a9996e7 chore: add derpserver to wsproxy, add proxies to derpmap (#7311) 2023-07-27 02:21:04 +10:00
goodspark dd4aafb350 feat: add template info tags to coderd_agents_up metric (#7942)
Co-authored-by: Colin Adler <colin1adler@gmail.com>
2023-07-11 12:39:14 -05:00
Marcin Tojek b1d1b63113 chore: ensure logs consistency across Coder (#8083) 2023-06-20 12:30:45 +02:00
Colin Adler 8f736fe5f5 fix(prometheusmetrics): ensure periodic metrics tick on startup (#7585) 2023-06-02 11:56:37 -05:00
Spike Curtis cd416c86dd refactor: workspace builds (#7541)
* refactor workspace builds

Signed-off-by: Spike Curtis <spike@coder.com>

* make gen

Signed-off-by: Spike Curtis <spike@coder.com>

* Remove ParameterResolver from typescript

Signed-off-by: Spike Curtis <spike@coder.com>

* rename conversion -> database/db2sdk

Signed-off-by: Spike Curtis <spike@coder.com>

* tests for db2sdk

Signed-off-by: Spike Curtis <spike@coder.com>

* Tests for ParameterResolver

Signed-off-by: Spike Curtis <spike@coder.com>

* wsbuilder tests

Signed-off-by: Spike Curtis <spike@coder.com>

* Move parameter validation tests to richparameters_test.go

Signed-off-by: Spike Curtis <spike@coder.com>

* Fix CI generation; rename mock->dbmock

Signed-off-by: Spike Curtis <spike@coder.com>

* Fix test imports

Signed-off-by: Spike Curtis <spike@coder.com>

---------

Signed-off-by: Spike Curtis <spike@coder.com>
2023-05-23 12:06:33 +04:00
Marcin Tojek bb0a38b161 feat: Implement aggregator for agent metrics (#7259) 2023-04-27 12:34:00 +02:00
Marcin Tojek 942aba3a66 feat: expose agent stats via Prometheus endpoint (#7115)
* WIP

* WIP

* WIP

* Agents

* fix

* 1min

* fix

* WIP

* Test

* docs

* fmt

* Add timer to measure the metrics collection

* Use CachedGaugeVec

* Unit tests

* WIP

* WIP

* db: GetWorkspaceAgentStatsAndLabels

* fmt

* WIP

* gauges

* feat: collect

* fix

* fmt

* minor fixes

* Prometheus flag

* fix

* WIP

* fix tests

* WIP

* fix json

* Rx Tx bytes

* CloseFunc

* fix

* fix

* Fixes

* fix

* fix: IgnoreErrors

* Fix: Windows

* fix

* reflect.DeepEquals
2023-04-14 16:14:52 +02:00
Marcin Tojek 0347231bb8 feat: expose agent metrics via Prometheus endpoint (#7011)
* WIP

* WIP

* WIP

* Agents

* fix

* 1min

* fix

* WIP

* Test

* docs

* fmt

* Add timer to measure the metrics collection

* Use CachedGaugeVec

* Unit tests

* Address PR comments
2023-04-07 17:48:52 +02:00
Colin Adler e740aebf26 feat: add provisionerd prometheus metrics (#4909) 2022-11-04 19:03:01 -05:00
Kyle Carberry 7bdb8ff9cf feat: Add workspace metrics export to Prometheus (#3421)
This adds workspace totals indexed by status. It could be any
codersdk.ProvisionerJobStatus.
2022-08-09 01:08:42 +00:00
Kyle Carberry 3279504cbe feat: Add active users prometheus metric (#3406)
This  allows deployments using our Prometheus export t determine
the number of active users in the past hour.

The interval is an hour to align with API key last used refresh times.

SSH connections poll to check shutdown time, so this will be accurate
even on long-running connections without dashboard requests.
2022-08-08 10:09:46 -05:00