coder

mirror of https://github.com/coder/coder.git synced 2026-06-04 05:28:20 +00:00

Author	SHA1	Message	Date
Susana Ferreira	df84cea924	feat(scripts/metricsdocgen): support merging static and generated metrics files (#21464 ) ## Description This PR refactors `scripts/metricsdocgen/main.go` to support merging static and generated metrics files for documentation generation. The static `metrics` file remains necessary for metrics not defined in the coder codebase (`go_`, `process_`, `promhttp_`, `coder_aibridged_`), as well as edge cases the scanner cannot handle (e.g., such as metrics with runtime-determined labels or function-local variable references for fields, ...). Handling these edge cases in the scanner would make it significantly more complex, so we keep this hybrid approach to accommodate them. This means that in such cases, developers need to update the `metrics` file directly, meaning there is still a risk of out-of-date information in the documentation. However, this solution should already encompass most cases. Static metrics take priority over generated metrics when both files contain the same metric name, allowing manual overrides without modifying the scanner. Some of these edge cases could be easily fixed by updating the codebase to use one of the supported patterns. ## Changes * Update `scripts/metricsdocgen/main.go` to read from two separate metrics files: * `metrics`: static, manually maintained metrics (e.g., `go_`, `process_`, `promhttp_`, `coder_aibridged_`) * `generated_metrics`: auto-generated by the AST scanner * Update `metrics` file to contain only static and edge-case metrics * Skip metrics with empty HELP descriptions in the scanner * Update `generated_metrics` to reflect skipped metrics * Update `docs/admin/integrations/prometheus.md` with merged metrics Related to: https://github.com/coder/coder/issues/13223 Disclosure: This PR was mainly developed with Claude Sonnet 4, with iterative review and refinement by @ssncferreira	2026-02-13 12:19:33 +00:00
Callum Styan	5f3be6b288	feat: add provisioner job queue wait time histogram and jobs enqueued counter (#21869 ) This PR adds some metrics to help identify job enqueue rates and latencies. This work was initiated as a way to help reduce the cost of the observation/measurement itself for autostart scaletests, which impacts our ability to identify/reason about the load caused by autostart. See: https://github.com/coder/internal/issues/1209 I've extended the metrics here to account for regular user initiated builds, prebuilds, autostarts, etc. IMO there is still the question here of whether we want to include or need the `transition` label, which is only present on workspace builds. Including it does lead to an increase in cardinality, and in the case of the histogram (when not using native histograms) that's at least a few extra series for every bucket. We could remove the transition label there but keep it on the counter. Additionally, the histogram is currently observing latencies for other jobs, such as template builds/version imports, those do not have a transition type associated with them. Tested briefly in a workspace, can see metric values like the following: - `coderd_workspace_builds_enqueued_total{build_reason="autostart",provisioner_type="terraform",status="success",transition="start"} 1` - `coderd_provisioner_job_queue_wait_seconds_bucket{build_reason="autostart",job_type="workspace_build",provisioner_type="terraform",transition="start",le="0.025"} 1` --------- Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-12 13:40:47 -08:00
Jon Ayers	6035e45cb8	feat: add e2e workspace build duration metric (#21739 ) Adds coderd_template_workspace_build_duration_seconds histogram that tracks the full duration from workspace build creation to agent ready. This captures the complete user-perceived build time including provisioning and agent startup. The metric is emitted when the agent reports ready/error/timeout via the lifecycle API, ensuring each build is counted exactly once per replica.	2026-02-06 16:26:02 -06:00
Marcin Tojek	036ed5672f	fix!: remove deprecated prometheus metrics (#21788 ) ## Description Removes the following deprecated Prometheus metrics: - `coderd_api_workspace_latest_build_total` → use `coderd_api_workspace_latest_build` instead - `coderd_oauth2_external_requests_rate_limit_total` → use `coderd_oauth2_external_requests_rate_limit` instead These metrics were deprecated in #12976 because gauge metrics should avoid the `_total` suffix per [Prometheus naming conventions](https://prometheus.io/docs/practices/naming/). ## Changes - Removed deprecated metric `coderd_api_workspace_latest_build_total` from `coderd/prometheusmetrics/prometheusmetrics.go` - Removed deprecated metric `coderd_oauth2_external_requests_rate_limit_total` from `coderd/promoauth/oauth2.go` - Updated tests to use the non-deprecated metric name Fixes #12999	2026-01-30 13:30:06 +01:00
Marcin Tojek	04b0253e8a	feat: add Prometheus metrics for license warnings and errors (#21749 ) Fixes: coder/internal#767 Adds two new Prometheus metrics for license health monitoring: - `coderd_license_warnings` - count of active license warnings - `coderd_license_errors` - count of active license errors Metrics endpoint after startup of a deployment with license enabled: ``` ... # HELP coderd_license_errors The number of active license errors. # TYPE coderd_license_errors gauge coderd_license_errors 0 ... # HELP coderd_license_warnings The number of active license warnings. # TYPE coderd_license_warnings gauge coderd_license_warnings 0 ... ```	2026-01-29 13:50:15 +01:00
Callum Styan	806d7e4c11	docs: update metrics docs to include metadata batcher metrics (#21665 ) This updates the metrics docs to include metrics added in https://github.com/coder/coder/pull/21330 Signed-off-by: Callum Styan <callumstyan@gmail.com>	2026-01-26 09:22:14 -08:00
Danny Kopping	c6631e1e50	feat: expose `aibridged` metrics (#20865 ) Upgrades `coder/aibridge` to v0.2.0 which includes https://github.com/coder/aibridge/pull/62. Creates a `prometheus.Registerer` with a prefix `coder_aibridged_` and passes that along to coder/aibridge which actually exposes the metrics. Also includes a side-effect of a change described in https://github.com/coder/aibridge/pull/62#discussion_r2550017470. --------- Signed-off-by: Danny Kopping <danny@coder.com>	2025-11-24 18:16:06 +02:00
Susana Ferreira	c1f8465de6	fix: add missing provisionerd metrics to docs (#20358 ) ## Description Add missing provisionerd metrics to Prometheus documentation: * `coderd_provisionerd_num_daemons`: The number of provisioner daemons. * `coderd_provisionerd_workspace_build_timings_seconds`: The time taken for a workspace to build. Related to internal thread: https://codercom.slack.com/archives/C07GRNNRW03/p1760642020583019	2025-10-20 11:33:45 +01:00
Susana Ferreira	0ab345ca84	feat: add prebuild timing metrics to Prometheus (#19503 ) ## Description This PR introduces one counter and two histograms related to workspace creation and claiming. The goal is to provide clearer observability into how workspaces are created (regular vs prebuild) and the time cost of those operations. ### `coderd_workspace_creation_total` * Metric type: Counter * Name: `coderd_workspace_creation_total` * Labels: `organization_name`, `template_name`, `preset_name` This counter tracks whether a regular workspace (not created from a prebuild pool) was created using a preset or not. Currently, we already expose `coderd_prebuilt_workspaces_claimed_total` for claimed prebuilt workspaces, but we lack a comparable metric for regular workspace creations. This metric fills that gap, making it possible to compare regular creations against claims. Implementation notes: * Exposed as a `coderd_` metric, consistent with other workspace-related metrics (e.g. `coderd_api_workspace_latest_build`: https://github.com/coder/coder/blob/main/coderd/prometheusmetrics/prometheusmetrics.go#L149). * Every `defaultRefreshRate` (1 minute ), DB query `GetRegularWorkspaceCreateMetrics` is executed to fetch all regular workspaces (not created from a prebuild pool). * The counter is updated with the total from all time (not just since metric introduction). This differs from the histograms below, which only accumulate from their introduction forward. ### `coderd_workspace_creation_duration_seconds` & `coderd_prebuilt_workspace_claim_duration_seconds` * Metric types: Histogram * Names: * `coderd_workspace_creation_duration_seconds` * Labels: `organization_name`, `template_name`, `preset_name`, `type` (`regular`, `prebuild`) * `coderd_prebuilt_workspace_claim_duration_seconds` * Labels: `organization_name`, `template_name`, `preset_name` We already have `coderd_provisionerd_workspace_build_timings_seconds`, which tracks build run times for all workspace builds handled by the provisioner daemon. However, in the context of this issue, we are only interested in creation and claim build times, not all transitions; additionally, this metric does not include `preset_name`, and adding it there would significantly increase cardinality. Therefore, separate more focused metrics are introduced here: * `coderd_workspace_creation_duration_seconds`: Build time to create a workspace (either a regular workspace or the build into a prebuild pool, for prebuild initial provisioning build). * `coderd_prebuilt_workspace_claim_duration_seconds`: Time to claim a prebuilt workspace from the pool. The reason for two separate histograms is that: * Creation (regular or prebuild): provisioning builds with similar time magnitude, generally expected to take longer than a claim operation. * Claim: expected to be a much faster provisioning build. #### Native histogram usage Provisioning times vary widely between projects. Using static buckets risks unbalanced or poorly informative histograms. To address this, these metrics use [Prometheus native histograms](https://prometheus.io/docs/specs/native_histograms/): * First introduced in Prometheus v2.40.0 * Recommended stable usage from v2.45+ * Requires Go client `prometheus/client_golang` v1.15.0+ * Experimental and must be explicitly enabled on the server (`--enable-feature=native-histograms`) For compatibility, we also retain a classic bucket definition (aligned with the existing provisioner metric: https://github.com/coder/coder/blob/main/provisionerd/provisionerd.go#L182-L189). * If native histograms are enabled, Prometheus ingests the high-resolution histogram. * If not, it falls back to the predefined buckets. Implementation notes: * Unlike the counter, these histograms are updated in real-time at workspace build job completion. * They reflect data only from the point of introduction forward (no historical backfill). ## Relates to Closes: https://github.com/coder/coder/issues/19528 Native histograms tested in observability stack: https://github.com/coder/observability/pull/50	2025-08-28 15:00:26 +01:00
Ethan	fb28979537	fix(docs): add `coderd_workspace_latest_build_status` prometheus metric (#14828 )	2024-09-27 02:55:24 +10:00
Ethan	c8580a415a	feat: expose current agent connections by type via prometheus (#14612 )	2024-09-11 14:13:30 +10:00
Pavel Aseev	4682355eed	chore: deprecate gauge metrics with _total suffix (#12744 ) (#12976 ) * chore: deprecate gauge metrics with _total suffix (#12744) Deprecated metrics: - coderd_oauth2_external_requests_rate_limit_total - coderd_api_workspace_latest_build_total * Apply suggestions from code review add link to follow-up issue Co-authored-by: Cian Johnston <public@cianjohnston.ie> --------- Co-authored-by: Cian Johnston <public@cianjohnston.ie>	2024-04-24 11:23:24 +03:00
Steven Masley	13359aa16f	chore: drop github per user rate limit tracking (#12286 ) * chore: drop github per user rate limit tracking Rate limits for authenticated requests are per user. This would be an excessive number of prometheus labels, so we only track the unauthorized limit.	2024-02-23 11:17:52 -06:00
Steven Masley	89ab659114	chore: add oauth2 prometheus metrics for to documentation (#11534 )	2024-01-10 15:46:37 +00:00
Steven Masley	b7bdb17460	feat: add metrics to workspace agent scripts (#11132 ) * push startup script metrics to agent	2023-12-13 11:45:43 -06:00
Eric Paulsen	167c759149	docs: add license and template insights prom metrics (#11109 ) * docs: add license and template insights prom metrics * add: coderd_insights_applications_usage_seconds	2023-12-08 14:17:14 -05:00
Marcin Tojek	942aba3a66	feat: expose agent stats via Prometheus endpoint (#7115 ) * WIP * WIP * WIP * Agents * fix * 1min * fix * WIP * Test * docs * fmt * Add timer to measure the metrics collection * Use CachedGaugeVec * Unit tests * WIP * WIP * db: GetWorkspaceAgentStatsAndLabels * fmt * WIP * gauges * feat: collect * fix * fmt * minor fixes * Prometheus flag * fix * WIP * fix tests * WIP * fix json * Rx Tx bytes * CloseFunc * fix * fix * Fixes * fix * fix: IgnoreErrors * Fix: Windows * fix * reflect.DeepEquals	2023-04-14 16:14:52 +02:00
Marcin Tojek	0347231bb8	feat: expose agent metrics via Prometheus endpoint (#7011 ) * WIP * WIP * WIP * Agents * fix * 1min * fix * WIP * Test * docs * fmt * Add timer to measure the metrics collection * Use CachedGaugeVec * Unit tests * Address PR comments	2023-04-07 17:48:52 +02:00
Cian Johnston	43e8ba0811	feat(api): add prometheus metric coderd_workspace_builds_total (#6314 ) This PR adds the prometheus metric coderd_workspace_builds_total. It measures the total number of workspace builds, along with a number of labels intended to be useful for an operator debugging a failed workspace build trying to discover the scope of the issue.	2023-02-23 01:28:10 +00:00
Steven Masley	f76ef98a32	chore!: Standardize prometheus time metrics to seconds (#5709 ) * chore!: Standardize prometheus time metrics to seconds * Update prometheus docs	2023-01-13 11:15:25 -06:00
Marcin Tojek	883cf8afa9	chore: Add missing metrics description (#5212 ) * chore: Add missing metrics description * Update provisionerd/provisionerd.go Co-authored-by: Mathias Fredriksson <mafredri@gmail.com> * Fix Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>	2022-12-01 12:50:57 +01:00
Marcin Tojek	38bdae7016	docs: Prometheus metrics + generator (#5179 ) * docs: Prometheus metrics * Fix * Typo * Typo * Typo * Fix: link * Update docs/admin/prometheus.md Co-authored-by: Dean Sheather <dean@deansheather.com> * Update docs/admin/prometheus.md Co-authored-by: Dean Sheather <dean@deansheather.com> * Update docs/admin/prometheus.md Co-authored-by: Dean Sheather <dean@deansheather.com> * Update docs/admin/prometheus.md Co-authored-by: Dean Sheather <dean@deansheather.com> * Update docs/admin/prometheus.md Co-authored-by: Dean Sheather <dean@deansheather.com> * Rephrase * notice * use ```shell * Generator * gosec * fix: lint * PR comments * not needed anymore Co-authored-by: Dean Sheather <dean@deansheather.com> Co-authored-by: Geoffrey Huntley <ghuntley@ghuntley.com>	2022-11-30 17:39:51 +01:00

22 Commits