Commit Graph

44 Commits

Author SHA1 Message Date
Cian Johnston 4b585465b8 feat: label chatd metrics by model, add stream-state diagnostics (#24475)
Adds production-observability metrics to coderd/x/chatd/ for
model-level correlation and a chatStreams memory-leak investigation.

- Label per-request chatd metrics (steps_total, message_count,
  prompt_size_bytes, tool_result_size_bytes, ttft_seconds,
  compaction_total) with `model` and enrich the per-turn logger
  with provider/model.
- Add `coderd_chatd_stream_retries_total{provider, model, kind}`
  counter incremented in chatloop before OnRetry.
- Register a prometheus.Collector exposing `streams_active`,
  `stream_buffer_size_max`, `stream_buffer_events`,
  `stream_subscribers` from p.chatStreams.
- Add `coderd_chatd_stream_buffer_dropped_total` counter,
  incremented per publishToStream drop independently of the
  existing log-rate-limited bufferDropCount.
- Snapshot logger/model before the title-generation goroutine to
  avoid a data race with the logger/model rebind below it.

> 🤖
2026-04-17 16:16:30 +01:00
Cian Johnston d7439a9de0 feat: add Prometheus metrics for chatd subsystem (#24371)
Adds 7 Prometheus metrics to the chatd subsystem and introduces typed
`ActivityBumpReason` for deadline bump attribution.

| Metric | Type | Labels |
|--------|------|--------|
| `coderd_chatd_chats` | Gauge | `state` (streaming, waiting) |
| `coderd_chatd_message_count` | Histogram | `provider` |
| `coderd_chatd_prompt_size_bytes` | Histogram | `provider` |
| `coderd_chatd_tool_result_size_bytes` | Histogram | `provider`,
`tool_name` |
| `coderd_chatd_ttft_seconds` | Histogram | `provider` |
| `coderd_chatd_compaction_total` | Counter | `provider`, `result` |
| `coderd_chatd_steps_total` | Counter | `provider` |

> 🤖
2026-04-15 19:53:10 +01:00
Danny Kopping 48b90f8cc8 feat: add coder_build_info metric (#24365)
_Disclaimer: produced by Claude Opus 4.6_

Adds a `coder_build_info` metric which allows operators to see which
versions of Coder are currently running.

---------

Signed-off-by: Danny Kopping <danny@coder.com>
2026-04-15 12:48:38 +00:00
J. Scott Miller 20b953a99d feat: add Prometheus metric for agent first connection duration (#24179)
## Summary

Add `coderd_agents_first_connection_seconds` histogram metric that
records the
duration from workspace agent creation to first connection. This fills
an
observability gap — provisioner job timings and startup script metrics
exist,
but the agent connection phase (which can take several minutes) was not
exposed
to Prometheus.

Closes https://github.com/coder/coder/issues/21282

## Changes

- **`coderd/prometheusmetrics/prometheusmetrics.go`** — Define and
register a
  `HistogramVec` in the existing `Agents()` polling loop. Observe
`first_connected_at - created_at` exactly once per agent via a
deduplication
  map, pruned each tick to prevent unbounded memory growth.
- **`coderd/prometheusmetrics/prometheusmetrics_test.go`** — Update
`TestAgents`
to set `first_connected_at` on the test agent and assert the histogram
is
  collected with correct labels, sample count, and sample sum.
- **`docs/admin/integrations/prometheus.md`**,
**`scripts/metricsdocgen/generated_metrics`** —
  Auto-generated documentation updates from `make gen`.

## Metric details

| Property | Value |
|---|---|
| Name | `coderd_agents_first_connection_seconds` |
| Type | histogram |
| Labels | `template_name`, `agent_name`, `username`, `workspace_name` |
| Buckets | 1s, 10s, 30s, 1m, 2m, 5m, 10m, 30m, 1h |

## Example PromQL

```promql
# P95 agent connection time by template
histogram_quantile(0.95,
  sum(rate(coderd_agents_first_connection_seconds_bucket[1h])) by (le, template_name)
)
```

<details>
<summary>Implementation notes</summary>

### Design decisions

- **Histogram over gauge**: Enables `histogram_quantile()` for
percentile queries.
- **Observe in `Agents()` polling loop**: All required data is already
fetched by
  `GetWorkspaceAgentsForMetrics()` — no new DB queries.
- **Dedup via `map[uuid.UUID]struct{}`**: Prevents re-observing the same
agent
  across polling ticks. Pruned each cycle to bound memory.
- **Buckets**: Aligned with
`coderd_provisionerd_workspace_build_timings_seconds`
  range (1s–1h).

### Overhead at scale (100k active workspaces)

The deduplication map (`observedFirstConnection`) and per-tick pruning
map
(`currentAgentIDs`) are both `map[[16]byte]struct{}`. At 100k agents:

- **Memory**: ~2.25 MB persistent + ~2.25 MB transient per tick = **~4.5
MB peak**.
- **CPU**: ~25 ms of map operations per tick (one tick per minute) =
**<0.05% of one core**.

Both are negligible relative to the existing cost of the `Agents()` loop
(the DB
query, per-agent `GetWorkspaceAppsByAgentID` calls, and coordinator node
lookups
dominate).

</details>

> 🤖 Generated by Coder Agents
2026-04-14 12:00:46 -05:00
Cian Johnston f164463c6a fix(scripts/metricsdocgen): shush the prometheus scanner in CI (#23642)
- Suppress informational `log.Printf` messages from the metrics scanner
when stdout is not a TTY (i.e. piped via `atomic_write` in `make gen` or
CI)
- Genuine warnings (`warnf`) still print unconditionally so real
problems remain visible
- `log.Fatalf` for fatal errors is unchanged

> 🤖 Created by Coder Agents and reviewed by a human
2026-03-26 12:58:02 +00:00
Cian Johnston f1d333f0e6 refactor: deduplicate utility helpers across the codebase (#23338)
Audited exported helpers in `coderd/util/*`, `testutil`, `cryptorand`,
and friends, then replaced duplicated implementations with canonical
versions.

- **fix: `maps.SortedKeys` generic signature** — value type was
hardcoded to `any`, making it impossible to actually call. Added second
type parameter `V any`. Added table-driven tests with `cmp.Diff`.
- **refactor: replace ad-hoc ptr helpers with `ptr.Ref`** — removed
`int64Ptr`, `stringPtr`, `boolPtr`, `i64ptr`, `strPtr`, `PtrInt32`
across 6 files.
- **refactor: replace local `sortedKeys`/`sortKeys` with
`maps.SortedKeys`** — now that the signature is fixed, scripts can use
it.
- **refactor: replace hand-rolled `capitalize` with
`strings.Capitalize`** — the typegen version was also not UTF-8 safe.

> 🤖 This PR was created with the help of Coder Agents, and was reviewed
by my human. 🧑‍💻
2026-03-20 15:12:41 +00:00
Jon Ayers 6c44de951d feat: add Prometheus collector for DERP server expvar metrics (#22583)
This PR does three things:
- Exports derp expvars to the pprof endpoint
- Exports the expvar metrics as prometheus metrics in both coderd and
wsproxy
- Updates our tailscale to a fix I also had to make to avoid a data race
condition

I generated this with mux but I also manually tested that the metrics
were getting properly emitted
2026-03-06 01:57:58 -06:00
Mathias Fredriksson a6a8fd94d7 build(Makefile): enable parallel make -j gen with correct dependency graph (#22612)
`make gen` could not run with `-j` because inter-target dependency edges
were missing. Multiple recipes compile `coderd/rbac` (which includes
generated files like `object_gen.go`), and without explicit ordering,
parallel runs produced syntax errors from mid-write reads.

Three main changes:

**Dependency graph fixes** declare the compile-time chain through
`coderd/rbac` so that `object_gen.go` is written before anything that
imports it is compiled. The DB generation targets use a GNU Make 4.3+
grouped target (`&:`) so Make knows `generate.sh` co-produces
`querier.go`, `unique_constraint.go`, `dbmetrics`, and `dbauthz` in a
single invocation. `SKIP_DUMP_SQL=1` avoids re-entrant `make` inside
`generate.sh` when the Makefile already guarantees `dump.sql` is fresh.

**`scripts/atomicwrite` package** replaces `os.WriteFile` in all gen
scripts with a temp-file-in-same-dir + rename pattern, preventing
interrupted runs from leaving partial files.

**`.PRECIOUS` and shell atomic writes** protect git-tracked generated
files from Make's default delete-on-error behavior. Since these files
are committed, deletion is worse than staleness -- `git restore` is the
recovery path.

CI now runs `make -j --output-sync -B gen` (~32s, down from ~85s
serial).

| Scenario                          | Before             | After    |
|-----------------------------------|--------------------|----------|
| `make gen` (serial)               | 95s                | 95s      |
| `make -j gen` (parallel)          | race error         | **22s**  |
| CI `make -j --output-sync -B gen` | forced serial ~85s | **~32s** |
2026-03-05 11:58:10 +00:00
Zach 5b7377c375 feat: add Prometheus metrics for boundary log drop reporting (#22521)
Add Prometheus metrics to the boundary log proxy for observability:
- batches_dropped_total (reason: buffer_full, forward_failed)
- logs_dropped_total (reason: buffer_full, forward_failed,
  boundary_channel_full, boundary_batch_full)
- batches_forwarded_total

Also add BoundaryStatus to the BoundaryMessage envelope so boundary
can report dropped log counts as a separate wire message. The agent
records these as Prometheus metrics, making boundary-side data loss
visible. Backwards compatibility for older versions of boundary is maintained.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 12:42:34 -07:00
Susana Ferreira ca234f346d fix: mark presets as validation_failed to prevent endless prebuild retries (#22085)
## Description

- Updates `wsbuilder` to return a `BuildError` with
`http.StatusBadRequest` to signify a "validation error" on missing or
invalid parameters
- Adds a short-circuit in `prebuilds.StoreReconciler` to mark presets
for which creating a build returns a "validation error" as "validation
failed" and skip further attempts to reconcile.
- Adds a test to verify the above
- Introduces a new Prometheus metric
`coderd_prebuilt_workspaces_preset_validation_failed` to track the above

Closes: https://github.com/coder/coder/issues/21237

---------

Co-authored-by: Cian Johnston <cian@coder.com>
2026-02-27 14:26:48 +00:00
Garrett Delfosse 4057363f78 fix(coderd): add organization_name label to insights Prometheus metrics (#22296)
## Description

When multiple organizations have templates with the same name, the
Prometheus `/metrics` endpoint returns HTTP 500 because Prometheus
rejects duplicate label combinations. The three `coderd_insights_*`
metrics (`coderd_insights_templates_active_users`,
`coderd_insights_applications_usage_seconds`,
`coderd_insights_parameters`) used only `template_name` as a
distinguishing label, so two templates named e.g. `"openstack-v1"` in
different orgs would produce duplicate metric series.

This adds `organization_name` as a label to all three insight metric
descriptors to disambiguate templates across organizations.

## Changes

**`coderd/prometheusmetrics/insights/metricscollector.go`**:
- Added `organization_name` label to all three metric descriptors
- Added `organizationNames` field (template ID → org name) to the
`insightsData` struct
- In `doTick`: after fetching templates, collect unique org IDs, fetch
organizations via `GetOrganizations`, and build a
template-ID-to-org-name mapping
- In `Collect()`: pass the organization name as an additional label
value in every `MustNewConstMetric` call

**`coderd/prometheusmetrics/insights/testdata/insights-metrics.json`**:
Updated golden file to include `organization_name=coder` in all metric
label keys.

Fixes #21748
2026-02-25 08:58:50 +00:00
Susana Ferreira a613ffa3d6 chore: integrate metrics scanner into Makefile (#21465)
## Description

This PR wires up the metrics scanner in the Makefile to automatically regenerate metrics documentation when source files change.

## Changes

* Add Makefile target `scripts/metricsdocgen/generated_metrics` to run the AST scanner to generate the metrics file
* Update `docs/admin/integrations/prometheus.md` Makefile target to depend on `scripts/metricsdocgen/generated_metrics`
* Add `scripts/metricsdocgen/README.md` documenting the metrics generation process

Closes: https://github.com/coder/coder/issues/13223
2026-02-13 12:31:33 +00:00
Susana Ferreira df84cea924 feat(scripts/metricsdocgen): support merging static and generated metrics files (#21464)
## Description

This PR refactors `scripts/metricsdocgen/main.go` to support merging static and generated metrics files for documentation generation.

The static `metrics` file remains necessary for metrics not defined in the coder codebase (`go_*`, `process_*`, `promhttp_*`, `coder_aibridged_*`), as well as **edge cases** the scanner cannot handle (e.g.,  such as metrics with runtime-determined labels or function-local variable references for fields, ...). Handling these edge cases in the scanner would make it significantly more complex, so we keep this hybrid approach to accommodate them. This means that in such cases, developers need to update the `metrics` file directly, meaning there is still a risk of out-of-date information in the documentation. However, this solution should already encompass most cases.

Static metrics take priority over generated metrics when both files contain the same metric name, allowing manual overrides without modifying the scanner. Some of these edge cases could be easily fixed by updating the codebase to use one of the supported patterns.

## Changes

* Update `scripts/metricsdocgen/main.go` to read from two separate metrics files:
  * `metrics`: static, manually maintained metrics (e.g., `go_*`, `process_*`, `promhttp_*`, `coder_aibridged_*`)
  * `generated_metrics`: auto-generated by the AST scanner
* Update `metrics` file to contain only static and edge-case metrics
* Skip metrics with empty HELP descriptions in the scanner
* Update `generated_metrics` to reflect skipped metrics
* Update `docs/admin/integrations/prometheus.md` with merged metrics

Related to: https://github.com/coder/coder/issues/13223

**Disclosure:** This PR was mainly developed with Claude Sonnet 4, with iterative review and refinement by @ssncferreira
2026-02-13 12:19:33 +00:00
Susana Ferreira 55d1a32424 feat(scripts/metricsdocgen): add promauto.With() pattern to metrics scanner (#21463)
## Description

This PR implements extraction of metrics defined using `promauto.With()` factory patterns.

## Changes

* Add `extractPromautoMetric()` to handle:
  * `promauto.With(reg).NewCounterVec(prometheus.CounterOpts{...}, labels)`
  * `factory.NewGaugeVec(prometheus.GaugeOpts{...}, labels)`
* Script generates an updated `scripts/metricsdocgen/generated_metrics` file

Related to: https://github.com/coder/coder/issues/13223

**Disclosure:** This PR was mainly developed with Claude Sonnet 4, with iterative review and refinement by @ssncferreira
2026-02-13 11:24:33 +00:00
Susana Ferreira bcb437d281 feat(scripts/metricsdocgen): add prometheus.New*() and New*Vec() patterns to metrics scanner (#21462)
## Description

This PR implements extraction of metrics defined using `prometheus.New*()` and `prometheus.New*Vec()` patterns with `*Opts{}` structs.

## Changes

* Add `extractOptsMetric()` to handle:
  * `prometheus.NewGauge(prometheus.GaugeOpts{...})`
  * `prometheus.NewCounter(prometheus.CounterOpts{...})`
  * `prometheus.NewHistogram(prometheus.HistogramOpts{...})`
  * `prometheus.NewSummary(prometheus.SummaryOpts{...})`
  * `prometheus.New*Vec(prometheus.*Opts{...}, labels)`
* Script generates an updated `scripts/metricsdocgen/generated_metrics` file

Related to: https://github.com/coder/coder/issues/13223

**Disclosure:** This PR was mainly developed with Claude Sonnet 4, with iterative review and refinement by @ssncferreira
2026-02-13 11:13:55 +00:00
Susana Ferreira 45280d5516 feat(scripts/metricsdocgen): add prometheus.NewDesc() pattern to metrics scanner (#21461)
## Description

This PR implements extraction of metrics defined using the `prometheus.NewDesc()` pattern.

## Changes

* Add `extractNewDescMetric()` to extract metrics from `prometheus.NewDesc()` calls
* Script generates an updated `scripts/metricsdocgen/generated_metrics` file

Related to: https://github.com/coder/coder/issues/13223

**Disclosure:** This PR was mainly developed with Claude Sonnet 4, with iterative review and refinement by @ssncferreira
2026-02-13 11:01:34 +00:00
Susana Ferreira a9180d406e feat(scripts/metricsdocgen): add AST scanner core for metrics doc generation (#21460)
## Description

This PR adds an AST-based scanner to automatically generate Prometheus metrics documentation from the coder source code.

## Changes

* Add `scripts/metricsdocgen/scanner/scanner.go` with:
  * Directory walking for `agent/`, `coderd/`, `enterprise/`, `provisionerd/`
  * Go file parsing (skipping `*_test.go` files)
  * AST inspection for metric extraction
  * `Metric.String()` for Prometheus text exposition format rendering
  * `writeMetrics()` to output metrics to stdout
  * Placeholder `extractMetricFromCall()` (implemented in subsequent PRs)
* Empty `scripts/metricsdocgen/generated_metrics` placeholder (populated by subsequent PRs)

**Note:** To facilitate the review process, this was separated into scoped stacked PRs. The division was based on the main structure, the different Prometheus patterns currently present in the codebase, and updates to the build process.

Related to: https://github.com/coder/coder/issues/13223

**Disclosure:** This PR was mainly developed with Claude Sonnet 4, with iterative review and refinement by @ssncferreira
2026-02-13 10:48:55 +00:00
Callum Styan 5f3be6b288 feat: add provisioner job queue wait time histogram and jobs enqueued counter (#21869)
This PR adds some metrics to help identify job enqueue rates and
latencies. This work was initiated as a way to help reduce the cost of
the observation/measurement itself for autostart scaletests, which
impacts our ability to identify/reason about the load caused by
autostart. See: https://github.com/coder/internal/issues/1209

I've extended the metrics here to account for regular user initiated
builds, prebuilds, autostarts, etc. IMO there is still the question here
of whether we want to include or need the `transition` label, which is
only present on workspace builds. Including it does lead to an increase
in cardinality, and in the case of the histogram (when not using native
histograms) that's at least a few extra series for every bucket. We
could remove the transition label there but keep it on the counter.

Additionally, the histogram is currently observing latencies for other
jobs, such as template builds/version imports, those do not have a
transition type associated with them.

Tested briefly in a workspace, can see metric values like the following:
-
`coderd_workspace_builds_enqueued_total{build_reason="autostart",provisioner_type="terraform",status="success",transition="start"}
1`
-
`coderd_provisioner_job_queue_wait_seconds_bucket{build_reason="autostart",job_type="workspace_build",provisioner_type="terraform",transition="start",le="0.025"}
1`

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-12 13:40:47 -08:00
Jon Ayers 6035e45cb8 feat: add e2e workspace build duration metric (#21739)
Adds coderd_template_workspace_build_duration_seconds histogram that
tracks the full duration from workspace build creation to agent ready.
This captures the complete user-perceived build time including
provisioning and agent startup.

The metric is emitted when the agent reports ready/error/timeout via the
lifecycle API, ensuring each build is counted exactly once per replica.
2026-02-06 16:26:02 -06:00
Marcin Tojek 036ed5672f fix!: remove deprecated prometheus metrics (#21788)
## Description

Removes the following deprecated Prometheus metrics:

- `coderd_api_workspace_latest_build_total` → use
`coderd_api_workspace_latest_build` instead
- `coderd_oauth2_external_requests_rate_limit_total` → use
`coderd_oauth2_external_requests_rate_limit` instead

These metrics were deprecated in #12976 because gauge metrics should
avoid the `_total` suffix per [Prometheus naming
conventions](https://prometheus.io/docs/practices/naming/).

## Changes

- Removed deprecated metric `coderd_api_workspace_latest_build_total`
from `coderd/prometheusmetrics/prometheusmetrics.go`
- Removed deprecated metric
`coderd_oauth2_external_requests_rate_limit_total` from
`coderd/promoauth/oauth2.go`
- Updated tests to use the non-deprecated metric name

Fixes #12999
2026-01-30 13:30:06 +01:00
Marcin Tojek 04b0253e8a feat: add Prometheus metrics for license warnings and errors (#21749)
Fixes: coder/internal#767

Adds two new Prometheus metrics for license health monitoring:

- `coderd_license_warnings` - count of active license warnings
- `coderd_license_errors` - count of active license errors

Metrics endpoint after startup of a deployment with license enabled:

```
...
# HELP coderd_license_errors The number of active license errors.
# TYPE coderd_license_errors gauge
coderd_license_errors 0
...
# HELP coderd_license_warnings The number of active license warnings.
# TYPE coderd_license_warnings gauge
coderd_license_warnings 0
...
```
2026-01-29 13:50:15 +01:00
Callum Styan 806d7e4c11 docs: update metrics docs to include metadata batcher metrics (#21665)
This updates the metrics docs to include metrics added in
https://github.com/coder/coder/pull/21330

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2026-01-26 09:22:14 -08:00
Danny Kopping c6631e1e50 feat: expose aibridged metrics (#20865)
Upgrades `coder/aibridge` to v0.2.0 which includes
https://github.com/coder/aibridge/pull/62.

Creates a `prometheus.Registerer` with a prefix `coder_aibridged_` and
passes that along to coder/aibridge which actually exposes the metrics.

Also includes a side-effect of a change described in
https://github.com/coder/aibridge/pull/62#discussion_r2550017470.

---------

Signed-off-by: Danny Kopping <danny@coder.com>
2025-11-24 18:16:06 +02:00
Susana Ferreira c1f8465de6 fix: add missing provisionerd metrics to docs (#20358)
## Description

Add missing provisionerd metrics to Prometheus documentation:
* `coderd_provisionerd_num_daemons`: The number of provisioner daemons.
* `coderd_provisionerd_workspace_build_timings_seconds`: The time taken
for a workspace to build.

Related to internal thread:
https://codercom.slack.com/archives/C07GRNNRW03/p1760642020583019
2025-10-20 11:33:45 +01:00
Susana Ferreira 0ab345ca84 feat: add prebuild timing metrics to Prometheus (#19503)
## Description

This PR introduces one counter and two histograms related to workspace
creation and claiming. The goal is to provide clearer observability into
how workspaces are created (regular vs prebuild) and the time cost of
those operations.

### `coderd_workspace_creation_total`

* Metric type: Counter
* Name: `coderd_workspace_creation_total`
* Labels: `organization_name`, `template_name`, `preset_name`

This counter tracks whether a regular workspace (not created from a
prebuild pool) was created using a preset or not.
Currently, we already expose `coderd_prebuilt_workspaces_claimed_total`
for claimed prebuilt workspaces, but we lack a comparable metric for
regular workspace creations. This metric fills that gap, making it
possible to compare regular creations against claims.

Implementation notes:
* Exposed as a `coderd_` metric, consistent with other workspace-related
metrics (e.g. `coderd_api_workspace_latest_build`:
https://github.com/coder/coder/blob/main/coderd/prometheusmetrics/prometheusmetrics.go#L149).
* Every `defaultRefreshRate` (1 minute ), DB query
`GetRegularWorkspaceCreateMetrics` is executed to fetch all regular
workspaces (not created from a prebuild pool).
* The counter is updated with the total from all time (not just since
metric introduction). This differs from the histograms below, which only
accumulate from their introduction forward.

### `coderd_workspace_creation_duration_seconds` &
`coderd_prebuilt_workspace_claim_duration_seconds`

* Metric types: Histogram
* Names:
  * `coderd_workspace_creation_duration_seconds`
* Labels: `organization_name`, `template_name`, `preset_name`, `type`
(`regular`, `prebuild`)
  * `coderd_prebuilt_workspace_claim_duration_seconds`
    * Labels: `organization_name`, `template_name`, `preset_name`

We already have `coderd_provisionerd_workspace_build_timings_seconds`,
which tracks build run times for all workspace builds handled by the
provisioner daemon.
However, in the context of this issue, we are only interested in
creation and claim build times, not all transitions; additionally, this
metric does not include `preset_name`, and adding it there would
significantly increase cardinality. Therefore, separate more focused
metrics are introduced here:
* `coderd_workspace_creation_duration_seconds`: Build time to create a
workspace (either a regular workspace or the build into a prebuild pool,
for prebuild initial provisioning build).
* `coderd_prebuilt_workspace_claim_duration_seconds`: Time to claim a
prebuilt workspace from the pool.

The reason for two separate histograms is that:
* Creation (regular or prebuild): provisioning builds with similar time
magnitude, generally expected to take longer than a claim operation.
* Claim: expected to be a much faster provisioning build.

#### Native histogram usage

Provisioning times vary widely between projects. Using static buckets
risks unbalanced or poorly informative histograms.
To address this, these metrics use [Prometheus native
histograms](https://prometheus.io/docs/specs/native_histograms/):
* First introduced in Prometheus v2.40.0
* Recommended stable usage from v2.45+
* Requires Go client `prometheus/client_golang` v1.15.0+
* Experimental and must be explicitly enabled on the server
(`--enable-feature=native-histograms`)

For compatibility, we also retain a classic bucket definition (aligned
with the existing provisioner metric:
https://github.com/coder/coder/blob/main/provisionerd/provisionerd.go#L182-L189).
* If native histograms are enabled, Prometheus ingests the
high-resolution histogram.
* If not, it falls back to the predefined buckets.

Implementation notes:
* Unlike the counter, these histograms are updated in real-time at
workspace build job completion.
* They reflect data only from the point of introduction forward (no
historical backfill).

## Relates to 

Closes: https://github.com/coder/coder/issues/19528
Native histograms tested in observability stack:
https://github.com/coder/observability/pull/50
2025-08-28 15:00:26 +01:00
Muhammad Atif Ali 419eba5fb6 docs: restructure docs (#14421)
Closes #13434 
Supersedes #14182

---------

Co-authored-by: Ethan <39577870+ethanndickson@users.noreply.github.com>
Co-authored-by: Ethan Dickson <ethan@coder.com>
Co-authored-by: Ben Potter <ben@coder.com>
Co-authored-by: Stephen Kirby <58410745+stirby@users.noreply.github.com>
Co-authored-by: Stephen Kirby <me@skirby.dev>
Co-authored-by: EdwardAngert <17991901+EdwardAngert@users.noreply.github.com>
Co-authored-by: Edward Angert <EdwardAngert@users.noreply.github.com>
2024-10-05 10:52:04 -05:00
Ethan fb28979537 fix(docs): add coderd_workspace_latest_build_status prometheus metric (#14828) 2024-09-27 02:55:24 +10:00
Ethan c8580a415a feat: expose current agent connections by type via prometheus (#14612) 2024-09-11 14:13:30 +10:00
dependabot[bot] c41d0efff9 chore: bump github.com/prometheus/client_golang from 1.18.0 to 1.19.1 (#13232)
* chore: bump github.com/prometheus/client_golang from 1.18.0 to 1.19.1
2024-05-13 13:01:28 +00:00
Pavel Aseev 4682355eed chore: deprecate gauge metrics with _total suffix (#12744) (#12976)
* chore: deprecate gauge metrics with _total suffix (#12744)

Deprecated metrics:
- coderd_oauth2_external_requests_rate_limit_total
- coderd_api_workspace_latest_build_total

* Apply suggestions from code review

add link to follow-up issue

Co-authored-by: Cian Johnston <public@cianjohnston.ie>

---------

Co-authored-by: Cian Johnston <public@cianjohnston.ie>
2024-04-24 11:23:24 +03:00
Steven Masley 13359aa16f chore: drop github per user rate limit tracking (#12286)
* chore: drop github per user rate limit tracking

Rate limits for authenticated requests are per user.
This would be an excessive number of prometheus labels,
so we only track the unauthorized limit.
2024-02-23 11:17:52 -06:00
Steven Masley 89ab659114 chore: add oauth2 prometheus metrics for to documentation (#11534) 2024-01-10 15:46:37 +00:00
Steven Masley b7bdb17460 feat: add metrics to workspace agent scripts (#11132)
* push startup script metrics to agent
2023-12-13 11:45:43 -06:00
Eric Paulsen 167c759149 docs: add license and template insights prom metrics (#11109)
* docs: add license and template insights prom metrics

* add: coderd_insights_applications_usage_seconds
2023-12-08 14:17:14 -05:00
Colin Adler bc862fa493 chore: upgrade tailscale to v1.46.1 (#8913) 2023-08-09 19:50:26 +00:00
Marcin Tojek 942aba3a66 feat: expose agent stats via Prometheus endpoint (#7115)
* WIP

* WIP

* WIP

* Agents

* fix

* 1min

* fix

* WIP

* Test

* docs

* fmt

* Add timer to measure the metrics collection

* Use CachedGaugeVec

* Unit tests

* WIP

* WIP

* db: GetWorkspaceAgentStatsAndLabels

* fmt

* WIP

* gauges

* feat: collect

* fix

* fmt

* minor fixes

* Prometheus flag

* fix

* WIP

* fix tests

* WIP

* fix json

* Rx Tx bytes

* CloseFunc

* fix

* fix

* Fixes

* fix

* fix: IgnoreErrors

* Fix: Windows

* fix

* reflect.DeepEquals
2023-04-14 16:14:52 +02:00
Marcin Tojek 0347231bb8 feat: expose agent metrics via Prometheus endpoint (#7011)
* WIP

* WIP

* WIP

* Agents

* fix

* 1min

* fix

* WIP

* Test

* docs

* fmt

* Add timer to measure the metrics collection

* Use CachedGaugeVec

* Unit tests

* Address PR comments
2023-04-07 17:48:52 +02:00
Cian Johnston 43e8ba0811 feat(api): add prometheus metric coderd_workspace_builds_total (#6314)
This PR adds the prometheus metric coderd_workspace_builds_total.
It measures the total number of workspace builds, along with a number of labels intended to be useful for an operator debugging a failed workspace build trying to discover the scope of the issue.
2023-02-23 01:28:10 +00:00
Ammar Bandukwala f05609b4da chore: format Go more aggressively 2023-02-18 18:32:09 -06:00
Kyle Carberry 026b1cd2a4 chore: update to go 1.20 (#5968)
Co-authored-by: Colin Adler <colin1adler@gmail.com>
2023-02-02 12:36:27 -06:00
Steven Masley f76ef98a32 chore!: Standardize prometheus time metrics to seconds (#5709)
* chore!: Standardize prometheus time metrics to seconds
* Update prometheus docs
2023-01-13 11:15:25 -06:00
Marcin Tojek dc6d271293 feat: Build framework for generating API docs (#5383)
* WIP

* Gen

* WIP

* chi swagger

* WIP

* WIP

* WIP

* GetWorkspaces

* GetWorkspaces

* Markdown

* Use widdershins

* WIP

* WIP

* WIP

* Markdown template

* Fix: makefile

* fmt

* Fix: comment

* Enable swagger conditionally

* fix: site

* Default false

* Flag tests

* fix

* fix

* template fixes

* Fix

* Fix

* Fix

* WIP

* Formatted

* Cleanup

* Templates

* BEGIN END SECTION

* subshell exit code

* Fix

* Fix merge

* WIP

* Fix

* Fix fmt

* Fix

* Generic api.md page

* Fix merge

* Link pages

* Fix

* Fix

* Fix: links

* Add icon

* Write manifest file

* Fix fmt

* Fix: enterprise

* Fix: Swagger.Enable

* Fix: rename apidocs to apidoc

* Fix: find -not -prune

* Fix: json not available

* Fix: rename Coderd API to Coder API

* Fix: npm exec

* Fix: api dir

* Fix: by ID

* Fix: string uuid

* Fix: include deleted

* Fix: indirect go.mod

* Fix: source lib.sh

* Fix: shellcheck

* Fix: pushd popd

* Fix: fmt

* Fix: improve workspaces

* Fix: swagger-enable

* Fix

* Fix: mention only HTTP 200

* Fix: IDs

* Fix: https

* Fix: icon

* More APis

* Fix: format swagger.json

* Fix: SwaggerEndpoint

* Fix: SCRIPT_DIR

* Fix: PROJECT_ROOT

* Fix: use code tags in schemas.md

* Fix: examples

* Fix: examples

* Fix: improve format

* Fix: date-time,enums

* Fix: include_deleted

* Fix: array of

* Fix: parameter, response

* Fix: string time or null

* Workspaces: more docs

* Workspaces: more docs

* Fix: renderDisplayName

* Fix: ActiveUserCount

* Fix

* Fix: typo

* Templates: docs

* Notice: incomplete
2022-12-19 18:43:46 +01:00
Marcin Tojek 883cf8afa9 chore: Add missing metrics description (#5212)
* chore: Add missing metrics description

* Update provisionerd/provisionerd.go

Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>

* Fix

Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
2022-12-01 12:50:57 +01:00
Marcin Tojek 38bdae7016 docs: Prometheus metrics + generator (#5179)
* docs: Prometheus metrics

* Fix

* Typo

* Typo

* Typo

* Fix: link

* Update docs/admin/prometheus.md

Co-authored-by: Dean Sheather <dean@deansheather.com>

* Update docs/admin/prometheus.md

Co-authored-by: Dean Sheather <dean@deansheather.com>

* Update docs/admin/prometheus.md

Co-authored-by: Dean Sheather <dean@deansheather.com>

* Update docs/admin/prometheus.md

Co-authored-by: Dean Sheather <dean@deansheather.com>

* Update docs/admin/prometheus.md

Co-authored-by: Dean Sheather <dean@deansheather.com>

* Rephrase

* notice

* use ```shell

* Generator

* gosec

* fix: lint

* PR comments

* not needed anymore

Co-authored-by: Dean Sheather <dean@deansheather.com>
Co-authored-by: Geoffrey Huntley <ghuntley@ghuntley.com>
2022-11-30 17:39:51 +01:00