Files
coder/scripts/metricsdocgen
J. Scott Miller 20b953a99d feat: add Prometheus metric for agent first connection duration (#24179)
## Summary

Add `coderd_agents_first_connection_seconds` histogram metric that
records the
duration from workspace agent creation to first connection. This fills
an
observability gap — provisioner job timings and startup script metrics
exist,
but the agent connection phase (which can take several minutes) was not
exposed
to Prometheus.

Closes https://github.com/coder/coder/issues/21282

## Changes

- **`coderd/prometheusmetrics/prometheusmetrics.go`** — Define and
register a
  `HistogramVec` in the existing `Agents()` polling loop. Observe
`first_connected_at - created_at` exactly once per agent via a
deduplication
  map, pruned each tick to prevent unbounded memory growth.
- **`coderd/prometheusmetrics/prometheusmetrics_test.go`** — Update
`TestAgents`
to set `first_connected_at` on the test agent and assert the histogram
is
  collected with correct labels, sample count, and sample sum.
- **`docs/admin/integrations/prometheus.md`**,
**`scripts/metricsdocgen/generated_metrics`** —
  Auto-generated documentation updates from `make gen`.

## Metric details

| Property | Value |
|---|---|
| Name | `coderd_agents_first_connection_seconds` |
| Type | histogram |
| Labels | `template_name`, `agent_name`, `username`, `workspace_name` |
| Buckets | 1s, 10s, 30s, 1m, 2m, 5m, 10m, 30m, 1h |

## Example PromQL

```promql
# P95 agent connection time by template
histogram_quantile(0.95,
  sum(rate(coderd_agents_first_connection_seconds_bucket[1h])) by (le, template_name)
)
```

<details>
<summary>Implementation notes</summary>

### Design decisions

- **Histogram over gauge**: Enables `histogram_quantile()` for
percentile queries.
- **Observe in `Agents()` polling loop**: All required data is already
fetched by
  `GetWorkspaceAgentsForMetrics()` — no new DB queries.
- **Dedup via `map[uuid.UUID]struct{}`**: Prevents re-observing the same
agent
  across polling ticks. Pruned each cycle to bound memory.
- **Buckets**: Aligned with
`coderd_provisionerd_workspace_build_timings_seconds`
  range (1s–1h).

### Overhead at scale (100k active workspaces)

The deduplication map (`observedFirstConnection`) and per-tick pruning
map
(`currentAgentIDs`) are both `map[[16]byte]struct{}`. At 100k agents:

- **Memory**: ~2.25 MB persistent + ~2.25 MB transient per tick = **~4.5
MB peak**.
- **CPU**: ~25 ms of map operations per tick (one tick per minute) =
**<0.05% of one core**.

Both are negligible relative to the existing cost of the `Agents()` loop
(the DB
query, per-agent `GetWorkspaceAppsByAgentID` calls, and coordinator node
lookups
dominate).

</details>

> 🤖 Generated by Coder Agents
2026-04-14 12:00:46 -05:00
..

Metrics Documentation Generator

This tool generates the Prometheus metrics documentation at docs/admin/integrations/prometheus.md.

How It Works

The documentation is generated from two metrics files:

  1. metrics (static, manually maintained)
  2. generated_metrics (auto-generated, do not edit)

These files are merged and used to produce the final documentation.

metrics (static)

Contains metrics that are not directly defined in the coder source code:

  • go_*: Go runtime metrics
  • process_*: Process metrics from prometheus/client_golang
  • promhttp_*: Prometheus HTTP handler metrics
  • coder_aibridged_*: Metrics from external dependencies

Note

This file also contains edge cases where metric metadata cannot be accurately extracted by the scanner (e.g., labels determined by runtime logic). Static metrics take priority over generated metrics when both files contain the same metric name.

Edit this file to add metrics that should appear in the documentation but are not scanned from the coder codebase, or to manually override metrics where the scanner generates incorrect metadata (e.g., missing runtime-determined labels like in agent_scripts_executed_total).

generated_metrics (auto-generated)

Contains metrics extracted from the coder source code by the AST scanner (scanner/scanner.go).

Do not edit this file directly. It is regenerated by running:

make scripts/metricsdocgen/generated_metrics

Updating Metrics Documentation

To regenerate the documentation after code changes:

make docs/admin/integrations/prometheus.md

This will:

  • Run the scanner to update generated_metrics
  • Merge metrics and generated_metrics metric files
  • Update the documentation file