feat: event driven agent connection metric (#24355)

Moves the `coderd_agents_first_connection_seconds` histogram from the
polling-based `prometheusmetrics.Agents()` loop to the event-driven
`agentConnectionMonitor.init()` path. The metric is now recorded exactly
once when an agent first connects over the RPC websocket, instead of
being retroactively computed each polling tick.

The `username` and `workspace_name` labels are removed to reduce
cardinality; only `template_name` and `agent_name` are retained.

Adds unit tests covering both the happy path (first connection recorded)
and the negative-duration guard (clock skew logs a warning, no sample
emitted).
This commit is contained in:
J. Scott Miller
2026-05-11 14:27:40 -05:00
committed by GitHub
parent e56381eb61
commit 3e46c7986f
7 changed files with 145 additions and 118 deletions
+2 -2
View File
@@ -157,9 +157,9 @@ coderd_agents_connection_latencies_seconds{agent_name="",username="",workspace_n
# HELP coderd_agents_connections Agent connections with statuses.
# TYPE coderd_agents_connections gauge
coderd_agents_connections{agent_name="",username="",workspace_name="",status="",lifecycle_state="",tailnet_node=""} 0
# HELP coderd_agents_first_connection_seconds Duration from agent creation to first connection to the control plane in seconds.
# HELP coderd_agents_first_connection_seconds Duration from agent creation to first connection in seconds.
# TYPE coderd_agents_first_connection_seconds histogram
coderd_agents_first_connection_seconds{template_name="",agent_name="",username="",workspace_name=""} 0
coderd_agents_first_connection_seconds{template_name="",agent_name=""} 0
# HELP coderd_agents_up The number of active agents per workspace.
# TYPE coderd_agents_up gauge
coderd_agents_up{username="",workspace_name="",template_name="",template_version=""} 0