Closes#15129
Adds a generated **Beta features** table on the feature stages doc,
using the same mainline/stable sparse-checkout approach as the
early-access experiments list.
- Walk `docs/manifest.json` for routes with `state: ["beta"]` and render
a table (title, description, mainline vs stable).
- Inject output between `<!-- BEGIN: available-beta-features -->` /
`END` in `docs/install/releases/feature-stages.md`.
- Rename `scripts/release/docs_update_experiments.sh` →
`docs_update_feature_stages.sh`, refresh the script header, and use
`build/docs/feature-stages` for clone output.
<img width="1624" height="1061" alt="image"
src="https://github.com/user-attachments/assets/5fa811dd-9b80-446b-ae65-ec6e6cfedd6a"
/>
- Suppress informational `log.Printf` messages from the metrics scanner
when stdout is not a TTY (i.e. piped via `atomic_write` in `make gen` or
CI)
- Genuine warnings (`warnf`) still print unconditionally so real
problems remain visible
- `log.Fatalf` for fatal errors is unchanged
> 🤖 Created by Coder Agents and reviewed by a human
GPG emits an "untrusted key" warning when signing with a key that hasn't
been assigned a trust level, which can cause verification steps to fail
or produce noisy output.
Example:
```sh
gpg: Signature made Tue Mar 24 20:56:59 2026 UTC
gpg: using RSA key 21C96B1CB950718874F64DBD6A5A671B5E40A3B9
gpg: Good signature from "Coder Release Signing Key <security@coder.com>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the owner.
Primary key fingerprint: 21C9 6B1C B950 7188 74F6 4DBD 6A5A 671B 5E40 A3B9
```
After importing the release key, derive its fingerprint from the keyring
and mark it as ultimately trusted via `--import-ownertrust`.
The fingerprint is extracted dynamically rather than hard-coded, so this
works for any key supplied via `CODER_GPG_RELEASE_KEY_BASE64`.
## Summary
The `build` CI job on `main` is failing with:
```
ERROR: Computed invalid windows version format: 2.32.0-rc.0.1
```
This started when the `v2.32.0-rc.0` tag was created, making `git
describe` produce versions like `2.32.0-rc.0-devel+4f571f8ff`.
## Root cause
`scripts/build_go.sh` converts the version to a Windows-compatible
`X.Y.Z.{0,1}` format by stripping pre-release segments. It uses
`${var%-*}` (shortest suffix match), which only removes the last
`-segment`. For RC versions this leaves `-rc.0` intact:
```
2.32.0-rc.0-devel → strip %-* → 2.32.0-rc.0 → + .1 → 2.32.0-rc.0.1 ✗
```
## Fix
Switch to `${var%%-*}` (longest suffix match) so all pre-release
segments are stripped from the first hyphen onward:
```
2.32.0-rc.0-devel → strip %%-* → 2.32.0 → + .1 → 2.32.0.1 ✓
```
Verified all version patterns produce valid output:
| Input | Output |
|---|---|
| `2.32.0` | `2.32.0.0` |
| `2.32.0-devel` | `2.32.0.1` |
| `2.32.0-rc.0-devel` | `2.32.0.1` |
| `2.32.0-rc.0` | `2.32.0.1` |
Fixes
https://github.com/coder/coder/actions/runs/23511163474/job/68434008241
When developers switch branches, the database may have migrations
from the other branch that don't exist in the current binary.
This causes coder server to fail at startup, leaving developers
stuck.
The develop script now detects this before starting the server:
1. Connects to postgres (starts temp embedded instance for
built-in postgres, or uses CODER_PG_CONNECTION_URL).
2. Compares DB version against the source's latest migration.
3. If DB is ahead, searches git history for the missing down
SQL files and applies them in a transaction.
4. If git recovery fails (ambiguous versions across branches,
missing files), falls back to resetting the public schema.
Also adds --reset-db and --skip-db-recovery flags.
The slices package provides type-safe generic replacements for the
old typed sort convenience functions. The codebase already uses
slices.Sort in 43 call sites; this finishes the migration for the
remaining 29.
- sort.Strings(x) -> slices.Sort(x)
- sort.Float64s(x) -> slices.Sort(x)
- sort.StringsAreSorted(x) -> slices.IsSorted(x)
Audited exported helpers in `coderd/util/*`, `testutil`, `cryptorand`,
and friends, then replaced duplicated implementations with canonical
versions.
- **fix: `maps.SortedKeys` generic signature** — value type was
hardcoded to `any`, making it impossible to actually call. Added second
type parameter `V any`. Added table-driven tests with `cmp.Diff`.
- **refactor: replace ad-hoc ptr helpers with `ptr.Ref`** — removed
`int64Ptr`, `stringPtr`, `boolPtr`, `i64ptr`, `strPtr`, `PtrInt32`
across 6 files.
- **refactor: replace local `sortedKeys`/`sortKeys` with
`maps.SortedKeys`** — now that the signature is fixed, scripts can use
it.
- **refactor: replace hand-rolled `capitalize` with
`strings.Capitalize`** — the typegen version was also not UTF-8 safe.
> 🤖 This PR was created with the help of Coder Agents, and was reviewed
by my human. 🧑💻
Pre-commit classifies staged files and runs make pre-commit-light
when no Go, TypeScript, or Makefile changes are present. This
skips gen, lint/go, lint/ts, fmt/go, fmt/ts, and the binary
build. A markdown-only commit takes seconds instead of minutes.
Pre-push uses the same heuristic: if only light files changed
(docs, shell, terraform, etc.), tests are skipped entirely.
Falls back to the full make targets when Go/TS/Makefile changes
are detected, CODER_HOOK_RUN_ALL=1 is set, or the diff range
can't be determined.
Also adds test-storybook to make pre-push (vitest with the
storybook project in Playwright browser mode).
Always deploy from main for now. We want to keep testing commits to main
as soon as they're merged, so we're going to disable the release
freezing behavior. We will test cut releases on a separate deployment,
upgraded manually.
The flat ChatMessagePart interface had 20+ optional fields, preventing
TypeScript from narrowing types on switch(part.type). Each consumer
needed runtime validation, type assertions, or defensive ?. chains.
Add `variants` struct tags to ChatMessagePart fields declaring which
union variants include each field. A codegen mutation in apitypings
reads these tags via reflect and generates per-variant sub-interfaces
(ChatTextPart, ChatReasoningPart, etc.) plus a union type alias.
A test validates every field has a variants tag or is explicitly
excluded, and every part type is covered.
Remove dead frontend code: normalizeBlockType, alias case branches
("thinking", "toolcall", "toolresult"), legacy field fallbacks
(line_number, typedBlock.name/id/input/output), and result_delta
handling. Add test coverage for args_delta streaming, provider_executed
skip logic, and source part parsing.
Previously main.go used syscall.SysProcAttr{Setpgid: true} and
syscall.Kill, both undefined on Windows. This broke GOOS=windows
cross-compilation.
Add a //go:build !windows constraint to the package since it is
a dev-only tool that requires Unix utilities (bash, make, etc.)
and is not intended to run on Windows.
Refs #23054Fixescoder/internal#1407
- Add `--starter-template` option and properly create starter template
with name and icon
- Add Coder Desktop URLs to listening banner
- Makefile tweak to avoid rebuilding `scripts/develop` every time Go
code changes
Replace the ~370-line bash develop.sh with a Go program using
serpent for CLI flags, errgroup for process lifecycle, and
codersdk for setup. develop.sh becomes a thin make + exec wrapper.
- Process groups for clean shutdown of child trees
- Docker template auto-creation via SDK ExampleID
- Idempotent setup (users, orgs, templates)
- Configurable --port, --web-port, --proxy-port
- Preflight runs lib.sh dependency checks
- TCP dial for port-busy checks
- Make target (build/.bin/develop) for build caching
Restore PR title validation that was removed in 828f33a when
cdr-bot was expected to handle it. That bot has since been disabled.
The new title job in contrib.yaml validates:
- Conventional commit format (type(scope): description)
- Type from the same set used by release notes generation
- Scope validity derived from the changed files in the PR diff
- All changed files fall under the declared scope
Uses actions/github-script (no third-party marketplace actions).
Also fixes feat(api) examples across docs (no api folder exists)
and consolidates commit rules into CONTRIBUTING.md as the single
source of truth.
Add cost tracking for LLM chat interactions with microdollar precision.
## Changes
- Add `chatcost` package for per-message cost calculation using
`shopspring/decimal` for intermediate arithmetic
- **Ceil rounding policy**: fractional micros round UP to next whole
micro (applied once after summing all components)
- Database migration: `total_cost_micros` BIGINT column with historical
backfill and `created_at` index
- API endpoints: per-user cost summary and admin rollup under
`/api/experimental/chats/cost/`
- SDK types: `ChatCostSummary`, `ChatCostModelBreakdown`,
`ChatCostUserRollup`
- Fix `modeloptionsgen` to handle `decimal.Decimal` as opaque numeric
type
- Update frontend pricing test fixtures for string decimal types
## Design decisions
- `NULL` = unpriced (no matching model config), `0` = free
- Reasoning tokens included in output tokens (no double-counting)
- Integer microdollars (BIGINT) for storage and API responses
- Price config uses `decimal.Decimal` for exact parsing; totals use
`int64`
Frontend: #23037
When SSH disconnects, the output reader subshells fail writing to the
dead terminal (EIO) and exit due to set -e. This breaks the pipe to
the server, which receives SIGPIPE and dies before graceful shutdown
can stop embedded PostgreSQL.
Disable errexit in readers so they survive terminal death, add HUP
traps so the script handles SSH disconnect, and shield children from
direct SIGHUP via SIG_IGN before exec (Go resets this to caught once
signal.Notify registers). Also make fatal() exit immediately instead
of relying on async signal delivery, and remove the broken process
group kill (ppid was never a PGID).
The pre-push hook was removed in #22956. This restores it with a
reduced scope (tests + site build) and an allowlist so it only runs
for developers who opt in.
Two opt-in mechanisms:
- git config coder.pre-push true (local, not committed)
- CODER_WORKSPACE_OWNER_NAME allowlist in the hook script
git config takes priority and also supports explicit opt-out for
allowlisted users (git config coder.pre-push false).
Refs #22956
---------
Co-authored-by: Cian Johnston <cian@coder.com>
pre-commit was noisy: every sub-target dumped full stdout/stderr to the
terminal, burying failures in pages of compiler output and lint details.
Teach timed-shell.sh a quiet mode via MAKE_LOGDIR: when set, recipe
output is redirected to per-target log files and a one-line status is
printed instead. When unset, behavior is unchanged (with a refreshed
output format).
Makefile changes:
- pre-commit creates a tmpdir, passes MAKE_LOGDIR to sub-makes
- Drop --output-sync=target (log files eliminate interleaving)
- Add --no-print-directory to suppress Entering/Leaving noise
- Split check-unstaged and check-untracked into separate defines
- Restyle both with colored indicators and clearer instructions
- Clean up tmpdir on success, preserve on failure for debugging
## Summary
- add a `--port` flag to `scripts/develop.sh` so the API can run on a
non-default port
- make the script use the selected API port for access URL defaults,
readiness checks, login, proxy wiring, and printed URLs
- reject oversized `--port` values early and treat an existing dev
frontend on port 8080 as a conflict when the requested API is not
already running
- pin the frontend dev server to port 8080 so inherited `PORT`
environment variables do not move it to a different port
## Testing
- `bash -n scripts/develop.sh`
- `shellcheck -x scripts/develop.sh`
- `bash scripts/develop.sh --port abc`
- `bash scripts/develop.sh --port 8080`
- `bash scripts/develop.sh --port 999999999999999999999`
- started `./scripts/develop.sh --port 3001` and verified:
- `http://127.0.0.1:3001/healthz`
- `http://127.0.0.1:3001/api/v2/buildinfo`
- `http://127.0.0.1:8080/healthz`
- `http://127.0.0.1:8080/api/v2/buildinfo`
- simulated an existing dev frontend on `127.0.0.1:8080` and verified
`./scripts/develop.sh --port 3001` exits with a conflict error
## Summary
- remove the `pre-push` git hook script from the repository
- remove the `make pre-push` target and related Makefile documentation
- update contributor and agent docs so they only describe the remaining
`pre-commit` hook
## Validation
- `make pre-commit`
- `git diff --check`
---
_Generated with [`mux`](https://github.com/coder/mux) • Model:
`openai:gpt-5.4` • Thinking: `high`_
pre-commit and pre-push only reported total elapsed time at the end,
making it hard to identify which jobs are slow.
Add a `MAKE_TIMED=1` mode that replaces `SHELL` with a wrapper
(`scripts/lib/timed-shell.sh`) to print wall-clock time for each
recipe. pre-commit and pre-push enable this on their sub-makes.
Ad-hoc use: `make MAKE_TIMED=1 test`
Agents hit short shell timeouts on `git commit` (~13s) before
`make pre-commit` finishes (~20s warm), then disable hooks via
`git config core.hooksPath /dev/null`. This bypasses all local checks
and, because it writes to shared `.git/config`, silently disables hooks
for every other worktree too.
Add explicit timing guidance to AGENTS.md, and write worktree-scoped
`core.hooksPath` in post-checkout, pre-commit, and pre-push hooks to
make the bypass ineffective.
This change adds git hooks and Makefile targets that mirror CI required
checks locally, catching issues before they reach CI.
This is for use by AI agents (documented in AGENTS.md).
- **pre-commit** (every commit): gen, fmt, lint, typos, slim binary
build. Fast checks without Docker or Playwright.
- **pre-push** (before push): full CI suite including site build, tests,
sqlc-vet, offlinedocs.
To use:
```sh
git config core.hooksPath scripts/githooks
```
Works in worktrees (where `.git` is a file). Bypass with `--no-verify`.
## Problem
Bootstrap scripts under `provisionersdk/scripts/` are inlined into
templates via `sh -c '${init_script}'`. Any single quote (apostrophe) in
these `.sh` files silently breaks the shell quoting, causing the agent
to never start — with near-invisible error output.
## Changes
- **`scripts/check_bootstrap_quotes.sh`** — new lint script that scans
all `.sh` files under `provisionersdk/scripts/` for single quotes and
fails with a clear error if any are found. Only checks shell scripts
(not `.ps1`, which legitimately uses single quotes).
- **`Makefile`** — added `lint/bootstrap` target wired into the `lint`
dependency list.
Fixes#22062
This PR does three things:
- Exports derp expvars to the pprof endpoint
- Exports the expvar metrics as prometheus metrics in both coderd and
wsproxy
- Updates our tailscale to a fix I also had to make to avoid a data race
condition
I generated this with mux but I also manually tested that the metrics
were getting properly emitted
Follow-up to #22612. Running `git status --short` in a loop during `make
-B -j gen` still showed intermediate states for several files. This PR
fixes the remaining ones.
The main issues:
- `generate.sh` ran `gofmt` and `goimports` in-place after moving files
into the source tree. Now it formats in a workdir first and only `mv`s
the final result.
- `protoc` targets wrote directly to the source tree. Wrapped with
`scripts/atomic_protoc.sh` which redirects output to a tmpdir.
- Several generators used hardcoded `/tmp/` paths. On systems where
`/tmp` is tmpfs, `mv` degrades to copy+delete. Switched to a
project-local `_gen/` directory (gitignored, same filesystem).
- `apidoc/.gen` and `cli/index.md` used `cp` for final output. Replaced
with `mv`.
- `manifest.json` was written twice (unformatted, then formatted). Now
`.gen` writes to a staging file and the manifest target does one
formatted atomic write.
- `biome_format.sh` silently skipped files in gitignored dirs. Added
`--vcs-enabled=false`.
Two helpers reduce the Makefile boilerplate: `scripts/atomic_protoc.sh`
(wraps protoc) and an `atomic_write` Make define
(stdout-to-temp-to-target pattern). `.PRECIOUS` now also covers `.pb.go`
and mock files.
Verification: `make -B -j gen` x3 with `git status` polling, no changes.
Refs #22612
`make gen` could not run with `-j` because inter-target dependency edges
were missing. Multiple recipes compile `coderd/rbac` (which includes
generated files like `object_gen.go`), and without explicit ordering,
parallel runs produced syntax errors from mid-write reads.
Three main changes:
**Dependency graph fixes** declare the compile-time chain through
`coderd/rbac` so that `object_gen.go` is written before anything that
imports it is compiled. The DB generation targets use a GNU Make 4.3+
grouped target (`&:`) so Make knows `generate.sh` co-produces
`querier.go`, `unique_constraint.go`, `dbmetrics`, and `dbauthz` in a
single invocation. `SKIP_DUMP_SQL=1` avoids re-entrant `make` inside
`generate.sh` when the Makefile already guarantees `dump.sql` is fresh.
**`scripts/atomicwrite` package** replaces `os.WriteFile` in all gen
scripts with a temp-file-in-same-dir + rename pattern, preventing
interrupted runs from leaving partial files.
**`.PRECIOUS` and shell atomic writes** protect git-tracked generated
files from Make's default delete-on-error behavior. Since these files
are committed, deletion is worse than staleness -- `git restore` is the
recovery path.
CI now runs `make -j --output-sync -B gen` (~32s, down from ~85s
serial).
| Scenario | Before | After |
|-----------------------------------|--------------------|----------|
| `make gen` (serial) | 95s | 95s |
| `make -j gen` (parallel) | race error | **22s** |
| CI `make -j --output-sync -B gen` | forced serial ~85s | **~32s** |
## Summary
The macOS `.dylib` is only used by Coder Desktop macOS v0.7.2 or older.
v0.7.2 was released in August 2025. v0.8.0 of Coder Desktop macOS, also
released in August 2025, uses a signed Coder slim binary from the
deployment instead.
It's unlikely customers will be using Coder Desktop macOS v0.7.2 and the
next release of Coder simultaneously, so I think we can safely remove
this process, given it slows down CI & release processes.
## Changes
- **Makefile**: Remove `DYLIB_ARCHES`, `CODER_DYLIBS` variables and
`build/coder-dylib` target
- **scripts/build_go.sh**: Remove `--dylib` flag and all dylib-specific
logic (c-shared buildmode, CGO, plist embedding, vpn/dylib entrypoint)
- **scripts/sign_darwin.sh**: Remove dylib-specific comment
- **CI (ci.yaml)**: Remove `build-dylib` job, artifact download/insert
steps, and `build-dylib` dependency from `build` job
- **Release (release.yaml)**: Remove `build-dylib` job, artifact
download/insert steps, and `build-dylib` dependency from `release` job
- **vpn/dylib/**: Delete entire directory (`lib.go` + `info.plist.tmpl`)
- **vpn/router.go, vpn/dns.go**: Clean up comments referencing dylib
The slim and fat binary builds are completely unaffected — the dylib was
an independent build target with its own CI job.
_Generated by mux but reviewed by a human_
Add Prometheus metrics to the boundary log proxy for observability:
- batches_dropped_total (reason: buffer_full, forward_failed)
- logs_dropped_total (reason: buffer_full, forward_failed,
boundary_channel_full, boundary_batch_full)
- batches_forwarded_total
Also add BoundaryStatus to the BoundaryMessage envelope so boundary
can report dropped log counts as a separate wire message. The agent
records these as Prometheus metrics, making boundary-side data loss
visible. Backwards compatibility for older versions of boundary is maintained.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Description
- Updates `wsbuilder` to return a `BuildError` with
`http.StatusBadRequest` to signify a "validation error" on missing or
invalid parameters
- Adds a short-circuit in `prebuilds.StoreReconciler` to mark presets
for which creating a build returns a "validation error" as "validation
failed" and skip further attempts to reconcile.
- Adds a test to verify the above
- Introduces a new Prometheus metric
`coderd_prebuilt_workspaces_preset_validation_failed` to track the above
Closes: https://github.com/coder/coder/issues/21237
---------
Co-authored-by: Cian Johnston <cian@coder.com>
## Description
When multiple organizations have templates with the same name, the
Prometheus `/metrics` endpoint returns HTTP 500 because Prometheus
rejects duplicate label combinations. The three `coderd_insights_*`
metrics (`coderd_insights_templates_active_users`,
`coderd_insights_applications_usage_seconds`,
`coderd_insights_parameters`) used only `template_name` as a
distinguishing label, so two templates named e.g. `"openstack-v1"` in
different orgs would produce duplicate metric series.
This adds `organization_name` as a label to all three insight metric
descriptors to disambiguate templates across organizations.
## Changes
**`coderd/prometheusmetrics/insights/metricscollector.go`**:
- Added `organization_name` label to all three metric descriptors
- Added `organizationNames` field (template ID → org name) to the
`insightsData` struct
- In `doTick`: after fetching templates, collect unique org IDs, fetch
organizations via `GetOrganizations`, and build a
template-ID-to-org-name mapping
- In `Collect()`: pass the organization name as an additional label
value in every `MustNewConstMetric` call
**`coderd/prometheusmetrics/insights/testdata/insights-metrics.json`**:
Updated golden file to include `organization_name=coder` in all metric
label keys.
Fixes#21748
Description:
This PR updates the bundled Terraform binary and related version pins
from 1.14.1 to 1.14.5 (base image, installer fallback, and CI/test
fixtures). Terraform is statically built with an embedded Go runtime.
Moving to 1.14.5 updates the embedded toolchain and is intended to
address Go stdlib CVEs reported by security scanning.
Notes:
- Change is version-only; no functional Coder logic changes.
- Backport-friendly: intended to be cherry-picked to release branches
after merge.
## Description
This PR wires up the metrics scanner in the Makefile to automatically regenerate metrics documentation when source files change.
## Changes
* Add Makefile target `scripts/metricsdocgen/generated_metrics` to run the AST scanner to generate the metrics file
* Update `docs/admin/integrations/prometheus.md` Makefile target to depend on `scripts/metricsdocgen/generated_metrics`
* Add `scripts/metricsdocgen/README.md` documenting the metrics generation process
Closes: https://github.com/coder/coder/issues/13223
## Description
This PR refactors `scripts/metricsdocgen/main.go` to support merging static and generated metrics files for documentation generation.
The static `metrics` file remains necessary for metrics not defined in the coder codebase (`go_*`, `process_*`, `promhttp_*`, `coder_aibridged_*`), as well as **edge cases** the scanner cannot handle (e.g., such as metrics with runtime-determined labels or function-local variable references for fields, ...). Handling these edge cases in the scanner would make it significantly more complex, so we keep this hybrid approach to accommodate them. This means that in such cases, developers need to update the `metrics` file directly, meaning there is still a risk of out-of-date information in the documentation. However, this solution should already encompass most cases.
Static metrics take priority over generated metrics when both files contain the same metric name, allowing manual overrides without modifying the scanner. Some of these edge cases could be easily fixed by updating the codebase to use one of the supported patterns.
## Changes
* Update `scripts/metricsdocgen/main.go` to read from two separate metrics files:
* `metrics`: static, manually maintained metrics (e.g., `go_*`, `process_*`, `promhttp_*`, `coder_aibridged_*`)
* `generated_metrics`: auto-generated by the AST scanner
* Update `metrics` file to contain only static and edge-case metrics
* Skip metrics with empty HELP descriptions in the scanner
* Update `generated_metrics` to reflect skipped metrics
* Update `docs/admin/integrations/prometheus.md` with merged metrics
Related to: https://github.com/coder/coder/issues/13223
**Disclosure:** This PR was mainly developed with Claude Sonnet 4, with iterative review and refinement by @ssncferreira
## Description
This PR implements extraction of metrics defined using `promauto.With()` factory patterns.
## Changes
* Add `extractPromautoMetric()` to handle:
* `promauto.With(reg).NewCounterVec(prometheus.CounterOpts{...}, labels)`
* `factory.NewGaugeVec(prometheus.GaugeOpts{...}, labels)`
* Script generates an updated `scripts/metricsdocgen/generated_metrics` file
Related to: https://github.com/coder/coder/issues/13223
**Disclosure:** This PR was mainly developed with Claude Sonnet 4, with iterative review and refinement by @ssncferreira
## Description
This PR implements extraction of metrics defined using `prometheus.New*()` and `prometheus.New*Vec()` patterns with `*Opts{}` structs.
## Changes
* Add `extractOptsMetric()` to handle:
* `prometheus.NewGauge(prometheus.GaugeOpts{...})`
* `prometheus.NewCounter(prometheus.CounterOpts{...})`
* `prometheus.NewHistogram(prometheus.HistogramOpts{...})`
* `prometheus.NewSummary(prometheus.SummaryOpts{...})`
* `prometheus.New*Vec(prometheus.*Opts{...}, labels)`
* Script generates an updated `scripts/metricsdocgen/generated_metrics` file
Related to: https://github.com/coder/coder/issues/13223
**Disclosure:** This PR was mainly developed with Claude Sonnet 4, with iterative review and refinement by @ssncferreira
## Description
This PR implements extraction of metrics defined using the `prometheus.NewDesc()` pattern.
## Changes
* Add `extractNewDescMetric()` to extract metrics from `prometheus.NewDesc()` calls
* Script generates an updated `scripts/metricsdocgen/generated_metrics` file
Related to: https://github.com/coder/coder/issues/13223
**Disclosure:** This PR was mainly developed with Claude Sonnet 4, with iterative review and refinement by @ssncferreira
## Description
This PR adds an AST-based scanner to automatically generate Prometheus metrics documentation from the coder source code.
## Changes
* Add `scripts/metricsdocgen/scanner/scanner.go` with:
* Directory walking for `agent/`, `coderd/`, `enterprise/`, `provisionerd/`
* Go file parsing (skipping `*_test.go` files)
* AST inspection for metric extraction
* `Metric.String()` for Prometheus text exposition format rendering
* `writeMetrics()` to output metrics to stdout
* Placeholder `extractMetricFromCall()` (implemented in subsequent PRs)
* Empty `scripts/metricsdocgen/generated_metrics` placeholder (populated by subsequent PRs)
**Note:** To facilitate the review process, this was separated into scoped stacked PRs. The division was based on the main structure, the different Prometheus patterns currently present in the codebase, and updates to the build process.
Related to: https://github.com/coder/coder/issues/13223
**Disclosure:** This PR was mainly developed with Claude Sonnet 4, with iterative review and refinement by @ssncferreira
This PR adds some metrics to help identify job enqueue rates and
latencies. This work was initiated as a way to help reduce the cost of
the observation/measurement itself for autostart scaletests, which
impacts our ability to identify/reason about the load caused by
autostart. See: https://github.com/coder/internal/issues/1209
I've extended the metrics here to account for regular user initiated
builds, prebuilds, autostarts, etc. IMO there is still the question here
of whether we want to include or need the `transition` label, which is
only present on workspace builds. Including it does lead to an increase
in cardinality, and in the case of the histogram (when not using native
histograms) that's at least a few extra series for every bucket. We
could remove the transition label there but keep it on the counter.
Additionally, the histogram is currently observing latencies for other
jobs, such as template builds/version imports, those do not have a
transition type associated with them.
Tested briefly in a workspace, can see metric values like the following:
-
`coderd_workspace_builds_enqueued_total{build_reason="autostart",provisioner_type="terraform",status="success",transition="start"}
1`
-
`coderd_provisioner_job_queue_wait_seconds_bucket{build_reason="autostart",job_type="workspace_build",provisioner_type="terraform",transition="start",le="0.025"}
1`
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Adds a standalone command that acts as a mock telemetry server,
receiving snapshots and printing them as a JSON stream to stdout. Useful
for local development testing with scripts/develop.sh by setting
CODER_TELEMETRY_ENABLE and CODER_TELEMETRY_URL environment variabless.