Commit Graph

631 Commits

Author SHA1 Message Date
Danny Kopping 12520ee964 feat: add ai provider status and reload freshness metrics (#25770)
Add metrics for `aibridged` and `aibridgeproxyd`'s provider statuses. AI providers can be modified, and possibly misconfigured, at runtime. These metrics help operators understand the state of these provider definitions in case unexpected behaviour is observed.
2026-05-28 14:57:33 +02:00
Mathias Fredriksson 3770176b7f fix(scripts): use merge-base in emdash lint to avoid false positives (#25726)
When GITHUB_BASE_REF is set, the emdash lint compared against the tip
of main instead of the merge-base. For PRs behind main, this produced
a diff covering all divergent files, flagging pre-existing emdashes the
PR never touched.

Query the PR commit count via gh, deepen HEAD by that amount, and
resolve HEAD~N as the merge-base. Falls back to the branch tip when
the merge-base cannot be determined.
2026-05-28 13:45:01 +03:00
blinkagent[bot] 1bfc1ce2c4 chore: update terraform to v1.15.5 (#25746)
Bumps bundled Terraform from `1.15.2` to `1.15.5` across all pinned
locations:

- `.github/actions/setup-tf/action.yaml`
- `scripts/Dockerfile.base`
- `install.sh`
- `flake.nix` (+ updated SRI hash for the linux_amd64 zip)
- `mise.toml`
- `mise.lock` (+ updated per-platform SHA256 checksums)
- `provisioner/terraform/testdata/version.txt`
-
`provisioner/terraform/testdata/resources/ai-tasks-disabled/ai-tasks-disabled.tfplan.json`

## Why

Terraform 1.15.5 is built with Go 1.25.10, while the 1.15.2 we currently
ship was built with Go 1.25.8. The newer Go runtime addresses recent
stdlib CVEs flagged by security scanners.

Releases included: 1.15.3 (provider install crash fix, nested-module
stack migration fix), 1.15.4 (Linux s390x builds, symlinked provider dir
fix), 1.15.5.

Release notes:
https://github.com/hashicorp/terraform/releases/tag/v1.15.5

## Cherry-pick

#25747 mirrors this PR against `release/2.34`.

Created on behalf of @Shelnutt2

Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>
2026-05-27 16:46:25 -04:00
Thomas Kosiewski 51836e681e refactor: build dogfood image as base + mise oci layers (#25448)
Splits the dogfood image into two artifacts:

- `ghcr.io/coder/oss-dogfood-base:<distro>-<base-sha>`: Ubuntu base with
apt packages, chrome, rustup, brew, gh, and the mise binary. The
base-sha is a cache key over `Dockerfile.base` and `files/`, so commits
that don't touch those inputs reuse the previous build.
- `codercom/oss-dogfood:<final-sha>-<distro>` and rolling tags
(`:22.04`, `:26.04`, `:latest`, `:<branch>`): produced by `mise oci
build` on top of the base, with one content-addressed OCI layer per mise
tool. The rolling tag scheme is unchanged, so the workspace template
doesn't need updating.

Single-tool version bumps now invalidate only that tool's OCI layer, so
workspaces re-pull just what changed instead of the entire 5-6 GB image
on every recreate.

Also:

- Drops the build-time `pnpm dlx playwright@1.47.0 install --with-deps
chromium` step (~400 MB) and the equivalent `playwright-driver.browsers`
install from `flake.nix`. `@playwright/mcp` (used by the claude-code and
codex MCP servers in `dogfood/coder/main.tf`) does NOT auto-install
browsers, so the existing `install-deps` `coder_script` now runs two
installs on workspace start: `pnpm exec playwright install chromium` for
the site's pinned `@playwright/test`, and `npx
--package=@playwright/mcp@latest playwright-core install --no-shell
chromium` so the MCP servers find their matching browser revision.
Browser revisions coexist under
`~/.cache/ms-playwright/chromium-<rev>/`, which lives on the home volume
so both downloads happen once per workspace recreate and persist across
restarts. Net effect: same MCP behavior as before, +~1-2 min on first
workspace start. Nix devshell users running site e2e tests locally now
need `pnpm exec playwright install` once (instead of getting browsers
via nixpkgs).
- Bumps the pinned mise binary to v2026.5.12 (matching main after
#25521) and adds top-level `min_version = "2026.5.12"` to `mise.toml` so
every consumer (devs, CI, the embedded mise inside the dogfood image,
mise oci builds) fails fast on an older mise.
- Adds bison, flex, libicu-dev, libreadline-dev, uuid-dev, and
zlib1g-dev to both Ubuntu base images for source-build use cases (e.g.,
building Postgres from source).
- Replaces skopeo with crane as the registry client `mise oci push`
shells out to: crane is added to `mise.toml`, the workflow drops its
`apt-get install skopeo` and forces `--tool crane`, and the local
wrapper image stops bundling skopeo. One source of truth for tool
versions, no apt drift, smaller wrapper image, and workspace users get a
registry client on PATH for free via mise oci's tool layers.
- Removes `nix.hash`/`mise.hash` and their Makefile rules. The registry
digest already captures every effective change since CI rebuilds when
any baked-in input moves; the per-file `filesha1()` entries in
`pull_triggers` are redundant.

Supersedes #25400 (the `mise.hash` pull trigger landed there in
`2b612abe7b`; this PR removes it as part of the broader simplification).

> [!NOTE]
> `mise oci build` is experimental and requires `MISE_EXPERIMENTAL=1`
(set at job level in the workflow). The local-only
`scripts/dogfood/mise-oci-wrapper.sh` builds a tiny
`coderdev/mise-oci-wrapper:<version>` Debian image with curl-installed
mise on first invocation (cached by version tag thereafter); we don't
reuse `jdxcode/mise:latest` because that tag lags upstream GitHub
releases by days and would defeat the `min_version` enforcement above.

> [!NOTE]
> `compute-base-sha.sh` and `compute-final-sha.sh` are cache keys, not
strict content addresses: the base Dockerfile still pulls dynamic
resources at build time (gh/buildx `releases/latest`, chrome
`stable_current_amd64.deb`, apt mirror state). Two runs with identical
checked-in files can produce slightly different bytes, which is
acceptable here because the cache-hit savings on irrelevant commits
outweigh that drift.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Signed-off-by: Thomas Kosiewski <tk@coder.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 14:52:21 +02:00
Cian Johnston 0a45f96d30 ci: validate dogfood image tooling by running gen, fmt, lint, build (#25475)
Adds a `test_image` job that runs `make gen`, `make fmt`, `make lint`, and `make build` inside the
newly built image via `docker run`. This helps detect breaking changes before merge. 

> [!NOTE]
> Generated with [Coder Agents](https://coder.com/agents)
2026-05-25 17:02:13 +01:00
Cian Johnston a4afb9dfc6 feat: add --env-file flag to develop.sh (#25621)
Adds `--env-file` to `scripts/develop.sh` to allow reading environment 
from a given file. This makes it easier to configure things like external 
auth providers, access URLs, and other dev-time settings without 
exporting a wall of environment variables in every shell session.

> Generated with [Coder Agents](https://coder.com/agents)
2026-05-25 11:54:57 +01:00
Michael Suchacz ca1f6b19a2 feat: remove legacy chat provider tables (#25416) 2026-05-22 09:50:01 +02:00
Spike Curtis 8dc4d76890 chore: add agent-connection-watch for workspaces (#24507)
<!--

If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting.

-->

relates to GRU-18  
  
Adds basic implementation for Workspace Agent Connection Watch and tests.  
  
Missing are handling of logs.
2026-05-20 13:09:11 -04:00
Steven Masley 19a1fa5c13 chore: disable access url to get a try.coder.app in dev (#25510) 2026-05-20 08:49:23 -05:00
Thomas Kosiewski 5f9b3220b5 chore: install dogfood image tooling via mise.toml (#25282)
This PR replaces the hand-rolled `curl | tar | go install | cargo
install` chains in the dogfood Ubuntu 22.04 and 26.04 Dockerfiles with a
single `mise install` driven by a new repo-root `mise.toml`.

The previous Dockerfiles installed ~25 CLIs across three multi-stage
builds with versions hardcoded inline. Version bumps were scattered
across the Dockerfiles, the root `mise.toml` (added in #24618 but
otherwise unused at runtime), and CI's setup actions; build-time network
failures came from a dozen distinct endpoints; and `mise` itself sat in
the image with no manifest to install from.

The new flow:

- The repo's `mise.toml` is the single source of truth for image tool
versions. The Dockerfiles `COPY` it to `/etc/mise/config.toml` and run a
single `mise install` as the `coder` user.
- Tools are installed into `/opt/mise/data` rather than the default
`/home/coder/.local/share/mise`, so they live in the image (not on the
persistent home volume) and reach every workspace on recreate.
- Build context moves to the repo root so the Dockerfile can `COPY
mise.toml`; an allowlist `.dockerignore` keeps the transferred context
to ~24 kB.
- Optional `--secret id=github_token` plumbing through the Makefile and
`.github/workflows/dogfood.yaml` lifts aqua's GitHub API quota from
60/hr unauthenticated to 1000/hr with `secrets.GITHUB_TOKEN`.
- `MISE_TRUSTED_CONFIG_PATHS=/home/coder:/etc/mise` is set as an ENV so
users who clone the coder repo into their workspace home aren't prompted
to `mise trust`.

Net diff for the two Ubuntu Dockerfiles: -399 / +244 lines (~200 lines
shorter each). The `FROM rust-utils`, `FROM go`, and `FROM proto`
multi-stage builds are gone; so are the NVM/Node block, the bulk
binary-install block (golangci-lint, helm, kubectx, syft, cosign, bun),
the gh `.deb`/lazygit/doctl tarball installs, the gofmt
`update-alternatives` line, and the `yq`→`yq4` rename
(`scripts/lib.sh:267-275` already auto-detects either name).

Both images were built and smoke-tested with Apple's `container` CLI on
macOS — every migrated tool resolves to the expected pinned version
including outside the cloned coder repo (e.g. `gh` from `/home/coder`,
matching the workspace startup script in `dogfood/coder/main.tf`),
`sqlc` runs (proving `CGO_ENABLED=1` was honoured at install), `yq
--version` reports v4 for `scripts/lib.sh`'s detection, and `gofmt`
resolves via the mise shim.

Follow-ups (out of scope here):

- Commit a multi-platform `mise.lock` so `gh = "latest"` and the other
floating versions resolve deterministically across rebuilds and dev
machines.
- Migrate CI's `setup-go` / `setup-node` actions to consume `mise.toml`
so image and CI versions stop being able to drift.

---------

Signed-off-by: Thomas Kosiewski <tk@coder.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 11:36:22 +02:00
Garrett Delfosse d97f5ae2a6 fix: add ESR support to release calendar script (#25205)
The `update-release-calendar.sh` script did not account for Extended
Support Release (ESR) versions. Running it would drop ESR entries (e.g.
2.24) from the calendar entirely or mark them as "Not Supported" instead
of "Extended Support Release".

## Changes

- Add `ESR_VERSIONS` array for tracking active ESR minor versions
- Add `is_esr_version()` helper to check ESR membership
- Extract `generate_release_row()` to reduce duplication
- Prepend ESR versions older than the standard window
- Override "Not Supported" status for ESR versions within the window

> [!NOTE]
> When new ESR versions are designated or old ones reach end of life,
update the `ESR_VERSIONS` array at the top of the script.

<!-- This PR was authored by Coder Agents -->
2026-05-14 15:35:30 -04:00
Seth Shelnutt 8eb7051987 fix(scripts/ironbank): update base image to UBI9 and remove urllib3 (CVE-2026-44431) (#25217)
The IronBank Dockerfile used UBI8-minimal:8.7 as its base image.
IronBank has migrated images to UBI9 base, and the bundled urllib3
1.26.5 in the image triggers CVE-2026-44431 (sensitive headers leaked on
cross-origin redirects via the low-level API).

This updates the base image from UBI8-minimal to UBI9-minimal and
explicitly removes python3-urllib3 after package installation. Coder is
a Go binary and does not invoke Python at runtime, so urllib3 is unused.

Refs
[ENT-4](https://linear.app/codercom/issue/ENT-4/ironbank-v23111-update-urllib3-from-1265-to-fix-cve-2026-44431),
[ENT-51](https://linear.app/codercom/issue/ENT-51/ironbank-main-update-base-image-urllib3-cve-2026-44431),
[CVE-2026-44431](https://nvd.nist.gov/vuln/detail/CVE-2026-44431)

> Generated by Coder Agents

<details><summary>Decision log</summary>

- **Base image**: Moved from `ubi8-minimal:8.7` to `ubi9-minimal:9.6` to
align with IronBank's UBI9 migration and reduce overall vulnerability
surface.
- **urllib3 removal**: Added explicit `microdnf remove python3-urllib3`
with error suppression (`|| true`) so the build succeeds whether or not
the package is present in the base image. This handles both the minimal
and full UBI9 base image variants that IronBank may use.
- **Crypto policies**: RHEL 9 uses the same
`/etc/crypto-policies/back-ends/*.config` paths as RHEL 8; no changes
needed.
- **Build script**: Updated the `registry.access.redhat.com` override
from `ubi8/ubi-minimal:8.7` to `ubi9/ubi-minimal:9.6` for local builds.

</details>
2026-05-13 10:41:56 -04:00
Garrett Delfosse 566dace1bc fix(scripts/releaser): use last stable release as changelog base for .0 releases (#24988)
When releasing a `.0` version (e.g. `v2.33.0`) from a release branch,
the release notes diff was comparing against the most recent RC (e.g.
`v2.33.0-rc.3`) instead of the last stable release from the previous
minor series (e.g. `v2.32.X`).

## The bug

`prevVersion` is set to the latest tag matching the branch's
`major.minor` from merged tags. For a `.0` release, this is the latest
RC (e.g. `v2.33.0-rc.3`). The commit range for release notes then
becomes `v2.33.0-rc.3..HEAD` instead of `v2.32.X..HEAD`, so the notes
only show the delta from the last RC rather than all changes since the
previous real release. The compare link also points to
`v2.33.0-rc.3...v2.33.0`.

## The fix

After all semver sanity checks have run (so version suggestion and
validation are unaffected), when the new version is a `.0` release and
`prevVersion` is an RC, override `prevVersion` with the last stable
release from the previous minor series. This makes both the commit range
and compare link use the correct base (e.g. `v2.32.X..HEAD` and
`v2.32.X...v2.33.0`).

> Generated with [Coder Agents](https://coder.com/agents)

---------

Co-authored-by: Zach <3724288+zedkipp@users.noreply.github.com>
2026-05-11 22:18:09 +00:00
Michael Suchacz bb8c40e764 feat: stream go test failure summary and drop raw json artifact (#25146)
This follows up on
https://github.com/coder/coder/actions/runs/25684936801/job/75406131184?pr=25139
by replacing the large raw Go test JSON artifact with inline structured
summaries and a compact failures-only artifact.

## What changed

- Added `scripts/gotestsummary`, a streaming Go tool that reads
gotestsum JSON and renders failed tests as Markdown.
- Updated the three Go test jobs to publish per-test `<details>`
sections in the job summary.
- Removed upload of the raw `go-test.json` artifact.
- Added upload of `go-test-failures-*.ndjson` with compact failure
records for deeper inspection.
- Deleted the old bash and `jq` summary script.

## Why

- The previous raw artifact was about 35 MB compressed and 445 MB raw in
the linked run.
- Passing-test output made the artifact noisy and slow to inspect.
- The old summary truncated output to 600 characters.
- The new path keeps streaming, bounded output and writes structured
diagnostics for only final failed tests.

## Validation

- `gofmt -w scripts/gotestsummary`
- `gofmt -l scripts/gotestsummary`
- `go test ./scripts/gotestsummary/...`
- `go vet ./scripts/gotestsummary/...`
- `grep -rn 'go-test-failure-summary.sh' . || true`
- `grep -rn 'go-test-failure-summary.sh\|go-test.json\|go-test-json-'
.claude .agents docs AGENTS.md || true`
- `make lint/agents`
- `make lint/emdash`
- `make lint/markdown`
- `make lint/shellcheck`
- `git diff --check origin/main..HEAD`

> This PR was prepared by Mux working on Mike's behalf.
2026-05-12 00:08:37 +02:00
J. Scott Miller 3e46c7986f feat: event driven agent connection metric (#24355)
Moves the `coderd_agents_first_connection_seconds` histogram from the
polling-based `prometheusmetrics.Agents()` loop to the event-driven
`agentConnectionMonitor.init()` path. The metric is now recorded exactly
once when an agent first connects over the RPC websocket, instead of
being retroactively computed each polling tick.

The `username` and `workspace_name` labels are removed to reduce
cardinality; only `template_name` and `agent_name` are retained.

Adds unit tests covering both the happy path (first connection recorded)
and the negative-duration guard (clock skew logs a warning, no sample
emitted).
2026-05-11 14:27:40 -05:00
Cian Johnston e8508b2d90 fix: recover chatd from poisoned chain anchor on retry (#25097)
When OpenAI's Responses API returns `Previous response with id ... not
found` for a chained turn, classify it as a `ChainBroken` retry, clear
`previous_response_id`, exit chain mode, reload full history, and let
`chatretry` retry. Self-heals chats whose anchor was poisoned before
#25074 stopped truncated streams from being persisted as a successful
turn with a stored response id.

The new state is exposed via the existing
`coderd_chatd_stream_retries_total` counter as a
`chain_broken="true"|"false"` label. Aggregating queries (`sum`, `rate`
over `provider`/`model`/`kind`) keep working without changes; raw-series
matchers without aggregation will now see two series per `(provider,
model, kind)` where they previously saw one. The metric is internal-only
so the blast radius should be small, but if you have dashboards that
index by exact label matchers without aggregation they will need an
extra `sum` or an explicit `chain_broken` selector.

> 🤖 This PR was created with the help of Coder Agents, and was reviewed by a human 🧑‍💻
2026-05-11 17:43:40 +01:00
Michael Suchacz 85792d08bc feat: add harness engineering layer for agent workflows (#24791)
This PR adds an opinionated harness-engineering layer for agent-driven
workflows: a small set of agent-readable docs, mechanical structure
checks, structured CI failure summaries, an architecture-lint umbrella,
and per-worktree dev-server isolation. The goal is to make local dev,
tests, and CI mechanically inspectable by agents without changing app
runtime behavior.

## What landed

**Agent docs and navigation**
- `.claude/docs/OBSERVABILITY.md`, `.claude/docs/DEV_ISOLATION.md`,
`.claude/docs/AGENT_FAILURES.md`: task-oriented guides for logs,
tracing, Prometheus, dev-server isolation, and a seeded failure catalog.
- `AGENTS.md`: added an `Agent navigation` block, then trimmed the file
from 375 to 229 lines by migrating duplicated detail into
`WORKFLOWS.md`, `GO.md`, `TESTING.md`, and `DATABASE.md`. The
user-managed custom-instructions block is preserved.
- `.agents/docs`: symlink mirror of `.claude/docs` for agent runtimes
that look under `.agents`.

**Mechanical checks**
- `scripts/check_agents_structure.sh`: validates `@...` references in
tracked `AGENTS.md` files and warns when root grows past 600 lines.
Wired as `make lint/agents` and into `make lint`.
- `scripts/audit-agent-readiness.sh`: report-first audit of harness
readiness. Currently `10 ok, 0 warn, 0 fail`.
- `scripts/check_architecture.sh` / `make lint/architecture`: umbrella
architecture-lint target. Consolidates the existing
`check_enterprise_imports.sh` and `check_codersdk_imports.sh` so they
run exactly once via the umbrella. Slot is open for new high-confidence
rules.

**Structured CI failure summaries**
- `scripts/playwright-failure-summary.sh`: parses
`site/test-results/results.json` and writes Markdown to
`$GITHUB_STEP_SUMMARY` on failure. Wired into the `test-e2e` matrix job.
- `scripts/go-test-failure-summary.sh`: parses `go test -json`
line-delimited output the same way. Wired into `test-go-pg`,
`test-go-pg-17`, and `test-go-race-pg` by injecting `gotestsum
--jsonfile` in the workflow without touching `Makefile`. JSON also
uploaded as a CI artifact on failure.
- `site/e2e/playwright.config.ts`: enables `screenshot:
only-on-failure`, `trace: retain-on-failure`, JSON reporter, and HTML
reporter alongside existing reporters.
- `.github/workflows/ci.yaml`: failure artifact uploads for Playwright
now use `if: failure()` and predictable names
(`playwright-artifacts-<variant>-<sha>`).

**Per-worktree dev-server isolation** (`scripts/develop/main.go`)
- Deterministic FNV-64a hash of the worktree path produces a port offset
in `[0, 1000)` (50 buckets, step 20 to avoid API/proxy overlap across
adjacent buckets).
- Offset is applied only to defaults; both env vars (`CODER_DEV_PORT`,
`CODER_DEV_WEB_PORT`, `CODER_DEV_PROXY_PORT`,
`CODER_DEV_PROMETHEUS_PORT`) and CLI flags retain priority.
- Hardcoded ports `9090` (embedded Prometheus UI) and `12345` (Delve)
are unchanged by design.
- Startup banner shows each port's source: `default`, `offset`, or
`explicit`.
- Unit tests in `scripts/develop/main_test.go` cover determinism,
bounds, no-overlap across the four ports, and explicit-skip behavior.
- State (`.coderv2/`) was already worktree-isolated via `os.Getwd()`, so
no state-dir changes were needed.

## Validation

`make lint/agents`, `make lint/architecture`, `make lint/emdash`, `bash
scripts/audit-agent-readiness.sh` (10 ok, 0 warn, 0 fail), `shellcheck`
on all 5 new scripts, `go test ./scripts/develop/...`, and `js-yaml`
parse of `ci.yaml` all pass. Synthetic fixtures verify both
failure-summary scripts handle empty/missing input (silent exit 0),
ANSI-stripped output, and parent/subtest formatting.

## Known follow-ups (deferred)

- Frontend Storybook/Vitest failure summary: lowest-leverage slice of
the failure-summary work. Skipping until observed pain.
- Architecture lint currently only delegates to existing import checks;
new rules (`InTx` outer-store detection, swagger-annotation lint) plug
in as needed.
- 50 port-offset buckets means two worktree paths can occasionally
collide. The DEV_ISOLATION doc tells users to set the relevant env var
when this happens.

> Mux opened this PR on Mike's behalf.
2026-05-11 17:27:29 +02:00
Yevhenii Shcherbina 4124d1137d feat: add ai_model_prices table (#24932)
# Summary

Implements
https://linear.app/codercom/issue/AIGOV-282/add-ai-model-price-table-and-seed-generator

This PR lays the groundwork for AI Bridge cost controls (per the AI
Governance RFC). It adds the foundation needed for future cost tracking:
a place to store per-model token prices, a way to keep those prices in
sync with upstream pricing data, and a startup mechanism that ensures
every deployment has prices loaded before AI Bridge starts processing
requests.

The price data comes from [models.dev](https://models.dev/), a
community-maintained catalogue of AI provider pricing. A generator
script fetches the latest prices, filters to Anthropic and OpenAI for
now, and produces a seed file checked into the repository.

On every server startup the seed is applied to the database, so new
releases automatically pick up any price corrections that landed since
the previous one. Existing rows are overwritten with the latest prices;
rows for models no longer in the seed are left untouched.

# Batching the AI model price seed: three approaches

Context: at server startup we seed the `ai_model_prices` table from an
embedded JSON price book (~70 rows today, will grow as we add providers,
potentially 4000+).

Each row is:

```text
(provider, model, input_price, output_price, cache_read_price, cache_write_price)
```

Any of the four price columns can be:

- `NULL` → “price unknown for this dimension”
- explicit `0` → “free”

The batch must be an UPSERT so re-running is idempotent and existing
rows pick up new prices.

We considered three implementations.

---

## Approach 1 — Per-row UPSERT in a Go loop

```go
for _, row := range rows {
    if err := db.UpsertAIModelPrice(ctx, database.UpsertAIModelPriceParams{
        Provider:   row.Provider,
        Model:      row.Model,
        InputPrice: nullInt64(row.InputPrice),
        // ...
    }); err != nil {
        return err
    }
}
```

### Pros

- Trivial.
- NULL handling falls out naturally from `sql.NullInt64`.

### Cons

- `N` round-trips per seed.
- With ~70 rows that means ~70 statement executions on every startup,
even inside a transaction.
- Doesn't scale gracefully as the price book grows, potentially 4000+.

---

## Approach 2 — `UNNEST` with parallel arrays

Pass each column as a separate Go slice. Postgres unnests them in
parallel into a virtual table, then `INSERT ... SELECT`.

```sql
INSERT INTO ai_model_prices (
    provider,
    model,
    input_price,
    output_price,
    cache_read_price,
    cache_write_price
)
SELECT
    UNNEST(@providers::text[]),
    UNNEST(@models::text[]),
    NULLIF(UNNEST(@input_prices::bigint[]), -1),
    NULLIF(UNNEST(@output_prices::bigint[]), -1),
    NULLIF(UNNEST(@cache_read_prices::bigint[]), -1),
    NULLIF(UNNEST(@cache_write_prices::bigint[]), -1)
ON CONFLICT (provider, model) DO UPDATE SET
    input_price       = EXCLUDED.input_price,
    output_price      = EXCLUDED.output_price,
    cache_read_price  = EXCLUDED.cache_read_price,
    cache_write_price = EXCLUDED.cache_write_price,
    updated_at        = NOW();
```

Go side: flatten rows into six parallel slices.

Use a sentinel (`-1`) for “missing”, since `lib/pq` can't encode `NULL`
into a `bigint[]` element.

```go
providers := make([]string, len(rows))
models    := make([]string, len(rows))
inputs    := make([]int64,  len(rows))
outputs   := make([]int64,  len(rows))
cacheR    := make([]int64,  len(rows))
cacheW    := make([]int64,  len(rows))

for i, r := range rows {
    providers[i] = r.Provider
    models[i]    = r.Model

    inputs[i] = -1
    if r.InputPrice != nil {
        inputs[i] = *r.InputPrice
    }

    outputs[i] = -1
    if r.OutputPrice != nil {
        outputs[i] = *r.OutputPrice
    }

    cacheR[i] = -1
    if r.CacheReadPrice != nil {
        cacheR[i] = *r.CacheReadPrice
    }

    cacheW[i] = -1
    if r.CacheWritePrice != nil {
        cacheW[i] = *r.CacheWritePrice
    }
}

return db.UpsertAIModelPrices(ctx, database.UpsertAIModelPricesParams{
    Providers:        providers,
    Models:           models,
    InputPrices:      inputs,
    OutputPrices:     outputs,
    CacheReadPrices:  cacheR,
    CacheWritePrices: cacheW,
})
```

### Pros

- Single round-trip.

### Cons

- The generated `sqlc` params become plain `[]int64`, which can't
represent `NULL`.

---

## Approach 3 — `jsonb_array_elements` over a single `@seed::jsonb`
(chosen)

Pass the raw seed JSON as one parameter; let Postgres expand and parse
it.

```sql
INSERT INTO ai_model_prices (
    provider,
    model,
    input_price,
    output_price,
    cache_read_price,
    cache_write_price
)
SELECT
    elem->>'provider',
    elem->>'model',
    (elem->>'input_price')::bigint,
    (elem->>'output_price')::bigint,
    (elem->>'cache_read_price')::bigint,
    (elem->>'cache_write_price')::bigint
FROM jsonb_array_elements(@seed::jsonb) AS elem
ON CONFLICT (provider, model) DO UPDATE SET
    input_price       = EXCLUDED.input_price,
    output_price      = EXCLUDED.output_price,
    cache_read_price  = EXCLUDED.cache_read_price,
    cache_write_price = EXCLUDED.cache_write_price,
    updated_at        = NOW();
```

Go side reduces to:

```go
return db.UpsertAIModelPrices(ctx, seedJSON)
```

### Pros

- Single round-trip.
- NULLs fall out naturally:
  - `(elem->>'cache_write_price')::bigint` becomes `NULL`
  - no sentinels
- The seed is already JSON:
- Existing precedent:
  - `jsonb_array_elements` is already used elsewhere in the codebase

### Cons

- Less type-safe at the SQL boundary than `UNNEST`
- Slightly less standard than `UNNEST`
- Readers need familiarity with:
  - `jsonb_array_elements`
  - `->>` extraction syntax
- Postgres pays JSON parse cost
  - negligible at our scale

---

---

# Decision

We picked Approach 3.

It collapses the round-trips like `UNNEST` does, but without:

- nullable-array workarounds
- sentinel values
2026-05-08 16:45:14 -04:00
Jon Ayers 400374992c fix: add pnpm overrides for vulnerable transitive dependencies (#25064) 2026-05-07 15:11:32 -05:00
Jon Ayers eef09f3d98 chore: update terraform to v1.15.2 (#25045) 2026-05-07 11:07:19 -05:00
david-fraley e7360da974 docs: generate Chats API docs from swagger annotations (#24830) 2026-05-05 18:52:54 +00:00
Garrett Delfosse a8222e02e5 fix(scripts/releaser): fix tag sorting and changelog blurb for older branches (#24798)
Fixes two bugs in the release tool.

## 1. RC tags chosen over release tags on release branches

`allSemverTags()` and `mergedSemverTags()` rely on `git tag
--sort=-v:refname` for ordering. Git's version sort treats pre-release
suffixes (e.g. `-rc.0`) as *greater* than the base release version,
which is the opposite of semver where `v2.32.0 > v2.32.0-rc.0`.

When the release branch code iterates the tag list looking for the first
matching `major.minor`, it finds the RC tag first, leading to incorrect
version suggestions (e.g. suggesting `v2.32.0` again instead of
`v2.32.1`).

**Fix:** Re-sort parsed tags using the existing `GreaterThan` method via
a new `sortVersionsDesc` helper.

## 2. Misleading mainline changelog blurb on ESR/older branch patches

When releasing a patch on an older branch (e.g. `release/2.29` for ESR),
the version is neither mainline nor stable. Declining the stable prompt
would always produce the mainline changelog note ("This is a mainline
Coder release..."), which is incorrect.

**Fix:** Only emit the mainline note when the version's minor matches
the current mainline series. For older branches the changelog omits the
note entirely.

> Generated by Coder Agents
2026-05-01 14:41:09 -04:00
Dean Sheather e57525002c chore: remove agents experiment flag and mark feature as beta (#24432)
Remove the `ExperimentAgents` feature flag so the Agents feature is
always available without requiring `--experiments=agents`. The feature
is now in beta.

Existing deployments that still pass `--experiments=agents` will get a
harmless "ignoring unknown experiment" warning on startup.

### Changes

**Backend:**
- Remove `RequireExperimentWithDevBypass` middleware from chat and MCP
server routes
- Always include `AgentsAccessRole` in assignable site roles (later
refactored to org-scoped on main; rebase keeps that)
- Always set `AgentsTabVisible = true`, then drop the entire dead
`AgentsTabVisible` metadata pipeline (Go htmlState field,
populateHTMLState goroutine, HTML meta tag, useEmbeddedMetadata
registration, mock); no production consumer reads it. `AgentsNavItem`
already gates on `permissions.createChat`.
- Make `blob:` CSP `img-src` addition unconditional
- Remove `ExperimentAgents` constant, `DisplayName` case, and
`ExperimentsKnown` entry

**CLI:**
- Graduate the agents TUI from `coder exp agents` to `coder agents`
(moved from `AGPLExperimental()` to `CoreSubcommands()`)
- Drop the `agent` alias so it does not collide with the hidden
workspace-agent command
- Rename implementation files `cli/exp_agents_*.go` -> `cli/agents_*.go`
and internal identifiers (`expChatsTUIModel` -> `chatsTUIModel`,
`newExpChatsTUIModel` -> `newChatsTUIModel`, `setupExpAgentsBackend` ->
`setupAgentsBackend`, `startExpAgentsSession` -> `startAgentsSession`,
`expAgentsPtr` -> `agentsPtr`, `expAgentsSession` -> `agentsSession`,
`TestExpAgents*` -> `TestAgents*`). `expClient` (the
`*codersdk.ExperimentalClient` local) is kept; `coderd/exp_chats*.go`
and other still-experimental `cli/exp_*.go` commands are intentionally
untouched.

**Frontend:**
- Remove experiment check from `AgentsNavItem` - render when
`canCreateChat` is true
- Remove `agentsEnabled` experiment check from `WorkspacesPage`, then
gate `chatsByWorkspace` on `permissions.createChat` so users without
chat access don't trigger the per-page DB query (Copilot review
feedback)
- Add `FeatureStageBadge` (beta) next to the Coder logo in the Agents
sidebar (desktop + mobile)

**Docs:**
- Remove experiment flag setup instructions from `early-access.md` and
`getting-started.md` (and rename `early-access.md`'s "Enable Coder
Agents" heading to "Set up Coder Agents", since there is no enablement
step left)
- Update `chats-api.md` and `getting-started.md`'s Chats API note to say
"beta" instead of "experimental"
- `docs/manifest.json`: drop "experimental" from the Chats API sidebar
description
- `make gen` regenerated `docs/reference/cli/agents.md` and the CLI
index
- `scripts/check_emdash.sh`: exclude `cli/testdata/*.golden` and
`enterprise/cli/testdata/*.golden` from the new repo-wide emdash lint,
since serpent emits emdash borders in every generated `--help` golden
file

**Tests:**
- Remove `ExperimentAgents` setup from all test files (14 occurrences
across 7 files)
- Update stale "with the agents experiment" comments in
`coderd/x/chatd/integration_test.go` and `coderd/mcp_test.go`


<img width="1185" height="900" alt="image"
src="https://github.com/user-attachments/assets/b420bc8f-41d6-42c6-abd8-ad572533d651"
/>


> 🤖 Generated by Coder Agents
2026-05-01 01:49:00 +10:00
Thomas Kosiewski 06bad73df4 feat: add admin-configurable advisor API, SDK, and queries (#24621)
## Summary

Add the **admin-configurable advisor configuration**: database-backed storage, SDK types, and the experimental HTTP handlers that back the admin settings UI (later PRs). Follows the same "site-configs" pattern as Virtual Desktop.

## Motivation

The advisor needs runtime-tunable knobs (enable/disable, per-run cap, max output tokens, reasoning effort, optional model override) without a service restart or redeploy. Using the existing `site_configs` K/V table keeps this pattern consistent with other admin features and avoids a bespoke schema.

## Changes

### Database (`coderd/database/queries/siteconfig.sql`)
- `GetChatAdvisorConfig` returns the stored JSON blob (default `'{}'`) under key `agents_advisor_config`.
- `UpsertChatAdvisorConfig` uses the standard `INSERT ... ON CONFLICT` pattern.
- Regenerated via `make gen` (queries.sql.go + mocks).

### SDK (`codersdk/chats.go`)
- `AdvisorConfig` type with `Enabled`, `MaxUsesPerRun`, `MaxOutputTokens`, `ReasoningEffort` (`""` / `low` / `medium` / `high`), `ModelConfigID uuid.UUID`.
- Client methods: `ChatAdvisorConfig(ctx)` / `UpdateChatAdvisorConfig(ctx, cfg)`.

### API (`coderd/exp_chats.go`)
- `GET /api/experimental/chats/config/advisor`: reads current config; relies on `ActorFromContext` validation.
- `PUT /api/experimental/chats/config/advisor`: requires `policy.ActionUpdate` on `rbac.ResourceDeploymentConfig`.
- Handlers unmarshal `{}` to a typed zero value and re-marshal on upsert for schema stability.
- Tests in `exp_chats_test.go` cover empty defaults, round-trip update, unauthorized update, and invalid body.

## Stack context

This is **PR 3 of 6** in the advisor feature stack. Consumed by:
- PR 4 (`feat/advisor-04-chatd-runtime`), which reads this config on every `runChat`.
- PR 6 (`feat/advisor-06-admin-settings-ui`), which renders the admin form.

## Scope / non-goals

- No `chatd` read path (lands in PR 4).
- No UI (lands in PR 6).
- `agents_advisor_config` remains a single-row JSON blob; we intentionally do not shard per-org/per-template yet.

## Validation

- `make gen`
- `go test ./coderd/database/... -run TestChatAdvisor`
- `go test ./coderd/... -run TestChatAdvisorConfig`
- `make lint`

---

<details>
<summary>📋 Implementation Plan (shared across the advisor stack)</summary>

# Plan: Add a Mux-style advisor tool to coder agents/chatd

## Outcome

Add a first-class `advisor` tool to agent chats in `coderd/x/chatd` that feels native to Coder:

- it is a built-in server-side tool, not an MCP/dynamic-tool workaround;
- it performs a nested **tool-less** model call for strategic advice;
- it is exposed only when eligible, and the prompt mentions it only when it is actually available;
- it is treated as a **planning-only** tool so it does not run alongside action tools in the same batch;
- it tracks usage/cost separately enough for operators to reason about it;
- it has a minimally polished UI in the Agents page;
- and it ships with explicit dogfooding evidence, including screenshots and repro videos.

## Design decisions to lock before coding

1. **Primary architecture:** native built-in tool in `chattool/`, backed by a small `chatadvisor` package.
2. **Nested model execution:** reuse chatd's existing model/provider stack for a one-step, tool-less advisor call rather than inventing a new provider pathway.
3. **Execution policy:** treat `advisor` as an exclusive/planning-only tool; mixed batches must return structured policy errors and force the model to retry cleanly.
4. **Availability:** initial rollout is for root agent chats only; disable for child/sub-agent chats until recursion/cost policy is proven.
5. **Prompt sync:** use one eligibility boolean to drive both tool registration and advisor guidance injection.
6. **Persistence/cost split:** MVP should keep advisor usage visible in result metadata and server metrics; only add DB schema if product/billing explicitly needs queryable advisor-specific cost.
7. **UI scope:** generic tool rendering is an acceptable temporary milestone during backend bring-up, but the release candidate should include a dedicated lightweight advisor renderer.

## Delivery model

The work should be executed as coordinated workstreams with one integration owner and parallel contributors for low-conflict areas. The integration owner should own `coderd/x/chatd/chatd.go` because prompt assembly, tool registration, and model resolution all converge there.

## Detailed workstreams

### Repo evidence used for this plan

<details>
<summary>Mux reference and current chatd seams</summary>

**Mux reference implementation**

- `src/node/services/tools/advisor.ts` — native advisor tool implementation.
- `src/common/constants/advisor.ts` — advisor prompt/constants and truncation policy.
- `src/common/utils/tools/tools.ts` — conditional tool registration.
- `src/node/services/streamContextBuilder.ts` — injects advisor guidance only when the tool is available.

**Current chatd seams**

- `coderd/x/chatd/chatd.go`
  - `processChat()` — tool assembly, prompt assembly, and chatloop invocation.
  - `resolveChatModel()` — current model/provider/key resolution seam.
  - `type Config struct` — server-level chatd configuration surface.
- `coderd/x/chatd/chatloop/chatloop.go`
  - `Run()` — main streaming/model loop.
  - `executeTools()` — built-in tool execution/batching seam.
- `coderd/x/chatd/chattool/` — built-in tool implementations.
- `site/src/pages/AgentsPage/components/ChatElements/tools/Tool.tsx` — tool renderer dispatch.
- `site/src/pages/AgentsPage/components/ChatConversation/messageParsing.ts` and `ConversationTimeline.tsx` — tool/result merge and rendering flow.

</details>

### Workstream map and ownership

| Workstream | Primary owner | Main files | Can run in parallel? | Done when |
|---|---|---|---|---|
| 0. Integration + gating | Integration lead | `coderd/x/chatd/chatd.go` | No; central merge lane | Tool registration, prompt sync, and model selection are wired together |
| 1. Advisor runtime + tool | Backend agent | new `coderd/x/chatd/chatadvisor/`, new `coderd/x/chatd/chattool/advisor.go` | Yes | Tool can perform a tool-less advisor call in memory and return structured results |
| 2. Planning-only execution policy | Chatloop agent | `coderd/x/chatd/chatloop/chatloop.go`, related tests | Yes | Mixed `advisor` + action-tool batches are rejected cleanly and deterministically |
| 3. Metrics/usage/config | Backend/telemetry agent | `chatd.go`, `chatloop/metrics.go`, optional config plumbing | Partially; coordinate with integration lead | Advisor usage is separately visible in metadata/metrics and limits are enforced |
| 4. Frontend rendering | Frontend agent | `site/.../tools/Tool.tsx`, new `AdvisorTool.tsx`, stories | Yes after result schema stabilizes | Advisor renders as a readable card and story tests pass |
| 5. Dogfood + QA evidence | QA agent | dev server, Storybook, dogfood output | After backend + UI are usable | Repro videos, screenshots, and a concise QA report exist |

### Parallelization rules

- **Do not split `coderd/x/chatd/chatd.go` across multiple execution agents without an integration lead.** That file owns prompt building, tool registration, model resolution, and cost persistence.
- Workstreams 1 and 2 can be developed in parallel and then stacked onto the integration branch.
- Workstream 4 should begin once the backend result schema is agreed on, even if the backend is still behind a feature flag.
- Any agent that needs to re-check Mux behavior should clone `coder/mux` into a temporary directory (for example, `$(mktemp -d)/mux`) and inspect it read-only; do not vendor or copy code from Mux directly.

## Phase 0 — Preflight and guardrails

### Goals

- Align the team on the smallest shippable architecture.
- Prevent scope creep into MCP/dynamic-tool/sub-agent variants.
- Decide upfront what is MVP vs. follow-up.

### Tasks

1. **Confirm the MVP boundary.**
   - Ship a built-in advisor tool first.
   - Do **not** make MCP, dynamic tools, or sub-agents the primary implementation.
   - Do **not** add transient streaming phases in the first backend PR unless they fall out almost for free.

2. **Confirm local workflow hygiene before coding.**
   - Ensure the repo is using the project git hooks from `scripts/githooks`.
   - Do not bypass hooks with `--no-verify`.
   - Use `./scripts/develop.sh` for the full dev server rather than manual build/run commands.

3. **Lock the model-selection policy.**
   - **Recommended MVP:** advisor uses the same resolved provider/model/cost config as the current chat, with advisor-specific max-output and usage caps.
   - **Follow-up only if required:** add a separate `AdvisorModelConfigID`-style override that resolves through the existing `configCache`/model-config path. Do not invent a new free-form `provider:model` parser if chatd already stores provider/model separately.

4. **Lock the persistence policy.**
   - **Recommended MVP:** no DB migration. Persist advisor-visible metadata in the tool result and record separate metrics in memory/Prometheus.
   - **Only if product/billing explicitly asks for queryable advisor cost:** add a later DB migration or usage table, following the normal `queries/*.sql` + `make gen` workflow.

5. **Create an execution ADR note in the work item or tracking doc.**
   - Capture: built-in tool, tool-less nested call, root-chat-only rollout, exclusive execution policy, MVP no-DB-migration default.

### Quality gate

- Everyone on the team can state the same answers to these questions:
  - Is advisor a built-in tool? **Yes.**
  - Can advisor run with action tools in the same batch? **No.**
  - Does advisor get tools of its own? **No.**
  - Is a DB migration required for MVP? **No, unless billing insists.**

## Phase 1 — Build the advisor runtime and tool wrapper

### Goals

Create the core advisor implementation in a way that is easy to test and keeps `chattool/` thin.

### Files to add

- `coderd/x/chatd/chatadvisor/types.go`
- `coderd/x/chatd/chatadvisor/guidance.go`
- `coderd/x/chatd/chatadvisor/handoff.go`
- `coderd/x/chatd/chatadvisor/runtime.go`
- `coderd/x/chatd/chatadvisor/runner.go`
- `coderd/x/chatd/chattool/advisor.go`

### Responsibilities by file

1. **`types.go`**
   - Define the input/result schema used by the tool and UI.
   - Keep the result shape close to Mux so the UI and model both have predictable cases.
   - Recommended result variants:
     - `advice`
     - `limit_reached`
     - `error`

   Recommended shape:

   ```go
   type AdvisorArgs struct {
       Question string `json:"question"`
   }

   type AdvisorResult struct {
       Type          string              `json:"type"`
       Advice        string              `json:"advice,omitempty"`
       Error         string              `json:"error,omitempty"`
       AdvisorModel  string              `json:"advisor_model,omitempty"`
       RemainingUses int                 `json:"remaining_uses,omitempty"`
       Usage         *AdvisorUsageResult `json:"usage,omitempty"`
   }
   ```

2. **`guidance.go`**
   - Hold two strings:
     - the nested advisor system prompt;
     - the parent-agent guidance block to inject into the outer system prompt.
   - The nested advisor prompt must say, in plain language:
     - you are advising the parent agent;
     - you do not address the end user directly;
     - you do not claim actions happened;
     - you return concise strategic guidance and tradeoffs.

3. **`runtime.go`**
   - Define the per-run runtime state.
   - Recommended fields:
     - resolved model + model config;
     - provider keys/options reused from the outer chat;
     - `MaxUsesPerRun`;
     - `MaxOutputTokens`;
     - atomic/current call counter;
     - callback(s) to obtain the current prompt snapshot and current-step snapshot;
     - optional metrics/usage hook.
   - Add fail-fast validation for impossible config: nil model, non-positive limits, empty prompt builders, etc.

4. **`handoff.go`**
   - Build the advisor handoff message from:
     - the explicit question;
     - the exact prompt/messages the parent model just used;
     - the current step's text/reasoning snapshot, if available;
     - the most recent relevant tool outputs, if they are already in the prompt snapshot.
   - **Important:** use the already-prepared outer prompt tail, not a fresh DB reload. That keeps the advisor aligned with compaction and the exact context the outer model saw.
   - Apply hard truncation budgets with recent-context bias.

5. **`runner.go`**
   - Execute the nested advisor call.
   - **Recommended implementation:** call `chatloop.Run()` in an in-memory, one-step mode:
     - `Tools: nil`
     - `ProviderTools: nil`
     - `MaxSteps: 1`
     - `PersistStep`: capture the assistant output in memory instead of writing DB rows
   - Reuse the existing provider/model/cost path instead of building a second provider runner.
   - Assert that no tool definitions are passed to the nested call.

6. **`chattool/advisor.go`**
   - Keep this file thin and consistent with other built-ins.
   - Responsibilities:
     - decode `AdvisorArgs`;
     - validate `Question` is non-empty and bounded;
     - call the `chatadvisor` runner;
     - return a structured tool response.

### Defensive programming requirements

- Assert `Question` is non-empty after trimming.
- Assert runtime limits are positive.
- Assert the nested advisor call runs with zero tools/provider tools.
- Assert `AdvisorResult.Type` is one of the known variants before returning.
- Assert remaining uses never goes negative.

### Acceptance criteria

- A unit test can call the advisor tool with a fake model and receive a stable `advice` result.
- The nested advisor call is impossible to run with tools accidentally attached.
- The core logic lives in `chatadvisor/`, not embedded inside `chatd.go`.

## Phase 2 — Wire advisor into chatd and keep prompt/tool availability in sync

### Goals

Register the tool in the right place, expose it only when eligible, and inject system guidance only when the tool is present.

### Files to modify

- `coderd/x/chatd/chatd.go`
- optionally a small helper file if `chatd.go` becomes too crowded

### Tasks

1. **Compute one eligibility boolean in `processChat()`.**
   Recommended inputs:
   - server-level advisor enabled flag;
   - root chat only (`chat.ParentChatID == uuid.Nil` or equivalent existing root/child check);
   - a usable resolved model/provider exists;
   - optional experiment/workspace/org gate if product wants staged rollout.

2. **Create the runtime once per outer chat run.**
   - Use the model/config/keys resolved by `resolveChatModel()`.
   - Reuse provider options from the current chat's `ChatModelCallConfig`.
   - Set `MaxUsesPerRun` and `MaxOutputTokens` from advisor config defaults.

3. **Register the tool in the built-in tool block.**
   - Insert after the skill tools and before MCP tools in `processChat()`.
   - Record `builtinToolNames["advisor"] = true` so metrics stay bounded.

4. **Inject advisor guidance into the outer system prompt using the same boolean.**
   - Use `chatprompt.InsertSystem()` in the same prompt assembly path that already injects user/system instructions.
   - Place the block near the existing instruction insertion, before plan-path/skill context blocks.
   - Wrap the guidance in an explicit tag like `<advisor-guidance>` so it is easy to spot in tests and future refactors.

5. **Keep advisor out of child chats for the first release.**
   - That avoids recursion/cost blowups with `spawn_agent` / `wait_agent` flows.
   - Document this explicitly in the rollout notes and tests.

### Acceptance criteria

- If advisor is disabled, neither the tool nor the prompt guidance appears.
- If advisor is enabled, both the tool and the prompt guidance appear.
- Root chats can use advisor; child chats cannot.
- Built-in tool names include `advisor` so metrics do not collapse it into the generic `mcp` label.

## Phase 3 — Enforce planning-only execution policy in `chatloop`

### Goals

Prevent the model from calling `advisor` and action tools in the same execution batch.

### Files to modify

- `coderd/x/chatd/chatloop/chatloop.go`
- related chatloop tests

### Recommended implementation

Keep the MVP small; do **not** build a general policy engine yet.

1. Add a minimal field to `chatloop.RunOptions`, for example:

   ```go
   ExclusiveToolName *string
   ```

2. In `Run()` / `executeTools()`, detect the case where the exclusive tool appears in the same local-tool batch as any other locally executed tool.

3. When that happens, synthesize structured tool-result errors for the affected calls instead of executing anything in the batch.
   - `advisor` should receive a clear error like: _advisor must be called by itself before action tools_.
   - The sibling action tools should receive a paired policy error like: _this tool was skipped because advisor must run alone_.

4. Let the outer model see those tool errors and retry cleanly.
   - This is simpler and safer than partial execution or hidden deferral.
   - It preserves deterministic transcript history for debugging.

5. Pass the just-finished step snapshot into the tool execution context.
   - The advisor runtime should be able to see the current step's text/reasoning content, because that is often the best hint about what the outer model is trying to decide.

### Why this is the right fit

- It matches the intended semantics: advisor is consulted **before** taking action.
- It avoids subtle race conditions caused by concurrent built-in tool execution.
- It keeps the behavior easy to test with fake models.

### Acceptance criteria

- A model-emitted batch containing only `advisor` succeeds.
- A model-emitted batch containing `advisor` plus any other locally executed tool returns deterministic policy errors and executes nothing.
- Non-advisor tool execution stays unchanged for normal chats.

## Phase 4 — Usage limits, metrics, and configuration

### Goals

Make advisor safe to operate without over-designing billing/storage in the first release.

### Files to modify

- `coderd/x/chatd/chatd.go`
- `coderd/x/chatd/chatloop/metrics.go` as needed
- `coderd/x/chatd/chatd.go` `Config` struct and constructor path
- optional follow-up config/db files only if a separate advisor model or persistent billing is required

### Tasks

1. **Add explicit server config knobs for MVP.**
   Recommended fields on `chatd.Config` or a nested advisor config struct:
   - `AdvisorEnabled bool`
   - `AdvisorMaxUsesPerRun int`
   - `AdvisorMaxOutputTokens int64`

2. **Track usage per outer run.**
   - Reset the counter for each `processChat()` invocation.
   - Return `remaining_uses` in the tool result.
   - Return `limit_reached` when the cap is exhausted.

3. **Expose advisor usage metadata in the tool result.**
   - Include model name and token/cost summary if available.
   - Use the same `callConfig.Cost` calculation path as the outer chat for MVP if advisor reuses the same model.

4. **Record server-side metrics.**
   - Count advisor invocations, failures, and latency.
   - Ensure they show up under the built-in tool label `advisor`.

5. **Optional decision gate: separate advisor model.**
   - If product insists on a stronger/different advisor model, add a follow-up config hook that resolves another existing chat model config through the same `configCache` path.
   - Keep that out of the first landing PR unless it is required for acceptance.

6. **Optional decision gate: queryable advisor cost.**
   - If this becomes required, spin a follow-up DB task:
     - update `coderd/database/queries/*.sql`;
     - add migration files;
     - run `make gen`;
     - update audit mappings if a new auditable type/field is introduced.

### Acceptance criteria

- Advisor calls are capped per outer run.
- Limit exhaustion is user-visible in the tool result.
- Metrics distinguish advisor calls from other built-in tools.
- MVP does not require a schema migration unless explicitly approved.

## Phase 5 — Frontend rendering and Storybook coverage

### Goals

Make advisor feel intentional in the Agents UI without blocking the backend on fancy streaming UI.

### Files to modify

- `site/src/pages/AgentsPage/components/ChatElements/tools/Tool.tsx`
- new `site/src/pages/AgentsPage/components/ChatElements/tools/AdvisorTool.tsx`
- Storybook story file(s) in the same tools directory

### Delivery strategy

1. **Intermediate milestone during backend bring-up:** rely on the existing generic tool renderer if needed.
   - This is acceptable only as a short-lived integration checkpoint.

2. **Release milestone:** add a dedicated lightweight `AdvisorTool` renderer.
   - Reuse existing primitives:
     - `ToolCollapsible`
     - `ToolIcon`
     - `Response` for markdown/prose rendering
     - `ScrollArea` if the advice can be long
   - Keep styling light and consistent with the Agents page.
   - Do not add unnecessary React memoization in `site/src/pages/AgentsPage/`; that area is already React-Compiler aware.

3. **Render the structured result states cleanly.**
   - `advice` — readable prose/markdown with optional metadata footer.
   - `limit_reached` — warning-style message.
   - `error` — error state with visible fallback text.
   - `running` — existing tool loading state/spinner is enough for MVP.

4. **Add Storybook coverage instead of ad-hoc component tests.**
   Recommended stories:
   - successful advice;
   - running/loading;
   - limit reached;
   - error.

5. **Keep the UI contract narrow.**
   - Prefer one text field like `advice` plus small metadata rather than a deeply nested schema.
   - That keeps the UI resilient to prompt iteration.

### Acceptance criteria

- The advisor tool card renders readable content rather than raw quoted JSON in the final release branch.
- Running, limit, and error states are visibly distinct.
- Storybook stories and play assertions cover the new states.
- Existing tool rendering flows remain unchanged.

## Phase 6 — Automated tests and validation gates

### Backend tests to add

1. **Advisor runtime/tool tests**
   - question validation;
   - tool-less nested execution assertion;
   - success result shaping;
   - limit-reached result shaping;
   - error result shaping.

2. **Prompt/gating tests in chatd**
   - advisor disabled ⇒ no tool, no guidance;
   - advisor enabled/root chat ⇒ tool + guidance;
   - child chat ⇒ advisor absent.

3. **Chatloop policy tests**
   - advisor alone runs;
   - advisor + action tool mixed batch returns deterministic policy errors;
   - non-advisor tools still execute normally.

4. **Usage/metrics tests**
   - per-run cap resets correctly;
   - builtin tool labeling includes `advisor`;
   - returned metadata includes model/usage summary when available.

### Frontend tests to add

- Storybook `play()` assertions for the advisor renderer states.
- Verify expand/collapse behavior and visible fallback text.
- Verify the message timeline still renders adjacent tools correctly.

### Recommended command sequence

Run these as the implementation matures, not only at the end:

1. Backend-focused gate after phases 1–4:
   - `make test RUN=TestAdvisor`
   - `make test RUN=TestChatloopAdvisor`
   - `make lint`

2. Frontend-focused gate after phase 5:
   - `pnpm test:storybook src/pages/AgentsPage/components/ChatElements/tools/AdvisorTool.stories.tsx`
   - `pnpm lint`
   - `pnpm format`

3. Final repo gate before handoff:
   - `make pre-commit`
   - run any additional targeted `make test RUN=...` selections covering touched chatd paths

> Use the exact new test names the implementing agents create; the names above are recommended anchors, not existing tests.

## Dogfooding plan

### Principle

Dogfood the change as a real agent feature, not just a unit-tested backend. Per the dogfood and `agent-browser` skills, the reviewer should get **watchable repro videos** plus screenshots that make the behavior obvious without reading logs.

### Required setup

1. Start the full dev environment with:
   - `./scripts/develop.sh`
2. If the frontend renderer changes, also start Storybook from `site/` with:
   - `pnpm storybook --no-open`
3. Use `agent-browser` directly — **never `npx agent-browser`**.
4. Use named browser sessions and an output folder such as:
   - `./dogfood-output/advisor/`
   - with subfolders `screenshots/` and `videos/`

### Evidence protocol

For every interactive scenario below:

1. Start video recording **before** the action.
2. Capture step-by-step screenshots at human pace.
3. Capture one annotated screenshot of the final state.
4. Stop the recording.
5. Note the exact pass/fail observation in the QA report.

For static UI states (for example Storybook error/limit cards), an annotated screenshot is sufficient; video is optional but still encouraged by this project’s review preference.

### Dogfood scenarios

#### Scenario A — Happy path in the real Agents UI

**Goal:** prove that a root agent chat can invoke advisor and produce a readable recommendation before taking further action.

Steps:

1. Open the Agents page with an advisor-enabled root chat.
2. Start a repro video.
3. Send a prompt that should reasonably trigger strategic planning, such as an architecture or multi-tradeoff question.
4. Capture screenshots of:
   - the prompt before send;
   - the running advisor state;
   - the completed advisor card and the assistant’s follow-up response.
5. Stop recording.

Pass criteria:

- advisor appears in the timeline;
- the rendered result is readable;
- the assistant can continue after consuming the advisor output.

#### Scenario B — Advisor unavailable path

**Goal:** prove the feature is truly gated.

Suggested variants (at least one is required, both are better):

- feature flag/config off;
- child/sub-agent chat.

Evidence:

- annotated screenshot of the chat/tool state showing advisor is absent;
- short video if toggling the gate live is part of the repro.

Pass criteria:

- no advisor tool is available;
- no advisor-specific prompt behavior leaks through.

#### Scenario C — UI states in Storybook

**Goal:** prove the renderer handles non-happy states cleanly.

Required story states:

- success/advice;
- running;
- limit reached;
- error.

Evidence:

- one screenshot per state;
- at least one short video showing collapse/expand behavior.

Pass criteria:

- success renders readable advice;
- limit/error have visible fallback text;
- the component behaves like the other tool cards.

#### Scenario D — Regression sweep of nearby tools

**Goal:** ensure advisor does not break the surrounding chat timeline.

Check at minimum:

- another existing built-in tool still renders correctly near advisor;
- sub-agent/tool cards still expand/collapse normally;
- no obvious console errors appear in the Agents page during the advisor flow.

Evidence:

- screenshots of adjacent tool cards;
- console/error capture if anything suspicious appears.

### `agent-browser` usage notes for the QA agent

- Prefer `agent-browser batch` for 2+ sequential commands when no intermediate parsing is needed.
- Use `snapshot -i` to discover interactive refs.
- Re-snapshot after navigation or major DOM changes.
- Avoid `wait --load networkidle` unless the page is known to go idle; prefer explicit element/text waits or short fixed waits.
- Record videos at human pace and include pauses that a reviewer can follow.

## Rollout plan

### Initial rollout

- Gate behind a server-side advisor-enabled flag.
- Enable only for selected internal/root agent chats first.
- Watch metrics for:
  - invocation count;
  - failure rate;
  - latency;
  - obvious retry loops.

### Expansion conditions

Expand beyond the initial rollout only after the following are true:

- mixed-batch policy behavior is stable;
- cost impact is understood;
- frontend UX is readable in production-like dogfood;
- no recursion surprises have appeared with sub-agent flows.

### Explicit non-goals for the first release

- advisor inside child/sub-agent chats;
- provider-agnostic streaming phase UI;
- MCP-based external advisor implementation;
- mandatory DB-backed advisor cost reporting.

## Final acceptance checklist

- [ ] `advisor` is a built-in chatd tool, not an MCP/dynamic-tool substitute.
- [ ] The nested advisor call is tool-less and bounded to one in-memory step.
- [ ] One eligibility boolean controls both tool registration and prompt guidance injection.
- [ ] Root chats can use advisor; child chats cannot in the initial rollout.
- [ ] Mixed advisor/action batches produce deterministic policy errors instead of partial execution.
- [ ] Per-run usage caps and limit-reached behavior work.
- [ ] Advisor usage is visible in metadata/metrics without forcing a DB migration for MVP.
- [ ] The Agents UI has a readable advisor card and Storybook coverage.
- [ ] Dogfooding produced screenshots and repro videos for the required scenarios.
- [ ] Validation commands (`make lint`, targeted `make test`, Storybook tests, `make pre-commit`) passed before handoff.

## Suggested PR split

1. **PR 1 — Backend foundation**
   - `chatadvisor/` package
   - `chattool/advisor.go`
   - `chatloop` exclusive policy
   - chatd gating/prompt sync
   - backend tests

2. **PR 2 — Frontend + QA**
   - advisor renderer
   - stories/play assertions
   - dogfood artifacts and QA notes

3. **PR 3 — Optional follow-ups only if demanded by stakeholders**
   - separate advisor model override
   - persistent advisor billing/queryability
   - transient phase-stream UX


</details>

---
_Generated with [`mux`](https://github.com/coder/mux) • Model: `anthropic:claude-opus-4-7` • Thinking: `max`_
2026-04-30 14:53:08 +02:00
Kayla はな dcb32165fa feat: add --skip-setup flag to develop script (#24794) 2026-04-28 13:49:43 -06:00
Cian Johnston a876287d36 feat: auto-archive inactive chats with audit trail (#24642)
Adds a background job in `dbpurge` that periodically archives chats
inactive beyond a configurable threshold. Each archived root chat gets a
background audit entry tagged `chat_auto_archive`. Disabled by default.

* New `AutoArchiveInactiveChats` SQL query with LATERAL last-activity
subquery and partial index on archive candidates
* `site_configs`-backed `auto_archive_days` setting with admin-only PUT,
any-authenticated-user GET
* Cascade archive via `root_chat_id`; pinned chats and active threads
exempt
* Root-only audit dispatch on detached context, matching manual archive
(`patchChat`) behavior
* 11 subtests covering disabled no-op, boundary, deleted messages, child
activity, pinned exemption, multi-owner, idempotency, and batch
pagination

PR #24643 adds per-owner digest notifications.
PR #24704 adds the requisite UI controls.

> 🤖
2026-04-24 14:18:28 +01:00
Cian Johnston 72e3ae9c5f feat: add chatd tool call error metrics and logging (#24559)
- Add `coderd_chatd_tool_errors_total` prometheus counter (labels:
provider, model, tool_name)
- Log tool call errors at warn level with correlation fields: chat_id,
owner_id, organization_id, workspace_id, agent_id, parent_chat_id,
trigger_message_id, tool_name, tool_call_id, provider, model
- Thread enriched logger from chatd.go into chatloop via
`RunOptions.Logger`
- Remove squashing of all MCP tool calls to the `mcp` bucket

> 🤖
2026-04-22 16:19:56 +00:00
Paweł Banaszewski e00e85765b chore: move aibridge library code into coder repo (#24190)
This PR merges code from `coder/aibridge` repository into `coder/coder`.
It was split into 4 PRs for easier review but stacked PRs will need to
be merged into this PR so all checks pass.

* https://github.com/coder/coder/pull/24190 -> raw code copy (this PR,
before merging PRs on top of it, it was just 1 commit:
https://github.com/coder/coder/commit/70d33f33200c7e77df910957595715f81f9bec24)
* https://github.com/coder/coder/pull/24570 -> update imports in
`coder/coder` to use copied code
* https://github.com/coder/coder/pull/24586 -> linter fixes and CI
integration (also added README.md)
* https://github.com/coder/coder/pull/24571 -> added exclude to
scripts/check_emdash.sh check

Original PR message (before PR squash):
Moves coder/aibridge code into coder/coder repository.

Omitted files:

- `go.mod`, `go.sum`, `.gitignore`, `.github/workflows/ci.yml,`
`Makefile`, `LICENSE`, `README.md` (modified README.md is added later)
- `.github`, `example`, `buildinfo,` `scripts` directories

Simple verification script (will list omitted files)

```
tmp=$(mktemp -d)
echo "$tmp"
git clone --depth=1 https://github.com/coder/aibridge "$tmp/aibridge"
git clone --depth=1 --branch pb/aibridge-code-move https://github.com/coder/coder "$tmp/coder"
diff -rq --exclude=.git "$tmp/aibridge" "$tmp/coder/aibridge"
# rm -rf "$tmp"
```
2026-04-22 17:01:01 +02:00
Mathias Fredriksson 623e72d72d chore: add no-emdash/endash rule to agent instructions and CI lint (#24375)
Add a lint check that prevents introduction of Unicode emdash (U+2014)
and endash (U+2013) characters. These are almost exclusively introduced
by AI agents and conflict with the project writing style.

The lint script (scripts/check_emdash.sh) checks only added lines in
the current diff by default, so existing violations do not block CI.
Pass --all to scan the entire repo for auditing.

Agent instructions in AGENTS.md, site/AGENTS.md, and the docs style
guide now explicitly ban emdash, endash, and " -- " as punctuation,
with guidance to use commas, semicolons, or periods instead.
2026-04-21 13:55:24 +03:00
Cian Johnston d99949df43 feat(scripts/develop): add --prometheus-server flag to run Prometheus UI (#24408)
Builds on #24389. Adds `--prometheus-server` flag that starts a
`prom/prometheus:v3.11.2` container alongside the dev environment.

> 🤖
2026-04-20 14:08:13 +00:00
Cian Johnston 4b585465b8 feat: label chatd metrics by model, add stream-state diagnostics (#24475)
Adds production-observability metrics to coderd/x/chatd/ for
model-level correlation and a chatStreams memory-leak investigation.

- Label per-request chatd metrics (steps_total, message_count,
  prompt_size_bytes, tool_result_size_bytes, ttft_seconds,
  compaction_total) with `model` and enrich the per-turn logger
  with provider/model.
- Add `coderd_chatd_stream_retries_total{provider, model, kind}`
  counter incremented in chatloop before OnRetry.
- Register a prometheus.Collector exposing `streams_active`,
  `stream_buffer_size_max`, `stream_buffer_events`,
  `stream_subscribers` from p.chatStreams.
- Add `coderd_chatd_stream_buffer_dropped_total` counter,
  incremented per publishToStream drop independently of the
  existing log-rate-limited bufferDropCount.
- Snapshot logger/model before the title-generation goroutine to
  avoid a data race with the logger/model rebind below it.

> 🤖
2026-04-17 16:16:30 +01:00
Cian Johnston a23a38c1f3 feat(scripts/develop): enable prometheus metrics by default (#24389)
- Adds `--prometheus-port` flag to develop.sh (default: 2114). Passes `--prometheus-enable=true` and `--prometheus-address=0.0.0.0:2114` to dev coderd.
- Show Prometheus URL in dev banner + group by service

> 🤖
2026-04-17 13:06:10 +01:00
Ethan 55e525fc28 ci: add InTx linter replacing ruleguard rule (#24422)
Replace the old `InTx` ruleguard rule in `scripts/rules.go` with a
custom in-tree `go/analysis` analyzer under `scripts/intxcheck/`. The
new analyzer catches the same direct and pass-through misuse classes as
before, plus two new classes the pattern-matcher couldn't reach:

- **Indirect same-package helper misuse** — flags `p.someHelper(ctx)`
inside `InTx` when the helper body uses the outer store (the PR #24369
bug class).
- **Nested dangerous closures** — descends into `go func() { ... }()`,
`defer func() { ... }()`, and immediately-invoked function literals.

The analyzer uses semantic `types.Object` identity instead of raw
expression string comparison, which avoids false positives from
closure-local shadowing and catches simple aliases like `outer := s.db`
and `alias := s`.

This PR also fixes three real outer-store-inside-transaction bugs the
new analyzer surfaced:

- `coderd/wsbuilder/wsbuilder.go`: `FindMatchingPresetID` and
`getWorkspaceTask` now use the inner transaction store instead of
`b.store`.
- `enterprise/dbcrypt/dbcrypt.go`: `ensureEncrypted` now calls
`s.InsertDBCryptKey` (the tx-wrapped store) instead of
`db.InsertDBCryptKey`. The `dbCrypt.InTx` method wraps the raw tx in a
new `*dbCrypt`, so `s.InsertDBCryptKey` still dispatches through the
encryption layer.

Two call sites need `// intxcheck:ignore` suppressions. Both are one-off
patterns that only look like misuse because the analyzer doesn't track
assignments — proving them safe would require full dataflow analysis,
which is well beyond what a targeted lint like this should attempt:

- `coderd/database/dbfake/dbfake.go` — `b.db` is reassigned to `tx` on
the preceding line, so `b.doInTX()` actually uses the transaction. The
analyzer sees the original `b.db` identity and flags it.
- `coderd/database/db_test.go` — test intentionally passes the outer
store to `require.Equal` to assert that nested `InTx` returns the same
handle.

Suppressions use `// intxcheck:ignore` instead of `//nolint:intxcheck`
because `intxcheck` runs as a standalone `go/analysis` tool outside
golangci-lint. golangci-lint's `nolintlint` checker flags `//nolint`
directives for linters it doesn't control, so we use a custom comment
prefix to avoid that conflict.
2026-04-17 00:07:30 +10:00
Kayla はな d23a6959fc chore: upgrade to ubuntu 26.04 (#24267) 2026-04-15 15:02:47 -06:00
Cian Johnston d7439a9de0 feat: add Prometheus metrics for chatd subsystem (#24371)
Adds 7 Prometheus metrics to the chatd subsystem and introduces typed
`ActivityBumpReason` for deadline bump attribution.

| Metric | Type | Labels |
|--------|------|--------|
| `coderd_chatd_chats` | Gauge | `state` (streaming, waiting) |
| `coderd_chatd_message_count` | Histogram | `provider` |
| `coderd_chatd_prompt_size_bytes` | Histogram | `provider` |
| `coderd_chatd_tool_result_size_bytes` | Histogram | `provider`,
`tool_name` |
| `coderd_chatd_ttft_seconds` | Histogram | `provider` |
| `coderd_chatd_compaction_total` | Counter | `provider`, `result` |
| `coderd_chatd_steps_total` | Counter | `provider` |

> 🤖
2026-04-15 19:53:10 +01:00
Danny Kopping 48b90f8cc8 feat: add coder_build_info metric (#24365)
_Disclaimer: produced by Claude Opus 4.6_

Adds a `coder_build_info` metric which allows operators to see which
versions of Coder are currently running.

---------

Signed-off-by: Danny Kopping <danny@coder.com>
2026-04-15 12:48:38 +00:00
J. Scott Miller 20b953a99d feat: add Prometheus metric for agent first connection duration (#24179)
## Summary

Add `coderd_agents_first_connection_seconds` histogram metric that
records the
duration from workspace agent creation to first connection. This fills
an
observability gap — provisioner job timings and startup script metrics
exist,
but the agent connection phase (which can take several minutes) was not
exposed
to Prometheus.

Closes https://github.com/coder/coder/issues/21282

## Changes

- **`coderd/prometheusmetrics/prometheusmetrics.go`** — Define and
register a
  `HistogramVec` in the existing `Agents()` polling loop. Observe
`first_connected_at - created_at` exactly once per agent via a
deduplication
  map, pruned each tick to prevent unbounded memory growth.
- **`coderd/prometheusmetrics/prometheusmetrics_test.go`** — Update
`TestAgents`
to set `first_connected_at` on the test agent and assert the histogram
is
  collected with correct labels, sample count, and sample sum.
- **`docs/admin/integrations/prometheus.md`**,
**`scripts/metricsdocgen/generated_metrics`** —
  Auto-generated documentation updates from `make gen`.

## Metric details

| Property | Value |
|---|---|
| Name | `coderd_agents_first_connection_seconds` |
| Type | histogram |
| Labels | `template_name`, `agent_name`, `username`, `workspace_name` |
| Buckets | 1s, 10s, 30s, 1m, 2m, 5m, 10m, 30m, 1h |

## Example PromQL

```promql
# P95 agent connection time by template
histogram_quantile(0.95,
  sum(rate(coderd_agents_first_connection_seconds_bucket[1h])) by (le, template_name)
)
```

<details>
<summary>Implementation notes</summary>

### Design decisions

- **Histogram over gauge**: Enables `histogram_quantile()` for
percentile queries.
- **Observe in `Agents()` polling loop**: All required data is already
fetched by
  `GetWorkspaceAgentsForMetrics()` — no new DB queries.
- **Dedup via `map[uuid.UUID]struct{}`**: Prevents re-observing the same
agent
  across polling ticks. Pruned each cycle to bound memory.
- **Buckets**: Aligned with
`coderd_provisionerd_workspace_build_timings_seconds`
  range (1s–1h).

### Overhead at scale (100k active workspaces)

The deduplication map (`observedFirstConnection`) and per-tick pruning
map
(`currentAgentIDs`) are both `map[[16]byte]struct{}`. At 100k agents:

- **Memory**: ~2.25 MB persistent + ~2.25 MB transient per tick = **~4.5
MB peak**.
- **CPU**: ~25 ms of map operations per tick (one tick per minute) =
**<0.05% of one core**.

Both are negligible relative to the existing cost of the `Agents()` loop
(the DB
query, per-agent `GetWorkspaceAppsByAgentID` calls, and coordinator node
lookups
dominate).

</details>

> 🤖 Generated by Coder Agents
2026-04-14 12:00:46 -05:00
Cian Johnston 497f637f58 chore: revert force deploying main (#23290) (#24072)
⚠️ DO NOT MERGE UNTIL @f0ssel SAYS SO ⚠️ 

This reverts commit 8f78c5145f
(https://github.com/coder/coder/pull/23290).
2026-04-08 11:19:14 -04:00
Ethan be686a8d0d fix(scripts/githooks): clear all repo-local Git env vars in hooks (#24138)
## Problem

In linked worktrees, Git hooks inherit multiple repo-local environment
variables: `GIT_DIR`, `GIT_COMMON_DIR`, `GIT_INDEX_FILE`, and others.
The
pre-commit and pre-push hooks only unset `GIT_DIR`, leaving the rest in
place.

When `make pre-commit` runs `go build`, Go tries to stamp VCS info by
shelling
out to `git`. With the leftover partial Git environment, `git` exits 128
and
the build fails:

```
error obtaining VCS status: exit status 128
    Use -buildvcs=false to disable VCS stamping.
```

This only happens inside hooks in a linked worktree — running `make
pre-commit`
directly from the terminal works fine because the repo-local vars are
not set.

## Fix

Replace the bare `unset GIT_DIR` in both hooks with a loop that clears
every
variable reported by `git rev-parse --local-env-vars`:

```sh
while IFS= read -r var; do
    unset "$var"
done < <(git rev-parse --local-env-vars)
```

This covers all 15 repo-local variables Git may inject (`GIT_DIR`,
`GIT_COMMON_DIR`, `GIT_INDEX_FILE`, `GIT_OBJECT_DIRECTORY`, etc.) and is
forward-compatible — if Git adds new local vars in the future, the loop
picks
them up automatically.
2026-04-09 01:06:12 +10:00
Garrett Delfosse ab77154975 ci: add cherry-pick to latest release workflow (#24051)
Adds a GitHub Actions workflow that cherry-picks merged PRs to the
latest release branch when the `cherry-pick` label is applied.

## How it works

1. Add the `cherry-pick` label to any PR targeting `main` (before or
after merge).
2. On merge (or on label if already merged), the workflow detects the
latest `release/*` branch.
3. It cherry-picks the merge commit (`-x -m1`) and opens a PR.

This complements the `backport` label (see #24025) which targets the
latest **3** release branches. `cherry-pick` targets only the **latest**
one — useful for getting fixes into the current release.

Created PRs follow existing repo conventions:
- **Branch:** `backport/<pr>-to-<version>`
- **Title:** `<original PR title> (#<pr>)` — e.g. `fix(site): correct
button alignment (#12345)`
- **Body:** links back to the original PR and merge commit

If the cherry-pick encounters conflicts, the workflow aborts the
cherry-pick, creates an empty commit with resolution instructions, and
opens the PR with a `[CONFLICT]` prefix so the author can resolve
manually.

Also:
- Removes `scripts/backport-pr.sh` (replaced by this workflow)
- Removes `.github/cherry-pick-bot.yml` (old bot config)
- Adds a section to the contributing docs explaining the `cherry-pick`
label

> [!NOTE]
> Generated with [Coder Agents](https://coder.com/agents)
2026-04-08 10:22:33 -04:00
Garrett Delfosse 48bc215f20 chore: tag RCs on main, cut release branch only for releases (#24001)
RC tags are now created directly on `main`. The `release/X.Y` branch is
only cut when the actual release is ready. This eliminates the need to
cherry-pick hundreds of commits from main onto the release branch
between the first RC and the release.

## Workflow

```
main:  ──●──●──●──●──●──●──●──●──●──
              ↑           ↑     ↑
           rc.0        rc.1    cut release/2.34, tag v2.34.0
                                     \
                               release/2.34:  ──●── v2.34.1 (patch)
```

1. **RC:** On `main`, run `./scripts/release.sh`. The tool detects main
(or a detached HEAD reachable from main), prompts for the commit SHA to
tag, suggests the next RC version, and tags it.
2. **Release:** When the RC is blessed, create `release/X.Y` from `main`
(or the specific RC commit). Switch to that branch and run
`./scripts/release.sh`, which suggests `vX.Y.0`.
3. **Patch:** Cherry-pick fixes onto `release/X.Y` and run
`./scripts/release.sh` from that branch.

## Changes

### `scripts/releaser/release.go`
- Two modes based on branch:
- **`main` (or detached HEAD from main)** — RC tagging. Prompts for the
commit SHA to tag (defaults to HEAD). Always checks out the target
commit so the flow operates in detached HEAD. Suggests the next RC based
on existing RC tags.
- **`release/X.Y`** — Release/patch mode. Suggests `vX.Y.0` if the
latest tag is an RC, or the next patch otherwise.
- Detached HEAD support: if `git branch --show-current` is empty, checks
whether HEAD is an ancestor of `origin/main` and enters RC mode
automatically.
- Commit selection prompt in RC mode: shows current commit, lets the
user confirm or provide a different SHA.
- Warns if you try to tag a non-RC on main, or an RC on a release
branch.
- Skips open-PR check and branch sync check in RC mode (not useful on
main).

### `scripts/releaser/main.go`
- Updated help text.

### `.github/workflows/release.yaml`
- RC tags (`*-rc.*`): skip the release-branch validation (they live on
main).
- Non-RC tags: still require the corresponding `release/X.Y` branch.

### `docs/about/contributing/CONTRIBUTING.md`
- Rewrote the Releases section with the new workflow, release types
table, and ASCII diagram.
- Replaced the old "Creating a release" / "Creating a release (via
workflow dispatch)" subsections.

<details><summary>Decision log</summary>

### Why this approach?

Previously, cutting a release branch early for an RC meant
cherry-picking all of main's progress onto that branch before the actual
release — often hundreds of commits. This approach avoids that entirely:
RCs are just tagged snapshots of main, and the release branch only
exists once you need it for stabilization and backports.

### Files NOT changed

- **`scripts/release/publish.sh`** — `--rc` flag controls GitHub
prerelease marking (tag-level, not branch-level). `target_commitish`
already defaults to `main` when the tag isn't on a release branch.
- **`scripts/release/tag_version.sh`** — No RC-specific branch logic.
- **`scripts/releaser/version.go`** — Version parsing/comparison
unchanged.
- **`docs/install/releases/index.md`** — Public-facing docs describe RC
as a release channel with no branch-level detail.

</details>

> Generated by Coder Agents
2026-04-07 15:21:22 -04:00
Garrett Delfosse 5453a6c6d6 fix(scripts/releaser): simplify branch regex and fix changelog range (#23947)
Two fixes for the release script:

**1. Branch regex cleanup** — Simplified to only match `release/X.Y`.
Removed
support for `release/X.Y.Z` and `release/X.Y-rc.N` branch formats. RCs
are
now tagged from main (not from release branches), and the three-segment
`release/X.Y.Z` format will not be used going forward.

**2. Changelog range for first release on a new minor** — When no tags
match
the branch's major.minor, the commit range fell back to `HEAD` (entire
git
history, ~13k lines of changelog). Now computes `git merge-base` with
the
previous minor's release branch (e.g. `origin/release/2.32`) as the
changelog
starting point. This works even when that branch has no tags pushed yet.
Falls
back to the latest reachable tag from a previous minor if the branch
doesn't
exist.
2026-04-07 17:07:21 +00:00
Kyle Carberry 500fc5e2a4 feat: polish model config form UI (#24047)
Polishes the AI model configuration form (add/edit model) with tighter
layout and better input affordances.

**Frontend changes:**
- Replace "Unset" with "Default" in select dropdowns to communicate
system fallback
- Show pricing fields inline instead of behind a collapsible toggle
- Use flat section dividers (`border-t`) instead of bordered fieldsets
- Move field descriptions into info-icon tooltips to fix input
misalignment
- Add InputGroup adornments: `$` prefix + `/1M` suffix on pricing,
`tokens` suffix on token fields, `%` suffix on compression threshold,
range placeholders on temperature/penalty fields
- Shorter pricing labels (Input, Output, Cache Read, Cache Write)
- Compact JSON textareas (1-row height, resizable)
- Smart grid layouts by field type (3-col provider, 4-col pricing, 3-col
advanced)
- Boolean fields render as a segmented control (Default · On · Off)
instead of a dropdown

**Backend changes:**
- Add `enum` tags to OpenAI `service_tier`
(`auto,default,flex,scale,priority`) and `reasoning_summary`
(`auto,concise,detailed`) so they render as select dropdowns instead of
free-text inputs

> 🤖 Generated by Coder Agents
2026-04-06 10:49:42 -04:00
Zach 796e8e4e18 feat: use -x option in backport script to improve tracability (#23984)
Add `-x` to backport script `git cherry-pick` command to include a
commit message reference to the original commit. This makes it easier to
trace where a cherry picked commit actually came from.
2026-04-02 10:23:31 -06:00
Danny Kopping fce05d0428 feat: add backport PR script (#23973)
_Disclaimer: created using Claude Opus 4.6._

```
# Examples:
#   ./scripts/backport-pr.sh 2.30 23969
#   ./scripts/backport-pr.sh --dry-run 2.30 23969
```

Here's one I created: https://github.com/coder/coder/pull/23972

Signed-off-by: Danny Kopping <danny@coder.com>
2026-04-02 15:19:06 +02:00
Garrett Delfosse be2e641162 feat: add release candidate (RC) support to release tooling (#23600)
This adds full RC release support to the release scripts and GitHub
Actions workflow. Previously, the tooling only supported stable and
mainline releases with strict vMAJOR.MINOR.PATCH semver tags.

Changes:
- scripts/releaser/version.go: Add Pre field to version struct for
prerelease suffixes (e.g. "rc.0"), update regex, parsing, String(),
comparison methods, and add IsRC()/rcNumber() helpers.
- scripts/releaser/release.go: Detect RC branches (release/X.Y-rc.N),
suggest RC version numbers, auto-set "rc" channel (skipping
stable/mainline prompt), add RC advisory to release notes, skip docs
update for RC releases.
- .github/workflows/release.yaml: Add "rc" channel option, fix branch
derivation for RC tags (v2.32.0-rc.0 -> release/2.32-rc.0 instead of
broken release/2.32.0-rc), skip homebrew/winget/package publishing for
RC releases.
- scripts/release/publish.sh: Add --rc flag, pass --prerelease to gh
release create for RC releases.
- scripts/releaser/version_test.go: Add comprehensive unit tests for
version parsing, string formatting, IsRC, rcNumber, GreaterThan, and
Equal with RC versions.

<!--

If you have used AI to produce some or all of this PR, please ensure you
have read our [AI Contribution
guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING)
before submitting.

-->
2026-04-01 16:00:49 -04:00
Michael Suchacz cf500b95b9 chore: move docker-chat-sandbox under templates/x (#23777)
Adds the experimental `docker-chat-sandbox` example template under
`examples/templates/x/`. It provisions a regular dev agent plus a
chat-designated agent that runs inside bubblewrap with a read-only root,
writable `/home/coder`, and outbound TCP restricted to the Coder
control-plane endpoint via `iptables`.

The chat agent still appears in dashboard and API responses, but the
template reserves it for chatd-managed sessions rather than normal user
interaction. `lint/examples` now walks nested template directories, so
experimental templates can live under `examples/templates/x/` without
treating `x/` itself as a template.
2026-03-30 15:17:55 +02:00
Jake Howell 8f73e46c2f feat: automatically generate beta features (#23549)
Closes #15129

Adds a generated **Beta features** table on the feature stages doc,
using the same mainline/stable sparse-checkout approach as the
early-access experiments list.

- Walk `docs/manifest.json` for routes with `state: ["beta"]` and render
a table (title, description, mainline vs stable).
- Inject output between `<!-- BEGIN: available-beta-features -->` /
`END` in `docs/install/releases/feature-stages.md`.
- Rename `scripts/release/docs_update_experiments.sh` →
`docs_update_feature_stages.sh`, refresh the script header, and use
`build/docs/feature-stages` for clone output.

<img width="1624" height="1061" alt="image"
src="https://github.com/user-attachments/assets/5fa811dd-9b80-446b-ae65-ec6e6cfedd6a"
/>
2026-03-30 21:14:52 +11:00
Cian Johnston f164463c6a fix(scripts/metricsdocgen): shush the prometheus scanner in CI (#23642)
- Suppress informational `log.Printf` messages from the metrics scanner
when stdout is not a TTY (i.e. piped via `atomic_write` in `make gen` or
CI)
- Genuine warnings (`warnf`) still print unconditionally so real
problems remain visible
- `log.Fatalf` for fatal errors is unchanged

> 🤖 Created by Coder Agents and reviewed by a human
2026-03-26 12:58:02 +00:00
Jakub Domeracki 6bc6e2baa6 fix: explicitly trust our own GPG key (#23556)
GPG emits an "untrusted key" warning when signing with a key that hasn't
been assigned a trust level, which can cause verification steps to fail
or produce noisy output.

Example:
```sh
gpg: Signature made Tue Mar 24 20:56:59 2026 UTC
gpg:                using RSA key 21C96B1CB950718874F64DBD6A5A671B5E40A3B9
gpg: Good signature from "Coder Release Signing Key <security@coder.com>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: 21C9 6B1C B950 7188 74F6  4DBD 6A5A 671B 5E40 A3B9
```

After importing the release key, derive its fingerprint from the keyring
and mark it as ultimately trusted via `--import-ownertrust`.
The fingerprint is extracted dynamically rather than hard-coded, so this
works for any key supplied via `CODER_GPG_RELEASE_KEY_BASE64`.
2026-03-25 10:24:31 +01:00