Commit Graph

308 Commits

Author SHA1 Message Date
Thomas Kosiewski fe257666d7 ci: refactor CI to use mise for shared tool setup (#25727) 2026-06-01 15:55:19 +02:00
Ethan ca7f07142e ci: add Go test flake detector workflow (#25667)
Adds a `flake-go` workflow that hunts for ordering-dependent and racy Go
tests on pull requests. The workflow runs only on PRs (cancelling
earlier runs on new commits) and skips test execution when no Go test
files changed.

A single `flake_go` job uses
[coder/whichtests](https://github.com/coder/whichtests) with
`--coalesce` to compute the directly-modified `Test*` functions from the
PR diff and emit them as one target row. The same job then runs those
selected tests on a deliberately resource-constrained 4-vCPU runner with
4x parallelism oversubscription, `-count=25`, and `-shuffle=on` to
amplify contention and surface flakes.

Pinned at
[coder/whichtests@ec33bab](https://github.com/coder/whichtests/commit/ec33bab1ec04cd86beb7a61a069db4463dba63f5).

Reuses the `test-go-pg` composite (with its new `run-regex`,
`test-shuffle`, and `gotestsum-json-file` inputs) and the
`go-test-failure-report` composite, both introduced on the base branch
(#25670), so this workflow shares one implementation of the gotestsum +
failure-report path with the existing CI jobs.

`Makefile` adds `TEST_SHUFFLE` support and single-quotes `RUN` so
whichtests' regex survives shell parsing.

Stacked on top of #25670.

Demo @
https://github.com/coder/coder/actions/runs/26494322649/job/78018779381?pr=25667

Closes CODAGT-381
2026-05-28 12:35:37 +10:00
Thomas Kosiewski 51836e681e refactor: build dogfood image as base + mise oci layers (#25448)
Splits the dogfood image into two artifacts:

- `ghcr.io/coder/oss-dogfood-base:<distro>-<base-sha>`: Ubuntu base with
apt packages, chrome, rustup, brew, gh, and the mise binary. The
base-sha is a cache key over `Dockerfile.base` and `files/`, so commits
that don't touch those inputs reuse the previous build.
- `codercom/oss-dogfood:<final-sha>-<distro>` and rolling tags
(`:22.04`, `:26.04`, `:latest`, `:<branch>`): produced by `mise oci
build` on top of the base, with one content-addressed OCI layer per mise
tool. The rolling tag scheme is unchanged, so the workspace template
doesn't need updating.

Single-tool version bumps now invalidate only that tool's OCI layer, so
workspaces re-pull just what changed instead of the entire 5-6 GB image
on every recreate.

Also:

- Drops the build-time `pnpm dlx playwright@1.47.0 install --with-deps
chromium` step (~400 MB) and the equivalent `playwright-driver.browsers`
install from `flake.nix`. `@playwright/mcp` (used by the claude-code and
codex MCP servers in `dogfood/coder/main.tf`) does NOT auto-install
browsers, so the existing `install-deps` `coder_script` now runs two
installs on workspace start: `pnpm exec playwright install chromium` for
the site's pinned `@playwright/test`, and `npx
--package=@playwright/mcp@latest playwright-core install --no-shell
chromium` so the MCP servers find their matching browser revision.
Browser revisions coexist under
`~/.cache/ms-playwright/chromium-<rev>/`, which lives on the home volume
so both downloads happen once per workspace recreate and persist across
restarts. Net effect: same MCP behavior as before, +~1-2 min on first
workspace start. Nix devshell users running site e2e tests locally now
need `pnpm exec playwright install` once (instead of getting browsers
via nixpkgs).
- Bumps the pinned mise binary to v2026.5.12 (matching main after
#25521) and adds top-level `min_version = "2026.5.12"` to `mise.toml` so
every consumer (devs, CI, the embedded mise inside the dogfood image,
mise oci builds) fails fast on an older mise.
- Adds bison, flex, libicu-dev, libreadline-dev, uuid-dev, and
zlib1g-dev to both Ubuntu base images for source-build use cases (e.g.,
building Postgres from source).
- Replaces skopeo with crane as the registry client `mise oci push`
shells out to: crane is added to `mise.toml`, the workflow drops its
`apt-get install skopeo` and forces `--tool crane`, and the local
wrapper image stops bundling skopeo. One source of truth for tool
versions, no apt drift, smaller wrapper image, and workspace users get a
registry client on PATH for free via mise oci's tool layers.
- Removes `nix.hash`/`mise.hash` and their Makefile rules. The registry
digest already captures every effective change since CI rebuilds when
any baked-in input moves; the per-file `filesha1()` entries in
`pull_triggers` are redundant.

Supersedes #25400 (the `mise.hash` pull trigger landed there in
`2b612abe7b`; this PR removes it as part of the broader simplification).

> [!NOTE]
> `mise oci build` is experimental and requires `MISE_EXPERIMENTAL=1`
(set at job level in the workflow). The local-only
`scripts/dogfood/mise-oci-wrapper.sh` builds a tiny
`coderdev/mise-oci-wrapper:<version>` Debian image with curl-installed
mise on first invocation (cached by version tag thereafter); we don't
reuse `jdxcode/mise:latest` because that tag lags upstream GitHub
releases by days and would defeat the `min_version` enforcement above.

> [!NOTE]
> `compute-base-sha.sh` and `compute-final-sha.sh` are cache keys, not
strict content addresses: the base Dockerfile still pulls dynamic
resources at build time (gh/buildx `releases/latest`, chrome
`stable_current_amd64.deb`, apt mirror state). Two runs with identical
checked-in files can produce slightly different bytes, which is
acceptable here because the cache-hit savings on irrelevant commits
outweigh that drift.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Signed-off-by: Thomas Kosiewski <tk@coder.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 14:52:21 +02:00
Cian Johnston 0a45f96d30 ci: validate dogfood image tooling by running gen, fmt, lint, build (#25475)
Adds a `test_image` job that runs `make gen`, `make fmt`, `make lint`, and `make build` inside the
newly built image via `docker run`. This helps detect breaking changes before merge. 

> [!NOTE]
> Generated with [Coder Agents](https://coder.com/agents)
2026-05-25 17:02:13 +01:00
Danny Kopping ddec110b0e refactor: move aibridged out of enterprise to AGPL (#25570)
In order to allow Coder Agents to use AI Gateway in OSS, we need to rehome the `aibridged`\-related code into the AGPL path.

The HTTP API is only registered under enterprise so will still require the AI Governance Add-on to be present in order to use it, whereas Coder Agents uses an in-memory pipe to the same handlers.
2026-05-22 09:11:37 +02:00
Spike Curtis 8dc4d76890 chore: add agent-connection-watch for workspaces (#24507)
<!--

If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting.

-->

relates to GRU-18  
  
Adds basic implementation for Workspace Agent Connection Watch and tests.  
  
Missing are handling of logs.
2026-05-20 13:09:11 -04:00
Thomas Kosiewski 2b612abe7b feat: trigger image pull on mise.toml or mise.lock changes (#25400)
The dogfood Dockerfiles consume the repo-root `mise.toml` and
`mise.lock` at build time (see `.dockerignore` allowlist), but the
template's `pull_triggers` list ignored them, so mise-only changes (tool
bumps, new tools) didn't roll out to existing workspaces.

Mirror the `nix.hash` pattern: a Makefile rule writes the sha256 of both
files into `dogfood/coder/mise.hash`, and `main.tf` hashes that
in-module file via `filesha1`. Run `make dogfood/coder/mise.hash` after
editing `mise.toml`/`mise.lock`.

Signed-off-by: Thomas Kosiewski <tk@coder.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 16:27:39 +00:00
Thomas Kosiewski 5f9b3220b5 chore: install dogfood image tooling via mise.toml (#25282)
This PR replaces the hand-rolled `curl | tar | go install | cargo
install` chains in the dogfood Ubuntu 22.04 and 26.04 Dockerfiles with a
single `mise install` driven by a new repo-root `mise.toml`.

The previous Dockerfiles installed ~25 CLIs across three multi-stage
builds with versions hardcoded inline. Version bumps were scattered
across the Dockerfiles, the root `mise.toml` (added in #24618 but
otherwise unused at runtime), and CI's setup actions; build-time network
failures came from a dozen distinct endpoints; and `mise` itself sat in
the image with no manifest to install from.

The new flow:

- The repo's `mise.toml` is the single source of truth for image tool
versions. The Dockerfiles `COPY` it to `/etc/mise/config.toml` and run a
single `mise install` as the `coder` user.
- Tools are installed into `/opt/mise/data` rather than the default
`/home/coder/.local/share/mise`, so they live in the image (not on the
persistent home volume) and reach every workspace on recreate.
- Build context moves to the repo root so the Dockerfile can `COPY
mise.toml`; an allowlist `.dockerignore` keeps the transferred context
to ~24 kB.
- Optional `--secret id=github_token` plumbing through the Makefile and
`.github/workflows/dogfood.yaml` lifts aqua's GitHub API quota from
60/hr unauthenticated to 1000/hr with `secrets.GITHUB_TOKEN`.
- `MISE_TRUSTED_CONFIG_PATHS=/home/coder:/etc/mise` is set as an ENV so
users who clone the coder repo into their workspace home aren't prompted
to `mise trust`.

Net diff for the two Ubuntu Dockerfiles: -399 / +244 lines (~200 lines
shorter each). The `FROM rust-utils`, `FROM go`, and `FROM proto`
multi-stage builds are gone; so are the NVM/Node block, the bulk
binary-install block (golangci-lint, helm, kubectx, syft, cosign, bun),
the gh `.deb`/lazygit/doctl tarball installs, the gofmt
`update-alternatives` line, and the `yq`→`yq4` rename
(`scripts/lib.sh:267-275` already auto-detects either name).

Both images were built and smoke-tested with Apple's `container` CLI on
macOS — every migrated tool resolves to the expected pinned version
including outside the cloned coder repo (e.g. `gh` from `/home/coder`,
matching the workspace startup script in `dogfood/coder/main.tf`),
`sqlc` runs (proving `CGO_ENABLED=1` was honoured at install), `yq
--version` reports v4 for `scripts/lib.sh`'s detection, and `gofmt`
resolves via the mise shim.

Follow-ups (out of scope here):

- Commit a multi-platform `mise.lock` so `gh = "latest"` and the other
floating versions resolve deterministically across rebuilds and dev
machines.
- Migrate CI's `setup-go` / `setup-node` actions to consume `mise.toml`
so image and CI versions stop being able to drift.

---------

Signed-off-by: Thomas Kosiewski <tk@coder.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 11:36:22 +02:00
Michael Suchacz 85792d08bc feat: add harness engineering layer for agent workflows (#24791)
This PR adds an opinionated harness-engineering layer for agent-driven
workflows: a small set of agent-readable docs, mechanical structure
checks, structured CI failure summaries, an architecture-lint umbrella,
and per-worktree dev-server isolation. The goal is to make local dev,
tests, and CI mechanically inspectable by agents without changing app
runtime behavior.

## What landed

**Agent docs and navigation**
- `.claude/docs/OBSERVABILITY.md`, `.claude/docs/DEV_ISOLATION.md`,
`.claude/docs/AGENT_FAILURES.md`: task-oriented guides for logs,
tracing, Prometheus, dev-server isolation, and a seeded failure catalog.
- `AGENTS.md`: added an `Agent navigation` block, then trimmed the file
from 375 to 229 lines by migrating duplicated detail into
`WORKFLOWS.md`, `GO.md`, `TESTING.md`, and `DATABASE.md`. The
user-managed custom-instructions block is preserved.
- `.agents/docs`: symlink mirror of `.claude/docs` for agent runtimes
that look under `.agents`.

**Mechanical checks**
- `scripts/check_agents_structure.sh`: validates `@...` references in
tracked `AGENTS.md` files and warns when root grows past 600 lines.
Wired as `make lint/agents` and into `make lint`.
- `scripts/audit-agent-readiness.sh`: report-first audit of harness
readiness. Currently `10 ok, 0 warn, 0 fail`.
- `scripts/check_architecture.sh` / `make lint/architecture`: umbrella
architecture-lint target. Consolidates the existing
`check_enterprise_imports.sh` and `check_codersdk_imports.sh` so they
run exactly once via the umbrella. Slot is open for new high-confidence
rules.

**Structured CI failure summaries**
- `scripts/playwright-failure-summary.sh`: parses
`site/test-results/results.json` and writes Markdown to
`$GITHUB_STEP_SUMMARY` on failure. Wired into the `test-e2e` matrix job.
- `scripts/go-test-failure-summary.sh`: parses `go test -json`
line-delimited output the same way. Wired into `test-go-pg`,
`test-go-pg-17`, and `test-go-race-pg` by injecting `gotestsum
--jsonfile` in the workflow without touching `Makefile`. JSON also
uploaded as a CI artifact on failure.
- `site/e2e/playwright.config.ts`: enables `screenshot:
only-on-failure`, `trace: retain-on-failure`, JSON reporter, and HTML
reporter alongside existing reporters.
- `.github/workflows/ci.yaml`: failure artifact uploads for Playwright
now use `if: failure()` and predictable names
(`playwright-artifacts-<variant>-<sha>`).

**Per-worktree dev-server isolation** (`scripts/develop/main.go`)
- Deterministic FNV-64a hash of the worktree path produces a port offset
in `[0, 1000)` (50 buckets, step 20 to avoid API/proxy overlap across
adjacent buckets).
- Offset is applied only to defaults; both env vars (`CODER_DEV_PORT`,
`CODER_DEV_WEB_PORT`, `CODER_DEV_PROXY_PORT`,
`CODER_DEV_PROMETHEUS_PORT`) and CLI flags retain priority.
- Hardcoded ports `9090` (embedded Prometheus UI) and `12345` (Delve)
are unchanged by design.
- Startup banner shows each port's source: `default`, `offset`, or
`explicit`.
- Unit tests in `scripts/develop/main_test.go` cover determinism,
bounds, no-overlap across the four ports, and explicit-skip behavior.
- State (`.coderv2/`) was already worktree-isolated via `os.Getwd()`, so
no state-dir changes were needed.

## Validation

`make lint/agents`, `make lint/architecture`, `make lint/emdash`, `bash
scripts/audit-agent-readiness.sh` (10 ok, 0 warn, 0 fail), `shellcheck`
on all 5 new scripts, `go test ./scripts/develop/...`, and `js-yaml`
parse of `ci.yaml` all pass. Synthetic fixtures verify both
failure-summary scripts handle empty/missing input (silent exit 0),
ANSI-stripped output, and parent/subtest formatting.

## Known follow-ups (deferred)

- Frontend Storybook/Vitest failure summary: lowest-leverage slice of
the failure-summary work. Skipping until observed pain.
- Architecture lint currently only delegates to existing import checks;
new rules (`InTx` outer-store detection, swagger-annotation lint) plug
in as needed.
- 50 port-offset buckets means two worktree paths can occasionally
collide. The DEV_ISOLATION doc tells users to set the relevant env var
when this happens.

> Mux opened this PR on Mike's behalf.
2026-05-11 17:27:29 +02:00
Ethan 063c06ca5f test: prevent expired contexts in chatd parallel subtests (#25107)
Parallel subtests in `coderd/x/chatd` reused a parent test context with
a `testutil.WaitLong` deadline, so the context could expire before a
subtest was scheduled under load. That made the subagent lifecycle tools
return plain-text context errors instead of the expected JSON payload,
causing flaky JSON unmarshal failures.

Create fresh `chatdTestContext` values inside the affected parallel
subtests and add `chatdTestContext` to the `paralleltestctx` custom
function list so this pattern is caught by `make lint`.

Closes https://github.com/coder/internal/issues/1494
2026-05-11 17:48:27 +10:00
Yevhenii Shcherbina 4124d1137d feat: add ai_model_prices table (#24932)
# Summary

Implements
https://linear.app/codercom/issue/AIGOV-282/add-ai-model-price-table-and-seed-generator

This PR lays the groundwork for AI Bridge cost controls (per the AI
Governance RFC). It adds the foundation needed for future cost tracking:
a place to store per-model token prices, a way to keep those prices in
sync with upstream pricing data, and a startup mechanism that ensures
every deployment has prices loaded before AI Bridge starts processing
requests.

The price data comes from [models.dev](https://models.dev/), a
community-maintained catalogue of AI provider pricing. A generator
script fetches the latest prices, filters to Anthropic and OpenAI for
now, and produces a seed file checked into the repository.

On every server startup the seed is applied to the database, so new
releases automatically pick up any price corrections that landed since
the previous one. Existing rows are overwritten with the latest prices;
rows for models no longer in the seed are left untouched.

# Batching the AI model price seed: three approaches

Context: at server startup we seed the `ai_model_prices` table from an
embedded JSON price book (~70 rows today, will grow as we add providers,
potentially 4000+).

Each row is:

```text
(provider, model, input_price, output_price, cache_read_price, cache_write_price)
```

Any of the four price columns can be:

- `NULL` → “price unknown for this dimension”
- explicit `0` → “free”

The batch must be an UPSERT so re-running is idempotent and existing
rows pick up new prices.

We considered three implementations.

---

## Approach 1 — Per-row UPSERT in a Go loop

```go
for _, row := range rows {
    if err := db.UpsertAIModelPrice(ctx, database.UpsertAIModelPriceParams{
        Provider:   row.Provider,
        Model:      row.Model,
        InputPrice: nullInt64(row.InputPrice),
        // ...
    }); err != nil {
        return err
    }
}
```

### Pros

- Trivial.
- NULL handling falls out naturally from `sql.NullInt64`.

### Cons

- `N` round-trips per seed.
- With ~70 rows that means ~70 statement executions on every startup,
even inside a transaction.
- Doesn't scale gracefully as the price book grows, potentially 4000+.

---

## Approach 2 — `UNNEST` with parallel arrays

Pass each column as a separate Go slice. Postgres unnests them in
parallel into a virtual table, then `INSERT ... SELECT`.

```sql
INSERT INTO ai_model_prices (
    provider,
    model,
    input_price,
    output_price,
    cache_read_price,
    cache_write_price
)
SELECT
    UNNEST(@providers::text[]),
    UNNEST(@models::text[]),
    NULLIF(UNNEST(@input_prices::bigint[]), -1),
    NULLIF(UNNEST(@output_prices::bigint[]), -1),
    NULLIF(UNNEST(@cache_read_prices::bigint[]), -1),
    NULLIF(UNNEST(@cache_write_prices::bigint[]), -1)
ON CONFLICT (provider, model) DO UPDATE SET
    input_price       = EXCLUDED.input_price,
    output_price      = EXCLUDED.output_price,
    cache_read_price  = EXCLUDED.cache_read_price,
    cache_write_price = EXCLUDED.cache_write_price,
    updated_at        = NOW();
```

Go side: flatten rows into six parallel slices.

Use a sentinel (`-1`) for “missing”, since `lib/pq` can't encode `NULL`
into a `bigint[]` element.

```go
providers := make([]string, len(rows))
models    := make([]string, len(rows))
inputs    := make([]int64,  len(rows))
outputs   := make([]int64,  len(rows))
cacheR    := make([]int64,  len(rows))
cacheW    := make([]int64,  len(rows))

for i, r := range rows {
    providers[i] = r.Provider
    models[i]    = r.Model

    inputs[i] = -1
    if r.InputPrice != nil {
        inputs[i] = *r.InputPrice
    }

    outputs[i] = -1
    if r.OutputPrice != nil {
        outputs[i] = *r.OutputPrice
    }

    cacheR[i] = -1
    if r.CacheReadPrice != nil {
        cacheR[i] = *r.CacheReadPrice
    }

    cacheW[i] = -1
    if r.CacheWritePrice != nil {
        cacheW[i] = *r.CacheWritePrice
    }
}

return db.UpsertAIModelPrices(ctx, database.UpsertAIModelPricesParams{
    Providers:        providers,
    Models:           models,
    InputPrices:      inputs,
    OutputPrices:     outputs,
    CacheReadPrices:  cacheR,
    CacheWritePrices: cacheW,
})
```

### Pros

- Single round-trip.

### Cons

- The generated `sqlc` params become plain `[]int64`, which can't
represent `NULL`.

---

## Approach 3 — `jsonb_array_elements` over a single `@seed::jsonb`
(chosen)

Pass the raw seed JSON as one parameter; let Postgres expand and parse
it.

```sql
INSERT INTO ai_model_prices (
    provider,
    model,
    input_price,
    output_price,
    cache_read_price,
    cache_write_price
)
SELECT
    elem->>'provider',
    elem->>'model',
    (elem->>'input_price')::bigint,
    (elem->>'output_price')::bigint,
    (elem->>'cache_read_price')::bigint,
    (elem->>'cache_write_price')::bigint
FROM jsonb_array_elements(@seed::jsonb) AS elem
ON CONFLICT (provider, model) DO UPDATE SET
    input_price       = EXCLUDED.input_price,
    output_price      = EXCLUDED.output_price,
    cache_read_price  = EXCLUDED.cache_read_price,
    cache_write_price = EXCLUDED.cache_write_price,
    updated_at        = NOW();
```

Go side reduces to:

```go
return db.UpsertAIModelPrices(ctx, seedJSON)
```

### Pros

- Single round-trip.
- NULLs fall out naturally:
  - `(elem->>'cache_write_price')::bigint` becomes `NULL`
  - no sentinels
- The seed is already JSON:
- Existing precedent:
  - `jsonb_array_elements` is already used elsewhere in the codebase

### Cons

- Less type-safe at the SQL boundary than `UNNEST`
- Slightly less standard than `UNNEST`
- Readers need familiarity with:
  - `jsonb_array_elements`
  - `->>` extraction syntax
- Postgres pays JSON parse cost
  - negligible at our scale

---

---

# Decision

We picked Approach 3.

It collapses the round-trips like `UNNEST` does, but without:

- nullable-array workarounds
- sentinel values
2026-05-08 16:45:14 -04:00
Ethan dc14ab6b97 fix(Makefile): rebuild helper binaries when inputs change (#24954)
## Summary

This fixes the stale helper-binary class of generator bugs in the
Makefile by adding the repo packages and embedded files that are
compiled into each affected `_gen/bin/*` helper as real prerequisites of
the helper binary target.

The concrete issue that prompted this was an audit docs regeneration
after a rebase. `docs/admin/security/audit-logs.md` depends on
`enterprise/audit/table.go`, so the docs target reran, but
`_gen/bin/auditdocgen` was only an order-only prerequisite and its own
rule only depended on `scripts/auditdocgen/*.go`. Because the stale
local `auditdocgen` binary had been compiled before `UserSecret` was
added to `enterprise/audit/table.go`, it regenerated the audit docs
without the `UserSecret` row even though the source table still
contained it.

This is the same failure mode I recently fixed for `_gen/bin/clidocgen`
in #24302 and `_gen/bin/modeloptionsgen` in #24543. Those fixes made the
binaries depend on the package sources and embedded template files whose
compile-time data they read at runtime, rather than relying on output
targets to mention those files. This PR applies that pattern to the
other high-value helper binaries with the same risk.

## Changes

- Rebuild `_gen/bin/auditdocgen` when `enterprise/audit/*.go` changes,
so audit docs are generated from the current `AuditableResources` and
`AuditActionMap` data.
- Rebuild `_gen/bin/apitypings` when `codersdk/*.go` changes, and make
`typesGenerated.ts` rerun when the health packages it emits change.
- Rebuild `_gen/bin/check-scopes` and `_gen/bin/apikeyscopesgen` when
RBAC or policy sources change.
- Rebuild `_gen/bin/dbdump` when migration Go or SQL files change, since
the migrations package embeds SQL into the binary.
- Rebuild `_gen/bin/typegen` when its Go sources, embedded templates,
RBAC/policy inputs, string helper, or country data change. Generated
RBAC files are deliberately excluded from the typegen binary input set
to avoid cycles with typegen outputs.

## Why this covers the class

Most generated output targets keep helper binaries as order-only
prerequisites. That is fine for avoiding unnecessary output churn, but
it means the helper binary target must be the cache boundary and must
list everything baked into the compiled binary. The affected helpers
import repo packages that expose maps, constants, struct tags, embedded
templates, or embedded SQL. Without those files on the binary rule, Make
can rerun an output target with an old executable and write semantically
stale generated content.

The fix keeps the existing order-only output structure and instead makes
each binary rule track its compile-time inputs directly. That matches
the previous clidocgen and modeloptionsgen fixes while avoiding a broad
`$(GO_SRC_FILES)` dependency for helpers that only need a small set of
packages.


> Written by Mux, reviewed by a human
2026-05-06 11:57:36 +10:00
Nick Vigilante a7377f7613 fix(Makefile): map arm64 to aarch64 for typos binary download (#24986)
macOS ARM reports arm64 via uname -m, but typos GitHub release assets
use aarch64 in their filenames. The mismatch produces a 404, so the
build/typos-$(VERSION) target fails silently and Apple Silicon users
fall back to whatever typos binary their environment provides, such as
the one from nix. That binary may be a different version than the one
pinned in CI, creating a skew where local lint/typos rejects strings
that CI accepts.

<!--

If you have used AI to produce some or all of this PR, please ensure you
have read our [AI Contribution
guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING)
before submitting.

-->
2026-05-05 20:41:50 +00:00
Mathias Fredriksson ce125831d3 fix(Makefile): run storybook tests after Go tests in pre-push (#24703)
Rolldown's tokio workers stall when competing with Go compilation
and the production site build for CPU, causing Vite transform
requests to hang. Vitest browser mode has no import-phase timeout,
so a stalled browser import() blocks the run indefinitely.
2026-04-24 13:40:37 +03:00
Ethan ef2b3a7263 fix: rebuild modeloptionsgen when codersdk changes (#24543)
`_gen/bin/modeloptionsgen` reflects over `codersdk` struct tags, but
reflection doesn't read source — Go struct tags are compile-time string
literals folded into the type descriptor and emitted into the binary's
`.rodata` section. `reflect.TypeOf(...).Field(i).Tag.Get("enum")` reads
from that baked-in table; the generator cannot consult
`codersdk/chats.go` on disk even if it wanted to.

That means the binary has to be rebuilt whenever those tags change. The
existing Makefile rule only depends on `scripts/modeloptionsgen/*.go`,
and the JSON target lists the binary as an order-only prereq, so `make
gen` happily runs the stale binary after edits to `codersdk/chats.go`
and writes outdated enum values.

Fix: add `$(wildcard codersdk/*.go)` to the binary's prereqs, matching
`clidocgen`. The whole-package wildcard is deliberate — a narrower
prereq would break if someone splits `chats.go`. Cost is negligible:
Go's per-package build cache means unrelated edits recompile one package
and re-link, and the JSON target's own prereqs are unchanged so it
doesn't regenerate.
2026-04-22 00:11:09 +10:00
Mathias Fredriksson 623e72d72d chore: add no-emdash/endash rule to agent instructions and CI lint (#24375)
Add a lint check that prevents introduction of Unicode emdash (U+2014)
and endash (U+2013) characters. These are almost exclusively introduced
by AI agents and conflict with the project writing style.

The lint script (scripts/check_emdash.sh) checks only added lines in
the current diff by default, so existing violations do not block CI.
Pass --all to scan the entire repo for auditing.

Agent instructions in AGENTS.md, site/AGENTS.md, and the docs style
guide now explicitly ban emdash, endash, and " -- " as punctuation,
with guidance to use commas, semicolons, or periods instead.
2026-04-21 13:55:24 +03:00
Ethan 55e525fc28 ci: add InTx linter replacing ruleguard rule (#24422)
Replace the old `InTx` ruleguard rule in `scripts/rules.go` with a
custom in-tree `go/analysis` analyzer under `scripts/intxcheck/`. The
new analyzer catches the same direct and pass-through misuse classes as
before, plus two new classes the pattern-matcher couldn't reach:

- **Indirect same-package helper misuse** — flags `p.someHelper(ctx)`
inside `InTx` when the helper body uses the outer store (the PR #24369
bug class).
- **Nested dangerous closures** — descends into `go func() { ... }()`,
`defer func() { ... }()`, and immediately-invoked function literals.

The analyzer uses semantic `types.Object` identity instead of raw
expression string comparison, which avoids false positives from
closure-local shadowing and catches simple aliases like `outer := s.db`
and `alias := s`.

This PR also fixes three real outer-store-inside-transaction bugs the
new analyzer surfaced:

- `coderd/wsbuilder/wsbuilder.go`: `FindMatchingPresetID` and
`getWorkspaceTask` now use the inner transaction store instead of
`b.store`.
- `enterprise/dbcrypt/dbcrypt.go`: `ensureEncrypted` now calls
`s.InsertDBCryptKey` (the tx-wrapped store) instead of
`db.InsertDBCryptKey`. The `dbCrypt.InTx` method wraps the raw tx in a
new `*dbCrypt`, so `s.InsertDBCryptKey` still dispatches through the
encryption layer.

Two call sites need `// intxcheck:ignore` suppressions. Both are one-off
patterns that only look like misuse because the analyzer doesn't track
assignments — proving them safe would require full dataflow analysis,
which is well beyond what a targeted lint like this should attempt:

- `coderd/database/dbfake/dbfake.go` — `b.db` is reassigned to `tx` on
the preceding line, so `b.doInTX()` actually uses the transaction. The
analyzer sees the original `b.db` identity and flags it.
- `coderd/database/db_test.go` — test intentionally passes the outer
store to `require.Equal` to assert that nested `InTx` returns the same
handle.

Suppressions use `// intxcheck:ignore` instead of `//nolint:intxcheck`
because `intxcheck` runs as a standalone `go/analysis` tool outside
golangci-lint. golangci-lint's `nolintlint` checker flags `//nolint`
directives for linters it doesn't control, so we use a custom comment
prefix to avoid that conflict.
2026-04-17 00:07:30 +10:00
Kayla はな d23a6959fc chore: upgrade to ubuntu 26.04 (#24267) 2026-04-15 15:02:47 -06:00
Ethan 0080bcbf33 fix(Makefile): rebuild clidocgen when Go sources or template change (#24302)
The `_gen/bin/clidocgen` binary only declared `scripts/clidocgen/*.go`
as prerequisites. Since it reflects over the full CLI tree (227
transitive internal packages via `enterprise/cli` → `cli/` → `codersdk/`
→ …), any change to CLI flags, SDK structs, or command definitions could
alter its output — but Make would keep serving the stale binary until it
was manually deleted (or `-B` was passed).

This caused a recurring developer-facing bug: after merging main (or
rebasing onto new CLI/SDK changes), the pre-commit hook would use the
stale binary, commit wrong docs, `make gen` would see no diff (same
stale binary), and CI would fail because it builds fresh.

Add `$(GO_SRC_FILES)` and the embedded `command.tpl` to the prerequisite
list so Make invalidates the binary whenever its inputs change. Move
`FIND_EXCLUSIONS` and `GO_SRC_FILES` above the helper-binary block so
the variable is defined before first use.
2026-04-15 12:59:07 +10:00
Mathias Fredriksson a1ef3043bb fix: prevent site storybook tests from hanging after completion (#23936)
The vitest process hung after all 2132 story tests passed because
leftover refetchInterval polls kept the Node.js event loop alive.
Components that set per-query refetchInterval override the
QueryClient default, causing HTTP requests through vite's proxy
to localhost:3000 (no backend) that never resolve cleanly.

Three fixes:

- preview.tsx: disable all automatic refetching defaults and cancel
  in-flight queries on story unmount via useEffect cleanup
- storybook.tsx: save/restore the original window.WebSocket in the
  withWebSocket decorator, clear pending timers in close()
- vite.config.mts: add explicit testTimeout, hookTimeout, bail, and
  retry settings to the storybook vitest project

Also fix 5 story files that imported from @testing-library/react
instead of storybook/test.
2026-04-14 12:19:55 +00:00
Michael Suchacz f8e8f979a2 chore(Makefile): use go build -o for helper binaries to reduce GOCACHE growth (#24197)
## Problem

`go run` caches the final linked executable in `~/.cache/go-build`.
Every
helper invocation via `go run ./scripts/<tool>` stores a copy, and
because
the cache key includes build metadata, the same tool accumulates
multiple
cached executables over time. With 12+ helper binaries invoked during
`make gen` and `make pre-commit`, this is a meaningful contributor to
GOCACHE growth.

## Fix

Replace `go run` with `go build -o _gen/bin/<tool>` for 12 repo-local
helper packages (16 Makefile callsites). Each helper is an explicit Make
file target with `$(wildcard *.go)` prerequisites, so `make -j`
serializes
builds correctly instead of racing on shared output paths.

Helpers converted: `apitypings`, `auditdocgen`, `check-scopes`,
`clidocgen`, `dbdump`, `examplegen`, `gensite`, `apikeyscopesgen`,
`metricsdocgen`, `metricsdocgen-scanner`, `modeloptionsgen`, `typegen`.

Left on `go run` (intentionally): `migrate-ci` and `migrate-test`
(CI/test-only, not on common developer paths).

`_gen/` is already in `.gitignore`. The `clean` target removes
`_gen/bin`.

## GOCACHE growth (isolated cache, single `make gen`)

|  | Old (`go run`) | New (`go build -o`) |
|--|----------------|---------------------|
| Total cache size | 2.9 GB | 2.6 GB |
| Cached executables | 11 | 4 |
| Executable bytes | 401 MB | 25 MB |

The 4 remaining executables come from tools outside this change
(`dbgen` and `goimports` from `generate.sh`, plus two `main` binaries
from deferred helpers). Helper binaries now live in `_gen/bin/`
(581 MB, gitignored, cleaned by `make clean`).

## Build time benchmarks

**Source changed** (content hash invalidated, forces recompile):

| Helper | `go run` | `go build -o` + run | Overhead |
|--------|---------|---------------------|----------|
| typegen | 1.50s | 2.03s | +0.52s |
| examplegen | 1.37s | 1.67s | +0.30s |
| apikeyscopesgen | 1.21s | 1.71s | +0.50s |
| modeloptionsgen | 1.23s | 1.64s | +0.41s |

**Repeat invocation** (no source change, the common `make gen` / `make
pre-commit` path):

| Helper | `go run` (cache lookup) | Cached binary | Speedup |
|--------|------------------------|---------------|---------|
| typegen | 0.346s | 0.037s | 9.4x |
| examplegen | 0.368s | 0.037s | 9.9x |
| modeloptionsgen | 0.342s | 0.021s | 16.3x |
| apikeyscopesgen | 0.298s | 0.030s | 9.9x |

When source changes, `go build -o` is 0.3-0.5s slower per helper (it
writes a local binary instead of caching in GOCACHE). On repeat runs
(the common path), the pre-built binary is 10-16x faster because
`go run` still does a staleness check while the binary just executes.

> This PR was authored by Mux on behalf of Mike.
2026-04-09 16:04:06 +02:00
Kyle Carberry ee855f9618 feat: make agent context paths configurable via env vars (#23878)
Replace hardcoded paths for instruction files, skills, and MCP config
with
values read from `CODER_AGENT_EXP_*` environment variables. Template
authors
configure paths via the existing `coder_agent` `env` block. The agent
resolves `~`, relative, and absolute paths locally, then serves the
resolved config over `GET /api/v0/context-config`. `chatd` fetches this
once per workspace attach and falls back to today's defaults for older
agents.

All path env vars are comma-separated, allowing multiple directories:

| Env Var | Default | Controls |
|---|---|---|
| `CODER_AGENT_EXP_INSTRUCTIONS_DIRS` | `~/.coder` | Dirs containing the
instruction file |
| `CODER_AGENT_EXP_INSTRUCTIONS_FILE` | `AGENTS.md` | Instruction file
name |
| `CODER_AGENT_EXP_SKILLS_DIRS` | `.agents/skills` | Skills directories
|
| `CODER_AGENT_EXP_SKILL_META_FILE` | `SKILL.md` | Skill metadata file
name |
| `CODER_AGENT_EXP_MCP_CONFIG_FILES` | `.mcp.json` | MCP config files |

### Example

```hcl
resource "coder_agent" "main" {
  os   = "linux"
  arch = "amd64"
  env = {
    CODER_AGENT_EXP_INSTRUCTIONS_DIRS  = "/opt/company/agent-config,~/.coder"
    CODER_AGENT_EXP_INSTRUCTIONS_FILE  = "CLAUDE.md"
    CODER_AGENT_EXP_SKILLS_DIRS        = "/opt/company/ai-skills,.agents/skills"
    CODER_AGENT_EXP_MCP_CONFIG_FILES   = "/opt/company/mcp.json,.mcp.json"
  }
}
```

<details>
<summary>Implementation Details</summary>

### Architecture

Follows the same pattern as MCP tool discovery:
agent resolves locally → exposes via HTTP → chatd consumes.

**Agent-side** (`agent/agentcontextconfig/`):
- `ResolvePath` / `ResolvePaths` handle `~`, relative, and absolute path
forms; returns `""` for relative paths when baseDir is empty
- `Config` reads env vars, falls back to defaults, resolves all paths
- `GET /api/v0/context-config` serves the resolved config as JSON

**chatd-side** (`coderd/x/chatd/`):
- Calls `conn.ContextConfig()` once on first workspace attach
- Falls back to hardcoded defaults on 404 (older agents)
- Iterates instruction dirs, skills dirs using resolved absolute paths
- `LSRelativityRoot` everywhere — no more home/root juggling

### Key design decisions

- **`EXP_` prefix**: env vars use `CODER_AGENT_EXP_*` to indicate
experimental status
- **Plural names**: comma-separated vars use plural names (`DIRS`,
`FILES`); single-value vars use singular (`FILE`)
- **Defaults in `workspacesdk`**: default constants live in
`codersdk/workspacesdk/` so both agent and server reference them without
cross-layer imports
- **`skillMetaFile` persistence**: stored on context-file parts via
`ContextFileSkillMetaFile` and restored on subsequent chat turns so
custom values survive across turns
- **Working dir dedup**: `slices.Contains` guard prevents reading the
same instruction file from both `InstructionsDirs` and the working
directory
- **MCP server dedup**: first-occurrence-wins dedup prevents leaking
duplicate connections from overlapping config files
- **ResolvePath safety**: returns `""` for relative paths when `baseDir`
is empty, so `ResolvePaths` filters them out

### Files changed

| File | Change |
|---|---|
| `agent/agentcontextconfig/` | New package — path resolution + HTTP
endpoint |
| `codersdk/workspacesdk/agentconn.go` | `ContextConfigResponse` type,
default constants, client method |
| `agent/agent.go` + `agent/api.go` | Wire up endpoint, pass config to
MCP |
| `agent/x/agentmcp/manager.go` | Accept `[]string` MCP config paths,
dedup by name |
| `coderd/x/chatd/chatd.go` | Fetch config, thread through, named
returns |
| `coderd/x/chatd/instruction.go` | Accept configurable dir + file name,
`skillMetaFileFromParts` |
| `coderd/x/chatd/chattool/skill.go` | Accept configurable dirs + meta
file |
| `codersdk/chats.go` | `ContextFileSkillMetaFile` field for persistence
|

### Test coverage

- `TestConfig` (4 cases): defaults, custom env vars, whitespace
trimming, comma-separated dirs
- `TestResolvePath` / `TestResolvePaths`: including empty baseDir edge
case
- `TestPersistInstructionFilesFallbackOnOlderAgent`: backward-compat
path when `ContextConfig` returns 404
- `TestChatMessagePartVariantTags`: updated exclusion list for new
internal field

### Backward compatibility

Older agents return 404 for the new endpoint. `chatd` catches this and
falls back to today's defaults via `readHomeInstructionFile` (using
`LSRelativityHome`). Existing workspaces work with no changes.

</details>
2026-04-01 12:28:47 -04:00
Mathias Fredriksson 7fb93dbf0e build: lock provider version in provisioner/terraform/testdata (#23776)
The terraform testdata fixtures silently drift when the coder provider
releases a new version. The .terraform.lock.hcl files are gitignored,
.tf files use loose constraints (>= 2.0.0), and generate.sh always
runs terraform init -upgrade. The Makefile only re-runs generate.sh
when the terraform CLI version changes, not the provider version.

Track a canonical lockfile and provider-version.txt in git. Change
generate.sh to respect the lockfile by default (terraform init without
-upgrade). Add --upgrade flag for intentional provider bumps, --check
for cheap staleness detection in the Makefile, and a new
update-terraform-testdata make target.
2026-03-30 16:37:25 +03:00
Mathias Fredriksson b23aed034f fix: make terraform ConvertState fully deterministic (#23459)
All map iterations in ConvertState now use sorted helpers instead of
ranging over Go maps directly. Previously only coder_env and
coder_script were sorted (via sortedResourcesByType). This extends
the pattern to coder_agent, coder_devcontainer, coder_agent_instance,
coder_app, coder_metadata, coder_external_auth, and the main
resource output list.

Also fixes generate.sh writing version.txt to the wrong directory
(resources/ instead of testdata/), which caused the Makefile version
check to silently desync and trigger unnecessary regeneration.

Adds TestConvertStateDeterministic that calls ConvertState 10 times
per fixture and asserts byte-identical JSON output without any
post-hoc sorting.
2026-03-24 11:02:45 +00:00
Mathias Fredriksson 145817e8d3 fix(Makefile): install playwright browsers before storybook tests (#23456)
The test-storybook target uses @vitest/browser-playwright with
Chromium but never installs the browser binaries. pnpm install
only fetches the npm package; the actual browser must be
downloaded separately via playwright install. This mirrors what
test-e2e already does.
2026-03-23 20:57:03 +00:00
Mathias Fredriksson 23542cb6af feat: smart file-based target selection for scripts/githooks (#23358)
Pre-commit classifies staged files and runs make pre-commit-light
when no Go, TypeScript, or Makefile changes are present. This
skips gen, lint/go, lint/ts, fmt/go, fmt/ts, and the binary
build. A markdown-only commit takes seconds instead of minutes.

Pre-push uses the same heuristic: if only light files changed
(docs, shell, terraform, etc.), tests are skipped entirely.
Falls back to the full make targets when Go/TS/Makefile changes
are detected, CODER_HOOK_RUN_ALL=1 is set, or the diff range
can't be determined.

Also adds test-storybook to make pre-push (vitest with the
storybook project in Playwright browser mode).
2026-03-20 17:05:44 +02:00
Mathias Fredriksson a797a494ef feat: add starter template option and Coder Desktop URLs to scripts/develop (#23149)
- Add `--starter-template` option and properly create starter template
  with name and icon
- Add Coder Desktop URLs to listening banner
- Makefile tweak to avoid rebuilding `scripts/develop` every time Go
  code changes
2026-03-17 15:34:03 +02:00
Mathias Fredriksson 3a3537a642 refactor: rewrite develop.sh orchestrator in Go (#23054)
Replace the ~370-line bash develop.sh with a Go program using
serpent for CLI flags, errgroup for process lifecycle, and
codersdk for setup. develop.sh becomes a thin make + exec wrapper.

- Process groups for clean shutdown of child trees
- Docker template auto-creation via SDK ExampleID
- Idempotent setup (users, orgs, templates)
- Configurable --port, --web-port, --proxy-port
- Preflight runs lib.sh dependency checks
- TCP dial for port-busy checks
- Make target (build/.bin/develop) for build caching
2026-03-16 16:13:57 +02:00
Mathias Fredriksson 660a3dad21 feat(scripts/githooks): restore pre-push hook with allowlist (#22980)
The pre-push hook was removed in #22956. This restores it with a
reduced scope (tests + site build) and an allowlist so it only runs
for developers who opt in.

Two opt-in mechanisms:

- git config coder.pre-push true (local, not committed)
- CODER_WORKSPACE_OWNER_NAME allowlist in the hook script

git config takes priority and also supports explicit opt-out for
allowlisted users (git config coder.pre-push false).

Refs #22956

---------

Co-authored-by: Cian Johnston <cian@coder.com>
2026-03-12 12:13:55 +02:00
Mathias Fredriksson e7e2de99ba build(Makefile): capture pre-commit output to log files (#22978)
pre-commit was noisy: every sub-target dumped full stdout/stderr to the
terminal, burying failures in pages of compiler output and lint details.

Teach timed-shell.sh a quiet mode via MAKE_LOGDIR: when set, recipe
output is redirected to per-target log files and a one-line status is
printed instead. When unset, behavior is unchanged (with a refreshed
output format).

Makefile changes:

- pre-commit creates a tmpdir, passes MAKE_LOGDIR to sub-makes
- Drop --output-sync=target (log files eliminate interleaving)
- Add --no-print-directory to suppress Entering/Leaving noise
- Split check-unstaged and check-untracked into separate defines
- Restyle both with colored indicators and clearer instructions
- Clean up tmpdir on success, preserve on failure for debugging
2026-03-12 11:16:31 +02:00
Thomas Kosiewski e96cd5cbb2 chore(githooks): remove pre-push hook (#22956)
## Summary
- remove the `pre-push` git hook script from the repository
- remove the `make pre-push` target and related Makefile documentation
- update contributor and agent docs so they only describe the remaining
`pre-commit` hook

## Validation
- `make pre-commit`
- `git diff --check`

---
_Generated with [`mux`](https://github.com/coder/mux) • Model:
`openai:gpt-5.4` • Thinking: `high`_
2026-03-11 17:44:19 +01:00
Kyle Carberry d3986b53b9 perf(ci): use fast zstd compression for non-release CI builds (#22907)
## Problem

The `build` job on `main` takes ~7m28s for the Build step alone (~13m
total). Analysis of 10 recent CI runs on `main` shows the zstd
compression of the slim binary archive is the second largest bottleneck:

| Phase | Avg Duration | % of Build Step |
|-------|-------------|----------------|
| Fat Go builds (7 binaries w/ embed) | ~205s | 45.8% |
| **zstd compression (`-22 --ultra`)** | **~123s** | **27.4%** |
| Parallel block (vite + slim Go builds) | ~65s | 14.5% |
| Packaging + signing | ~55s | 12.3% |

The `zstd -22 --ultra` setting compresses a ~350 MB tar to ~71 MB, but
it is **single-threaded** and takes ~102s on 8-core CI runners. Adding
`-T8` does not help at level 22 — it remains CPU-bound on a single
thread.

## Solution

Use `zstd -6 -T0` (multithreaded, auto-detect cores) for non-release CI
builds. Release builds (`CODER_RELEASE=true`) continue using `-22
--ultra`.

### Benchmarks (349 MB slim binary tar, 8 cores)

| Setting | Wall Time | Output Size | Use Case |
|---------|----------|------------|----------|
| `-22 --ultra` | **102.4s** | 71 MB | Release builds |
| `-6 -T0` | **0.8s** | 94 MB | CI builds (new) |
| `-6` | 2.4s | 94 MB | Local dev (unchanged) |

The 23 MB size increase is negligible for the main branch preview images
(`ghcr.io/coder/coder-preview:main`). The archive is embedded in fat
binaries and extracted once by the agent at startup — decompression time
is identical regardless of compression ratio.

### Expected impact

~120s savings on the Build step, bringing it from ~7m28s to ~5m30s.

## Verification

All three code paths confirmed:
- `CODER_RELEASE=true CI=true` → `-22 --ultra` 
- `CI=true` (no `CODER_RELEASE`) → `-6 -T0` 
- Local (no `CI`) → `-6` 
- `CODER_RELEASE=false CI=true` (dry run) → `-6 -T0` 
2026-03-10 15:54:32 +00:00
Mathias Fredriksson abdfadf8cb build(Makefile): fix lint/go recipe by using bash subshell (#22874)
The `lint/go` recipe used `$(shell)` inside a recipe to extract the
golangci-lint version. When `MAKE_TIMED=1` (set by pre-commit/pre-push),
make expands `.SHELLFLAGS = $@ -ceu` for `$(shell)` calls, passing the
target name as the first argument to `timed-shell.sh`. Since the target
name doesn't start with `-`, the timing code path runs and its banner
output contaminates the captured value, causing intermittent failures:

```
bash: line 3: lint/go: No such file or directory
```

Replace with bash command substitution (`$$()`), which is the correct
approach under `.ONESHELL` and avoids the `SHELL`/`.SHELLFLAGS`
interaction entirely. Also replaces deprecated `egrep` with `grep -oE`.
2026-03-10 12:07:44 +02:00
Mathias Fredriksson 56960585af build(Makefile): add per-target timing via SHELL wrapper (#22862)
pre-commit and pre-push only reported total elapsed time at the end,
making it hard to identify which jobs are slow.

Add a `MAKE_TIMED=1` mode that replaces `SHELL` with a wrapper
(`scripts/lib/timed-shell.sh`) to print wall-clock time for each
recipe. pre-commit and pre-push enable this on their sub-makes.

Ad-hoc use: `make MAKE_TIMED=1 test`
2026-03-09 23:07:33 +02:00
Mathias Fredriksson 1a2eea5e76 build(Makefile): harden make pre-push (#22849)
- Fix dead docker pull retry loop (Make ate bash expansions)
- Make test-postgres-docker idempotent so Phase 2 stops restarting it
  mid-test
- Run migrate-ci at recipe time, not parse time
- Install Playwright browsers before e2e tests
- Set test timeout to 20m, 5m shy of CI's 25m job limit
- Cap parallelism at nproc/4 via PARALLEL_JOBS
- Add phase banners and elapsed time
2026-03-09 17:26:34 +00:00
Mathias Fredriksson a96ec4c397 build: remove defunct test-postgres rule (#22839)
The `test-postgres` Makefile rule was redundant — CI never used it (it
runs `test-postgres-docker` + `make test` via the `test-go-pg` action),
and `make test` auto-starts a Postgres Docker container when needed via
`dbtestutil`.

- Remove the `test-postgres` rule from Makefile
- Update `pre-push` to run `test-postgres-docker` in the first phase
(alongside gen/fmt) and `make test` in the second phase
- Fix stale comments in CI workflows referencing `make test-postgres`
- Remove redundant "Test Postgres" entries from docs since `make test`
handles Postgres automatically
2026-03-09 16:24:40 +02:00
Mathias Fredriksson a48e4a43e2 fix(Makefile): align test-race with CI configuration (#22727)
Follow-up to #22705 (pre-commit/pre-push hooks).

Unifies `test` and `test-race` into the same structure and lets CI call
`make test-race` instead of reproducing the gotestsum command.

**Parallelism**: Extracted from `GOTEST_FLAGS` into
`TEST_PARALLEL_PACKAGES`
/ `TEST_PARALLEL_TESTS` (default 8x8). `test-race` overrides to 4x4 via
target-specific Make variables. `TEST_NUM_PARALLEL_PACKAGES` and
`TEST_NUM_PARALLEL_TESTS` env vars continue to work for both targets.

**GOTEST_FLAGS**: Changed from simply-expanded (`:=`) to
recursively-expanded
(`=`) so target-specific overrides take effect at recipe time.

**CI**: `.github/actions/test-go-pg/action.yaml` now calls `make
test-race`
/ `make test` instead of hand-rolling the gotestsum command, eliminating
drift between local and CI configurations.

Refs #22705
2026-03-09 10:39:13 +00:00
Mathias Fredriksson 752e6ecc16 build: add pre-commit/push hooks mirroring CI checks (#22705)
This change adds git hooks and Makefile targets that mirror CI required
checks locally, catching issues before they reach CI.

This is for use by AI agents (documented in AGENTS.md).

- **pre-commit** (every commit): gen, fmt, lint, typos, slim binary
  build. Fast checks without Docker or Playwright.
- **pre-push** (before push): full CI suite including site build, tests,
  sqlc-vet, offlinedocs.
  
To use:

```sh
git config core.hooksPath scripts/githooks
```

Works in worktrees (where `.git` is a file). Bypass with `--no-verify`.
2026-03-06 16:56:11 +02:00
Kacper Sawicki ba05188934 ci: add lint check to prevent single quotes in bootstrap scripts (#22664)
## Problem

Bootstrap scripts under `provisionersdk/scripts/` are inlined into
templates via `sh -c '${init_script}'`. Any single quote (apostrophe) in
these `.sh` files silently breaks the shell quoting, causing the agent
to never start — with near-invisible error output.

## Changes

- **`scripts/check_bootstrap_quotes.sh`** — new lint script that scans
all `.sh` files under `provisionersdk/scripts/` for single quotes and
fails with a clear error if any are found. Only checks shell scripts
(not `.ps1`, which legitimately uses single quotes).
- **`Makefile`** — added `lint/bootstrap` target wired into the `lint`
dependency list.

Fixes #22062
2026-03-06 13:09:56 +01:00
Mathias Fredriksson 719c24829a build(Makefile): use atomic writes for remaining gen targets (#22670)
Follow-up to #22612. Running `git status --short` in a loop during `make
-B -j gen` still showed intermediate states for several files. This PR
fixes the remaining ones.

The main issues:

- `generate.sh` ran `gofmt` and `goimports` in-place after moving files
  into the source tree. Now it formats in a workdir first and only `mv`s 
  the final result.
- `protoc` targets wrote directly to the source tree. Wrapped with
  `scripts/atomic_protoc.sh` which redirects output to a tmpdir.
- Several generators used hardcoded `/tmp/` paths. On systems where
  `/tmp` is tmpfs, `mv` degrades to copy+delete. Switched to a
  project-local `_gen/` directory (gitignored, same filesystem).
- `apidoc/.gen` and `cli/index.md` used `cp` for final output. Replaced
  with `mv`.
- `manifest.json` was written twice (unformatted, then formatted). Now
  `.gen` writes to a staging file and the manifest target does one
  formatted atomic write.
- `biome_format.sh` silently skipped files in gitignored dirs. Added
  `--vcs-enabled=false`.

Two helpers reduce the Makefile boilerplate: `scripts/atomic_protoc.sh`
(wraps protoc) and an `atomic_write` Make define
(stdout-to-temp-to-target pattern). `.PRECIOUS` now also covers `.pb.go`
and mock files.

Verification: `make -B -j gen` x3 with `git status` polling, no changes.

Refs #22612
2026-03-05 22:32:18 +02:00
Mathias Fredriksson a6a8fd94d7 build(Makefile): enable parallel make -j gen with correct dependency graph (#22612)
`make gen` could not run with `-j` because inter-target dependency edges
were missing. Multiple recipes compile `coderd/rbac` (which includes
generated files like `object_gen.go`), and without explicit ordering,
parallel runs produced syntax errors from mid-write reads.

Three main changes:

**Dependency graph fixes** declare the compile-time chain through
`coderd/rbac` so that `object_gen.go` is written before anything that
imports it is compiled. The DB generation targets use a GNU Make 4.3+
grouped target (`&:`) so Make knows `generate.sh` co-produces
`querier.go`, `unique_constraint.go`, `dbmetrics`, and `dbauthz` in a
single invocation. `SKIP_DUMP_SQL=1` avoids re-entrant `make` inside
`generate.sh` when the Makefile already guarantees `dump.sql` is fresh.

**`scripts/atomicwrite` package** replaces `os.WriteFile` in all gen
scripts with a temp-file-in-same-dir + rename pattern, preventing
interrupted runs from leaving partial files.

**`.PRECIOUS` and shell atomic writes** protect git-tracked generated
files from Make's default delete-on-error behavior. Since these files
are committed, deletion is worse than staleness -- `git restore` is the
recovery path.

CI now runs `make -j --output-sync -B gen` (~32s, down from ~85s
serial).

| Scenario                          | Before             | After    |
|-----------------------------------|--------------------|----------|
| `make gen` (serial)               | 95s                | 95s      |
| `make -j gen` (parallel)          | race error         | **22s**  |
| CI `make -j --output-sync -B gen` | forced serial ~85s | **~32s** |
2026-03-05 11:58:10 +00:00
Spike Curtis 7cc2b22568 chore: expose UpdateAppStatus on agentsocket (#22353)
relates to #21335

Adds UpdateAppStatus on the agentsocket, wired up to forward to Coderd over the dRPC connection the agent maintains.

Disclosure: I used AI to generate significant portions of this PR, but hand-reviewed and tweaked the code. I consider it approximately indistinguishable from what I would have done by hand.
2026-03-04 21:18:17 +04:00
Ethan e738ff5299 ci: remove dylib build pipeline (#22592)
## Summary

The macOS `.dylib` is only used by Coder Desktop macOS v0.7.2 or older.
v0.7.2 was released in August 2025. v0.8.0 of Coder Desktop macOS, also
released in August 2025, uses a signed Coder slim binary from the
deployment instead.

It's unlikely customers will be using Coder Desktop macOS v0.7.2 and the
next release of Coder simultaneously, so I think we can safely remove
this process, given it slows down CI & release processes.

## Changes

- **Makefile**: Remove `DYLIB_ARCHES`, `CODER_DYLIBS` variables and
`build/coder-dylib` target
- **scripts/build_go.sh**: Remove `--dylib` flag and all dylib-specific
logic (c-shared buildmode, CGO, plist embedding, vpn/dylib entrypoint)
- **scripts/sign_darwin.sh**: Remove dylib-specific comment
- **CI (ci.yaml)**: Remove `build-dylib` job, artifact download/insert
steps, and `build-dylib` dependency from `build` job
- **Release (release.yaml)**: Remove `build-dylib` job, artifact
download/insert steps, and `build-dylib` dependency from `release` job
- **vpn/dylib/**: Delete entire directory (`lib.go` + `info.plist.tmpl`)
- **vpn/router.go, vpn/dns.go**: Clean up comments referencing dylib

The slim and fat binary builds are completely unaffected — the dylib was
an independent build target with its own CI job.

_Generated by mux but reviewed by a human_
2026-03-05 01:50:50 +11:00
Kyle Carberry f758443f44 feat(codersdk): generate chat model provider options schema from Go structs (#22568) 2026-03-03 21:29:58 +00:00
Zach 66954aead0 feat: add TagV2 BoundaryMessage envelope protocol (#22520)
Extend the wire protocol for the boundary <-> agent unix socket with
a message envelope.

The envelope creates a boundary <-> agent data path that is separate
from the agent <-> coderd path. This lets boundary send operational
metadata (drop counts, configuration like jail type, capabilities)
that the agent can act on locally (e.g. Prometheus metrics) or use
to enrich outbound requests, without polluting the coderd-facing proto
with fields coderd never consumes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 09:13:11 -07:00
Kyle Carberry edee917d88 feat: add experimental agents support (#22290)
feat: add AI chat system with agent tools and chat UI

Introduce the chatd subsystem and Agents UI for AI-powered chat
within Coder workspaces.

- Add chatd package with chat loop, message compaction, prompt
  management, and LLM provider integration (OpenAI, Anthropic)
- Add agent tools: create workspace, list/read templates, read/write/
  edit files, execute commands
- Add chat API endpoints with streaming, message editing, and
  durable reconnection
- Add database schema and migrations for chats, chat messages, chat
  providers, and chat model configs
- Add RBAC policies and dbauthz enforcement for chat resources
- Add Agents UI pages with conversation timeline, queued messages
  list, diff viewer, and model configuration panel
- Add comprehensive test coverage including coderd integration tests,
  chatd unit tests, and Storybook stories
- Gate feature behind experiments flag

---------

Co-authored-by: Cian Johnston <cian@coder.com>
Co-authored-by: Danielle Maywood <danielle@themaywoods.com>
Co-authored-by: Jeremy Ruppel <jeremy@coder.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 16:50:56 +00:00
Susana Ferreira a613ffa3d6 chore: integrate metrics scanner into Makefile (#21465)
## Description

This PR wires up the metrics scanner in the Makefile to automatically regenerate metrics documentation when source files change.

## Changes

* Add Makefile target `scripts/metricsdocgen/generated_metrics` to run the AST scanner to generate the metrics file
* Update `docs/admin/integrations/prometheus.md` Makefile target to depend on `scripts/metricsdocgen/generated_metrics`
* Add `scripts/metricsdocgen/README.md` documenting the metrics generation process

Closes: https://github.com/coder/coder/issues/13223
2026-02-13 12:31:33 +00:00
Marcin Tojek 456c0bced9 fix: enable strict mode for swagger generation & upgrade swag (#21975)
Adds a Go wrapper (`scripts/apidocgen/swaginit/main.go`) that calls
swag's Go API with `Strict: true`. The `--strict` flag isn't available
in swag's CLI in any version, so the wrapper is the only way to enable
it.

Also upgrades swag from v1.16.2 to v1.16.6 (better generics support,
precise numeric formats, `x-enum-descriptions`, CVE-2024-45338 fix).
2026-02-06 13:04:35 +01:00
Dean Sheather bcc57632dd ci: split lint-actions into separate job to reduce flakes (#21834)
## Summary

The `lint/actions/zizmor` target flakes in CI due to network
connectivity issues when running on depot runners
(https://github.com/coder/internal/issues/1233). The zizmor tool needs
to reach GitHub's API but intermittently fails with "Connection refused"
errors.

## Changes

- Creates a new `lint-actions` CI job that only runs when `.github/**`
files are touched (using existing `ci` filter)
- Removes zizmor from the main `lint` job  
- Uses a Makefile conditional to include actionlint in `make lint`
locally but skip it in CI (where `lint-actions` handles it)

This reduces unnecessary flake exposure for PRs that don't modify GitHub
Actions files.

## Testing

- `actionlint` passes on the modified ci.yaml
- Verified Makefile conditional works: actionlint included locally,
skipped when `CI=true`

Fixes https://github.com/coder/internal/issues/1233
2026-02-03 00:32:09 +11:00
blinkagent[bot] d5296a4855 chore: add lint/migrations to detect hardcoded public schema (#21496)
## Problem

Migration 000401 introduced a hardcoded `public.` schema qualifier which
broke deployments using non-public schemas (see #21493). We need to
prevent this from happening again.

## Solution

Adds a new `lint/migrations` Make target that validates database
migrations do not hardcode the `public` schema qualifier. Migrations
should rely on `search_path` instead to support deployments using
non-public schemas.

## Changes

- Added `scripts/check_migrations_schema.sh` - a linter script that
checks for `public.` references in migration files (excluding test
fixtures)
- Added `lint/migrations` target to the Makefile
- Added `lint/migrations` to the main `lint` target so it runs in CI

## Testing

- Verified the linter **fails** on current `main` (which has the
hardcoded `public.` in migration 000401)
- Verified the linter **passes** after applying the fix from #21493

```bash
# On main (fails)
$ make lint/migrations
ERROR: Migrations must not hardcode the 'public' schema. Use unqualified table names instead.

# After fix (passes)
$ make lint/migrations
Migration schema references OK
```

## Depends on

- #21493 must be merged first (or this PR will fail CI until it is)

---------

Signed-off-by: Danny Kopping <danny@coder.com>
Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>
Co-authored-by: Danny Kopping <danny@coder.com>
2026-01-15 14:17:16 +02:00