## Description
This PR wires up the metrics scanner in the Makefile to automatically regenerate metrics documentation when source files change.
## Changes
* Add Makefile target `scripts/metricsdocgen/generated_metrics` to run the AST scanner to generate the metrics file
* Update `docs/admin/integrations/prometheus.md` Makefile target to depend on `scripts/metricsdocgen/generated_metrics`
* Add `scripts/metricsdocgen/README.md` documenting the metrics generation process
Closes: https://github.com/coder/coder/issues/13223
Adds a Go wrapper (`scripts/apidocgen/swaginit/main.go`) that calls
swag's Go API with `Strict: true`. The `--strict` flag isn't available
in swag's CLI in any version, so the wrapper is the only way to enable
it.
Also upgrades swag from v1.16.2 to v1.16.6 (better generics support,
precise numeric formats, `x-enum-descriptions`, CVE-2024-45338 fix).
## Summary
The `lint/actions/zizmor` target flakes in CI due to network
connectivity issues when running on depot runners
(https://github.com/coder/internal/issues/1233). The zizmor tool needs
to reach GitHub's API but intermittently fails with "Connection refused"
errors.
## Changes
- Creates a new `lint-actions` CI job that only runs when `.github/**`
files are touched (using existing `ci` filter)
- Removes zizmor from the main `lint` job
- Uses a Makefile conditional to include actionlint in `make lint`
locally but skip it in CI (where `lint-actions` handles it)
This reduces unnecessary flake exposure for PRs that don't modify GitHub
Actions files.
## Testing
- `actionlint` passes on the modified ci.yaml
- Verified Makefile conditional works: actionlint included locally,
skipped when `CI=true`
Fixes https://github.com/coder/internal/issues/1233
## Problem
Migration 000401 introduced a hardcoded `public.` schema qualifier which
broke deployments using non-public schemas (see #21493). We need to
prevent this from happening again.
## Solution
Adds a new `lint/migrations` Make target that validates database
migrations do not hardcode the `public` schema qualifier. Migrations
should rely on `search_path` instead to support deployments using
non-public schemas.
## Changes
- Added `scripts/check_migrations_schema.sh` - a linter script that
checks for `public.` references in migration files (excluding test
fixtures)
- Added `lint/migrations` target to the Makefile
- Added `lint/migrations` to the main `lint` target so it runs in CI
## Testing
- Verified the linter **fails** on current `main` (which has the
hardcoded `public.` in migration 000401)
- Verified the linter **passes** after applying the fix from #21493
```bash
# On main (fails)
$ make lint/migrations
ERROR: Migrations must not hardcode the 'public' schema. Use unqualified table names instead.
# After fix (passes)
$ make lint/migrations
Migration schema references OK
```
## Depends on
- #21493 must be merged first (or this PR will fail CI until it is)
---------
Signed-off-by: Danny Kopping <danny@coder.com>
Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>
Co-authored-by: Danny Kopping <danny@coder.com>
Usage example:
```bash
$ make test TEST_CPUPROFILE=cpu.prof TEST_MEMPROFILE=mem.prof TEST_PACKAGES=./coderd
```
Note that `TEST_PACKAGES` has to be specified, otherwise you get `cannot
use -{cpu,memory}profile flag with multiple packages`.
Signed-off-by: Danny Kopping <danny@coder.com>
Go 1.24 adds [tool
dependencies](https://go.dev/doc/modules/managing-dependencies#tools).
This allows us to track versions of tools in our `go.mod` instead of
sprinkling various `go run` commands throughout our codebase.
NOTE: there are still various hard-coded `go install` commands in our
dogfood Dockerfile. As that list is likely severely outdated, will leave
that for a separate PR.
Modifies `make fmt/go` to also use https://github.com/daixiang0/gci to format our imports to a standard format.
It introduces a new shell script to do the formatting so that our formatting tools are in one place.
relates to: https://github.com/coder/internal/issues/1094
This is number 2 of 5 pull requests in an effort to add agent script
ordering. It adds a drpc API that is exposed via a local socket. This
API serves access to a lightweight DAG based dependency manager that was
inspired by systemd.
In follow-up PRs:
* This unit manager will be plumbed into the workspace agent struct.
* CLI commands will use this agentsocket api to express dependencies
between coder scripts
I used an LLM to produce some of these changes, but I have conducted
thorough self review and consider this contribution to be ready for an
external reviewer.
Adds columns to track package and test name to test_databases table, and populates them as databases are created using the Broker.
In order to seamlessly work with existing `coder_database` databases with the old schema, the SQL that creates the table and columns is additive and idempotent, so we run it every time we initialize the Broker (once per test binary execution).
We include a transaction level advisorly lock to prevent deadlocks before attempting to alter the schema. I was seeing deadlocks without this.
Relates to https://github.com/coder/internal/issues/1093
This is the first of N pull requests to allow coder script ordering.
It introduces what is for now dead code, but paves the way for various
interfaces that allow coder scripts and other processes to depend on one
another via CLI commands and terraform configurations.
The next step is to add reactivity to the graph, such that changes in
the status of one vertex will propagate and allow other vertices to
change their own statuses.
Concurrency and stress testing yield the following:
CPU Profile:
<img width="1512" height="862" alt="Screenshot 2025-10-17 at 10 38 52"
src="https://github.com/user-attachments/assets/f46cf1a2-a0b2-4c02-81a0-069798108ee5"
/>
Mem Profile:
<img width="1512" height="862" alt="Screenshot 2025-10-17 at 10 38 01"
src="https://github.com/user-attachments/assets/45be1235-fff6-45ba-a50d-db9880377bd0"
/>
Predictably, lock contention and memory allocation are the largest
components of this system under stress. Nothing seems untoward.
fixes https://github.com/coder/internal/issues/1026
Thru a (perhaps too-) clever hack of `init()` functions, I've managed to remove the need to separately compile the cleaner binary. This should fix the flakes we are seeing were the binary compilation takes 10s of seconds on macOS. The cleaner is encorporated directly into the test binary and we self-exec as the subprocess.
We're seeing some timeouts from starting the db cleaner, e.g. https://github.com/coder/internal/issues/1026
My suspicion is that in CI the go build cache might not be warm, and so it can take a while to compile and run the dbcleaner subprocess.
This fix builds the cleaner once prior to starting a test run via `make test` to ensure we have a warm cache.
# Canonicalize API Key Scopes
This PR introduces canonical API key scopes with a `coder:` namespace prefix to avoid collisions with low-level resource:action names. It:
1. Renames special API key scopes in the database:
- `all` → `coder:all`
- `application_connect` → `coder:application_connect`
2. Adds support for a new `scopes` field in the API key creation request, allowing multiple scopes to be specified while maintaining backward compatibility with the singular `scope` field.
3. Updates the API documentation to reflect these changes, including the new endpoint for listing public API key scopes.
4. Ensures backward compatibility by mapping between legacy and canonical scope names in relevant code paths.
# Add support for low-level API key scopes
This PR adds support for fine-grained API key scopes based on RBAC resource:action pairs. It includes:
1. A new endpoint `/api/v2/auth/scopes` to list all public low-level API key scopes
2. Generated constants in the SDK for all public scopes
3. Tests to verify scope validation during token creation
4. Updated API documentation to reflect the expanded scope options
The implementation allows users to create API keys with specific permissions like `workspace:read` or `template:use` instead of only the legacy `all` or `application_connect` scopes.
Fixes#19847
# Generate RBAC scope name constants
This PR adds a new generated file `coderd/rbac/scopes_constants_gen.go` that contains typed constants for all RBAC scope names in the format `Scope<Resource><Action>`. For example, `ScopeWorkspaceRead` for the scope "workspace:read".
These constants make it easier to reference specific scopes in code without using string literals, improving type safety and making refactoring easier.
The PR:
- Adds a new template file `scripts/typegen/scopenames.gotmpl`
- Updates the typegen script to support generating scope name constants
- Updates the Makefile to include the new generated file in build targets
Added a script/linter to ensure all `policy.RBACPermissions` entries are part of the `api_key_scope` enumerated in the `coderd/database/dump.sql` file.
Fixes#19846
In order to get `make test` to reliably pass again on our dogfood
workspaces, we're having to resort to setting parallelism.
It also reworks our CI to call the `make test` target, instead of
rolling a different command.
Behavior changes:
* sets 8 packages x 8 tests in parallel by default on `make test`
* by default, removes the `-short` flag. In my testing it makes only a
few seconds difference on ~200s, or 1-2%
* by default, removes the `-count=1` flag that busts Go's test cache.
With a fresh cache and no code changes, `make test` executes in ~15
seconds.
Signed-off-by: Spike Curtis <spike@coder.com>
Fixes https://github.com/coder/internal/issues/907
We convert `workspacesdk.AgentConn` to an interface and generate a mock
for it. This allows writing `coderd` tests that rely on the agent's HTTP
api to not have to set up an entire tailnet networking stack.
Closes https://github.com/coder/internal/issues/884
We're adding this as a `go run` in `lint/go` for now, since adding it to
golangci-lint ourselves involves recompiling golangci-lint and then
running that new binary. I'll look into proposing it being added to the
public golangci-lint linters.
Doesn't appear to cause the lint ci job to take any longer, which is
nice.
### Description
This PR introduces GPG signing for all Coder *slim-binaries*.
Detached signatures will allow users to verify the integrity and
authenticity of the binaries they download.
### Changes
* `scripts/sign_with_gpg.sh`: New script to sign a given binary
using GPG. It imports the release key, signs the binary, and
verifies the signature.
* `scripts/build_go.sh`: Updated to call `sign_with_gpg.sh` when the
`CODER_SIGN_GPG` environment variable is set to 1.
* `.github/workflows/release.yaml`: The` CODER_SIGN_GPG` environment
variable is now set to 1 during the release build, enabling GPG
signing for all release binaries.
* `.github/workflows/ci.yaml`: The `CODER_SIGN_GPG` environment
variable is now set to 1 during the CI build, enabling GPG
signing for all CI binaries.
* `Makefile`: Detached signatures are moved to the `/site/out/bin/
`directory
# Organize Development Documentation into Separate Files
This PR reorganizes the development documentation by splitting the monolithic CLAUDE.md file into multiple focused documents. The main file now provides a concise overview with essential commands and critical patterns, while importing detailed content from specialized guides.
Key improvements:
- Created separate documentation files for specific domains:
- Database development patterns
- OAuth2 implementation guidelines
- Testing best practices
- Troubleshooting common issues
- Development workflows and guidelines
- Restructured the main CLAUDE.md to be more scannable with improved formatting
- Added quick-reference tables for common commands
- Maintained all existing content while making it more accessible
- Highlighted critical patterns that must be followed
This organization makes the documentation more maintainable and easier to navigate, allowing developers to quickly find relevant information for their specific tasks.
This PR introduces failing test retries in CI for e2e tests, Go tests
with the in-memory database, Go tests with Postgres, and the CLI tests.
Retries are not enabled for race tests.
The goal is to reduce how often flakes disrupt developers' workflows.
We replace timestamps in our golden files to keep the values constant.
However, if a non-UTC timezone is used then the timestamp will still be
replaced but the whitespace will be messed up (since it was aligned to
the original value).

Therefore we must force a timezone when generating golden files.
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Updating golden files is an unnecessary extra step in addition to gen
that is easily overlooked, leading to the developer noticing the issue
in CI leading to lost developer time waiting for tests to complete.
Fixes https://github.com/coder/coder/issues/16268
- Adds `/api/v2/workspaceagents/:id/containers` coderd endpoint that allows listing containers
visible to the agent. Optional filtering by labels is supported.
- Adds go tools to the `coder-dylib` CI step so we can generate mocks if needed
Replace Depot build action with Nix for Nix dogfood image builds
The dogfood Nix image is now built using Nix's native container tooling instead of Depot. This change:
- Adds Nix setup steps to the GitHub Actions workflow
- Removes the Dockerfile.nix in favor of a Nix-native container build
- Updates the flake.nix to support building Docker images
- Introduces a hash file to track Nix-related changes
- Updates the vendorHash for Go dependencies
Change-Id: I4e011fe3a19d9a1375fbfd5223c910e59d66a5d9
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Using `go run` inside of a test is fragile, because it means we have to
wait for `go` to compile the binary while also constrained on resources
by the fact that Playwright and coderd are already running. We should
instead compile a coder binary for the current platform before the tests
and use it directly.
We have an effort underway to replace `dbmem` (#15109), and consequently
we've begun running our full test-suite (with Postgres) on all supported
OSs - Windows, MacOS, and Linux, since #15520.
Since this change, we've seen a marked decrease in the success rate of
our builds on `main` (note how the Windows/MacOS failures account for
the vast majority of failed builds):

We're still investigating why these OSs are a lot less reliable. It's
likely that the VMs on which the builds are run have different
characteristics from our Ubuntu runners such as disk I/O, network
latency, or something else.
**In the meantime, we need to start trusting CI failures in `main`
again, as the current failures are too noisy / vague for us to
correct.**
We've also considered hosting our own runners where possible so we can
get OS-level observability to rule out some possibilities.
See the [meeting
notes](https://www.notion.so/coderhq/CI-Investigation-Call-Notes-17dd579be59280d8897cc9fe4bb46695?pvs=6&utm_content=17dd579b-e592-80d8-897c-c9fe4bb46695&utm_campaign=T1ZPT2FL0&n=slack&n=slack_link_unfurl)
where we linked into this for more detail.
This PR introduces several changes:
1. Moves the full test-suite with Postgres on Windows/MacOS to the
`nightly-gauntlet` workflow
tradeoff: this means that any regressions may be more difficult to
discover since we merge to main several times a day
2. Run only the CLI test-suite on each PR / merge to `main` on
Windows/MacOS
3. `test-go` is still running the full test-suite against all OSs
(including the CLI ones), but will soon be removed once #15109 is
completed since it uses `dbmem`
4. Changes `nightly-gauntlet` to run at 4AM: we've seen several
instances of the runner being stopped externally, and we're _guessing_
this may have something to do with the midnight UTC execution time, when
other cron jobs may run
5. Removes the existing `nightly-gauntlet` jobs since they haven't
passed in a long time, indicating that nobody cares enough to fix them
and they don't provide diagnostic value; we can restore them later if
necessary
I've manually run both these new workflows successfully:
- `ci`:
https://github.com/coder/coder/actions/runs/12825874176/job/35764724907
- `nightly-gauntlet`:
https://github.com/coder/coder/actions/runs/12825539092
---------
Signed-off-by: Danny Kopping <danny@coder.com>
Co-authored-by: Muhammad Atif Ali <atif@coder.com>