related to: https://github.com/coder/internal/issues/1387
Sometimes our tests using DERP flake like the above, but we have no information at the DERP layer about why. This enables verbose DERP logging in CI.
Logs are kinda spammy, but most tests do not force DERP and so only use it for discovery (disco).
A test like TestWorkspaceAgentListeningPorts/OK_BlockDirect drops around 50 DERP packets e.g.
```
t.go:111: 2026-03-13 15:20:52.201 [debu] agent.net.tailnet.net.wgengine: magicsock: got derp-999 packet: "\x04\x00\x00\x00C}\xe0\xb9\x00\x00\x00\x00\x00\x00\x00\x00}\xae;l\xd9Hk8\xa8l\x1cK\xedO{狦)\x18NIw\xc4k\xd2-\x19\xbf\xfb\xdd\x17\xa9b\xac\xfd#\xf7\xcaC\xbe\vq(u=\xa7\x16\xe9\x9aLjS\x1fXL\x19y\xf4\x1dE%\xb3\xff\x9d:8\xa9\x95X\xfe\xf8\x95\x7f\x9dv$\f\xf4\xbe"
t.go:111: 2026-03-13 15:20:52.201 [debu] agent.net.tailnet.net.wgengine: magicsock: got derp-999 packet: "\x04\x00\x00\x00C}\xe0\xb9\x01\x00\x00\x00\x00\x00\x00\x00\x05x\xb6RD;\xb4\x80\xb6\x0f\xf6KŠc\xfb1\xbd\x06\xb70K3\x97`\x8d\xd2\x14\xed\xc5\xd6\xc6\xcaV\xbf\x878\xb2Ƥ\xf0\xd5\xf7\xc0\x1b\x9f\x04Y\x03\x17\xd4\x06\xee\xb2G\r){\x9f\xde\xe0(\xb5N\xfejR_\xf6q\xa4\xfaT\x9a\xd8\xcbk\xba\x16K"
t.go:111: 2026-03-13 15:20:52.201 [debu] agent.net.tailnet.net.wgengine: magicsock: got derp-999 packet: "\x04\x00\x00\x00C}\xe0\xb9\x02\x00\x00\x00\x00\x00\x00\x00\x1e\x93a\x15\xfev\x81'\xa9?\xe8nR\xce<\x91\x86\xcau@\xb9\xcfɩ\xef\xd1眓\x95\xf3*X^7\x99\x88\xb0|\x8cS\xe4@[\x16\xda\xca\xd4\xd9\x1dP\xd0\xfe\xd9r\x8c\xfcp~dP\xfaK\xe0\xf9y\xb2\x11\x15\xfe\xdcx鷽\xdeF\xf7\x92\xe8"
```
Follow-up to #22705 (pre-commit/pre-push hooks).
Unifies `test` and `test-race` into the same structure and lets CI call
`make test-race` instead of reproducing the gotestsum command.
**Parallelism**: Extracted from `GOTEST_FLAGS` into
`TEST_PARALLEL_PACKAGES`
/ `TEST_PARALLEL_TESTS` (default 8x8). `test-race` overrides to 4x4 via
target-specific Make variables. `TEST_NUM_PARALLEL_PACKAGES` and
`TEST_NUM_PARALLEL_TESTS` env vars continue to work for both targets.
**GOTEST_FLAGS**: Changed from simply-expanded (`:=`) to
recursively-expanded
(`=`) so target-specific overrides take effect at recipe time.
**CI**: `.github/actions/test-go-pg/action.yaml` now calls `make
test-race`
/ `make test` instead of hand-rolling the gotestsum command, eliminating
drift between local and CI configurations.
Refs #22705
setup-go has been sporadically failing to download Go, and we were advised
by a member of the Go team that downloading Go from `storage.googleapis.com`
is not guaranteed (which is what setup-go <= v5.6.0 does).
Also remove the use-preinstalled-go optimization for Windows runners.
setup-go v6 sets GOTOOLCHAIN=local, which prevents the pre-installed
Go from auto-downloading the toolchain specified in go.mod. The windows
optimization with v5 relied on GOTOOLCHAIN=auto. setup-go uses the runner
cache, which is a different caching path but should serve the same purpose.
Description:
This PR updates the bundled Terraform binary and related version pins
from 1.14.1 to 1.14.5 (base image, installer fallback, and CI/test
fixtures). Terraform is statically built with an embedded Go runtime.
Moving to 1.14.5 updates the embedded toolchain and is intended to
address Go stdlib CVEs reported by security scanning.
Notes:
- Change is version-only; no functional Coder logic changes.
- Backport-friendly: intended to be cherry-picked to release branches
after merge.
macOS runners lack GNU toolchain dependencies (bash 4+, GNU getopt, make
4+) required by `scripts/lib.sh`. When any script sources `lib.sh`, it
checks for these dependencies and fails if they're missing.
This caused consistent failures in the `test-go-pg (macos-latest)` job
in `nightly-gauntlet.yaml`, which didn't have the GNU tools setup that
`ci.yaml` had. Commit 9a417df ("ci: add retry logic for Go module
operations") added a macOS GNU tools step to `ci.yaml`, but
`nightly-gauntlet.yaml` was not updated.
This PR adds a reusable `setup-gnu-tools` action and uses it
consistently across all workflows with macOS jobs, replacing the inline
brew install steps.
Closes https://github.com/coder/internal/issues/1133
## Description
Add exponential backoff retries to all `go install` and `go mod
download` commands across CI workflows and actions.
## Why
Fixes
[coder/internal#1276](https://github.com/coder/internal/issues/1276) -
CI fails when `sum.golang.org` returns 500 errors during Go module
verification. This is an infrastructure-level flake that can't be
controlled.
## Changes
- Created `.github/scripts/retry.sh` - reusable retry helper with
exponential backoff (2s, 4s, 8s delays, max 3 attempts), using
`scripts/lib.sh` helpers
- Wrapped all `go install` and `go mod download` commands with retry in:
- `.github/actions/setup-go/action.yaml`
- `.github/actions/setup-sqlc/action.yaml`
- `.github/actions/setup-go-tools/action.yaml`
- `.github/workflows/ci.yaml`
- `.github/workflows/release.yaml`
- `.github/workflows/security.yaml`
- Added GNU tools setup (bash 4+, GNU getopt, make 4+) for macOS in
`test-go-pg` job, since `retry.sh` uses `lib.sh` which requires these
tools
Go 1.24 adds [tool
dependencies](https://go.dev/doc/modules/managing-dependencies#tools).
This allows us to track versions of tools in our `go.mod` instead of
sprinkling various `go run` commands throughout our codebase.
NOTE: there are still various hard-coded `go install` commands in our
dogfood Dockerfile. As that list is likely severely outdated, will leave
that for a separate PR.
# Update Node.js from 20.19.4 to 22.19.0
This PR updates Node.js from v20.19.4 to v22.19.0 across the codebase. The change includes:
- Updated Node.js version in GitHub Actions setup-node workflow
- Updated Node.js version in the dogfood Dockerfile
- Changed from `pkgs.nodejs_20` to `unstablePkgs.nodejs_22` in the Nix flake
- Updated the Node.js engine version constraints in package.json files to allow Node.js 22
- Updated Playwright from v1.47.0 to v1.50.1
- Updated tzdata dependency from v1.0.44 to v1.0.46
- Updated the flake.lock file with latest nixpkgs references
The PR also improves the error message for Playwright version mismatches by showing the actual versions in the error.
Updates Terraform from 1.11.4 to 1.12.2 across all relevant files.
Changes include:
- GitHub Actions setup-tf configuration
- Dockerfile configurations (dogfood and base)
- Install script
- Provisioner install.go with version constants
- Test data files (tfstate.json, tfplan.json, version.txt)
Follows the same pattern as PR #17323 which updated to 1.11.4.
Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>
Co-authored-by: sreya <4856196+sreya@users.noreply.github.com>
Updates all Go version references in the codebase to use Go 1.24.4.
## Changes
- Update `go.mod` to use Go 1.24.4
- Update `dogfood/coder/Dockerfile` GO_VERSION to 1.24.4
- Update `.github/actions/setup-go/action.yaml` default version to
1.24.4
- Update `examples/parameters-dynamic-options/variables.yml` to use
golang:1.24
## Testing
- ✅ All Go version references are consistent (verified with
`scripts/check_go_versions.sh`)
- ✅ Build tested successfully with Go 1.24.4
- ✅ Binary runs correctly
Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>
Co-authored-by: sreya <4856196+sreya@users.noreply.github.com>
This PR starts running test-go-pg on macOS and Windows in regular CI.
Previously this suite was only run in the nightly gauntlet for 2
reasons:
- it was flaky
- it was slow (took 17 minutes)
We've since stabilized the flakiness by switching to depot runners,
using ram disks, optimizing the number of tests run in parallel, and
automatically re-running failing tests. We've also [brought
down](https://github.com/coder/coder/pull/17756) the time to run the
suite to 9 minutes. Additionally, this PR allows test-go-pg to use cache
from previous runs, which speeds it up further. The cache is only used
on PRs, `main` will still run tests without it.
This PR also:
- removes the nightly gauntlet since all tests now run in regular CI
- removes the `test-cli` job for the same reason
- removes the `setup-imdisk` action which is now fully replaced by
[coder/setup-ramdisk-action](https://github.com/coder/setup-ramdisk-action)
- makes 2 minor changes which could be separate PRs, but I rolled them
into this because they were helpful when iterating on it:
- replace the `if: always()` condition on the `gen` job with a `if: ${{
!cancelled() }}` to allow the job to be cancelled. Previously the job
would run to completion even if the entire workflow was cancelled. See
[the GitHub
docs](https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/evaluate-expressions-in-workflows-and-actions#always)
for more details.
- disable the recently added `TestReinitializeAgent` since it does not
pass on Windows with Postgres. There's an open issue to fix it:
https://github.com/coder/internal/issues/642
This PR will:
- unblock https://github.com/coder/coder/issues/15109
- alleviate https://github.com/coder/internal/issues/647
I tested caching by temporarily enabling cache upload on this PR: here's
[a
run](https://github.com/coder/coder/actions/runs/15119046903/job/42496939341?pr=17853#step:13:1296)
showing cache being used.
This PR speeds up the "Upload tests to datadog" step by downloading the
`datadog-ci` binary directly from GitHub releases. Most of the time used
to be spent in `npm install`, which consistently timed out on Windows
after a minute. [Now it takes 3
seconds](https://github.com/coder/coder/actions/runs/14834976784/job/41644230049?pr=17668#step:10:1).
I updated it to version v2.48.0 because v2.21.0 didn't have the
artifacts for arm64 macOS.
This PR focuses on optimizing go-test CI times on Windows. It:
- backs the `$RUNNER_TEMP` directory with a RAM disk. This directory is
used by actions like cache, setup-go, and setup-terraform as a staging
area
- backs `GOCACHE`, `GOMODCACHE`, and `GOPATH` with a RAM disk
- backs `$GITHUB_WORKSPACE` with a RAM disk - that's where the
repository is checked out
- uses preinstalled Go on Windows runners
- starts using the depot Windows runner
From what I've seen, these changes bring test times down to be on par
with Linux and macOS. The biggest improvement comes from backing
frequently accessed paths with RAM disks. The C drive is surprisingly
slow - I ran some performance tests with
[fio](https://fio.readthedocs.io/en/latest/fio_doc.html#) where I tested
IOPS on many small files, and the RAM disk was 100x faster.
Additionally, the depot runners seem to have more consistent performance
than the ones provided by GitHub.
Addresses https://github.com/coder/internal/issues/322.
This PR starts caching Terraform providers used by `TestProvision` in
`provisioner/terraform/provision_test.go`. The goal is to improve the
reliability of this test by cutting down on the number of network calls
to external services. It leverages GitHub Actions cache, which [on depot
runners is persisted for 14 days by
default](https://depot.dev/docs/github-actions/overview#cache-retention-policy).
Other than the aforementioned `TestProvision`, I couldn't find any other
tests which depend on external terraform providers.
I think using an older version of mockgen on the schmoder CI broke the
workflow, so I'm gonna sync it via this action, like we do with the
other `make build` dependencies.
this updates `go` to the latest stable patch version `1.24.2` in:
- `go.mod`
- `dogfood/coder/Dockerfile`
- `.github/actions/setup-go/action.yaml`
- `flake.nix`
written with the assistance of ClaudeCode.
---------
Co-authored-by: Thomas Kosiewski <tk@coder.com>
- Update go.mod to use Go 1.24.1
- Update GitHub Actions setup-go action to use Go 1.24.1
- Fix linting issues with golangci-lint by:
- Updating to golangci-lint v1.57.1 (more compatible with Go 1.24.1)
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
This pull request adds new GitHub Actions for installing `cosign` and
`syft`, and updates the CI, release, and security workflows.
**New Actions:**
- [`install-cosign`](.github/actions/install-cosign/action.yaml):
Installs `cosign` with a configurable version.
- [`install-syft`](.github/actions/install-syft/action.yaml): Installs
`syft` with a configurable version.
**Workflow Updates:**
- CI, release, and security workflows now use `install-cosign` and
`install-syft`.
Replace Depot build action with Nix for Nix dogfood image builds
The dogfood Nix image is now built using Nix's native container tooling instead of Depot. This change:
- Adds Nix setup steps to the GitHub Actions workflow
- Removes the Dockerfile.nix in favor of a Nix-native container build
- Updates the flake.nix to support building Docker images
- Introduces a hash file to track Nix-related changes
- Updates the vendorHash for Go dependencies
Change-Id: I4e011fe3a19d9a1375fbfd5223c910e59d66a5d9
Signed-off-by: Thomas Kosiewski <tk@coder.com>
This PR is the second in a series aimed at closing
https://github.com/coder/coder/issues/15109.
## Changes
- adds `scripts/embedded-pg/main.go`, which can start a native Postgres
database. This is used to set up PG on Windows and macOS, as these
platforms don't support Docker in Github Actions.
- runs the `test-go-pg` job on macOS and Windows too
- adds the `test-go-race-go` job, which runs race tests with Postgres on
Linux
Use specific commit SHAs for GitHub actions across various workflows to
enhance reliability and reproducibility. This change ensures that
actions run against a known version, reducing the risk of unexpected
issues due to updates in the third-party action repositories.
This contributes to improving the score in #14879