Commit Graph

40 Commits

Author SHA1 Message Date
Spike Curtis 2ea27e897b chore: split Pubsub interface into Publisher and Subscriber (#24442)
<!--

If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting.

-->

Splits the Pubsub into Publisher and Subscriber interfaces. Allows components to scope down their needs if they only publish or only subscribe. This allows smaller fakes/mocks and generally better encapsulation.
2026-04-17 22:58:33 -04:00
Kyle Carberry 2d7009e50d test: reduce unnecessary sleep durations in tests (#22552)
## Summary

Removes `time.Sleep` calls in two test files by replacing them with
deterministic or event-driven alternatives.

### Changes

**`coderd/provisionerjobs_test.go`** (34.5s → 0.25s)

Replaced `time.Sleep(1500ms)` with a direct SQL `UPDATE` to bump
`created_at` by 2 seconds. The sleep existed purely to ensure different
timestamps for sort-order testing. The fix is deterministic and cannot
flake. Uses `NewDBWithSQLDB` (the test already required real Postgres
via `WithDumpOnFailure`).

**`coderd/database/pubsub/pubsub_test.go`** (2.05s → 1.3s)

Replaced `time.Sleep(1s)` with a `testutil.Eventually` retry loop that
publishes and checks for subscriber receipt. This is the idiomatic
pattern in the codebase. The old sleep waited for pq.Listener to
re-issue LISTEN after reconnect; the new code polls until it actually
works.
2026-03-03 10:19:00 -05:00
Spike Curtis bddb808b25 chore: arrange imports in a standard way (#21452)
Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:

```
import (
	"context"
	"time"

	"github.com/prometheus/client_golang/prometheus"
	"golang.org/x/xerrors"
	"gopkg.in/natefinch/lumberjack.v2"

	"cdr.dev/slog/v3"
	"github.com/coder/coder/v2/codersdk/agentsdk"
	"github.com/coder/serpent"
)
```

3 groups: standard library, 3rd partly libs, Coder libs.

This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
2026-01-08 15:24:11 +04:00
Spike Curtis 49b34a716a fix: fix slog to always use array of Fields (#21426)
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).

It also updates dependencies that also use slog and were updated.

I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.

Other dependencies, I pushed new tags.
2026-01-08 10:29:41 +04:00
Hugo Dutka e62c5db678 chore: remove references to dbtestutil.WillUsePostgres (#20436)
Addresses https://github.com/coder/internal/issues/758.

This PR only cleans up dead code, it makes no changes to test logic.
2025-10-23 14:24:54 +02:00
ケイラ 09cc906981 chore: remove unnecessary redeclarations in for loops (part 2) (#18593) 2025-06-26 12:28:00 -06:00
Hugo Dutka 623dcd97dc fix(cli): fix flakes related to context cancellation when establishing pg connections (#18246)
Since https://github.com/coder/coder/pull/18195 was merged, we started
running CLI tests with postgres instead of just dbmem. This surfaced
errors related to context cancellation while establishing postgres
connections.

This PR should fix https://github.com/coder/internal/issues/672. Related
to https://github.com/coder/coder/issues/15109.
2025-06-05 15:54:13 +02:00
Spike Curtis 6c0bed0f53 chore: update to coder/quartz v0.2.0 (#18007)
Upgrade to coder/quartz v0.2.0 including fixing up a minor API breaking change.
2025-05-27 16:05:03 +04:00
ケイラ f670bc31f5 chore: update testutil chan helpers (#17408) 2025-04-16 10:37:09 -06:00
Spike Curtis 5f3a53f01b fix: fix race condition in pubsub startup (#17088)
fixes https://github.com/coder/internal/issues/525

If the context is canceled, the goroutine that is supposed to read from the `errCh` could exit prematurely, leading to a goroutine leak. Refactors this code so it cannot block.
2025-03-25 16:45:45 +04:00
Yevhenii Shcherbina 98dfc70f31 fix(coderd/database): remove linux build tags from db package (#16633)
Remove linux build tags from database package to make sure we can run
tests on Mac OS.
2025-02-25 11:39:37 -05:00
Mathias Fredriksson b77b5432c6 test(coderd/database/pubsub): ensure db closure on unhappy paths (#16327) 2025-01-29 14:47:38 +00:00
Thomas Kosiewski 1336925c9f feat(flake.nix): switch dogfood dev image to buildNixShellImage from dockerTools (#16223)
Replace Depot build action with Nix for Nix dogfood image builds

The dogfood Nix image is now built using Nix's native container tooling instead of Depot. This change:

- Adds Nix setup steps to the GitHub Actions workflow
- Removes the Dockerfile.nix in favor of a Nix-native container build
- Updates the flake.nix to support building Docker images
- Introduces a hash file to track Nix-related changes
- Updates the vendorHash for Go dependencies

Change-Id: I4e011fe3a19d9a1375fbfd5223c910e59d66a5d9
Signed-off-by: Thomas Kosiewski <tk@coder.com>
2025-01-28 16:38:37 +01:00
Spike Curtis 5861e516b9 chore: add standard test logger ignoring db canceled (#15556)
Refactors our use of `slogtest` to instantiate a "standard logger" across most of our tests.  This standard logger incorporates https://github.com/coder/slog/pull/217 to also ignore database query canceled errors by default, which are a source of low-severity flakes.

Any test that has set non-default `slogtest.Options` is left alone. In particular, `coderdtest` defaults to ignoring all errors. We might consider revisiting that decision now that we have better tools to target the really common flaky Error logs on shutdown.
2024-11-18 14:09:22 +04:00
Hugo Dutka 1bfa7d42e8 chore: add postgres template caching for tests (#15336)
This PR is the first in a series aimed at closing
[#15109](https://github.com/coder/coder/issues/15109).

### Changes

- **Template Database Creation:**  
`dbtestutil.Open` now has the ability to create a template database if
none is provided via `DB_FROM`. The template database’s name is derived
from a hash of the migration files, ensuring that it can be reused
across tests and is automatically updated whenever migrations change.

- **Optimized Database Handling:**  
Previously, `dbtestutil.Open` would spin up a new container for each
test when `DB_FROM` was unset. Now, it first checks for an active
PostgreSQL instance on `localhost:5432`. If none is found, it creates a
single container that remains available for subsequent tests,
eliminating repeated container startups.

These changes address the long individual test times (10+ seconds)
reported by some users, likely due to the time Docker took to start and
complete migrations.
2024-11-04 17:23:31 +01:00
Spike Curtis 005ea536a5 fix: fix Listen/Unlisten race on Pubsub (#15315)
Fixes #15312

When we need to `Unlisten()` for an event, instead of immediately removing the event from the `p.queues`, we store a channel to signal any goroutines trying to Subscribe to the same event when we are done. On `Subscribe`, if the channel is present, wait for it before calling `Listen` to ensure the ordering is correct.
2024-11-01 14:35:26 +04:00
Mathias Fredriksson bd9151d224 fix(coderd/database/pubsub): prevent listeners read outside mutex lock (#15303)
https://github.com/coder/coder/actions/runs/11611105362/job/32331771969#logs

```
2024-10-31T11:36:45.9225038Z WARNING: DATA RACE
2024-10-31T11:36:45.9225120Z Write at 0x00c0000d8030 by goroutine 26:
2024-10-31T11:36:45.9225200Z   runtime.mapdelete()
2024-10-31T11:36:45.9225412Z       /opt/hostedtoolcache/go/1.22.8/x64/src/runtime/map.go:696 +0x0
2024-10-31T11:36:45.9225647Z   github.com/coder/coder/v2/coderd/database/pubsub.(*PGPubsub).subscribeQueue.func2()
2024-10-31T11:36:45.9225906Z       /home/runner/work/coder/coder/coderd/database/pubsub/pubsub.go:277 +0x131
2024-10-31T11:36:45.9225993Z   runtime.deferreturn()
2024-10-31T11:36:45.9226210Z       /opt/hostedtoolcache/go/1.22.8/x64/src/runtime/panic.go:602 +0x5d
2024-10-31T11:36:45.9226283Z   testing.tRunner()
2024-10-31T11:36:45.9226519Z       /opt/hostedtoolcache/go/1.22.8/x64/src/testing/testing.go:1689 +0x21e
2024-10-31T11:36:45.9226603Z   testing.(*T).Run.gowrap1()
2024-10-31T11:36:45.9226831Z       /opt/hostedtoolcache/go/1.22.8/x64/src/testing/testing.go:1742 +0x44
2024-10-31T11:36:45.9226836Z 
2024-10-31T11:36:45.9226934Z Previous read at 0x00c0000d8030 by goroutine 112:
2024-10-31T11:36:45.9227159Z   github.com/coder/coder/v2/coderd/database/pubsub.(*PGPubsub).subscribeQueue.func2()
2024-10-31T11:36:45.9227462Z       /home/runner/work/coder/coder/coderd/database/pubsub/pubsub.go:284 +0x1b6
2024-10-31T11:36:45.9227661Z   github.com/coder/coder/v2/enterprise/replicasync.(*Manager).subscribe.func3()
2024-10-31T11:36:45.9227936Z       /home/runner/work/coder/coder/enterprise/replicasync/replicasync.go:228 +0x53
2024-10-31T11:36:45.9227941Z 
2024-10-31T11:36:45.9228019Z Goroutine 26 (running) created at:
2024-10-31T11:36:45.9228096Z   testing.(*T).Run()
2024-10-31T11:36:45.9228318Z       /opt/hostedtoolcache/go/1.22.8/x64/src/testing/testing.go:1742 +0x825
2024-10-31T11:36:45.9228498Z   github.com/coder/coder/v2/enterprise/replicasync_test.TestReplica()
2024-10-31T11:36:45.9228777Z       /home/runner/work/coder/coder/enterprise/replicasync/replicasync_test.go:33 +0x4b
2024-10-31T11:36:45.9228847Z   testing.tRunner()
2024-10-31T11:36:45.9229063Z       /opt/hostedtoolcache/go/1.22.8/x64/src/testing/testing.go:1689 +0x21e
2024-10-31T11:36:45.9229142Z   testing.(*T).Run.gowrap1()
2024-10-31T11:36:45.9229366Z       /opt/hostedtoolcache/go/1.22.8/x64/src/testing/testing.go:1742 +0x44
2024-10-31T11:36:45.9229369Z 
2024-10-31T11:36:45.9229443Z Goroutine 112 (finished) created at:
2024-10-31T11:36:45.9229685Z   github.com/coder/coder/v2/enterprise/replicasync.(*Manager).subscribe()
2024-10-31T11:36:45.9229952Z       /home/runner/work/coder/coder/enterprise/replicasync/replicasync.go:226 +0x568
2024-10-31T11:36:45.9230092Z   github.com/coder/coder/v2/enterprise/replicasync.New()
2024-10-31T11:36:45.9230361Z       /home/runner/work/coder/coder/enterprise/replicasync/replicasync.go:101 +0x1344
2024-10-31T11:36:45.9230547Z   github.com/coder/coder/v2/enterprise/replicasync_test.TestReplica.func1()
2024-10-31T11:36:45.9230836Z       /home/runner/work/coder/coder/enterprise/replicasync/replicasync_test.go:48 +0x26a
2024-10-31T11:36:45.9230904Z   testing.tRunner()
2024-10-31T11:36:45.9231127Z       /opt/hostedtoolcache/go/1.22.8/x64/src/testing/testing.go:1689 +0x21e
2024-10-31T11:36:45.9231207Z   testing.(*T).Run.gowrap1()
2024-10-31T11:36:45.9231431Z       /opt/hostedtoolcache/go/1.22.8/x64/src/testing/testing.go:1742 +0x44
```
2024-11-01 11:24:29 +04:00
Mathias Fredriksson 14565615be test(coderd/database/pubsub): fix data race in err assignment (#15306) 2024-10-31 22:37:19 +02:00
Ethan b8944074c4 chore: improve coder server ux (#14761) 2024-09-24 13:16:36 +10:00
Garrett Delfosse ded612d3ec fix: use authenticated urls for pubsub (#14261) 2024-08-26 15:04:04 +00:00
Spike Curtis e5268e4551 chore: spin clock library out to coder/quartz repo (#13777)
Code that was in `/clock` has been moved to github.com/coder/quartz.  This PR refactors our use of the clock library to point to the external Quartz repo.
2024-07-03 15:02:54 +04:00
Spike Curtis 8326a3a675 chore: change mock clock to allow Advance() within timer/tick functions (#13500) 2024-06-10 15:27:24 +04:00
Spike Curtis fade8ba759 fix: fix MeasureLatencyRecvTimeout to accept send=0 (#13477)
Fixes the flake seen here: https://github.com/coder/coder/runs/25832852690

Linux is not a real time operating system, and so there is no guarantee that subsequent `time.Now()` `time.Since()` calls will return a non-zero time.  This assert is mainly there to ensure we don't return `-1`.
2024-06-05 18:27:56 +04:00
Spike Curtis 9c3fd5dd26 chore: add explicit Wait() to clock.Advance() (#13464) 2024-06-05 15:37:16 +04:00
Spike Curtis 42324b386a chore: add clock pkg for testing time (#13461)
Adds a package for testing time/timer/ticker functions.  Implementation is limited to `NewTimer` and `NewContextTicker`, but will eventually be expanded to all `time` functions from the standard library as well as `context.WithTimeout()`, `context.WithDeadline()`.

Replaces `benbjohnson/clock` for the pubsub watchdog, as a proof of concept.

Eventually, as we expand functionality, we will replace most time-related functions with this library for testing.
2024-06-05 13:55:45 +04:00
Colin Adler 85de0e966d chore: fix TestMeasureLatency/MeasureLatencyRecvTimeout flake (#13301) 2024-05-16 13:42:42 -05:00
Danny Kopping 4671ebb330 feat: measure pubsub latencies and expose metrics (#13126) 2024-05-10 12:31:49 +00:00
Cian Johnston 5454f4997b chore(ci): clean up databases after test finishes in CI (#12702) 2024-03-21 14:53:16 +00:00
Spike Curtis 51707446d0 fix: stop holding Pubsub mutex while calling pq.Listener (#12518)
fixes #11950

https://github.com/coder/coder/issues/11950#issuecomment-1987756088 explains the bug

We were also calling into `Unlisten()` and `Close()` while holding the mutex.  I don't believe that `Close()` depends on the notification loop being unblocked, but it's hard to be sure, and the safest thing to do is assume it could block.

So, I added a unit test that fakes out `pq.Listener` and sends a bunch of notifies every time we call into it to hopefully prevent regression where we hold the mutex while calling into these functions.

It also removes the use of a `context.Context` to stop the PubSub -- it must be explicitly `Closed()`.  This simplifies a bunch of the logic, and is how we use the pubsub anyway.
2024-03-12 09:44:12 +04:00
Spike Curtis 213ae69bee fix: start timer before subscribing to avoid test race (#12031)
Fixes #12030

This is a good example of the kind of thing I'd like to address with a time-testing lib.  The problem is that there is a race between the watchdog starting it's timer and the test incrementing the time.  What would make this easier is if the time-testing library could wait for and assert the call to start the timer before incrementing the time.
2024-02-06 20:21:23 +04:00
Dean Sheather 98b86f3cd6 chore: add logs to pq notification dialer (#12020) 2024-02-06 15:21:48 +00:00
Spike Curtis e09cd2c6bd feat: add watchdog to pubsub (#12011)
adds a watchdog to our pubsub and runs it for Coder server.

If the watchdog times out, it triggers a graceful exit in `coder server` to give any provisioner jobs a chance to shut down.

c.f. #11950
2024-02-06 16:58:45 +04:00
Colin Adler c7f52b73bb feat(coderd): add prometheus metrics to servertailnet (#11988) 2024-02-05 23:57:18 -06:00
Spike Curtis d5a98cc6d7 fix: avoid race in TestPGPubsub_Metrics by using Eventually (#11973)
Annoyingly, prometheus Registry collects metrics async, which is causing our test to be racy.  They also don't export enough from the Metric interface for us to replicate a synchronous collect, so we have to use Eventually to test.
2024-02-01 12:10:19 +04:00
Spike Curtis 5a359d50dd feat: add metrics to PGPubsub (#11971)
Adds prometheus metrics to PGPubsub for monitoring its health and performance in production.

Related to #11950 --- additional diagnostics to help figure out what's happening
2024-02-01 11:25:03 +04:00
Spike Curtis a34cada09a feat: add logging to pgPubsub (#11953)
Should be helpful for #11950

Adds a logger to pgPubsub and logs various events, most especially connection and disconnection from postgres.
2024-01-31 15:49:16 +04:00
Spike Curtis 387723a596 fix: close pg PubSub listener to avoid race (#11640)
Fixes flake as seen here: https://github.com/coder/coder/runs/20528529187
2024-01-18 09:18:59 +04:00
Kyle Carberry 22e781eced chore: add /v2 to import module path (#9072)
* chore: add /v2 to import module path

go mod requires semantic versioning with versions greater than 1.x

This was a mechanical update by running:
```
go install github.com/marwan-at-work/mod/cmd/mod@latest
mod upgrade
```

Migrate generated files to import /v2

* Fix gen
2023-08-18 18:55:43 +00:00
Spike Curtis b4057bd74a feat: make pgCoordinator generally available (#8419)
* pgCoord to GA, fix tests

Signed-off-by: Spike Curtis <spike@coder.com>

* Fix generation and coordinator delete RBAC

Signed-off-by: Spike Curtis <spike@coder.com>

* Fix fakeQuerier -> FakeQuerier

Signed-off-by: Spike Curtis <spike@coder.com>

---------

Signed-off-by: Spike Curtis <spike@coder.com>
2023-07-12 13:35:29 +04:00
Kyle Carberry e4b6f5695b chore: separate pubsub into a new package (#8017)
* chore: rename store to dbmock for consistency

* chore: remove redundant dbtype package

This wasn't necessary and forked how we do DB types.

* chore: separate pubsub into a new package

This didn't need to be in database and was bloating it.
2023-06-14 15:34:54 +00:00