Commit Graph

40 Commits

Author SHA1 Message Date
Hugo Dutka e62c5db678 chore: remove references to dbtestutil.WillUsePostgres (#20436)
Addresses https://github.com/coder/internal/issues/758.

This PR only cleans up dead code, it makes no changes to test logic.
2025-10-23 14:24:54 +02:00
Ethan 50704a5014 ci: improve 'tfail in goroutine' ruleguard rule (#19682)
This PR improves the ruleguard rule for detecting `t.Fail` calls in goroutines. It picks up additional violations, of which are fixed in this PR.
See self-review for details.

The motivation for fixing this comes from a flake I fixed in https://github.com/coder/coder/pull/19599, where tests would fail from a `require` in an `Eventually`.
2025-09-04 14:28:29 +10:00
Spike Curtis 6c0bed0f53 chore: update to coder/quartz v0.2.0 (#18007)
Upgrade to coder/quartz v0.2.0 including fixing up a minor API breaking change.
2025-05-27 16:05:03 +04:00
Spike Curtis 345435a04c feat: modify coordinators to send errors and peers to log them (#17467)
Adds support to our coordinator implementations to send Error updates before disconnecting clients.

I was recently debugging a connection issue where the client was getting repeatedly disconnected from the Coordinator, but since we never send any error information it was really hard without server logs.

This PR aims to correct that, by sending a CoordinateResponse with `Error` set in cases where we disconnect a client without them asking us to.

It also logs the error whenever we get one in the client controller.
2025-04-21 11:40:56 +04:00
ケイラ f670bc31f5 chore: update testutil chan helpers (#17408) 2025-04-16 10:37:09 -06:00
Mathias Fredriksson c069563af1 test: fix use of t.Logf where t.Log would suffice (#16328) 2025-01-29 14:35:04 +00:00
Cian Johnston 7b88776403 chore(testutil): add testutil.GoleakOptions (#16070)
- Adds `testutil.GoleakOptions` and consolidates existing options to
this location
- Pre-emptively adds required ignore for this Dependabot PR to pass CI
https://github.com/coder/coder/pull/16066
2025-01-08 15:38:37 +00:00
Spike Curtis 63572d9f53 fix: loosen timing checks for heartbeats (#15923)
Fixes #15782.

I believe that Windows doesn't always have high-resolution timers available, so this PR loosens the check for PG Coordinator heartbeats, to avoid flakes like:

https://github.com/coder/coder/actions/runs/12397381823/job/34607639048
2024-12-19 13:49:01 +04:00
Hugo Dutka 83c493e832 chore: fix more flaky tests on Windows with Postgres (#15629)
Addresses the following flakes:

- https://github.com/coder/internal/issues/222
- https://github.com/coder/internal/issues/223
- https://github.com/coder/internal/issues/224
- https://github.com/coder/internal/issues/225
- https://github.com/coder/internal/issues/226
- https://github.com/coder/internal/issues/227
- https://github.com/coder/internal/issues/228
- https://github.com/coder/internal/issues/229
- https://github.com/coder/internal/issues/230
2024-11-26 11:56:07 +01:00
Spike Curtis 5861e516b9 chore: add standard test logger ignoring db canceled (#15556)
Refactors our use of `slogtest` to instantiate a "standard logger" across most of our tests.  This standard logger incorporates https://github.com/coder/slog/pull/217 to also ignore database query canceled errors by default, which are a source of low-severity flakes.

Any test that has set non-default `slogtest.Options` is left alone. In particular, `coderdtest` defaults to ignoring all errors. We might consider revisiting that decision now that we have better tools to target the really common flaky Error logs on shutdown.
2024-11-18 14:09:22 +04:00
Hugo Dutka 1bfa7d42e8 chore: add postgres template caching for tests (#15336)
This PR is the first in a series aimed at closing
[#15109](https://github.com/coder/coder/issues/15109).

### Changes

- **Template Database Creation:**  
`dbtestutil.Open` now has the ability to create a template database if
none is provided via `DB_FROM`. The template database’s name is derived
from a hash of the migration files, ensuring that it can be reused
across tests and is automatically updated whenever migrations change.

- **Optimized Database Handling:**  
Previously, `dbtestutil.Open` would spin up a new container for each
test when `DB_FROM` was unset. Now, it first checks for an active
PostgreSQL instance on `localhost:5432`. If none is found, it creates a
single container that remains available for subsequent tests,
eliminating repeated container startups.

These changes address the long individual test times (10+ seconds)
reported by some users, likely due to the time Docker took to start and
complete migrations.
2024-11-04 17:23:31 +01:00
Ethan b1298a3c1e feat: add WorkspaceUpdates tailnet RPC (#14847)
Closes #14716
Closes #14717

Adds a new user-scoped tailnet API endpoint (`api/v2/tailnet`) with a new RPC stream for receiving updates on workspaces owned by a specific user, as defined in #14716. 

When a stream is started, the `WorkspaceUpdatesProvider` will begin listening on the user-scoped pubsub events implemented in #14964. When a relevant event type is seen (such as a workspace state transition), the provider will query the DB for all the workspaces (and agents) owned by the user. This gets compared against the result of the previous query to produce a set of workspace updates. 

Workspace updates can be requested for any user ID, however only workspaces the authorised user is permitted to `ActionRead` will have their updates streamed.
Opening a tunnel to an agent requires that the user can perform `ActionSSH` against the workspace containing it.
2024-11-01 14:53:53 +11:00
Spike Curtis 7d9f5ab81d chore: add Coder service prefix to tailnet (#14943)
re: #14715

This PR introduces the Coder service prefix: `fd60:627a:a42b::/48` and refactors our existing code as calling the Tailscale service prefix explicitly (rather than implicitly).

Removes the unused `Addresses` agent option. All clients today assume they can compute the Agent's IP address based on its UUID, so an agent started with a custom address would break things.
2024-10-04 10:04:10 +04:00
Spike Curtis d6154c4310 chore: remove tailnet v1 API support (#14641)
Drops support for v1 of the tailnet API, which was the original coordination protocol where we only sent node updates, never marked them lost or disconnected.

v2 of the tailnet API went GA for CLI clients in Coder 2.8.0, so clients older than that would stop working.
2024-09-12 07:56:31 +04:00
Spike Curtis fb3523b37f chore: remove legacy AgentIP address (#14640)
Removes the support for the Agent's "legacy IP" which was a hardcoded IP address all agents used to use, before we introduced "single tailnet". Single tailnet went GA in 2.7.0.
2024-09-12 07:40:19 +04:00
Jon Ayers 4fc047954e fix: avoid deleting peers on graceful close (#14165)
* fix: avoid deleting peers on graceful close

- Fixes an issue where a coordinator deletes all
  its peers on shutdown. This can cause disconnects
  whenever a coderd is redeployed.
2024-08-14 15:16:08 -04:00
Spike Curtis e5268e4551 chore: spin clock library out to coder/quartz repo (#13777)
Code that was in `/clock` has been moved to github.com/coder/quartz.  This PR refactors our use of the clock library to point to the external Quartz repo.
2024-07-03 15:02:54 +04:00
Spike Curtis ce7f13c6c3 fix: fix TestPGCoordinatorSingle_MissedHeartbeats flake (#13686) 2024-06-27 19:17:24 +04:00
Spike Curtis 8326a3a675 chore: change mock clock to allow Advance() within timer/tick functions (#13500) 2024-06-10 15:27:24 +04:00
Spike Curtis a0962ba089 fix: wait for PGCoordinator to clean up db state (#13351)
c.f. https://github.com/coder/coder/pull/13192#issuecomment-2097657692

We need to wait for PGCoordinator to finish its work before returning on `Close()`, so that we delete database state (best effort -- if this fails others will filter it out based on heartbeats).
2024-05-24 12:01:03 +04:00
Colin Adler 205c43da99 fix(enterprise): mark nodes from unhealthy coordinators as lost (#13123)
Instead of removing the mappings of unhealthy coordinators entirely,
mark them as lost instead. This prevents peers from disappearing from
other peers if a coordinator misses a heartbeat.
2024-05-03 14:07:29 -05:00
Colin Adler 777dfbe965 feat(enterprise): add ready for handshake support to pgcoord (#12935) 2024-04-16 15:01:10 -05:00
Colin Adler 4d5a7b2d56 chore(codersdk): move all tailscale imports out of codersdk (#12735)
Currently, importing `codersdk` just to interact with the API requires
importing tailscale, which causes builds to fail unless manually using
our fork.
2024-03-26 12:44:31 -05:00
Colin Adler e5d911462f fix(tailnet): enforce valid agent and client addresses (#12197)
This adds the ability for `TunnelAuth` to also authorize incoming wireguard node IPs, preventing agents from reporting anything other than their static IP generated from the agent ID.
2024-03-01 09:02:33 -06:00
Spike Curtis f01cab9894 feat: use tailnet v2 API for coordination (#11638)
This one is huge, and I'm sorry.

The problem is that once I change `tailnet.Conn` to start doing v2 behavior, I kind of have to change it everywhere, including in CoderSDK (CLI), the agent, wsproxy, and ServerTailnet.

There is still a bit more cleanup to do, and I need to add code so that when we lose connection to the Coordinator, we mark all peers as LOST, but that will be in a separate PR since this is big enough!
2024-01-22 11:07:50 +04:00
Steven Masley dd05a6b13a chore: mockgen archived, moved to new location (#11415)
* chore: mockgen archived, moved to new location
2024-01-04 18:35:56 -06:00
Spike Curtis f2606a78dd fix: avoid converting nil node
fixes: #11276
2023-12-19 13:38:15 +04:00
Spike Curtis ad3fed72bc chore: rename Coordinator to CoordinatorV1 (#11222)
Renames the tailnet.Coordinator to represent both v1 and v2 APIs, so that we can use this interface for the main atomic pointer.

Part of #10532
2023-12-15 11:38:12 +04:00
Spike Curtis 2c86d0bed0 feat: support v2 Tailnet API in AGPL coordinator (#11010)
Fixes #10529
2023-12-06 15:04:28 +04:00
Spike Curtis 612e67a53b feat: add cleanup of lost tailnet peers and tunnels to PGCoordinator (#10939)
Adds the "lost" peer cleanup queries to PGCoordinator, including tests.
2023-12-01 10:13:29 +04:00
Spike Curtis 0cab6e7763 feat: support graceful disconnect in PGCoordinator (#10937)
Adds support for graceful disconnect to PGCoordinator.  When peers gracefully disconnect, they send a disconnect message.  This triggers the peer to be disconnected from all tunneled peers.

The Multi-Agent Client supports graceful disconnect, since it is in memory and we know that when it is closed, we really mean to disconnect.

The v1 agent and client Websocket connections do not support graceful disconnect, since the v1 protocol doesn't have this feature.  That means that if a v1 peer connects to a v2 peer, when the v1 peer's coordinator connection is closed, the v2 peer will
see it as "lost" since we don't know whether the v1 peer meant to disconnect, or it just lost connectivity to the coordinator.
2023-12-01 09:55:25 +04:00
Spike Curtis 5c48cb4447 feat: modify PG Coordinator to work with new v2 Tailnet API (#10573)
re: #10528

Refactors PG Coordinator to work with the Tailnet v2 API, including wrappers for the existing v1 API.

The debug endpoint functions, but doesn't return sensible data, that will be in another stacked PR.
2023-11-20 14:31:04 +04:00
Spike Curtis fbabb43cbb fix: ignore spurious node updates while waiting for errors (#10175)
fixes #9921
2023-10-11 09:22:20 +04:00
Colin Adler c900b5f8df feat: add single tailnet support to pgcoord (#9351) 2023-09-21 14:30:48 -05:00
Kyle Carberry 22e781eced chore: add /v2 to import module path (#9072)
* chore: add /v2 to import module path

go mod requires semantic versioning with versions greater than 1.x

This was a mechanical update by running:
```
go install github.com/marwan-at-work/mod/cmd/mod@latest
mod upgrade
```

Migrate generated files to import /v2

* Fix gen
2023-08-18 18:55:43 +00:00
Spike Curtis 2f46f2315c fix: fix race in PGCoord at startup (#9144)
Signed-off-by: Spike Curtis <spike@coder.com>
2023-08-18 09:53:03 +04:00
Spike Curtis c7a6d626b4 fix: make PGCoordinator close connections when unhealthy (#9125)
Signed-off-by: Spike Curtis <spike@coder.com>
2023-08-17 09:36:47 +04:00
Spike Curtis c0a01ec81c fix: fix TestPGCoordinatorDual_Mainline flake (#8228)
* fix TestPGCoordinatorDual_Mainline flake

Signed-off-by: Spike Curtis <spike@coder.com>

* use slices.Contains instead of local function

Signed-off-by: Spike Curtis <spike@coder.com>

---------

Signed-off-by: Spike Curtis <spike@coder.com>
2023-06-28 11:37:45 +04:00
Spike Curtis 5d48122f12 fix: fix PG Coordinator to update when heartbeats (re)start (#8178)
* fix: fix PG Coordinator to update when heartbeats (re)start

Signed-off-by: Spike Curtis <spike@coder.com>

* rename resetExpiryTimer(WithLock)

Signed-off-by: Spike Curtis <spike@coder.com>

---------

Signed-off-by: Spike Curtis <spike@coder.com>
2023-06-23 10:38:58 +00:00
Spike Curtis cc17d2feea refactor: add postgres tailnet coordinator (#8044)
* postgres tailnet coordinator

Signed-off-by: Spike Curtis <spike@coder.com>

* Fix db migration; tests

Signed-off-by: Spike Curtis <spike@coder.com>

* Add fixture, regenerate

Signed-off-by: Spike Curtis <spike@coder.com>

* Fix fixtures

Signed-off-by: Spike Curtis <spike@coder.com>

* review comments, run clean gen

Signed-off-by: Spike Curtis <spike@coder.com>

* Rename waitForConn -> cleanupConn

Signed-off-by: Spike Curtis <spike@coder.com>

* code review updates

Signed-off-by: Spike Curtis <spike@coder.com>

* db migration order

Signed-off-by: Spike Curtis <spike@coder.com>

* fix log field name last_heartbeat

Signed-off-by: Spike Curtis <spike@coder.com>

* fix heartbeat_from log field

Signed-off-by: Spike Curtis <spike@coder.com>

* fix slog fields for linting

Signed-off-by: Spike Curtis <spike@coder.com>

---------

Signed-off-by: Spike Curtis <spike@coder.com>
2023-06-21 16:20:58 +04:00