coder

mirror of https://github.com/coder/coder.git synced 2026-06-04 13:38:21 +00:00

Author	SHA1	Message	Date
Spike Curtis	06e396188f	test: subscribe to heartbeats synchronously on PGCoord startup (#21746 ) fixes: https://github.com/coder/internal/issues/1304 Subscribe to heartbeats synchronously on startup of PGCoordinator. This ensures tests that send heartbeats don't race with this subscription.	2026-01-29 13:34:34 +04:00
Spike Curtis	bddb808b25	chore: arrange imports in a standard way (#21452 ) Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example: ``` import ( "context" "time" "github.com/prometheus/client_golang/prometheus" "golang.org/x/xerrors" "gopkg.in/natefinch/lumberjack.v2" "cdr.dev/slog/v3" "github.com/coder/coder/v2/codersdk/agentsdk" "github.com/coder/serpent" ) ``` 3 groups: standard library, 3rd partly libs, Coder libs. This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.	2026-01-08 15:24:11 +04:00
Spike Curtis	49b34a716a	fix: fix slog to always use array of Fields (#21426 ) Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder). It also updates dependencies that also use slog and were updated. I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule. Other dependencies, I pushed new tags.	2026-01-08 10:29:41 +04:00
Spike Curtis	05b037bdea	fix: avoid deadlock race writing to a disconnected mapper (#20303 ) fixes https://github.com/coder/internal/issues/1045 Fixes a race condition in our PG Coordinator when a peer disconnects. We issue database queries to find the peer mappings (node structures for each peer connected via a tunnel), and then send these to the "mapper" that generates diffs and eventually writes the update to the websocket. Before this change we erroneously used the querier's context for this update, which has the same lifetime as the coordinator itself. If the peer has disconnected, the mapper might not be reading from its channel, and this causes a deadlock in a querier worker. This also prevents us from doing any more work on the peer. I also added some more debug logging that would have been helpful when tracking this down.	2025-10-15 15:56:07 +04:00
ケイラ	caeff49aba	chore: refactor roles to support multiple permission sets scoped by org id (#20186 ) In preparation for adding the "member" permission level, which will also be grouped by org ID, do a bit of a refactor to make room for it and the existing "org" level to live in the same `map`	2025-10-09 11:08:34 -06:00
Spike Curtis	04dfda8a0e	fix: change enqueue error to debug log level (#19686 ) fixes https://github.com/coder/internal/issues/958 Logging was being done at error level, but most likely any errors are from simple races between an update triggered around the same time as a client disconnecting. Debug is fine for these.	2025-09-03 13:42:02 +04:00
Spike Curtis	345435a04c	feat: modify coordinators to send errors and peers to log them (#17467 ) Adds support to our coordinator implementations to send Error updates before disconnecting clients. I was recently debugging a connection issue where the client was getting repeatedly disconnected from the Coordinator, but since we never send any error information it was really hard without server logs. This PR aims to correct that, by sending a CoordinateResponse with `Error` set in cases where we disconnect a client without them asking us to. It also logs the error whenever we get one in the client controller.	2025-04-21 11:40:56 +04:00
Dean Sheather	fbe2fa66f5	chore: add test for coord rolling restart (#14680 ) Closes https://github.com/coder/team-coconut/issues/50 --------- Co-authored-by: Ethan Dickson <ethan@coder.com>	2024-11-20 18:04:33 +11:00
Spike Curtis	8c00ebc6ee	chore: refactor ServerTailnet to use tailnet.Controllers (#15408 ) chore of #14729 Refactors the `ServerTailnet` to use `tailnet.Controller` so that we reuse logic around reconnection and handling control messages, instead of reimplementing. This unifies our "client" use of the tailscale API across CLI, coderd, and wsproxy.	2024-11-08 13:18:56 +04:00
Spike Curtis	d6154c4310	chore: remove tailnet v1 API support (#14641 ) Drops support for v1 of the tailnet API, which was the original coordination protocol where we only sent node updates, never marked them lost or disconnected. v2 of the tailnet API went GA for CLI clients in Coder 2.8.0, so clients older than that would stop working.	2024-09-12 07:56:31 +04:00
Jon Ayers	4fc047954e	fix: avoid deleting peers on graceful close (#14165 ) * fix: avoid deleting peers on graceful close - Fixes an issue where a coordinator deletes all its peers on shutdown. This can cause disconnects whenever a coderd is redeployed.	2024-08-14 15:16:08 -04:00
Spike Curtis	e5268e4551	chore: spin clock library out to coder/quartz repo (#13777 ) Code that was in `/clock` has been moved to github.com/coder/quartz. This PR refactors our use of the clock library to point to the external Quartz repo.	2024-07-03 15:02:54 +04:00
Spike Curtis	ce7f13c6c3	fix: fix TestPGCoordinatorSingle_MissedHeartbeats flake (#13686 )	2024-06-27 19:17:24 +04:00
Steven Masley	5ccf5084e8	chore: create type for unique role names (#13506 ) * chore: create type for unique role names Using `string` was confusing when something should be combined with org context, and when not to. Naming this new name, "RoleIdentifier"	2024-06-11 08:55:28 -05:00
Spike Curtis	8326a3a675	chore: change mock clock to allow Advance() within timer/tick functions (#13500 )	2024-06-10 15:27:24 +04:00
Spike Curtis	a0962ba089	fix: wait for PGCoordinator to clean up db state (#13351 ) c.f. https://github.com/coder/coder/pull/13192#issuecomment-2097657692 We need to wait for PGCoordinator to finish its work before returning on `Close()`, so that we delete database state (best effort -- if this fails others will filter it out based on heartbeats).	2024-05-24 12:01:03 +04:00
Steven Masley	1f5788feff	chore: remove rbac psuedo resources, add custom verbs (#13276 ) Removes our pseudo rbac resources like `WorkspaceApplicationConnect` in favor of additional verbs like `ssh`. This is to make more intuitive permissions for building custom roles. The source of truth is now `policy.go`	2024-05-15 11:09:42 -05:00
Steven Masley	cb6b5e8fbd	chore: push rbac actions to policy package (#13274 ) Just moved `rbac.Action` -> `policy.Action`. This is for the stacked PR to not have circular dependencies when doing autogen. Without this, the autogen can produce broken golang code, which prevents the autogen from compiling. So just avoiding circular dependencies. Doing this in it's own PR to reduce LoC diffs in the primary PR, since this has 0 functional changes.	2024-05-15 09:46:35 -05:00
Colin Adler	205c43da99	fix(enterprise): mark nodes from unhealthy coordinators as lost (#13123 ) Instead of removing the mappings of unhealthy coordinators entirely, mark them as lost instead. This prevents peers from disappearing from other peers if a coordinator misses a heartbeat.	2024-05-03 14:07:29 -05:00
Colin Adler	777dfbe965	feat(enterprise): add ready for handshake support to pgcoord (#12935 )	2024-04-16 15:01:10 -05:00
Spike Curtis	06eae954c9	fix: stop sending DeleteTailnetPeer when coordinator is unhealthy (#12925 ) fixes #12923 Prevents Coordinate peer connections from generating spurious database queries like DeleteTailnetPeer when the coordinator is unhealthy. It does this by checking the health of the querier before accepting a connection, rather than unconditionally accepting it only for it to get swatted down later.	2024-04-10 22:49:13 +04:00
Colin Adler	e5d911462f	fix(tailnet): enforce valid agent and client addresses (#12197 ) This adds the ability for `TunnelAuth` to also authorize incoming wireguard node IPs, preventing agents from reporting anything other than their static IP generated from the agent ID.	2024-03-01 09:02:33 -06:00
Spike Curtis	627232eae9	fix: fix pgcoord to delete coordinator row last (#12155 ) Fixes #12141 Fixes #11750 PGCoord shutdown was uncoordinated, so an update at an inopportune time during shutdown would be rejected because the coordinator row was already deleted. This PR ensures that the PGCoord subcomponents that write updates are shut down before we take down the heartbeats, which is responsible for deleting the coordinator row.	2024-02-15 16:34:29 +04:00
Spike Curtis	1c8b803785	feat: add logging to pgcoord subscribe/unsubscribe (#11952 ) Adds logging to unsubscribing from peer and tunnel updates in pgcoordinator, since #11950 seems to be problem with these subscriptions	2024-01-31 12:15:58 +04:00
Cian Johnston	ecae6f9135	fix(enterprise/tailnet): handle query canceled error in sendBeat() (#11794 )	2024-01-24 18:42:05 +00:00
Spike Curtis	cae095fdb6	fix: stop logging errors on canceled cleanup queries (#11547 ) Fixes flake seen here: https://github.com/coder/coder/actions/runs/7474259128/job/20340051975	2024-01-10 16:20:29 +04:00
Spike Curtis	f2606a78dd	fix: avoid converting nil node fixes: #11276	2023-12-19 13:38:15 +04:00
Dean Sheather	e46431078c	feat: add AgentAPI using DRPC (#10811 ) Co-authored-by: Spike Curtis <spike@coder.com>	2023-12-18 22:53:28 +10:00
Spike Curtis	ad3fed72bc	chore: rename Coordinator to CoordinatorV1 (#11222 ) Renames the tailnet.Coordinator to represent both v1 and v2 APIs, so that we can use this interface for the main atomic pointer. Part of #10532	2023-12-15 11:38:12 +04:00
Spike Curtis	bf3b35b1e2	fix: stop logging context Canceled as error (#11177 ) fixes #11166 and a related log that could have the same problem	2023-12-13 13:08:30 +04:00
Spike Curtis	b34ecf1e9e	fix: fix deadlock of mappingQuery on context canceled Fixes #11078 replace bare channel send with SendCtx so that we properly shut down when context is canceled.	2023-12-07 17:19:18 +04:00
Spike Curtis	2c86d0bed0	feat: support v2 Tailnet API in AGPL coordinator (#11010 ) Fixes #10529	2023-12-06 15:04:28 +04:00
Spike Curtis	612e67a53b	feat: add cleanup of lost tailnet peers and tunnels to PGCoordinator (#10939 ) Adds the "lost" peer cleanup queries to PGCoordinator, including tests.	2023-12-01 10:13:29 +04:00
Spike Curtis	0cab6e7763	feat: support graceful disconnect in PGCoordinator (#10937 ) Adds support for graceful disconnect to PGCoordinator. When peers gracefully disconnect, they send a disconnect message. This triggers the peer to be disconnected from all tunneled peers. The Multi-Agent Client supports graceful disconnect, since it is in memory and we know that when it is closed, we really mean to disconnect. The v1 agent and client Websocket connections do not support graceful disconnect, since the v1 protocol doesn't have this feature. That means that if a v1 peer connects to a v2 peer, when the v1 peer's coordinator connection is closed, the v2 peer will see it as "lost" since we don't know whether the v1 peer meant to disconnect, or it just lost connectivity to the coordinator.	2023-12-01 09:55:25 +04:00
Spike Curtis	52901e1219	feat: implement HTMLDebug for PGCoord with v2 API (#10914 ) Implements HTMLDebug for the PGCoordinator with the new v2 API and related DB tables.	2023-11-28 22:37:20 +04:00
Spike Curtis	5c48cb4447	feat: modify PG Coordinator to work with new v2 Tailnet API (#10573 ) re: #10528 Refactors PG Coordinator to work with the Tailnet v2 API, including wrappers for the existing v1 API. The debug endpoint functions, but doesn't return sensible data, that will be in another stacked PR.	2023-11-20 14:31:04 +04:00
Colin Adler	36f3151b71	fix(enterprise/tailnet): properly detect legacy agents (#10083 )	2023-10-06 16:49:26 +00:00
Colin Adler	c900b5f8df	feat: add single tailnet support to pgcoord (#9351 )	2023-09-21 14:30:48 -05:00
Spike Curtis	a415395e9e	fix: stop dropping error log on context canceled after heartbeat (#9427 ) Signed-off-by: Spike Curtis <spike@coder.com>	2023-08-30 14:44:00 +04:00
Kyle Carberry	22e781eced	chore: add /v2 to import module path (#9072 ) * chore: add /v2 to import module path go mod requires semantic versioning with versions greater than 1.x This was a mechanical update by running: ``` go install github.com/marwan-at-work/mod/cmd/mod@latest mod upgrade ``` Migrate generated files to import /v2 * Fix gen	2023-08-18 18:55:43 +00:00
Spike Curtis	2f46f2315c	fix: fix race in PGCoord at startup (#9144 ) Signed-off-by: Spike Curtis <spike@coder.com>	2023-08-18 09:53:03 +04:00
Spike Curtis	c7a6d626b4	fix: make PGCoordinator close connections when unhealthy (#9125 ) Signed-off-by: Spike Curtis <spike@coder.com>	2023-08-17 09:36:47 +04:00
Colin Adler	bc862fa493	chore: upgrade tailscale to v1.46.1 (#8913 )	2023-08-09 19:50:26 +00:00
Colin Adler	0b4f333a6f	chore: add http debug support to pgcoord (#8795 )	2023-07-28 17:59:31 -05:00
Colin Adler	dd2f79995b	chore(tailnet): rewrite coordinator debug using `html/template` (#8752 )	2023-07-26 22:54:21 +00:00
Colin Adler	6b92abebb9	fix(tailnet): track agent names for http debug (#8744 )	2023-07-26 18:44:10 +00:00
Colin Adler	1cb39fc65d	test: ignore more spurious pgcoord errors (#8628 )	2023-07-20 19:55:25 +00:00
Colin Adler	00b9a3ce58	fix: prevent error log when `pgcoord` query is canceled (#8609 )	2023-07-19 16:40:57 -05:00
Colin Adler	c47b78c44b	chore: replace wsconncache with a single tailnet (#8176 )	2023-07-12 17:37:31 -05:00
Spike Curtis	b4057bd74a	feat: make pgCoordinator generally available (#8419 ) * pgCoord to GA, fix tests Signed-off-by: Spike Curtis <spike@coder.com> * Fix generation and coordinator delete RBAC Signed-off-by: Spike Curtis <spike@coder.com> * Fix fakeQuerier -> FakeQuerier Signed-off-by: Spike Curtis <spike@coder.com> --------- Signed-off-by: Spike Curtis <spike@coder.com>	2023-07-12 13:35:29 +04:00

1 2

54 Commits