coder

mirror of https://github.com/coder/coder.git synced 2026-06-03 21:18:24 +00:00

Author	SHA1	Message	Date
Spike Curtis	6238065185	test: use not before in TestAgentConnectionMonitor_* (#21332 ) fixes https://github.com/coder/internal/issues/1203 The matcher I wrote for TestAgentConnectionMonitor tested that `last_disconnected_at` was strictly _after_ the start of the test to ensure it was updated. This is too strict of a test because Windows in particular doesn't have high-resolution timers, so it's entirely possible to get the exact same timestamp from subsequent calls to `time.Now()`. This PR switches the test to _not before_ to cover this case. The results are just as valid because we always initialize the `last_disconnected_at` to something well before the test starts.	2025-12-22 10:21:39 +04:00
Spike Curtis	71c6dc4043	fix: stop disconnecting from coderd early and record disconnect correctly (#21250 ) fixes https://github.com/coder/internal/issues/1196 The above issue exposes two different bugs in Coder. In the agent, there is a race where if the agent is closed while starting up networking, it will erroneously disconnect from Coderd, which delays or breaks writing final status and logs. In Coderd, there is a bug where we don't properly record the latest agent disconnection time if the agent had previously disconnected. This causes us to report the agent status as "Connected" even after it has disconnected up until the inactivity timeout fires. This PR fixes both issues. It also slightly reworks when we send workspace updates based on connection and disconnection. Previously we would send two updates when the agent connected in certain circumstances, even though the status would be the same in both (only times changed). Now we universally only send one on connect, and then another on disconnect.	2025-12-15 12:04:01 +04:00
Spike Curtis	f2904726a5	test: wait for completion before asserting in TestAgentConnectionMonitor_BuildOutdated (#19959 ) <!-- If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting. --> follow on to #19836 fixes https://github.com/coder/internal/issues/970 Same issue, different (adjacent) test.	2025-09-26 09:24:11 +04:00
Spike Curtis	655a36c392	test: fix TestAgentConnectionMonitor_PingTimeout race with mock assertions (#19836 ) Fixes https://github.com/coder/internal/issues/970 The test doesn't wait for `monitor()` to complete, and the mock database call that we assert takes place in a `defer` within `monitor()`. This allows the mock assertions to race with the defer and flake the test. Solution is to explicitly wait for `monitor()` to complete before the end of the test, so that mock assertions (which happen in a `t.Cleanup()`) don't race.	2025-09-16 21:54:50 +04:00
ケイラ	f670bc31f5	chore: update testutil chan helpers (#17408 )	2025-04-16 10:37:09 -06:00
Spike Curtis	2c7f8ac65f	chore: migrate to coder/websocket 1.8.12 (#15898 ) Migrates us to `coder/websocket` v1.8.12 rather than `nhooyr/websocket` on an older version. Works around https://github.com/coder/websocket/issues/504 by adding an explicit test for `xerrors.Is(err, io.EOF)` where we were previously getting `io.EOF` from the netConn.	2024-12-19 00:51:30 +04:00
Spike Curtis	5861e516b9	chore: add standard test logger ignoring db canceled (#15556 ) Refactors our use of `slogtest` to instantiate a "standard logger" across most of our tests. This standard logger incorporates https://github.com/coder/slog/pull/217 to also ignore database query canceled errors by default, which are a source of low-severity flakes. Any test that has set non-default `slogtest.Options` is left alone. In particular, `coderdtest` defaults to ignoring all errors. We might consider revisiting that decision now that we have better tools to target the really common flaky Error logs on shutdown.	2024-11-18 14:09:22 +04:00
Ethan	31506e694b	chore: send workspace pubsub events by owner id (#14964 ) We currently send empty payloads to pubsub channels of the form `workspace:<workspace_id>` to notify listeners of updates to workspaces (such as for refreshing the workspace dashboard). To support https://github.com/coder/coder/issues/14716, we'll instead send `WorkspaceEvent` payloads to pubsub channels of the form `workspace_owner:<owner_id>`. This enables a listener to receive events for all workspaces owned by a user. This PR replaces the usage of the old channels without modifying any existing behaviors. ``` type WorkspaceEvent struct { Kind WorkspaceEventKind `json:"kind"` WorkspaceID uuid.UUID `json:"workspace_id" format:"uuid"` // AgentID is only set for WorkspaceEventKindAgent* events // (excluding AgentTimeout) AgentID *uuid.UUID `json:"agent_id,omitempty" format:"uuid"` } ``` We've defined `WorkspaceEventKind`s based on how the old channel was used, but it's not yet necessary to inspect the types of any of the events, as the existing listeners are designed to fire off any of them. ``` WorkspaceEventKindStateChange WorkspaceEventKind = "state_change" WorkspaceEventKindStatsUpdate WorkspaceEventKind = "stats_update" WorkspaceEventKindMetadataUpdate WorkspaceEventKind = "mtd_update" WorkspaceEventKindAppHealthUpdate WorkspaceEventKind = "app_health" WorkspaceEventKindAgentLifecycleUpdate WorkspaceEventKind = "agt_lifecycle_update" WorkspaceEventKindAgentLogsUpdate WorkspaceEventKind = "agt_logs_update" WorkspaceEventKindAgentConnectionUpdate WorkspaceEventKind = "agt_connection_update" WorkspaceEventKindAgentLogsOverflow WorkspaceEventKind = "agt_logs_overflow" WorkspaceEventKindAgentTimeout WorkspaceEventKind = "agt_timeout" ```	2024-11-01 14:17:05 +11:00
Spike Curtis	b79785c86f	feat: move agent v2 API connection monitoring to yamux layer (#11910 ) Moves monitoring of the agent v2 API connection to the yamux layer. Present behavior monitors this at the websocket layer, and closes the websocket on completion. This can cause yamux to hit unexpected errors since the connection is closed underneath it. This might be the cause of yamux errors that some customers are seeing ![image.png](https://graphite-user-uploaded-assets-prod.s3.amazonaws.com/tCz4CxRU9jhAJ7zH8RTi/53b8b5ef-e9e5-44a5-b559-99c37c136071.png) In any case, it's more graceful to close yamux first and let yamux close the underlying websocket. That should limit yamux error logging to truly unexpected/error cases. The only downside is that the yamux `Close()` doesn't accept a reason, so if the agent becomes outdated and we close the API connection, the agent just sees the connection close without a reason. I'm not sure we log this at the agent anyway, but it would be nice. I think more accurate logging on Coderd are more important. I've also added some logging when the monitor disconnects for reasons other than the context being canceled (e.g. agent outdated, failed pings).	2024-02-01 08:18:35 +04:00
Cian Johnston	5ecb0db4f2	chore(coderd): fix test flake in TestAgentWebsocketMonitor_SendPings (#11518 )	2024-01-10 08:45:46 +00:00
Steven Masley	dd05a6b13a	chore: mockgen archived, moved to new location (#11415 ) * chore: mockgen archived, moved to new location	2024-01-04 18:35:56 -06:00
Spike Curtis	c9b7d61769	chore: refactor agent connection updates (#11301 ) Refactors the code that handles monitoring an agent websocket with pings and updating the connection times in the DB. Consolidates v1 and v2 agent APIs under the same code for this. One substantive change (not _just_ a refactor) is that I've made it so that we actually disconnect if the agent fails to respond to our pings, rather than the old behavior where we would update the database, but not actually tear down the websocket.	2024-01-02 16:04:37 +04:00

12 Commits