coder

mirror of https://github.com/coder/coder.git synced 2026-06-02 20:48:20 +00:00

Author	SHA1	Message	Date
Spike Curtis	bddb808b25	chore: arrange imports in a standard way (#21452 ) Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example: ``` import ( "context" "time" "github.com/prometheus/client_golang/prometheus" "golang.org/x/xerrors" "gopkg.in/natefinch/lumberjack.v2" "cdr.dev/slog/v3" "github.com/coder/coder/v2/codersdk/agentsdk" "github.com/coder/serpent" ) ``` 3 groups: standard library, 3rd partly libs, Coder libs. This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.	2026-01-08 15:24:11 +04:00
Spike Curtis	49b34a716a	fix: fix slog to always use array of Fields (#21426 ) Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder). It also updates dependencies that also use slog and were updated. I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule. Other dependencies, I pushed new tags.	2026-01-08 10:29:41 +04:00
Zach	07924037e7	feat: add boundary log forwarding from agent to coderd (#21345 ) Add agent forwarding of boundary audit logs from workspaces to coderd via agent API, and re-emission of boundary logs to coderd stderr. This change adds a server to the workspace agent that always listens on a unix socket for boundary to connect and send audit logs. coderd log format example: ``` [API] 2025-12-23 18:31:46.755 [info] coderd.agentrpc: boundary_request owner=.. workspace_name=.. agent_name=.. decision=.. workspace_id=.. http_method=.. http_url=.. event_time=.. request_id=.. ``` Corresponding boundary PR: https://github.com/coder/boundary/pull/124 RFC: https://www.notion.so/coderhq/Agent-Boundary-Logs-2afd579be59280f29629fc9823ac41ba https://github.com/coder/coder/issues/21280	2025-12-31 16:38:19 -07:00
Zach	9d1493a13a	feat: add initial API for boundary log forwarding to coderd (#21293 ) Add the AgentAPI changes to support the feature that transmits boundary logs from workspaces to coderd via the agent API for eventual re-emission to stderr. The API handlers are stubs for now because I'm trying to land this feature from multiple smaller PRs. High level architecture: - Boundary records resource access in batches and sends proto message to agent - Agent proxies messages to coderd (captured by the API changes in this PR) - coderd re-emits logs to stderr RFC: https://www.notion.so/coderhq/Agent-Boundary-Logs-2afd579be59280f29629fc9823ac41ba	2025-12-19 10:41:39 -07:00
Callum Styan	8ed1c1d372	perf: reduce calls to GetWorkspaceByAgentID in GetWorkspaceAgentByID (#21046 ) This PR piggy backs on the agent API cached workspace added in an earlier PR to provide a fast path for avoiding `GetWorkspaceByAgentID` calls in dbauthz's `GetWorkspaceAgentByID`. This query is not the most expensive, but has a significant call volume at ~16 million calls per week. Signed-off-by: Callum Styan <callumstyan@gmail.com>	2025-12-10 14:03:24 -08:00
Callum Styan	d22d34e45b	fix: pass context with authorization to agentapi (#20959 ) The agentapi context needs to be a context with some amount of authorization attached to it via the context so that the cache refresh routine can fetch the workspace from the db via GetWorkspaceForAgentID. --------- Signed-off-by: Callum Styan <callumstyan@gmail.com>	2025-11-26 14:53:16 -08:00
Callum Styan	b0e8384b82	perf: reduce DB calls to `GetWorkspaceByAgentID` via caching workspace info (#20662 ) --------- Signed-off-by: Callum Styan <callumstyan@gmail.com>	2025-11-25 14:45:05 -08:00
Callum Styan	45c43d4ec4	fix: refactor agent resource monitoring API to avoid excessive calls to DB (#20430 ) This should resolve https://github.com/coder/internal/issues/728 by refactoring the ResourceMonitorAPI struct to only require querying the resource monitor once for memory and once for volumes, then using the stored monitors on the API struct from that point on. This should eliminate the vast majority of calls to `GetWorkspaceByAgentID` and `FetchVolumesResourceMonitorsUpdatedAfter`/`FetchMemoryResourceMonitorsUpdatedAfter` (millions of calls per week). Tests passed, and I ran an instance of coder via a workspace with a template that added resource monitoring every 10s. Note that this is the default docker container, so there are other sources of `GetWorkspaceByAgentID` db queries. Note that this workspace was running for ~15 minutes at the time I gathered this data. Over 30s for the `ResourceMonitor` calls: ``` coder@callum-coder-2:~/coder$ curl localhost:19090/metrics \| grep ResourceMonitor \| grep count % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0coderd_db_query_latencies_seconds_count{query="FetchMemoryResourceMonitorsByAgentID"} 2 coderd_db_query_latencies_seconds_count{query="FetchMemoryResourceMonitorsUpdatedAfter"} 2 100 288k 0 288k 0 0 58.3M 0 --:--:-- --:--:-- --:--:-- 70.4M coderd_db_query_latencies_seconds_count{query="FetchVolumesResourceMonitorsByAgentID"} 2 coderd_db_query_latencies_seconds_count{query="FetchVolumesResourceMonitorsUpdatedAfter"} 2 coderd_db_query_latencies_seconds_count{query="UpdateMemoryResourceMonitor"} 155 coderd_db_query_latencies_seconds_count{query="UpdateVolumeResourceMonitor"} 155 coder@callum-coder-2:~/coder$ curl localhost:19090/metrics \| grep ResourceMonitor \| grep count % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0coderd_db_query_latencies_seconds_count{query="FetchMemoryResourceMonitorsByAgentID"} 2 coderd_db_query_latencies_seconds_count{query="FetchMemoryResourceMonitorsUpdatedAfter"} 2 100 288k 0 288k 0 0 34.7M 0 --:--:-- --:--:-- --:--:-- 40.2M coderd_db_query_latencies_seconds_count{query="FetchVolumesResourceMonitorsByAgentID"} 2 coderd_db_query_latencies_seconds_count{query="FetchVolumesResourceMonitorsUpdatedAfter"} 2 coderd_db_query_latencies_seconds_count{query="UpdateMemoryResourceMonitor"} 158 coderd_db_query_latencies_seconds_count{query="UpdateVolumeResourceMonitor"} 158 ``` And over 1m for the `GetWorkspaceAgentByID` calls, the majority are from the workspace metadata stats updates: ``` coder@callum-coder-2:~/coder$ curl localhost:19090/metrics \| grep GetWorkspaceByAgentID \| grep count % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 284k 0 284k 0 0 42.4M 0 --:--:-- --:--:-- --:--:-- 46.3M coderd_db_query_latencies_seconds_count{query="GetWorkspaceByAgentID"} 876 coder@callum-coder-2:~/coder$ curl localhost:19090/metrics \| grep GetWorkspaceByAgentID \| grep count % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 284k 0 284k 0 0 75.4M 0 --:--:-- --:--:-- --:--:-- 92.7M coderd_db_query_latencies_seconds_count{query="GetWorkspaceByAgentID"} 918 ``` --------- Signed-off-by: Callum Styan <callumstyan@gmail.com>	2025-10-28 13:38:16 -07:00
Ethan	08e17a07fc	chore!: route connection logs to new table (#18340 ) ### Breaking Change (changelog note): > User connections to workspaces, and the opening of workspace apps or ports will no longer create entries in the audit log. Those events will now be included in the 'Connection Log'. Please see the 'Connection Log' page in the dashboard, and the Connection Log [documentation](https://coder.com/docs/admin/monitoring/connection-logs) for details. Those with permission to view the Audit Log will also be able to view the Connection Log. The new Connection Log has the same licensing restrictions as the Audit Log, and requires a Premium Coder deployment. ### Context This is the first PR of a few for moving connection events out of the audit log, and into a new database table and web UI page called the 'Connection Log'. This PR: - Creates the new table - Adds and tests queries for inserting and reading, including reading with an RBAC filter. - Implements the corresponding RBAC changes, such that anyone who can view the audit log can read from the table - Implements, under the enterprise package, a `ConnectionLogger` abstraction to replace the `Auditor` abstraction for these logs. (No-op'd in AGPL, like the `Auditor`) - Routes SSH connection and Workspace App events into the new `ConnectionLogger` - Updates all existing tests to check the values of the `ConnectionLogger` instead of the `Auditor`. Future PRs: - Add filtering to the query - Add an enterprise endpoint to query the new table - Write a query to delete old events from the audit log, call it from dbpurge. - Implement a table in the Web UI for viewing connection logs. > [!NOTE] > The PRs in this stack obviously won't be (completely) atomic. Whilst they'll each pass CI, the stack is designed to be merged all at once. I'm splitting them up for the sake of those reviewing, and so changes can be reviewed as early as possible. Despite this, it's really hard to make this PR any smaller than it already is. I'll be keeping it in draft until it's actually ready to merge.	2025-07-15 14:36:06 +10:00
Danielle Maywood	b712d0b23f	feat(coderd/agentapi): implement sub agent api (#17823 ) Closes https://github.com/coder/internal/issues/619 Implement the `coderd` side of the AgentAPI for the upcoming dev-container agents work. `agent/agenttest/client.go` is left unimplemented for a future PR working to implement the agent side of this feature.	2025-05-29 12:15:47 +01:00
Steven Masley	64807e1d61	chore: apply the 4mb max limit on drpc protocol message size (#17771 ) Respect the 4mb max limit on proto messages	2025-05-13 11:24:51 -05:00
Danielle Maywood	cac130346d	chore: bump debounce from 5 minutes to 30 minutes (#17111 ) To ensure OOM/OOD isn't too spammy we want to have a debounce period of 30 minutes.	2025-03-26 14:33:10 +00:00
Mathias Fredriksson	b07b33ec9d	feat: add agentapi endpoint to report connections for audit (#16507 ) This change adds a new `ReportConnection` endpoint to the `agentapi`. The protocol version was bumped previously, so it has been omitted here. This allows the agent to report connection events, for example when the user connects to the workspace via SSH or VS Code. Updates #15139	2025-02-20 14:52:01 +02:00
Danielle Maywood	d6b9806098	chore: implement oom/ood processing component (#16436 ) Implements the processing logic as set out in the OOM/OOD RFC.	2025-02-17 16:56:52 +00:00
Vincent Vielle	bc609d0056	feat: integrate agentAPI with resources monitoring logic (#16438 ) As part of the new resources monitoring logic - more specifically for OOM & OOD Notifications , we need to update the AgentAPI , and the agents logic. This PR aims to do it, and more specifically : We are updating the AgentAPI & TailnetAPI to version 24 to add two new methods in the AgentAPI : - One method to fetch the resources monitoring configuration - One method to push the datapoints for the resources monitoring. Also, this PR adds a new logic on the agent side, with a routine running and ticking - fetching the resources usage each time , but also storing it in a FIFO like queue. Finally, this PR fixes a problem we had with RBAC logic on the resources monitoring model, applying the same logic than we have for similar entities.	2025-02-14 10:28:15 +01:00
Ethan	31506e694b	chore: send workspace pubsub events by owner id (#14964 ) We currently send empty payloads to pubsub channels of the form `workspace:<workspace_id>` to notify listeners of updates to workspaces (such as for refreshing the workspace dashboard). To support https://github.com/coder/coder/issues/14716, we'll instead send `WorkspaceEvent` payloads to pubsub channels of the form `workspace_owner:<owner_id>`. This enables a listener to receive events for all workspaces owned by a user. This PR replaces the usage of the old channels without modifying any existing behaviors. ``` type WorkspaceEvent struct { Kind WorkspaceEventKind `json:"kind"` WorkspaceID uuid.UUID `json:"workspace_id" format:"uuid"` // AgentID is only set for WorkspaceEventKindAgent* events // (excluding AgentTimeout) AgentID *uuid.UUID `json:"agent_id,omitempty" format:"uuid"` } ``` We've defined `WorkspaceEventKind`s based on how the old channel was used, but it's not yet necessary to inspect the types of any of the events, as the existing listeners are designed to fire off any of them. ``` WorkspaceEventKindStateChange WorkspaceEventKind = "state_change" WorkspaceEventKindStatsUpdate WorkspaceEventKind = "stats_update" WorkspaceEventKindMetadataUpdate WorkspaceEventKind = "mtd_update" WorkspaceEventKindAppHealthUpdate WorkspaceEventKind = "app_health" WorkspaceEventKindAgentLifecycleUpdate WorkspaceEventKind = "agt_lifecycle_update" WorkspaceEventKindAgentLogsUpdate WorkspaceEventKind = "agt_logs_update" WorkspaceEventKindAgentConnectionUpdate WorkspaceEventKind = "agt_connection_update" WorkspaceEventKindAgentLogsOverflow WorkspaceEventKind = "agt_logs_overflow" WorkspaceEventKindAgentTimeout WorkspaceEventKind = "agt_timeout" ```	2024-11-01 14:17:05 +11:00
Steven Masley	343f8ec9ab	chore: join owner, template, and org in new workspace view (#15116 ) Joins in fields like `username`, `avatar_url`, `organization_name`, `template_name` to `workspaces` via a view. The view must be maintained moving forward, but this prevents needing to add RBAC permissions to fetch related workspace fields.	2024-10-22 09:20:54 -05:00
Danielle Maywood	ae522c558d	feat: add agent timings (#14713 ) * feat: begin impl of agent script timings * feat: add job_id and display_name to script timings * fix: increment migration number * fix: rename migrations from 251 to 254 * test: get tests compiling * fix: appease the linter * fix: get tests passing again * fix: drop column from correct table * test: add fixture for agent script timings * fix: typo * fix: use job id used in provisioner job timings * fix: increment migration number * test: behaviour of script runner * test: rewrite test * test: does exit 1 script break things? * test: rewrite test again * fix: revert change Not sure how this came to be, I do not recall manually changing these files. * fix: let code breathe * fix: wrap errors * fix: justify nolint * fix: swap require.Equal argument order * fix: add mutex operations * feat: add 'ran_on_start' and 'blocked_login' fields * fix: update testdata fixture * fix: refer to agent_id instead of job_id in timings * fix: JobID -> AgentID in dbauthz_test * fix: add 'id' to scripts, make timing refer to script id * fix: fix broken tests and convert bug * fix: update testdata fixtures * fix: update testdata fixtures again * feat: capture stage and if script timed out * fix: update migration number * test: add test for script api * fix: fake db query * fix: use UTC time * fix: ensure r.scriptComplete is not nil * fix: move err check to right after call * fix: uppercase sql * fix: use dbtime.Now() * fix: debug log on r.scriptCompleted being nil * fix: ensure correct rbac permissions * chore: remove DisplayName * fix: get tests passing * fix: remove space in sql up * docs: document ExecuteOption * fix: drop 'RETURNING' from sql * chore: remove 'display_name' from timing table * fix: testdata fixture * fix: put r.scriptCompleted call in goroutine * fix: track goroutine for test + use separate context for reporting * fix: appease linter, handle trackCommandGoroutine error * fix: resolve race condition * feat: replace timed_out column with status column * test: update testdata fixture * fix: apply suggestions from review * revert: linter changes	2024-09-24 10:51:49 +01:00
Dean Sheather	6c94dd4f23	chore: add DRPC server implementation for network telemetry (#13675 )	2024-07-02 01:50:52 +10:00
Garrett Delfosse	fed668b432	chore: switch ssh session stats based on experiment (#13637 )	2024-06-25 10:58:45 -04:00
Kayla Washburn-Love	b248f125e1	chore: rename notification banners to announcement banners (#13419 )	2024-05-31 10:59:28 -06:00
Garrett Delfosse	5789ea5397	chore: move stat reporting into workspacestats package (#13386 )	2024-05-29 11:49:08 -04:00
Kayla Washburn-Love	d8e0be6ee6	feat: add support for multiple banners (#13081 )	2024-05-08 15:40:43 -06:00
Kayla Washburn-Love	b2413a593c	chore: reimplement activity status and autostop improvements (#12175 )	2024-02-27 11:06:26 -07:00
Cian Johnston	d6b025db14	Revert "feat: add activity status and autostop reason to workspace overview (#11987 )" (#12144 ) Related to https://github.com/coder/coder/pull/11987 This reverts commit `d37b131`.	2024-02-14 17:14:49 +00:00
Kayla Washburn-Love	d37b131426	feat: add activity status and autostop reason to workspace overview (#11987 )	2024-02-13 10:50:17 -07:00
Spike Curtis	2599850e54	feat: use agent v2 API to post startup (#11877 ) Uses the v2 Agent API to post startup information.	2024-01-30 11:23:28 +04:00
Spike Curtis	da8bb1c198	feat: use agent v2 API to fetch manifest (#11832 ) Agent uses the v2 API to obtain the manifest, instead of the HTTP API.	2024-01-30 10:11:28 +04:00
Spike Curtis	207328ca50	feat: use appearance.Fetcher in agentapi (#11770 ) This PR updates the Agent API to use the appearance.Fetcher, which is set by entitlement code in Enterprise coderd. This brings the agentapi into compliance with the Enterprise feature.	2024-01-29 21:22:50 +04:00
Dean Sheather	29707099d7	chore: add agentapi tests (#11269 )	2024-01-26 07:04:19 +00:00
Spike Curtis	36636bb6a5	feat: add tailnet to agent RPC service (#11304 ) Adds tailnet.DRPCService to the agent API Supports #10531 but we still need to add version negotiation to the websocket endpoint	2024-01-02 10:10:20 +04:00
Spike Curtis	25f2abf9ab	chore: remove tailnet from agent API and rename client API to tailnet (#11303 ) Refactors our DRPC service definitions slightly. In the previous version, I inserted the RPCs from the tailnet proto directly into the Agent service. This makes things hard to deal with because DRPC then generates a new set of methods with new interfaces with the `DRPCAgent_` prefixed. Since you can't have a single method that takes different argument types, we couldn't reuse the implementation of those RFCs without a lot of extra classes and pass-thru methods. Instead, the "right" way to do it is to integrate at the DRPC layer. So, we have two DRPC services available over the Agent websocket, and register them both on the DRPC `mux`. Since the tailnet proto RPC service is now for both clients and agents, I renamed some things to clarify and shorten. This PR also removes the `TailnetAPI` implementation from the `agentapi` package, and the next PR in the stack replaces it with the implementation from the `tailnet` package.	2024-01-02 10:02:45 +04:00
Dean Sheather	e46431078c	feat: add AgentAPI using DRPC (#10811 ) Co-authored-by: Spike Curtis <spike@coder.com>	2023-12-18 22:53:28 +10:00

33 Commits