Commit Graph

267 Commits

Author SHA1 Message Date
Mathias Fredriksson 37fc6646ad perf(coderd/database): limit GetLatestWorkspaceAppStatusByAppID to 1 row (#20917)
## Description

This PR fixes an issue where `GetLatestWorkspaceAppStatusesByAppID`
returned an unbounded number of rows for a given app ID, which could
cause performance issues for noisy or long-running AI tasks.

## Impact

This change reduces database query overhead for workspace app status
updates, particularly for busy AI tasks that update their status
frequently. Previously, fetching the latest status would return all
historical statuses, now it returns only the most recent one.

Fixes #20862

---

🤖 This change was written by Claude Sonnet 4.5 Thinking using [mux](https://github.com/coder/mux) and reviewed by a human 🏄🏻‍♂️
2025-11-25 16:56:42 +02:00
Danielle Maywood f2a1a7e8c3 fix(coderd): gate AI task notifications on agent ready state (#20690)
Relates to https://github.com/coder/internal/issues/1098

Currently AgentAPI waits for only 2 seconds worth of identical terminal
screen snapshots before deciding a task has entered a "stable" state. We
interpret this as becoming "idle", resulting in a notification being
triggered. This behavior is not ideal and is ultimately the root cause
of our spammy notifications.

Unfortunately, until we move AgentAPI to either use the Claude Code SDK
(or ACP wrapper around it), we are unable to easily fix the root cause.

This PR instead waits until the agent is ready before it will send state
change notifications. This will at least resolve _some_ of the
complaints about task state notifications being too spammy.

---

🤖 PR was written by Claude Sonnet 4.5 using [Coder
Mux](https://github.com/coder/cmux) and reviewed by a human 👩
2025-11-10 16:00:13 +00:00
Mathias Fredriksson 7ae3fdc749 refactor: use task data model for notifications (#20590)
Updates coder/internal#973
Updates coder/internal#974
2025-10-31 15:53:27 +02:00
Cian Johnston 0faee8e913 feat(coderd): notify on task completion/failure (#20327)
Adds notifications on task transitions to completed or failure state.

Authored by Claude, I reviewed it and it appears to be legit.
2025-10-16 10:21:08 +01:00
Cian Johnston ade3fce0f6 fix(coderd): prevent task working notification for first app status (#20313)
Disclaimer: Claude did all of this, reviewed and committed by me.

I find the "task is working" notification straight after creation to be
unnecessary.
Added logic to skip the notification if the first app status is
"working".
2025-10-15 16:41:16 +01:00
Cian Johnston ffcb7a1693 fix(coderd): truncate task prompt to 160 characters in notifications (#20147)
Truncates the task prompt used in notifications to a maximum of 160
characters. The length of 160 characters was chosen arbitrarily.
2025-10-02 19:54:07 +01:00
Cian Johnston 6e4d903a8e fix(coderd): increase task notification dedupe bypass timestamp to 1 minute (#20043) 2025-09-30 16:05:34 +00:00
Susana Ferreira fdb0267e5d feat: add notification for task status (#19965)
## Description

Send a notification to the workspace owner when an AI task’s app state
becomes `Working` or `Idle`.
An AI task is identified by a workspace build with `HasAITask = true`
and `AITaskSidebarAppID` matching the agent app’s ID.

## Changes

* Add `TemplateTaskWorking` notification template.
* Add `TemplateTaskIdle` notification template.
* Add `GetLatestWorkspaceAppStatusesByAppID` SQL query to get the
workspace app statuses ordered by latest first.
* Update `PATCH /workspaceagents/me/app-status` to enqueue:
  * `TemplateTaskWorking` when state transitions to `working`
  * `TemplateTaskIdle` when state transitions to `idle`
* Notification labels include:
  * `task`: task initial prompt
  * `workspace`: workspace name
* Notification dedupe: include a minute-bucketed timestamp (UTC
truncated to the minute) in the enqueue data to allow identical content
to resend within the same day (but not more than once per minute).

Closes: https://github.com/coder/coder/issues/19776
2025-09-29 16:44:53 +01:00
Danielle Maywood e12b621ff0 fix(coderd): ensure agent WebSocket conn is cleaned up (#19711)
When clients disconnected from the /containers/watch endpoint, the WebSocket 
connection between coderd and the agent stayed open. This caused heartbeat 
traffic every 15s that was incorrectly counted as workspace activity, 
extending workspace lifetimes indefinitely.

Now properly cancels the agent connection context when the client disconnects.
2025-09-05 14:26:46 +01:00
Danielle Maywood 205eb29e60 fix: stop reading closed channel for /watch devcontainers endpoint (#19373)
Fixes https://github.com/coder/coder/issues/19372

We increase the read limit to 4MiB (we use this limit elsewhere). We
also make sure to stop sending messages when `containersCh` becomes
closed.
2025-08-15 12:32:33 +01:00
Cian Johnston 812d72c5bb fix: sanitize app status summary (#19075)
Fixes https://github.com/coder/coder/issues/18875
2025-07-29 15:24:11 +01:00
Danielle Maywood 43b0bb7f61 feat(site): use websocket connection for devcontainer updates (#18808)
Instead of polling every 10 seconds, we instead use a WebSocket
connection for more timely updates.
2025-07-14 21:35:35 +01:00
Danielle Maywood f2d229eed3 fix!: use devcontainer ID when rebuilding a devcontainer (#18604)
This PR replaces the use of the **container** ID with the
**devcontainer** ID. This is a breaking change. This allows rebuilding a
devcontainer when there is no valid container ID.
2025-06-26 11:41:57 +01:00
Asher 0a483ea2b7 feat: add idle app status (#18415)
"Idle" is more accurate than "complete" since:

1. AgentAPI only knows if the screen is active; it has no way of knowing
    if the task is complete.
2. The LLM might be done with its current prompt, but that does not mean
    the task is complete either (it likely needs refinement).

The "complete" state will be reserved for future definition.

Additionally, in the case where the screen goes idle but the LLM never
reported a status update, we can get an idle icon without a message, and
it looks kinda janky in the UI so if there is no message I display the
state text.

Closes https://github.com/coder/internal/issues/699
2025-06-20 14:34:31 -08:00
Mathias Fredriksson 70723d3b51 fix(coderd): fix panics by always checking for non-nil request logger (#18228) 2025-06-12 13:50:50 +03:00
Mathias Fredriksson a18eb9d08f feat(site): allow recreating devcontainers and showing dirty status (#18049)
This change allows showing the devcontainer dirty status in the UI as
well as a recreate button to update the devcontainer.

Closes #16424
2025-05-27 19:42:24 +03:00
Mathias Fredriksson 0731304905 feat(agent/agentcontainers): recreate devcontainers concurrently (#18042)
This change introduces a refactor of the devcontainers recreation logic
which is now handled asynchronously rather than being request scoped.
The response was consequently changed from "No Content" to "Accepted" to
reflect this.

A new `Status` field was introduced to the devcontainer struct which
replaces `Running` (bool). This reflects that the devcontainer can now
be in various states (starting, running, stopped or errored).

The status field also protects against multiple concurrent recrations,
as long as they are initiated via the API.

Updates #16424
2025-05-26 18:30:52 +03:00
Mathias Fredriksson 98e2ec4417 feat: show devcontainer dirty status and allow recreate (#17880)
Updates #16424
2025-05-19 12:56:10 +03:00
Thomas Kosiewski 1bacd82e80 feat: add API key scope to restrict access to user data (#17692) 2025-05-15 15:32:52 +01:00
Sas Swart 425ee6fa55 feat: reinitialize agents when a prebuilt workspace is claimed (#17475)
This pull request allows coder workspace agents to be reinitialized when
a prebuilt workspace is claimed by a user. This facilitates the transfer
of ownership between the anonymous prebuilds system user and the new
owner of the workspace.

Only a single agent per prebuilt workspace is supported for now, but
plumbing has already been done to facilitate the seamless transition to
multi-agent support.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
2025-05-14 14:15:36 +02:00
Cian Johnston 2acf0adcf2 chore(codersdk/toolsdk): improve static analyzability of toolsdk.Tools (#17562)
* Refactors toolsdk.Tools to remove opaque `map[string]any` argument in
favour of typed args structs.
* Refactors toolsdk.Tools to remove opaque passing of dependencies via
`context.Context` in favour of a tool dependencies struct.
* Adds panic recovery and clean context middleware to all tools.
* Adds `GenericTool` implementation to allow keeping `toolsdk.All` with
uniform type signature while maintaining type information in handlers.
* Adds stricter checks to `patchWorkspaceAgentAppStatus` handler.
2025-04-29 16:05:23 +01:00
Mathias Fredriksson 1fc74f629e refactor(agent): update agentcontainers api initialization (#17600)
There were too many ways to configure the agentcontainers API resulting
in inconsistent behavior or features not being enabled. This refactor
introduces a control flag for enabling or disabling the containers API.
When disabled, all implementations are no-op and explicit endpoint
behaviors are defined. When enabled, concrete implementations are used
by default but can be overridden by passing options.
2025-04-29 17:53:10 +03:00
Michael Suchacz 06d39151dc feat: extend request logs with auth & DB info (#17304)
Closes #16903
2025-04-15 13:27:23 +02:00
Cian Johnston 979687c37f chore(codersdk): deprecate WorkspaceAppStatus.{NeedsUserAttention,Icon} (#17358)
https://github.com/coder/coder/pull/17163 introduced the
`workspace_app_statuses` table. Two of these fields
(`needs_user_attention`, `icon`) turned out to be surplus to
requirements.

- Removes columns `needs_user_attention` and `icon` from
`workspace_app_statuses`
- Marks the corresponding fields of `codersdk.WorkspaceAppStatus` as
deprecated.
2025-04-15 10:47:42 +01:00
Danny Kopping 0b18e458f4 fix: reduce excessive logging when database is unreachable (#17363)
Fixes #17045

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
2025-04-15 10:55:30 +02:00
Spike Curtis 12dc086628 feat: return hostname suffix on AgentConnectionInfo (#17334)
Adds the Hostname Suffix to `AgentConnectionInfo` --- the VPN provider will use it to control the suffix for DNS hostnames.

part of: #16828
2025-04-11 13:09:51 +04:00
Michael Suchacz ce22de8d15 feat: log long-lived connections acceptance (#17219)
Closes #16904
2025-04-08 08:30:05 +00:00
Kyle Carberry 8ea956fc11 feat: add app status tracking to the backend (#17163)
This does ~95% of the backend work required to integrate the AI work.

Most left to integrate from the tasks branch is just frontend, which
will be a lot smaller I believe.

The real difference between this branch and that one is the abstraction
-- this now attaches statuses to apps, and returns the latest status
reported as part of a workspace.

This change enables us to have a similar UX to in the tasks branch, but
for agents other than Claude Code as well. Any app can report status
now.
2025-03-31 10:55:44 -04:00
Michael Smith 9bc727e977 chore: add support for one-way websockets to backend (#16853)
Closes https://github.com/coder/coder/issues/16775

## Changes made
- Added `OneWayWebSocket` function that establishes WebSocket
connections that don't allow client-to-server communication
- Added tests for the new function
- Updated API endpoints to make new WS-based endpoints, and mark
previous SSE-based endpoints as deprecated
- Updated existing SSE handlers to use the same core logic as the new WS
handlers

## Notes
- Frontend changes handled via #16855
2025-03-28 17:13:20 -04:00
Jon Ayers 17ddee05e5 chore: update golang to 1.24.1 (#17035)
- Update go.mod to use Go 1.24.1
- Update GitHub Actions setup-go action to use Go 1.24.1
- Fix linting issues with golangci-lint by:
  - Updating to golangci-lint v1.57.1 (more compatible with Go 1.24.1)

🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <claude@anthropic.com>
2025-03-26 01:56:39 -05:00
Spike Curtis 117e4c2fe7 feat: adds device_id, device_os, and coder_desktop_version to telemetry (#17086)
Records the Device ID, Device OS and Coder Desktop version to telemetry.

These values are provided by the Coder Desktop client in the StartRequest method of the VPN protocol. We render them as an HTTP header to transmit to Coderd, where they are decoded and added to telemetry.
2025-03-25 15:26:05 +04:00
Spike Curtis e0ecc28638 feat: add telemetry to user-scoped tailnet API call (#17065)
Adds support for sending telemetry on calls to the User-scoped tailnet RPC endpoint. This is currently used only by Coder Desktop.

Later PRs will fill in the version, OS information, and device ID via HTTP headers.
2025-03-24 16:02:33 +04:00
Mathias Fredriksson 3ac844ad3d chore(codersdk): rename WorkspaceAgent(Dev)container structs (#16996)
This is to free up the devcontainer name space for more targeted
structs.

Updates #16423
2025-03-19 10:16:14 +00:00
Eng Zer Jun 04c33968cf refactor: replace golang.org/x/exp/slices with slices (#16772)
The experimental functions in `golang.org/x/exp/slices` are now
available in the standard library since Go 1.21.

Reference: https://go.dev/doc/go1.21#slices

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2025-03-04 00:46:49 +11:00
Ethan d50e846747 fix: block vpn tailnet endpoint when --browser-only is set (#16647)
The work on CoderVPN required a new user-scoped `/tailnet` endpoint for
coordinating with multiple workspace agents, and receiving workspace
updates. Much like the `/coordinate` endpoint, this needs to respect the
`CODER_BROWSER_ONLY`/`--browser-only` deployment config value.
2025-02-21 12:21:20 +11:00
Cian Johnston 31b1ff7d3b feat(agent): add container list handler (#16346)
Fixes https://github.com/coder/coder/issues/16268

- Adds `/api/v2/workspaceagents/:id/containers` coderd endpoint that allows listing containers
visible to the agent. Optional filtering by labels is supported.
- Adds go tools to the `coder-dylib` CI step so we can generate mocks if needed
2025-02-10 11:29:30 +00:00
Spike Curtis 2c7f8ac65f chore: migrate to coder/websocket 1.8.12 (#15898)
Migrates us to `coder/websocket` v1.8.12 rather than `nhooyr/websocket` on an older version.

Works around https://github.com/coder/websocket/issues/504 by adding an explicit test for `xerrors.Is(err, io.EOF)` where we were previously getting `io.EOF` from the netConn.
2024-12-19 00:51:30 +04:00
Spike Curtis 148a5a3593 fix: fix goroutine leak in log streaming over websocket (#15709)
fixes #14881

Our handlers for streaming logs don't read from the websocket. We don't allow the client to send us any data, but the websocket library we use requires reading from the websocket to properly handle pings and closing. Not doing so can [can cause the websocket to hang on write](https://github.com/coder/websocket/issues/405), leaking go routines which were noticed in #14881.

This fixes the issue, and in process refactors our log streaming to a encoder/decoder package which provides generic types for sending JSON over websocket.

I'd also like for us to upgrade to the latest https://github.com/coder/websocket but we should also upgrade our tailscale fork before doing so to avoid including two copies of the websocket library.
2024-12-03 10:12:30 +04:00
Ethan b1298a3c1e feat: add WorkspaceUpdates tailnet RPC (#14847)
Closes #14716
Closes #14717

Adds a new user-scoped tailnet API endpoint (`api/v2/tailnet`) with a new RPC stream for receiving updates on workspaces owned by a specific user, as defined in #14716. 

When a stream is started, the `WorkspaceUpdatesProvider` will begin listening on the user-scoped pubsub events implemented in #14964. When a relevant event type is seen (such as a workspace state transition), the provider will query the DB for all the workspaces (and agents) owned by the user. This gets compared against the result of the previous query to produce a set of workspace updates. 

Workspace updates can be requested for any user ID, however only workspaces the authorised user is permitted to `ActionRead` will have their updates streamed.
Opening a tunnel to an agent requires that the user can perform `ActionSSH` against the workspace containing it.
2024-11-01 14:53:53 +11:00
Ethan 31506e694b chore: send workspace pubsub events by owner id (#14964)
We currently send empty payloads to pubsub channels of the form `workspace:<workspace_id>` to notify listeners of updates to workspaces (such as for refreshing the workspace dashboard).

To support https://github.com/coder/coder/issues/14716, we'll instead send `WorkspaceEvent` payloads to pubsub channels of the form `workspace_owner:<owner_id>`. This enables a listener to receive events for all workspaces owned by a user.
This PR replaces the usage of the old channels without modifying any existing behaviors.

```
type WorkspaceEvent struct {
	Kind        WorkspaceEventKind `json:"kind"`
	WorkspaceID uuid.UUID          `json:"workspace_id" format:"uuid"`
	// AgentID is only set for WorkspaceEventKindAgent* events
	// (excluding AgentTimeout)
	AgentID *uuid.UUID `json:"agent_id,omitempty" format:"uuid"`
}
```

We've defined `WorkspaceEventKind`s based on how the old channel was used, but it's not yet necessary to inspect the types of any of the events, as the existing listeners are designed to fire off any of them.

```
WorkspaceEventKindStateChange     WorkspaceEventKind = "state_change"
WorkspaceEventKindStatsUpdate     WorkspaceEventKind = "stats_update"
WorkspaceEventKindMetadataUpdate  WorkspaceEventKind = "mtd_update"
WorkspaceEventKindAppHealthUpdate WorkspaceEventKind = "app_health"

WorkspaceEventKindAgentLifecycleUpdate  WorkspaceEventKind = "agt_lifecycle_update"
WorkspaceEventKindAgentLogsUpdate       WorkspaceEventKind = "agt_logs_update"
WorkspaceEventKindAgentConnectionUpdate WorkspaceEventKind = "agt_connection_update"
WorkspaceEventKindAgentLogsOverflow     WorkspaceEventKind = "agt_logs_overflow"
WorkspaceEventKindAgentTimeout          WorkspaceEventKind = "agt_timeout"
```
2024-11-01 14:17:05 +11:00
Jon Ayers cd890aa3a0 feat: enable key rotation (#15066)
This PR contains the remaining logic necessary to hook up key rotation
to the product.
2024-10-25 17:14:35 +01:00
Steven Masley 343f8ec9ab chore: join owner, template, and org in new workspace view (#15116)
Joins in fields like `username`, `avatar_url`, `organization_name`,
`template_name` to `workspaces` via a **view**. 
The view must be maintained moving forward, but this prevents needing to
add RBAC permissions to fetch related workspace fields.
2024-10-22 09:20:54 -05:00
Danielle Maywood ae522c558d feat: add agent timings (#14713)
* feat: begin impl of agent script timings

* feat: add job_id and display_name to script timings

* fix: increment migration number

* fix: rename migrations from 251 to 254

* test: get tests compiling

* fix: appease the linter

* fix: get tests passing again

* fix: drop column from correct table

* test: add fixture for agent script timings

* fix: typo

* fix: use job id used in provisioner job timings

* fix: increment migration number

* test: behaviour of script runner

* test: rewrite test

* test: does exit 1 script break things?

* test: rewrite test again

* fix: revert change

Not sure how this came to be, I do not recall manually changing
these files.

* fix: let code breathe

* fix: wrap errors

* fix: justify nolint

* fix: swap require.Equal argument order

* fix: add mutex operations

* feat: add 'ran_on_start' and 'blocked_login' fields

* fix: update testdata fixture

* fix: refer to agent_id instead of job_id in timings

* fix: JobID -> AgentID in dbauthz_test

* fix: add 'id' to scripts, make timing refer to script id

* fix: fix broken tests and convert bug

* fix: update testdata fixtures

* fix: update testdata fixtures again

* feat: capture stage and if script timed out

* fix: update migration number

* test: add test for script api

* fix: fake db query

* fix: use UTC time

* fix: ensure r.scriptComplete is not nil

* fix: move err check to right after call

* fix: uppercase sql

* fix: use dbtime.Now()

* fix: debug log on r.scriptCompleted being nil

* fix: ensure correct rbac permissions

* chore: remove DisplayName

* fix: get tests passing

* fix: remove space in sql up

* docs: document ExecuteOption

* fix: drop 'RETURNING' from sql

* chore: remove 'display_name' from timing table

* fix: testdata fixture

* fix: put r.scriptCompleted call in goroutine

* fix: track goroutine for test + use separate context for reporting

* fix: appease linter, handle trackCommandGoroutine error

* fix: resolve race condition

* feat: replace timed_out column with status column

* test: update testdata fixture

* fix: apply suggestions from review

* revert: linter changes
2024-09-24 10:51:49 +01:00
Danielle Maywood 86f68b220e feat: add 'display_name' column to 'workspace_agent_scripts' (#14747)
* feat: add 'display_name' column to 'workspace_agent_scripts'

* fix: backfill from workspace_agent_log_sources

* fix: run 'make gen'
2024-09-20 14:26:13 +01:00
Spike Curtis 5bd19f8ba3 fix: fix flake in TestWorkspaceAgentClientCoordinate_ResumeToken (#14642)
fixes #14365

I bet what's going on is that in `connectToCoordinatorAndFetchResumeToken()` we call `Coordinate()`, send a message on the `Coordinate` client and then close it in rapid succession. We don't wait around for a response from the coordinator, so dRPC is likely aborting the call `Coordinate()` in the backend because the stream is closed before it even gets a chance.

Instead of using the Coordinator to record the peer ID assigned on the API call, we can wrap the resume token provider, since we call that API _and_ wait for a response. This also affords the opportunity to directly assert we get called with the right token.
2024-09-11 16:32:47 +04:00
Dean Sheather cf8be4eac5 feat: add resume support to coordinator connections (#14234) 2024-08-20 17:16:49 +10:00
Ethan dd243686e4 chore!: remove deprecated agent v1 routes (#13486) 2024-06-11 12:22:59 +10:00
Colin Adler 9d00a26a90 fix: add missing route for codersdk.PostLogSource (#13421) 2024-06-03 12:29:50 -05:00
Steven Masley 24ba81930b chore: return failed refresh errors on external auth as string (was boolean) (#13402)
* chore: return failed refresh errors on external auth

Failed refreshes should return errors. These errors are captured
as validate errors.
2024-06-03 09:33:49 -05:00
Garrett Delfosse 5789ea5397 chore: move stat reporting into workspacestats package (#13386) 2024-05-29 11:49:08 -04:00