Commit Graph

55 Commits

Author SHA1 Message Date
Danielle Maywood 911d734df9 fix: avoid re-using AuthInstanceID for sub agents (#22196)
Parent agents were re-using AuthInstanceID when spawning child agents.
This caused GetWorkspaceAgentByInstanceID to return the most recently
created sub agent instead of the parent when the parent tried to refetch
its own manifest.

Fix by not reusing AuthInstanceID for sub agents, and updating
GetWorkspaceAgentByInstanceID to filter them out entirely.
2026-02-19 16:56:29 +00:00
Danielle Maywood af0e171595 feat(coderd/agentapi): support terraform-defined subagent ids (#21837)
Update `coderd/agentapi` to handle pre-created sub agents
2026-02-04 15:33:48 +00:00
Callum Styan e195856c43 perf: reduce pg_notify call volume by batching together agent metadata updates (#21330)
---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-22 22:47:49 -08:00
Mathias Fredriksson 97e8a5b093 fix(coderd): allow agent auth during workspace shutdown (#21538)
Agents were losing authentication during workspace shutdown, causing
shutdown scripts to fail. The auth query required agents to belong to
the latest build, but during shutdown a `stop` build becomes latest while
the `start` build's agents are still running.

Modified the auth query to allow `start` build agents to authenticate
temporarily during `stop` execution. The query allows auth when:

- Agent's `start` build job succeeded
- Latest build is `stop` with `pending`/`running` job status
- Builds are adjacent (`stop` is `build_number + 1`)
- Template versions match

Auth closes once `stop` completes.

Renamed `GetWorkspaceAgentAndLatestBuildByAuthToken` to
`GetAuthenticatedWorkspaceAgentAndBuildByAuthToken` since it returns the
agent's build (not always latest) during shutdown.

Closes coder/internal#1249
Fixes #19467
2026-01-21 13:18:43 +00:00
Cian Johnston 08343a7a9f perf: reduce number of queries made by /api/v2/workspaceagents/{id} (#21522)
Relates to https://github.com/coder/internal/issues/1214

The `ExtractWorkspaceAgentParam` middleware ends up making 4 database
queries to follow the chain of `WorkspaceAgent` -> `WorkspaceResource`
-> `ProvisionerJob` -> `WorkspaceBuild` -- but then dropping all that
hard work on the floor. The `api.workspaceAgent` handler that references
this middleware then has to do all of that work again, plus one more
query to get the related `User` so we can get the username. This pattern
is also mirrored in `getDatabaseTerminal` but without the middleware.

This PR:
* Adds a new query `GetWorkspaceAgentAndWorkspaceByID` to fetch all
this information at once to avoid the multiple round-trips,
* Updates the existing usage of `GetWorkspaceAgentByID` to this new
query instead,
* Updates `ExtractWorkspaceAgentParam` to also store the workspace in
the request context

Dalibo: [0.63ms](https://explain.dalibo.com/plan/40bb597f3539gc6c)
2026-01-19 12:36:33 +00:00
Mathias Fredriksson ff46917e62 feat: add retention config for workspace_agent_logs (#21039)
Replace hardcoded 7-day retention for workspace agent logs with
configurable retention from deployment settings. Defaults to 7d to
preserve existing behavior.

Depends on #21038
Updates #20743
2025-12-02 16:01:33 +00:00
Mathias Fredriksson 7ae3fdc749 refactor: use task data model for notifications (#20590)
Updates coder/internal#973
Updates coder/internal#974
2025-10-31 15:53:27 +02:00
Callum Styan 141ef23c81 fix: introduce dedicated queries for workspaces and workspace agents metrics (#19786)
aid in differentiation between sources of calls to `GetWorkspaces` but introducing new queries for metrics specific use cases

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-10-17 13:40:10 -07:00
Mathias Fredriksson 511fd09582 fix(coderd): mark sub agent deletion via boolean instead of delete (#18411)
Deletion of data is uncommon in our database, so the introduction of sub agents
and the deletion of them introduced issues with foreign key assumptions, as can
be seen in coder/internal#685. We could have only addressed the specific case by
allowing cascade deletion of stats as well as handling in the stats collector,
but it's unclear how many more such edge-cases we could run into.

In this change, we mark the rows as deleted via boolean instead, and filter them
out in all relevant queries.

Fixes coder/internal#685
2025-06-19 13:32:51 +00:00
Danielle Maywood b712d0b23f feat(coderd/agentapi): implement sub agent api (#17823)
Closes https://github.com/coder/internal/issues/619

Implement the `coderd` side of the AgentAPI for the upcoming
dev-container agents work.

`agent/agenttest/client.go` is left unimplemented for a future PR
working to implement the agent side of this feature.
2025-05-29 12:15:47 +01:00
Thomas Kosiewski 1bacd82e80 feat: add API key scope to restrict access to user data (#17692) 2025-05-15 15:32:52 +01:00
Sas Swart 425ee6fa55 feat: reinitialize agents when a prebuilt workspace is claimed (#17475)
This pull request allows coder workspace agents to be reinitialized when
a prebuilt workspace is claimed by a user. This facilitates the transfer
of ownership between the anonymous prebuilds system user and the new
owner of the workspace.

Only a single agent per prebuilt workspace is supported for now, but
plumbing has already been done to facilitate the seamless transition to
multi-agent support.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
2025-05-14 14:15:36 +02:00
Danielle Maywood 0b5f27f566 feat: add parent_id column to workspace_agents table (#17758)
Adds a new nullable column `parent_id` to `workspace_agents` table. This
lays the groundwork for having child agents.
2025-05-13 00:01:31 +01:00
Danielle Maywood 5762d8add4 fix: return only the first workspace agent script timing per script (#16203)
Fixes https://github.com/coder/coder/issues/16124

If a workspace agent crashes, it is possible for any startup scripts to
be ran again. This PR makes it so that the
`GetWorkspaceAgentScriptTimingsByBuildID` query only returns the first
timing recorded per-script.
2025-01-21 11:54:43 +00:00
Bruno Quaresma e232aee011 feat(site): add agent connection timings (#15276)
Local preview:

<img width="1260" alt="Screenshot 2024-10-29 at 16 16 01"
src="https://github.com/user-attachments/assets/10fdb20d-1f2a-4b0a-a8a1-171050ee620d">


Close https://github.com/coder/internal/issues/116

---------

Co-authored-by: Danny Kopping <danny@coder.com>
2024-11-01 13:29:00 -03:00
Bruno Quaresma 9c8ecb82a3 feat(coderd): return agent script timings (#14923)
Add the agent script timings into the
`/workspacebuilds/:workspacebuild/timings` response.

Close https://github.com/coder/coder/issues/14876
2024-10-14 09:31:03 -03:00
Danielle Maywood ae522c558d feat: add agent timings (#14713)
* feat: begin impl of agent script timings

* feat: add job_id and display_name to script timings

* fix: increment migration number

* fix: rename migrations from 251 to 254

* test: get tests compiling

* fix: appease the linter

* fix: get tests passing again

* fix: drop column from correct table

* test: add fixture for agent script timings

* fix: typo

* fix: use job id used in provisioner job timings

* fix: increment migration number

* test: behaviour of script runner

* test: rewrite test

* test: does exit 1 script break things?

* test: rewrite test again

* fix: revert change

Not sure how this came to be, I do not recall manually changing
these files.

* fix: let code breathe

* fix: wrap errors

* fix: justify nolint

* fix: swap require.Equal argument order

* fix: add mutex operations

* feat: add 'ran_on_start' and 'blocked_login' fields

* fix: update testdata fixture

* fix: refer to agent_id instead of job_id in timings

* fix: JobID -> AgentID in dbauthz_test

* fix: add 'id' to scripts, make timing refer to script id

* fix: fix broken tests and convert bug

* fix: update testdata fixtures

* fix: update testdata fixtures again

* feat: capture stage and if script timed out

* fix: update migration number

* test: add test for script api

* fix: fake db query

* fix: use UTC time

* fix: ensure r.scriptComplete is not nil

* fix: move err check to right after call

* fix: uppercase sql

* fix: use dbtime.Now()

* fix: debug log on r.scriptCompleted being nil

* fix: ensure correct rbac permissions

* chore: remove DisplayName

* fix: get tests passing

* fix: remove space in sql up

* docs: document ExecuteOption

* fix: drop 'RETURNING' from sql

* chore: remove 'display_name' from timing table

* fix: testdata fixture

* fix: put r.scriptCompleted call in goroutine

* fix: track goroutine for test + use separate context for reporting

* fix: appease linter, handle trackCommandGoroutine error

* fix: resolve race condition

* feat: replace timed_out column with status column

* test: update testdata fixture

* fix: apply suggestions from review

* revert: linter changes
2024-09-24 10:51:49 +01:00
Cian Johnston 0f8251be41 feat(coderd/database/dbpurge): retain most recent agent build logs (#14460)
Updates the `DeleteOldWorkspaceAgentLogs` to:
- Retain logs for the most recent build regardless of age,
- Delete logs for agents that never connected and were created before
   the cutoff for deleting logs while still retaining the logs most recent build.
2024-08-30 17:39:09 +01:00
Cian Johnston a74273f1fd chore(coderd/database/dbpurge): replace usage of time.* with quartz (#14480)
Related to #10576

This PR introduces quartz to coderd/database/dbpurge and updates the following unit tests to make use of Quartz's functionality:

- TestPurge
- TestDeleteOldWorkspaceAgentLogs

Additionally, updates DeleteOldWorkspaceAgentLogs to replace the hard-coded interval with a parameter passed into the query. This aids in testing and brings us a step towards allowing operators to configure the cutoff interval for workspace agent logs.
2024-08-30 11:55:47 +01:00
Mathias Fredriksson 79441e3609 perf(coderd/database): optimize GetWorkspaceAgentAndLatestBuildByAuthToken (#12809) 2024-03-28 19:38:16 +02:00
Garrett Delfosse 0723dd3abf fix: ensure agent token is from latest build in middleware (#12443) 2024-03-14 12:27:32 -04:00
Marcin Tojek 7a453608c9 feat: support order property of coder_agent (#12121) 2024-02-15 13:33:13 +01:00
Marcin Tojek c0e169ebf9 feat: support custom order of agent metadata (#12066) 2024-02-08 17:29:34 +01:00
Steven Masley fe867d02e0 fix: correct perms for forbidden error in TemplateScheduleStore.Load (#11286)
* chore: TemplateScheduleStore.Load() throwing forbidden error
* fix: workspace agent scope to include template
2023-12-20 11:38:49 -06:00
Garrett Delfosse 228cbec99b fix: stop updating agent stats from deleted workspaces (#11026)
Co-authored-by: Steven Masley <stevenmasley@gmail.com>
2023-12-07 13:55:29 -05:00
Spike Curtis a7c671ca07 feat: add workspace agent APIVersion (#10419)
Fixes #10339
2023-10-31 10:08:43 +04:00
Mathias Fredriksson 7eeba15d16 feat(coderd): add support for sending batched agent metadata (#10223)
Part of #9782
2023-10-13 16:37:55 +03:00
Kyle Carberry 1262eef2c0 feat: add support for coder_script (#9584)
* Add basic migrations

* Improve schema

* Refactor agent scripts into it's own package

* Support legacy start and stop script format

* Pipe the scripts!

* Finish the piping

* Fix context usage

* It works!

* Fix sql query

* Fix SQL query

* Rename `LogSourceID` -> `SourceID`

* Fix the FE

* fmt

* Rename migrations

* Fix log tests

* Fix lint err

* Fix gen

* Fix story type

* Rename source to script

* Fix schema jank

* Uncomment test

* Rename proto to TimeoutSeconds

* Fix comments

* Fix comments

* Fix legacy endpoint without specified log_source

* Fix non-blocking by default in agent

* Fix resources tests

* Fix dbfake

* Fix resources

* Fix linting I think

* Add fixtures

* fmt

* Fix startup script behavior

* Fix comments

* Fix context

* Fix cancel

* Fix SQL tests

* Fix e2e tests

* Interrupt on Windows

* Fix agent leaking script process

* Fix migrations

* Fix stories

* Fix duplicate logs appearing

* Gen

* Fix log location

* Fix tests

* Fix tests

* Fix log output

* Show display name in output

* Fix print

* Return timeout on start context

* Gen

* Fix fixture

* Fix the agent status

* Fix startup timeout msg

* Fix command using shared context

* Fix timeout draining

* Change signal type

* Add deterministic colors to startup script logs

---------

Co-authored-by: Muhammad Atif Ali <atif@coder.com>
2023-09-25 16:47:17 -05:00
Jon Ayers ee24260614 feat: allow configuring display apps from template (#9100) 2023-08-30 14:53:42 -05:00
Cian Johnston 5d4a17717f refactor(coderd): fetch owner information when authorizing workspace agent (#9123)
* Refactors the existing httpmw tests to use dbtestutil so that we can test them against a real database if desired,
* Modifies the GetWorkspaceAgentByAuthToken to return the owner and associated roles, removing the need for additional queries
2023-08-21 15:49:26 +01:00
Dean Sheather 07fd73c4a0 chore: allow multiple agent subsystems, add exectrace (#8933) 2023-08-08 22:10:28 -07:00
Kyle Carberry bd944e0d21 chore: rename startup logs to agent logs (#8649)
* chore: rename startup logs to agent logs

This also adds a `source` property to every agent log. It
should allow us to group logs and display them nicer in
the UI as they stream in.

* Fix migration order

* Fix naming

* Rename the frontend

* Fix tests

* Fix down migration

* Match enums for workspace agent logs

* Fix inserting log source

* Fix migration order

* Fix logs tests

* Fix psql insert
2023-07-28 15:57:23 +00:00
Mathias Fredriksson 8dac0356ed refactor: replace startup script logs EOF with starting/ready time (#8082)
This commit reverts some of the changes in #8029 and implements an
alternative method of keeping track of when the startup script has ended
and there will be no more logs.

This is achieved by adding new agent fields for tracking when the agent
enters the "starting" and "ready"/"start_error" lifecycle states. The
timestamps simplify logic since we don't need understand if the current
state is before or after the state we're interested in. They can also be
used to show data like how long the startup script took to execute. This
also allowed us to remove the EOF field from the logs as the
implementation was problematic when we returned the EOF log entry in the
response since requesting _after_ that ID would give no logs and the API
would thus lose track of EOF.
2023-06-20 14:41:55 +03:00
Mathias Fredriksson 0c5077464b fix: avoid missed logs when streaming startup logs (#8029)
* feat(coderd,agent): send startup log eof at the end

* fix(coderd): fix edge case in startup log pubsub

* fix(coderd): ensure startup logs are closed on lifecycle state change (fallback)

* fix(codersdk): fix startup log channel shared memory bug

* fix(site): remove the EOF log line
2023-06-16 17:14:22 +03:00
Mathias Fredriksson 660bbb8d38 refactor: deprecate login_before_ready in favor of startup_script_behavior (#7837)
Fixes #7758
2023-06-06 11:58:07 +03:00
Jon Ayers 00a2413c03 feat: add telemetry support for workspace agent subsystem (#7579) 2023-05-17 22:49:25 -05:00
Kyle Carberry 81e2b2500a feat: add level support for startup logs (#7067)
This allows external services like our devcontainer support to display
errors and warnings with custom styles to indicate failures to users.
2023-04-10 14:29:59 -05:00
Ammar Bandukwala ca4fa81570 feat: add agent metadata (#6614) 2023-03-31 15:26:19 -05:00
Dean Sheather 665b84de0d feat: use app tickets for web terminal (#6628) 2023-03-30 23:24:51 +10:00
Kyle Carberry cb7375450b feat: add startup script logs to the ui (#6558)
* Add startup script logs to the database

* Add coderd endpoints for startup script logs

* Push startup script logs from agent

* Pull startup script logs on frontend

* Rename queries

* Add constraint

* Start creating log sending loop

* Add log sending to the agent

* Add tests for streaming logs

* Shorten notify channel name

* Add FE

* Improve bulk log performance

* Finish UI display

* Fix startup log visibility

* Add warning for overflow

* Fix agent queue logs overflow

* Display staartup logs in a virtual DOM for performance

* Fix agent queue with loads of logs

* Fix authorize test

* Remove faulty test

* Fix startup and shutdown reporting error

* Fix gen

* Fix comments

* Periodically purge old database entries

* Add test fixture for migration

* Add Storybook

* Check if there are logs when displaying features

* Fix startup component overflow gap

* Fix startup log wrapping

---------

Co-authored-by: Asher <ash@coder.com>
2023-03-23 14:09:13 -05:00
Mathias Fredriksson 22e3ff96be feat(agent): Add shutdown lifecycle states and shutdown_script support (#6139)
* feat(api): Add agent shutdown lifecycle states

* feat(agent): Add shutdown_script support

* feat(agent): Add shutdown_script timeout

* feat(site): Support new agent lifecycle states

---

Co-authored-by: Marcin Tojek <marcin@coder.com>
2023-03-06 21:34:00 +02:00
Kyle Carberry 691495d761 feat: add expanded_directory to the agent for extension support (#6087)
This will enable opening the default `dir` of an agent in
the VS Code extension!
2023-02-07 21:35:09 +00:00
Mathias Fredriksson 981cac5e28 chore: Invert delay_login_until_ready, now login_before_ready (#5893) 2023-01-27 20:07:47 +00:00
Mathias Fredriksson 138887de7e feat: Add workspace agent lifecycle state reporting (#5785) 2023-01-24 14:24:27 +02:00
Mathias Fredriksson eff99f78fa feat: Add support for MOTD file in coder agents (#5147) 2022-11-24 12:22:20 +00:00
Mathias Fredriksson 90c34b74de feat: Add connection_timeout and troubleshooting_url to agent (#4937)
* feat: Add connection_timeout and troubleshooting_url to agent

This commit adds the connection timeout and troubleshooting url fields
to coder agents.

If an initial connection cannot be established within connection timeout
seconds, then the agent status will be marked as `"timeout"`.

The troubleshooting URL will be present, if configured in the Terraform
template, it can be presented to the user when the agent state is either
`"timeout"` or `"disconnected"`.

Fixes #4678
2022-11-09 17:27:05 +02:00
Kyle Carberry 5be6c7071e feat: Associate connected workspace agents with replicas (#4914)
This will enable displaying a graph that associates agents
to running replicas.
2022-11-06 15:27:09 -06:00
Kyle Carberry 9bd83e5ec7 feat: Add Tailscale networking (#3505)
* fix: Add coder user to docker group on installation

This makes for a simpler setup, and reduces the likelihood
a user runs into a strange issue.

* Add wgnet

* Add ping

* Add listening

* Finish refactor to make this work

* Add interface for swapping

* Fix conncache with interface

* chore: update gvisor

* fix tailscale types

* linting

* more linting

* Add coordinator

* Add coordinator tests

* Fix coordination

* It compiles!

* Move all connection negotiation in-memory

* Migrate coordinator to use net.conn

* Add closed func

* Fix close listener func

* Make reconnecting PTY work

* Fix reconnecting PTY

* Update CI to Go 1.19

* Add CLI flags for DERP mapping

* Fix Tailnet test

* Rename ConnCoordinator to TailnetCoordinator

* Remove print statement from workspace agent test

* Refactor wsconncache to use tailnet

* Remove STUN from unit tests

* Add migrate back to dump

* chore: Upgrade to Go 1.19

This is required as part of #3505.

* Fix reconnecting PTY tests

* fix: update wireguard-go to fix devtunnel

* fix migration numbers

* linting

* Return early for status if endpoints are empty

* Update cli/server.go

Co-authored-by: Colin Adler <colin1adler@gmail.com>

* Update cli/server.go

Co-authored-by: Colin Adler <colin1adler@gmail.com>

* Fix frontend entites

* Fix agent bicopy

* Fix race condition for the last node

* Fix down migration

* Fix connection RBAC

* Fix migration numbers

* Fix forwarding TCP to a local port

* Implement ping for tailnet

* Rename to ForceHTTP

* Add external derpmapping

* Expose DERP region names to the API

* Add global option to enable Tailscale networking for web

* Mark DERP flags hidden while testing

* Update DERP map on reconnect

* Add close func to workspace agents

* Fix race condition in upstream dependency

* Fix feature columns race condition

Co-authored-by: Colin Adler <colin1adler@gmail.com>
2022-08-31 20:09:44 -05:00
Cian Johnston 5362f4636e feat: show agent version in UI and CLI (#3709)
This commit adds the ability for agents to set their version upon start.
This is then reported in the UI and CLI.
2022-08-31 16:33:50 +01:00
Abhineet Jain 9df6bc7ba1 fix: update template updated_at value (#2729)
* fix: update template updated_at value

* use Go time for all updated_at updates
2022-06-30 12:14:51 +00:00