Commit Graph

263 Commits

Author SHA1 Message Date
Kacper Sawicki f016d9e505 fix(coderd): add role param to agent RPC to prevent false connectivity (#22052)
## Summary

coder-logstream-kube and other tools that use the agent token to connect
to the RPC endpoint were incorrectly triggering connection monitoring,
causing false connected/disconnected timestamps on the agent. This led
to VSCode/JetBrains disconnections and incorrect dashboard status.

## Changes

Add a `role` query parameter to `/api/v2/workspaceagents/me/rpc`:
- `role=agent`: triggers connection monitoring (default for the agent
SDK)
- any other value (e.g. `logstream-kube`): skips connection monitoring
- omitted: triggers monitoring for backward compatibility with older
agents

The agent SDK now sends `role=agent` by default. A new `Role` field on
the `agentsdk.Client` allows non-agent callers to specify a different
role.

## Required follow-up

coder-logstream-kube needs to set `client.Role = "logstream-kube"`
before calling `ConnectRPC20()`. Without that change, it will still send
`role=agent` and trigger monitoring.

Fixes #21625
2026-02-18 09:44:06 +01:00
Danielle Maywood 2de8cdf160 feat(agent): add subagent ID fields to devcontainers in manifest (#21848)
Update the agent protobuf schema (agent/proto/agent.proto) to include:
- subagent_id field in WorkspaceAgentDevcontainer message
- id field in CreateSubAgentRequest message

Bump the Agent API version from v2.7 to v2.8 and update all client
references throughout the codebase (ConnectRPC27 -> ConnectRPC28,
DRPCAgentClient27 -> DRPCAgentClient28).
2026-02-03 12:37:30 +00:00
Spike Curtis 3398833919 test: don't drop error on blank IP address in report (#21642)
fixes https://github.com/coder/internal/issues/1286

We can get blank IP address from the net connection if the client has
already disconnected, as was the case in this flake. Fix is to only log
error if we get something non-empty we can't parse.

---------

Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
2026-01-23 10:26:44 +00:00
Asher ff9ed91811 chore: move agent's file API into separate package (#21531)
This makes it so we can test it directly without having to go through
Tailnet, which appears to be causing flakes in CI where the requests
time out and never make it to the agent.

Takes inspiration from the container-related API endpoints.

Would probably make sense to refactor the ls tests to also go through
the API (rather than be internal tests like they are currently) but I
left those alone for now to keep the diff minimal.
2026-01-16 17:03:17 -09:00
Spike Curtis bddb808b25 chore: arrange imports in a standard way (#21452)
Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:

```
import (
	"context"
	"time"

	"github.com/prometheus/client_golang/prometheus"
	"golang.org/x/xerrors"
	"gopkg.in/natefinch/lumberjack.v2"

	"cdr.dev/slog/v3"
	"github.com/coder/coder/v2/codersdk/agentsdk"
	"github.com/coder/serpent"
)
```

3 groups: standard library, 3rd partly libs, Coder libs.

This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
2026-01-08 15:24:11 +04:00
Spike Curtis 49b34a716a fix: fix slog to always use array of Fields (#21426)
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).

It also updates dependencies that also use slog and were updated.

I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.

Other dependencies, I pushed new tags.
2026-01-08 10:29:41 +04:00
Zach 07924037e7 feat: add boundary log forwarding from agent to coderd (#21345)
Add agent forwarding of boundary audit logs from workspaces to coderd
via agent API, and re-emission of boundary logs to coderd stderr. This
change adds a server to the workspace agent that always listens on a
unix socket for boundary to connect and send audit logs.

coderd log format example:
```
[API] 2025-12-23 18:31:46.755 [info] coderd.agentrpc: boundary_request owner=.. workspace_name=.. agent_name=.. decision=.. workspace_id=.. http_method=.. http_url=.. event_time=.. request_id=..
```

Corresponding boundary PR: https://github.com/coder/boundary/pull/124
RFC:
https://www.notion.so/coderhq/Agent-Boundary-Logs-2afd579be59280f29629fc9823ac41ba
https://github.com/coder/coder/issues/21280
2025-12-31 16:38:19 -07:00
Zach 9d1493a13a feat: add initial API for boundary log forwarding to coderd (#21293)
Add the AgentAPI changes to support the feature that transmits boundary
logs from workspaces to coderd via the agent API for eventual re-emission to
stderr. The API handlers are stubs for now because I'm trying to land
this feature from multiple smaller PRs.

High level architecture:
- Boundary records resource access in batches and sends proto message to
agent
- Agent proxies messages to coderd **(captured by the API changes in
this PR)**
- coderd re-emits logs to stderr

RFC:
https://www.notion.so/coderhq/Agent-Boundary-Logs-2afd579be59280f29629fc9823ac41ba
2025-12-19 10:41:39 -07:00
Spike Curtis 71c6dc4043 fix: stop disconnecting from coderd early and record disconnect correctly (#21250)
fixes https://github.com/coder/internal/issues/1196

The above issue exposes two different bugs in Coder.

In the agent, there is a race where if the agent is closed while starting up networking, it will erroneously disconnect from Coderd, which delays or breaks writing final status and logs.

In Coderd, there is a bug where we don't properly record the latest agent disconnection time if the agent had previously disconnected. This causes us to report the agent status as "Connected" even after it has disconnected up until the inactivity timeout fires.

This PR fixes both issues.

It also slightly reworks when we send workspace updates based on connection and disconnection. Previously we would send two updates when the agent connected in certain circumstances, even though the status would be the same in both (only times changed).  Now we universally only send one on connect, and then another on disconnect.
2025-12-15 12:04:01 +04:00
Spike Curtis ce9e7ad909 fix(agent): ignore EOF errors during shutdown (#21187)
fixes: https://github.com/coder/internal/issues/1179

The problem in that flake is that dRPC doensn't consistently return
`context.Canceled` if you make an RPC call and then cancel it: sometimes
it returns EOF.

Without this PR, if we get an EOF on one of the routines that uses the
agentapi connection, we tear down the whole connection and reconnect to
coderd --- even if we are in the middle of a graceful shutdown.

What happened in the linked flake is that writing stats failed with EOF,
which then caused us to reconnect and write the lifecycle "SHUTTING
DOWN" twice.
2025-12-09 17:32:38 +04:00
Spike Curtis 40df21ed62 fix: fixes use of possibly nil RemoteAddr() and LocalAddr() return values (#21076)
fixes: https://github.com/coder/internal/issues/1143

Both gVisor and the Go standard library implementations of `net.Conn` can under certain circumstances return `nil` for `RemoteAddr()` and `LocalAddr()` calls. If we call their methods, we segfault.

This PR fixes these calls and adds ruleguard rules.

Note that `slog.F("remote_addr", conn.RemoteAddr())` is fine because slog detects the `nil` before attempting to stringify the type.
2025-12-03 15:06:00 +04:00
Sas Swart ce627bf23f feat: implement agent socket api, client and cli (#20758)
closes: https://github.com/coder/coder/issues/10352
closes: https://github.com/coder/internal/issues/1094
closes: https://github.com/coder/internal/issues/1095

In this pull request, we enable a new set of experimental cli commands
grouped under `coder exp sync`.
These commands allow any process acting within a coder workspace to
inform the coder agent of its requirements and execution progress. The
coder agent will then relay this information to other processes that
have subscribed.

These commands are:
```
# Check if this feature is enabled in your environment 
coder exp sync ping

# express that your unit depends on another
coder exp sync want <unit> <dependency_unit> 

# express that your unit intends to start a portion of the script that requires 
# other units to have completed first. This command blocks until all dependencies have been met
coder exp sync start <unit> 

# express that your unit has completes its work, allowing dependent units to begin their execution
coder exp sync complete <unit>
```

Example:

In order to automatically run claude code in a new workspace, it must
first have a git repository cloned. The scripts responsible for cloning
the repository and for running claude code would coordinate in the
following way:

```bash
# Script A: Claude code

# Inform the agent that the claude script wants the git script.
# That is, the git script must have completed before the claude script can begin its execution
coder exp sync want claude git

# Inform the agent that we would now like to begin execution of claude.
# This command will block until the git script (and any other defined dependencies)
# have completed
coder exp sync start claude

# Now we run claude code and any other commands we need
claude ...

# Once our script has completed, we inform the agent, so that any scripts that depend on this one
# may begin their execution

coder exp sync complete claude
```

```bash
# Script B: Git

# Because the git script does not have any dependencies, we can simply inform the agent that we 
# intend to start
coder exp sync start git

git clone ssh://git@github.com/coder/coder

# Once the repository have been cloned, we inform the agent that this script is complete, so that
# scripts that depend on it may begin their execution.
coder exp sync complete git
```

Notes:
* Unit names (ie. `claude` and `git`) given as input to the sync
commands are arbitrary strings. You do not have to conform to specific
identifiers. We recommend naming your scripts descriptively, but
succinctly.
* Scripts unit names should be well documented. Other scripts will need
to know the names you've chosen in order to depend on yours. Therefore,
you

---------

Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
2025-11-28 08:33:50 +02:00
Sas Swart 1d726c81bb fix: remove a sensitive field from an agent log line (#20968)
This PR removes a log field that could expose sensitive information in
agent logs for workspaces that pass such information to the agent via
its manifest.
2025-11-27 16:12:03 +02:00
Spike Curtis afd40436f0 fix: mock Agent querying OS for listening ports in tests (#20842)
fixes https://github.com/coder/internal/issues/1123

We want to tests that ports are not included after they are no longer used, but this isn't safe on the real OS networking stack because there is no way to guarantee a port _won't_ be used. Instead, we introduce an interface and fake implementation for testing.

On order to leave the filtering logic in the test path, this PR also does some refactoring.

Caching logic is left in the real OS querying implementation and a new test case is added for it in this PR.
2025-11-25 14:25:24 +04:00
Dean Sheather 6c99d5eca2 fix: avoid connection logging crashes in agent (#20307)
- Ignore errors when reporting a connection from the server, just log
them instead
- Translate connection log IP `localhost` to `127.0.0.1` on both the
server and the agent

Note that the temporary fix for converting invalid IPs to localhost is
not required in main since the database no longer forbids NULL for the
IP column since https://github.com/coder/coder/pull/19788

Relates to #20194
2025-10-16 01:56:43 +11:00
Spike Curtis 1354d84eb4 chore: refactor instance identity to be a SessionTokenProvider (#19566)
Refactors Agent instance identity to be a SessionTokenProvider.

Refactors the CLI to create Agent clients via a centralized function, rather than add-hoc via individual command handlers and their flags.

This allows commands besides `coder agent`, but which still use the agent identity, to support instance identity authentication.

Fixes #19111 by unifying all API requests to go thru the SessionTokenProvider for auth credentials.
2025-09-03 10:38:42 +04:00
Danielle Maywood f41275eb39 feat(agent/agentcontainers): auto detect dev containers (#18950)
Relates to https://github.com/coder/internal/issues/711

This PR implements a project discovery mechanism that searches for any
dev container projects and makes them visible in the UI so that they can
be started. To make the wording on the site more clear, "Rebuild" has
been changed to "Start" when there is no container associated with a
known dev container configuration. I've also made it so that site will
show the dev container config path when there is no other name
available.

### Design decisions

Just want to ensure my explanation for a few design decisions are noted
down:
- We only search for dev container configurations inside git
repositories
- We only search for these git repositories if they're at the top level
or a direct child of the agent directory.

This limited approach is to reduce the amount of files we ultimately
walk when trying to find these projects. It makes sense to limit it to
only the agent directory, although I'm open to expanding how deep we
search.
2025-07-22 19:02:43 +01:00
Dean Sheather a1b87a67c6 fix: use client preferred URL for the default DERP (#18911)
The agentsdk currently does a remap of the DERP map to change the
EmbeddedRelay node's URL to match the agent's access URL.

This PR makes changes to the `workspacesdk` (used by clients like the
CLI) and `vpn` (used by Coder Desktop) to match this behavior.

This enables us the ability to try Coder clients in dogfood over a VPN
without changing the global access URL.
2025-07-17 20:17:44 +10:00
Danielle Maywood 0118e75009 fix(agent): disable dev container integration inside sub agents (#18781)
It appears we accidentally broke this logic in a previous PR. This
should now correctly disable the agent api as we'd expect.
2025-07-08 11:05:30 +01:00
Mathias Fredriksson 0f3a1e9849 fix(agent/agentcontainers): split Init into Init and Start for early API responses (#18640)
Previously in #18635 we delayed the containers API `Init` to avoid producing
errors due to Docker and `@devcontainers/cli` not yet being installed by startup
scripts. This had an adverse effect on the UX via UI responsiveness as the
detection of devcontainers was greatly delayed.

This change splits `Init` into `Init` and `Start` so that we can immediately
after `Init` start serving known devcontainers (defined in Terraform), improving
the UX.

Related #18635
Related #18640
2025-06-27 19:01:50 +03:00
Mathias Fredriksson 8ee2668b39 fix(agent): fix script filtering for devcontainers (#18635) 2025-06-27 16:59:31 +03:00
Mathias Fredriksson 7e99fb7d7e fix(agent): delay containerAPI init to ensure startup scripts run before (#18630) 2025-06-27 14:10:35 +03:00
ケイラ 09cc906981 chore: remove unnecessary redeclarations in for loops (part 2) (#18593) 2025-06-26 12:28:00 -06:00
Mathias Fredriksson eca6381314 feat(agent/agentcontainers): add more envs to readconfig for app URL building (#18603) 2025-06-26 09:33:58 +00:00
Danielle Maywood c4e4fe85f9 fix(agent): start devcontainers through agentcontainers package (#18471)
Fixes https://github.com/coder/internal/issues/706

Context for the implementation here
https://github.com/coder/internal/issues/706#issuecomment-2990490282

Synchronously starts dev containers defined in terraform with our
`DevcontainerCLI` abstraction, instead of piggybacking off of our
`agentscripts` package. This gives us more control over logs, instead of
being reliant on packages which may or may not exist in the
user-provided image.
2025-06-25 11:52:50 +01:00
Mathias Fredriksson 99d124e276 feat(agent): enable devcontainers by default (#18533) 2025-06-24 21:17:04 +03:00
Danielle Maywood 118bf98145 chore(agent): add workspace owner env var and log dev container app failures (#18433)
Listen to feedback that was missed in
https://github.com/coder/coder/pull/18346

- Adds `CODER_WORKSPACE_OWNER_NAME` into the agent environment.
- Logs warnings for when dev container app creation fails.
2025-06-19 09:37:48 +01:00
Mathias Fredriksson d6df1f23a9 fix(agent/agentcontainers): update sub agent client on reconnect (#18399)
Fixes coder/internal#697
2025-06-17 13:58:09 +00:00
Mathias Fredriksson ae0c8701bb feat(agent): disable devcontainers for sub agents (#18303)
Updates coder/internal#621
Refs #18245
2025-06-10 10:47:02 +00:00
Mathias Fredriksson fca99174ad feat(agent/agentcontainers): implement sub agent injection (#18245)
This change adds support for sub agent creation and injection into dev
containers.

Updates coder/internal#621
2025-06-10 12:37:54 +03:00
Mathias Fredriksson 04e4f2fac0 chore(agent): update agent proto client (#18242) 2025-06-05 16:58:18 +03:00
Bruno Quaresma d779126ee3 chore: rollback PR #18081 (#18104)
Rollback https://github.com/coder/coder/pull/18081
2025-05-29 13:12:13 -03:00
Danielle Maywood b712d0b23f feat(coderd/agentapi): implement sub agent api (#17823)
Closes https://github.com/coder/internal/issues/619

Implement the `coderd` side of the AgentAPI for the upcoming
dev-container agents work.

`agent/agenttest/client.go` is left unimplemented for a future PR
working to implement the agent side of this feature.
2025-05-29 12:15:47 +01:00
Bruno Quaresma 2ec7404197 chore: make owner_name and owner_username consistent (#18081)
We've been using owner_name inconsistently as username. So this PR fixes
it by making the attribute naming more consistent.
2025-05-28 17:25:32 -03:00
Mathias Fredriksson d6c14f3d8a feat(agent/agentcontainers): update containers periodically (#17972)
This change introduces a significant refactor to the agentcontainers API
and enables periodic updates of Docker containers rather than on-demand.
Consequently this change also allows us to move away from using a
locking channel and replace it with a mutex, which simplifies usage.

Additionally a previous oversight was fixed, and testing added, to clear
devcontainer running/dirty status when the container has been removed.

Updates coder/coder#16424
Updates coder/internal#621
2025-05-22 19:44:33 +03:00
Danielle Maywood 61f22a59ba feat(agent): add ParentId to agent manifest (#17888)
Closes https://github.com/coder/internal/issues/648

This change introduces a new `ParentId` field to the agent's manifest.
This will allow an agent to know if it is a child or not, as well as
knowing who the owner is.

This is part of the Dev Container Agents work
2025-05-19 16:09:56 +01:00
Danielle Maywood 83df55700b revert(agent): remove CODER_AGENT_IS_SUB_AGENT cli flag (#17875)
The RFC has changed, this information will be passed through the
manifest instead.
2025-05-16 11:04:21 +00:00
Sas Swart 425ee6fa55 feat: reinitialize agents when a prebuilt workspace is claimed (#17475)
This pull request allows coder workspace agents to be reinitialized when
a prebuilt workspace is claimed by a user. This facilitates the transfer
of ownership between the anonymous prebuilds system user and the new
owner of the workspace.

Only a single agent per prebuilt workspace is supported for now, but
plumbing has already been done to facilitate the seamless transition to
multi-agent support.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
2025-05-14 14:15:36 +02:00
Danielle Maywood 7f056da088 feat: add hidden CODER_AGENT_IS_SUB_AGENT flag to coder agent (#17783)
Closes https://github.com/coder/internal/issues/620

Adds a new, hidden, flag `CODER_AGENT_IS_SUB_AGENT` to the `coder agent`
command.
2025-05-13 10:57:50 +01:00
Mathias Fredriksson 7af188bfc1 fix(agent): fix unexpanded devcontainer paths for agentcontainers (#17736)
Devcontainers were duplicated in the API because paths weren't
absolute, we now normalize them early on to keep it simple.

Updates #16424
2025-05-12 14:03:40 +03:00
Mathias Fredriksson 1fc74f629e refactor(agent): update agentcontainers api initialization (#17600)
There were too many ways to configure the agentcontainers API resulting
in inconsistent behavior or features not being enabled. This refactor
introduces a control flag for enabling or disabling the containers API.
When disabled, all implementations are no-op and explicit endpoint
behaviors are defined. When enabled, concrete implementations are used
by default but can be overridden by passing options.
2025-04-29 17:53:10 +03:00
Mathias Fredriksson 268a50c193 feat(agent/agentcontainers): add file watcher and dirty status (#17573)
Fixes coder/internal#479
Fixes coder/internal#480
2025-04-29 11:53:58 +03:00
Spike Curtis c1816e3674 fix(agent): fix deadlock if closed while starting listeners (#17329)
fixes #17328

Fixes a deadlock if we close the Agent in the middle of starting listeners on the tailnet.
2025-04-10 12:46:19 +04:00
Aaron Lehmann aa0a63a295 fix(agent): log correct error variable in createTailnet (#17283) 2025-04-07 16:32:52 +00:00
Aaron Lehmann e9863aba81 fix: log correct error on drpc connection close error (#17265) 2025-04-04 22:09:42 +03:00
Spike Curtis 42e5d71f59 fix: fix closeMutex unlock bug (#17259)
Fixes https://github.com/coder/internal/issues/550

Classic return before unlocking bug.
2025-04-04 14:29:56 +04:00
Spike Curtis f6bf6c6ec4 fix!: use names not IDs for agent SSH key seed (#17258)
Changes the SSH host key seeding to use the owner username, workspace name, and agent name. This prevents SSH from complaining about a mismatched host key if you use Coder Desktop to connect, and delete and recreate your workspace with the same name. Previously this would generate a different key because the workspace ID changed.

We also include the owner's username in anticipation of using Coder Desktop to access shared workspaces (or as a superuser) down the road, so that workspaces with the same name owned by different users will not have the same key.

This change is **BREAKING** in a limited sense that early access users of Coder Desktop will see their SSH clients complain about host keys changing the first time each workspace is rebuilt with this code. It can be resolved by clearing your `.ssh/known_hosts` file of the Coder workspaces you access this way.
2025-04-04 12:51:46 +04:00
Mathias Fredriksson b61f0ab958 fix(agent): ensure SSH server shutdown with process groups (#17227)
Fix hanging workspace shutdowns caused by orphaned SSH child processes.
Key changes:

- Create process groups for non-PTY SSH sessions
- Send SIGHUP to entire process group for proper termination
- Add 5-second timeout to prevent indefinite blocking

Fixes #17108
2025-04-03 16:01:43 +03:00
Mathias Fredriksson 7d4b3c8634 feat(agent): add devcontainer autostart support (#17076)
This change adds support for devcontainer autostart in workspaces. The
preconditions for utilizing this feature are:

1. The `coder_devcontainer` resource must be defined in Terraform
2. By the time the startup scripts have completed,
	- The `@devcontainers/cli` tool must be installed
	- The given workspace folder must contain a devcontainer configuration

Example Terraform:

```tf
resource "coder_devcontainer" "coder" {
  agent_id         = coder_agent.main.id
  workspace_folder = "/home/coder/coder"
  config_path      = ".devcontainer/devcontainer.json" # (optional)
}
```

Closes #16423
2025-03-27 12:31:30 +02:00
Danielle Maywood 1bbbae8d57 chore: migrate to github.com/coder/clistat (#17107)
Migrate from in-tree `clistat` package to
https://github.com/coder/clistat.
2025-03-26 10:36:53 +00:00