Commit Graph

30 Commits

Author SHA1 Message Date
Ethan 181e103201 fix: reuse shared tailnet for coderd-hosted MCP workspace tools (#24460)
## Problem

Coderd can expose an MCP server at `/api/experimental/mcp/http` (we have
this enabled on dogfood). Its workspace tools dialed agents through a
per-call client-side tailnet stack. Every tool call re-created a
WireGuard device, netstack, magicsock + UDP sockets, DERP connection,
coordinator websocket, and their goroutines — in a process that already
runs a long-lived shared tailnet. The duplicate stacks drove up resource
usage under load.

## Fix

Route this server's tool calls through the existing shared tailnet, so
none of those transports are reconstructed per call. Closing an
`AgentConn` now releases a tunnel reference instead of tearing down a
transport.

## Potential follow-up

`coder exp mcp server` still builds a fresh tailnet per call. It pays
per-call latency and causes coordinator/DERP churn. A shared CLI tailnet
is more involved — unlike coderd, the CLI has no existing shared tailnet
to reuse, so it would need a new long-lived client-side tailnet with
reconnect, sleep/wake, and idle-destination handling. There's less
motivation to optimize this, given the client-side MCP does not compete
for resources with coderd.

Closes CODAGT-199

> Generated by mux, but reviewed by a human
2026-04-21 11:37:10 +10:00
Spike Curtis 4c1a32cd7c feat: wire DERPTLSConfig through CLI, SDK, tailnet, VPN, agent, and health checks (#24435)
Wire DERPTLSConfig through the CLI, SDK, tailnet, VPN client, agent, and
health checks to allow custom TLS configuration for DERP connections.
The main use case is to be able to set a custom CA and also present
client certs (mTLS). See https://github.com/coder/tailscale/pull/105 for
related changes.

Adds three new global CLI flags:
- `--client-tls-ca-file` / `CODER_CLIENT_TLS_CA_FILE`
- `--client-tls-cert-file` / `CODER_CLIENT_TLS_CERT_FILE`
- `--client-tls-key-file` / `CODER_CLIENT_TLS_KEY_FILE`

Based on community PR #22695 by @ibdafna, with autogeneration issues
fixed (protobuf version mismatches in .pb.go files, golden file
regeneration, lint fixes).

> [!NOTE]
> This PR was authored by Coder Agents on behalf of a Coder team member.

<details>
<summary>Relationship to #22695</summary>

This is a clean reimplementation of the changes from #22695 on top of
current `main`, with the following differences:
- **Removed**: Accidental protobuf version changes in `.pb.go` files
(contributor had `protoc v6.33.4` vs project's `protoc v4.23.4`)
- **Added**: Properly regenerated golden files and docs via `make gen`
- **Fixed**: Lint issue (`var-declaration` revive warning on explicit
type in `createHTTPClient`)
- All meaningful code changes are identical to the original PR
</details>
2026-04-16 12:46:52 -04:00
Ethan 552f342a5b fix(codersdk): use header auth for non-browser websocket dials (#22461)
## Context
This commit is part of the fix for a downstream provider outage observed
during
`coderd_template` updates.

Observed downstream symptoms (terraform-provider-coderd):
- Template-version websocket log stream requests returned `401`:
  `GET /api/v2/templateversions/<id>/logs`.
- In older provider code (`waitForJob`), stream-init errors could
produce
`(nil, nil, err)` and then trigger a nil dereference when
`closer.Close()`
  was deferred before checking `err`.
- Net effect: template update path crashed instead of returning a
controlled
  provisioning error.

That provider panic is being hardened in the provider repo separately
(https://github.com/coder/terraform-provider-coderd/pull/308). This
commit addresses the upstream SDK auth mismatch that caused the
websocket `401`
side of the chain.

## Root cause

On deployments with host-prefixed cookie handling (dev.coder.com)
enabled
(`--host-prefix-cookie` / `EnableHostPrefix=true`), middleware rewrites
cookie
state to enforce prefixed auth cookies.

For non-browser websocket clients that still sent unprefixed
`coder_session_token` via cookie jars, this created an auth mismatch:
- cookie-based credential expected by the client path,
- but cookie normalization/stripping applied server-side,
- resulting in no usable token at auth extraction time.

## Fix in this commit

Apply the #22226 non-browser auth principle to remaining websocket
callsites in
`codersdk` by replacing cookie-jar session auth with header-token auth.

_Generated with mux but reviewed by a human_
2026-03-02 19:32:36 +11:00
Spike Curtis bddb808b25 chore: arrange imports in a standard way (#21452)
Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:

```
import (
	"context"
	"time"

	"github.com/prometheus/client_golang/prometheus"
	"golang.org/x/xerrors"
	"gopkg.in/natefinch/lumberjack.v2"

	"cdr.dev/slog/v3"
	"github.com/coder/coder/v2/codersdk/agentsdk"
	"github.com/coder/serpent"
)
```

3 groups: standard library, 3rd partly libs, Coder libs.

This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
2026-01-08 15:24:11 +04:00
Spike Curtis 49b34a716a fix: fix slog to always use array of Fields (#21426)
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).

It also updates dependencies that also use slog and were updated.

I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.

Other dependencies, I pushed new tags.
2026-01-08 10:29:41 +04:00
Spike Curtis 192c81e8f9 chore: refactor codersdk to use SessionTokenProvider (#19565)
Refactors `codersdk.Client`'s use of session tokens to use a `SessionTokenProvider`, which abstracts the obtaining and storing of the session token.

The main motiviation is to unify Agent authentication an an upstack PR, which can use cloud instance identity via token exchange, rather than a fixed session token.

However, the abstraction could also allow functionality like obtaining the session token from other external sources like the OS credential manager, or an external secret/key management system like Vault.
2025-08-29 10:41:32 +02:00
Danielle Maywood 5e84d257b7 refactor: convert workspacesdk.AgentConn to an interface (#19392)
Fixes https://github.com/coder/internal/issues/907

We convert `workspacesdk.AgentConn` to an interface and generate a mock
for it. This allows writing `coderd` tests that rely on the agent's HTTP
api to not have to set up an entire tailnet networking stack.
2025-08-20 10:00:44 +01:00
Dean Sheather a1b87a67c6 fix: use client preferred URL for the default DERP (#18911)
The agentsdk currently does a remap of the DERP map to change the
EmbeddedRelay node's URL to match the agent's access URL.

This PR makes changes to the `workspacesdk` (used by clients like the
CLI) and `vpn` (used by Coder Desktop) to match this behavior.

This enables us the ability to try Coder clients in dogfood over a VPN
without changing the global access URL.
2025-07-17 20:17:44 +10:00
Spike Curtis 3b54254177 feat: add coder connect exists hidden subcommand (#17418)
Adds a new hidden subcommand `coder connect exists <hostname>` that checks if the name exists via Coder Connect. This will be used in SSH config to match only if Coder Connect is unavailable for the hostname in question, so that the SSH client will directly dial the workspace over an existing Coder Connect tunnel.

Also refactors the way we inject a test DNS resolver into the lookup functions so that we can test from outside the `workspacesdk` package.
2025-04-17 11:23:24 +04:00
Spike Curtis e5ce3824ca feat: add IsCoderConnectRunning to workspacesdk (#17361)
Adds `IsCoderConnectRunning()` to the workspacesdk. This will support the `coder` CLI being able to use CoderConnect when it's running.

part of #16828
2025-04-14 09:47:46 +04:00
Spike Curtis 2c573dc023 feat: vpn uses WorkspaceHostnameSuffix for DNS names (#17335)
Use the hostname suffix to set DNS names as programmed into the DNS service and returned by the vpn `Tunnel`.

part of: #16828
2025-04-11 13:24:20 +04:00
Spike Curtis 12dc086628 feat: return hostname suffix on AgentConnectionInfo (#17334)
Adds the Hostname Suffix to `AgentConnectionInfo` --- the VPN provider will use it to control the suffix for DNS hostnames.

part of: #16828
2025-04-11 13:09:51 +04:00
Jon Ayers 17ddee05e5 chore: update golang to 1.24.1 (#17035)
- Update go.mod to use Go 1.24.1
- Update GitHub Actions setup-go action to use Go 1.24.1
- Fix linting issues with golangci-lint by:
  - Updating to golangci-lint v1.57.1 (more compatible with Go 1.24.1)

🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <claude@anthropic.com>
2025-03-26 01:56:39 -05:00
Cian Johnston 68624092a4 feat(agent/reconnectingpty): allow selecting backend type (#17011)
agent/reconnectingpty: allow specifying backend type
cli: exp rpty: automatically select backend based on command
2025-03-20 13:45:31 +00:00
Thomas Kosiewski d0e2060692 feat(agent): add second SSH listener on port 22 (#16627)
Fixes: https://github.com/coder/internal/issues/377

Added an additional SSH listener on port 22, so the agent now listens on both, port one and port 22.

---
Change-Id: Ifd986b260f8ac317e37d65111cd4e0bd1dc38af8
Signed-off-by: Thomas Kosiewski <tk@coder.com>
2025-03-03 04:47:42 +01:00
Cian Johnston 172e52317c feat(agent): wire up agentssh server to allow exec into container (#16638)
Builds on top of https://github.com/coder/coder/pull/16623/ and wires up
the ReconnectingPTY server. This does nothing to wire up the web
terminal yet but the added test demonstrates the functionality working.

Other changes:
* Refactors and moves the `SystemEnvInfo` interface to the
`agent/usershell` package to address follow-up from
https://github.com/coder/coder/pull/16623#discussion_r1967580249
* Marks `usershellinfo.Get` as deprecated. Consumers should use the
`EnvInfoer` interface instead.

---------

Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
Co-authored-by: Danny Kopping <danny@coder.com>
2025-02-26 09:03:27 +00:00
Spike Curtis 2c7f8ac65f chore: migrate to coder/websocket 1.8.12 (#15898)
Migrates us to `coder/websocket` v1.8.12 rather than `nhooyr/websocket` on an older version.

Works around https://github.com/coder/websocket/issues/504 by adding an explicit test for `xerrors.Is(err, io.EOF)` where we were previously getting `io.EOF` from the netConn.
2024-12-19 00:51:30 +04:00
Spike Curtis 747f7ce173 feat: add support for WorkspaceUpdates to WebsocketDialer (#15534)
closes #14730

Adds support for WorkspaceUpdates to the WebsocketDialer. This allows us to dial the new endpoint added in #14847 and connect it up to a `tailnet.Controllers` to connect to all agents over the tailnet.

I refactored the fakeWorkspaceUpdatesProvider to a mock and moved it to `tailnettest` so it could be more easily reused.  The Mock is a little more full-featured.
2024-11-18 10:54:11 +04:00
Spike Curtis 40802958e9 fix: use explicit api versions for agent and tailnet (#15508)
Bumps the Tailnet and Agent API version 2.3, and creates some extra controls and machinery around these versions.

What happened is that we accidentally shipped two new API features without bumping the version.  `ScriptCompleted` on the Agent API in Coder v2.16 and `RefreshResumeToken` on the Tailnet API in Coder v2.15.

Since we can't easily retroactively bump the versions, we'll roll these changes into API version 2.3 along with the new WorkspaceUpdates RPC, which hasn't been released yet.  That means there is some ambiguity in Coder v2.15-v2.17 about exactly what methods are supported on the Tailnet and Agent APIs.  This isn't great, but hasn't caused us major issues because 

1. RefreshResumeToken is considered optional, and clients just log and move on if the RPC isn't supported. 
2. Agents basically never get started talking to a Coderd that is older than they are, since the agent binary is normally downloaded from Coderd at workspace start.

Still it's good to get things squared away in terms of versions for SDK users and possible edge cases around client and server versions.

To mitigate against this thing happening again, this PR also:

1. adds a CODEOWNERS for the API proto packages, so I'll review changes
2. defines interface types for different API versions, and has the agent explicitly use a specific version.  That way, if you add a new method, and try to use it in the agent without thinking explicitly about versions, it won't compile.

With the protocol controllers stuff, we've sort of already abstracted the Tailnet API such that the interface type strategy won't work, but I'll work on getting the Controller to be version aware, such that it can check the API version it's getting against the controllers it has -- in a later PR.
2024-11-15 11:16:28 +04:00
Spike Curtis e5661c2748 feat: add support for multiple tunnel destinations in tailnet (#15409)
Closes #14729

Expands the Coordination controller used by the CLI client to allow multiple tunnel destinations (agents).  Our current client uses just one, but this unifies the logic so that when we add Coder VPN, 1 is just a special case of "many."
2024-11-08 13:32:07 +04:00
Spike Curtis 718722af1b chore: refactor tailnetAPIConnector to tailnet.Controller (#15361)
Refactors `workspacesdk.tailnetAPIConnector` as a `tailnet.Controller` to reuse all the reconnection and graceful disconnect logic.

chore re: #14729
2024-11-08 10:10:54 +04:00
Spike Curtis 2d061e698d chore: refactor tailnetAPIConnector to use dialer (#15347)
refactors `tailnetAPIConnector` to use the `Dialer` interface in `tailnet`, introduced lower in this stack of PRs. This will let us use the same Tailnet API handling code across different things that connect to the Tailnet API (CLI client, coderd, workspace proxies, and soon: Coder VPN).

chore re: #14729
2024-11-07 17:24:19 +04:00
Spike Curtis 7d9f5ab81d chore: add Coder service prefix to tailnet (#14943)
re: #14715

This PR introduces the Coder service prefix: `fd60:627a:a42b::/48` and refactors our existing code as calling the Tailscale service prefix explicitly (rather than implicitly).

Removes the unused `Addresses` agent option. All clients today assume they can compute the Agent's IP address based on its UUID, so an agent started with a custom address would break things.
2024-10-04 10:04:10 +04:00
Spike Curtis fb3523b37f chore: remove legacy AgentIP address (#14640)
Removes the support for the Agent's "legacy IP" which was a hardcoded IP address all agents used to use, before we introduced "single tailnet". Single tailnet went GA in 2.7.0.
2024-09-12 07:40:19 +04:00
Dean Sheather cf8be4eac5 feat: add resume support to coordinator connections (#14234) 2024-08-20 17:16:49 +10:00
Ethan a110d18275 chore: add DRPC tailnet & cli network telemetry (#13687) 2024-07-03 15:23:46 +10:00
Spike Curtis c94b5188bd fix: modify workspacesdk to ask for tailnet API 2.0 (#13684)
#13617 bumped the Agent/Tailnet API minor version because it adds telemetry features.  However, we don't actually use the protocol features yet, so it's a bit obnoxious for our CLI client to ask for the newest API version.

This is particularly true of the CLI client, since that's distributed separately, so if an end user installs the latest CLI client and their organization hasn't fully upgraded, then it will fail to connect.

Since we have a release coming up and the telemetry stuff won't make it, I think we should roll back to version 2.0 until we actually implement the telemetry stuff. That way the newest release (2.13) will work with Coder servers all the way back to 2.9.
2024-06-27 15:38:21 +04:00
Spike Curtis 5b59f2880f fix: fix workspacesdk to return error on API mismatch (#13683) 2024-06-27 15:02:43 +04:00
Spike Curtis 3de737fdc8 fix: start packet capture immediately on speedtest (#13128)
I initially made this change when hacking wgengine to also capture wireguard packets going into the magicsock, so that we could capture the initial wireguard handshake. 

I don't think we should ship that additional capture logic, but... it seems generally useful to capture packets from the get go on speedtest, so that you can see disco and pings before the TCP speedtest session starts.
2024-05-02 19:44:32 +04:00
Colin Adler 4d5a7b2d56 chore(codersdk): move all tailscale imports out of codersdk (#12735)
Currently, importing `codersdk` just to interact with the API requires
importing tailscale, which causes builds to fail unless manually using
our fork.
2024-03-26 12:44:31 -05:00