Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:
```
import (
"context"
"time"
"github.com/prometheus/client_golang/prometheus"
"golang.org/x/xerrors"
"gopkg.in/natefinch/lumberjack.v2"
"cdr.dev/slog/v3"
"github.com/coder/coder/v2/codersdk/agentsdk"
"github.com/coder/serpent"
)
```
3 groups: standard library, 3rd partly libs, Coder libs.
This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).
It also updates dependencies that also use slog and were updated.
I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.
Other dependencies, I pushed new tags.
Fixes#15523
Uses latest https://github.com/coder/tailscale which includes https://github.com/coder/tailscale/pull/85 to stop selecting paths with small MTU for direct connections.
Also updates the tailnet integration test to reproduce the issue. The previous version had the 2 peers connected by a single veth, but this allows the OS to fragment the packet. In the new version, the 2 peers (and server) are all connected by a central router. The link between peer 1 and the router has an adjustable MTU. IPv6 does not allow packets to be fragmented by intermediate routers, so sending a too-large packet in this scenario forces the router to drop packets and reproduce the issue (without the tailscale changes).
Refactors tailnet integration test and adds UDP echo tests with different MTU related to #15523
I still haven't gotten to the bottom of what's causing the issue (the added test case I expected to fail actually succeeds), but these integration test improvements are generally useful.
also:
* consolidates networking setup with easy and hard NAT
* consolidates client setup
* makes Client2 act like an agent at the tailnet layer, so it will send ReadyForHandshake and speed up the tunnel establishment
* adds support for logging tunneled packets
* adds support for dumping outer (underlay) IP traffic
* adds support for adjusting veth MTU
* adds support for IPv6 in the outer (underlay) network topology
Adds support to our coordinator implementations to send Error updates before disconnecting clients.
I was recently debugging a connection issue where the client was getting repeatedly disconnected from the Coordinator, but since we never send any error information it was really hard without server logs.
This PR aims to correct that, by sending a CoordinateResponse with `Error` set in cases where we disconnect a client without them asking us to.
It also logs the error whenever we get one in the client controller.
- Update go.mod to use Go 1.24.1
- Update GitHub Actions setup-go action to use Go 1.24.1
- Fix linting issues with golangci-lint by:
- Updating to golangci-lint v1.57.1 (more compatible with Go 1.24.1)
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
Migrates us to `coder/websocket` v1.8.12 rather than `nhooyr/websocket` on an older version.
Works around https://github.com/coder/websocket/issues/504 by adding an explicit test for `xerrors.Is(err, io.EOF)` where we were previously getting `io.EOF` from the netConn.
Refactors our use of `slogtest` to instantiate a "standard logger" across most of our tests. This standard logger incorporates https://github.com/coder/slog/pull/217 to also ignore database query canceled errors by default, which are a source of low-severity flakes.
Any test that has set non-default `slogtest.Options` is left alone. In particular, `coderdtest` defaults to ignoring all errors. We might consider revisiting that decision now that we have better tools to target the really common flaky Error logs on shutdown.
Closes#14729
Expands the Coordination controller used by the CLI client to allow multiple tunnel destinations (agents). Our current client uses just one, but this unifies the logic so that when we add Coder VPN, 1 is just a special case of "many."
chore of #14729
Refactors the `ServerTailnet` to use `tailnet.Controller` so that we reuse logic around reconnection and handling control messages, instead of reimplementing. This unifies our "client" use of the tailscale API across CLI, coderd, and wsproxy.
Refactors the way clients of the Tailnet API (clients of the API, which include both workspace "agents" and "clients") interact with the API. Introduces the idea of abstract "controllers" for each of the RPCs in the API, and implements a Coordination controller by refactoring from `workspacesdk`.
chore re: #14729
Closes#14716Closes#14717
Adds a new user-scoped tailnet API endpoint (`api/v2/tailnet`) with a new RPC stream for receiving updates on workspaces owned by a specific user, as defined in #14716.
When a stream is started, the `WorkspaceUpdatesProvider` will begin listening on the user-scoped pubsub events implemented in #14964. When a relevant event type is seen (such as a workspace state transition), the provider will query the DB for all the workspaces (and agents) owned by the user. This gets compared against the result of the previous query to produce a set of workspace updates.
Workspace updates can be requested for any user ID, however only workspaces the authorised user is permitted to `ActionRead` will have their updates streamed.
Opening a tunnel to an agent requires that the user can perform `ActionSSH` against the workspace containing it.
re: #14715
This PR introduces the Coder service prefix: `fd60:627a:a42b::/48` and refactors our existing code as calling the Tailscale service prefix explicitly (rather than implicitly).
Removes the unused `Addresses` agent option. All clients today assume they can compute the Agent's IP address based on its UUID, so an agent started with a custom address would break things.
Fixes#12560
When gracefully disconnecting from the coordinator, we would send the Disconnect message and then close the dRPC stream. However, closing the dRPC stream can cause the server not to process the Disconnect message, since we use the stream context in a `select` while sending it to the coordinator.
This is a product bug uncovered by the flake, and probably results in us failing graceful disconnect some minority of the time.
Instead, the `remoteCoordination` (and `inMemoryCoordination` for consistency) should send the Disconnect message and then wait for the coordinator to hang up (on some graceful disconnect timer, in the form of a context).
Drops support for v1 of the tailnet API, which was the original coordination protocol where we only sent node updates, never marked them lost or disconnected.
v2 of the tailnet API went GA for CLI clients in Coder 2.8.0, so clients older than that would stop working.
* fix: avoid deleting peers on graceful close
- Fixes an issue where a coordinator deletes all
its peers on shutdown. This can cause disconnects
whenever a coderd is redeployed.
- `DERPForceWebSockets`: Test that DERP over WebSocket (as well as DERPForceWebSockets works). This does not test the actual DERP failure detection code and automatic fallback.
- `DERPFallbackWebSockets`: Test that falling back to DERP over WebSocket works.
Also:
- Rearranges some test code and refactors `TestTopology.StartServer` to be `TestTopology.ServerOptions` and take a struct instead of a function
Closes#13045
This adds the ability for `TunnelAuth` to also authorize incoming wireguard node IPs, preventing agents from reporting anything other than their static IP generated from the agent ID.