Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:
```
import (
"context"
"time"
"github.com/prometheus/client_golang/prometheus"
"golang.org/x/xerrors"
"gopkg.in/natefinch/lumberjack.v2"
"cdr.dev/slog/v3"
"github.com/coder/coder/v2/codersdk/agentsdk"
"github.com/coder/serpent"
)
```
3 groups: standard library, 3rd partly libs, Coder libs.
This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).
It also updates dependencies that also use slog and were updated.
I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.
Other dependencies, I pushed new tags.
fixes: https://github.com/coder/internal/issues/1143
Both gVisor and the Go standard library implementations of `net.Conn` can under certain circumstances return `nil` for `RemoteAddr()` and `LocalAddr()` calls. If we call their methods, we segfault.
This PR fixes these calls and adds ruleguard rules.
Note that `slog.F("remote_addr", conn.RemoteAddr())` is fine because slog detects the `nil` before attempting to stringify the type.
Fixes https://github.com/coder/internal/issues/66
The problem is a race during test tear down where the node callback can fire after the destination tailnet.Conn has already closed, causing an error.
The fix I have employed is to remove the callback in a t.Cleanup() and also refactor some tests to ensure they close the tailnet.Conn in a Cleanup deeper in the stack.
- Adds `testutil.GoleakOptions` and consolidates existing options to
this location
- Pre-emptively adds required ignore for this Dependabot PR to pass CI
https://github.com/coder/coder/pull/16066
Fixes a test flake on TestTailnet_ForcesWebsockets like:
```
t.go:106: 2024-11-18 07:44:25.939 [debu] w2: dial tcp addr_port="[fd7a:115c:a1e0:46cc:bd8e:400d:1bc6:f6ac]:35565"
t.go:106: 2024-11-18 07:44:25.943 [debu] w1.net.netstack: netstack: could not connect to local server at 127.0.0.1:35565 (or [::1]:35565)%!(EXTRA *net.OpError=dial tcp [::1]:35565: connect: connection refused)
conn_test.go:146:
Error Trace: /Users/spike/repos/coder/tailnet/conn_test.go:146
Error: Received unexpected error:
connect tcp [fd7a:115c:a1e0:46cc:bd8e:400d:1bc6:f6ac]:35565: connection was refused
Test: TestTailnet/ForcesWebSockets
t.go:106: 2024-11-18 07:44:25.945 [info] w1: closing tailnet Conn
t.go:106: 2024-11-18 07:44:25.945 [debu] w1: closing configMaps configLoop
t.go:106: 2024-11-18 07:44:25.945 [debu] w1: closing nodeUpdater updateLoop
t.go:106: 2024-11-18 07:44:25.945 [debu] w1: closed netstack
conn_test.go:135:
Error Trace: /Users/spike/repos/coder/tailnet/conn_test.go:135
/Users/spike/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.8.darwin-arm64/src/runtime/asm_arm64.s:1222
Error: Received unexpected error:
connection closed:
github.com/coder/coder/v2/tailnet.init
<autogenerated>:1
Test: TestTailnet/ForcesWebSockets
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x0 pc=0x1039771dc]
goroutine 2224 [running]:
github.com/coder/coder/v2/tailnet_test.TestTailnet.func3.2()
/Users/spike/repos/coder/tailnet/conn_test.go:136 +0x7c
created by github.com/coder/coder/v2/tailnet_test.TestTailnet.func3 in goroutine 109
/Users/spike/repos/coder/tailnet/conn_test.go:133 +0x7dc
```
Test didn't synchronize listening on the port before dialing it.
It also has a nil pointer deference when the test fails, which causes a bunch of unrelated output. Also fixed.
Refactors our use of `slogtest` to instantiate a "standard logger" across most of our tests. This standard logger incorporates https://github.com/coder/slog/pull/217 to also ignore database query canceled errors by default, which are a source of low-severity flakes.
Any test that has set non-default `slogtest.Options` is left alone. In particular, `coderdtest` defaults to ignoring all errors. We might consider revisiting that decision now that we have better tools to target the really common flaky Error logs on shutdown.
re: #14715
This PR introduces the Coder service prefix: `fd60:627a:a42b::/48` and refactors our existing code as calling the Tailscale service prefix explicitly (rather than implicitly).
Removes the unused `Addresses` agent option. All clients today assume they can compute the Agent's IP address based on its UUID, so an agent started with a custom address would break things.
Use TSMP ping for reachability, but leave Disco ping for when we call Ping() since we often use that to determine whether we have a direct connection.
Also adds unit tests to make sure Ping() returns direct connection vs DERP correctly.
This one is huge, and I'm sorry.
The problem is that once I change `tailnet.Conn` to start doing v2 behavior, I kind of have to change it everywhere, including in CoderSDK (CLI), the agent, wsproxy, and ServerTailnet.
There is still a bit more cleanup to do, and I need to add code so that when we lose connection to the Coordinator, we mark all peers as LOST, but that will be in a separate PR since this is big enough!
* chore: add /v2 to import module path
go mod requires semantic versioning with versions greater than 1.x
This was a mechanical update by running:
```
go install github.com/marwan-at-work/mod/cmd/mod@latest
mod upgrade
```
Migrate generated files to import /v2
* Fix gen
* feat: automatically use websockets if DERP upgrade is unavailable
This might be our biggest hangup for deployments at the moment...
Load balancers by default do not support the DERP protocol, so many
of our prospects and customers run into failing workspace connections.
This automatically swaps to use WebSockets, and reports the reason to
coderd.
In a future contribution, a warning will appear by the agent if it was
forced to use WebSockets instead of DERP.
* Fix nil pointer type in Tailscale dep
* Fix requested changes
* fix: Improve tailnet connections by reducing timeouts
This awaits connection ping before running a dial. Before,
we were hitting the TCP retransmission and handshake timeouts,
which could intermittently add 1 or 5 seconds to a connection
being initialized.
* Update Tailscale
* fix: Add latency-check for DERP over HTTP(s)
This fixes scenarios where latency wasn't being reported if
a connection had UDP entirely blocked.
* Add inactivity ping
* Improve coordinator error reporting consistency
* fix: Add coder user to docker group on installation
This makes for a simpler setup, and reduces the likelihood
a user runs into a strange issue.
* Add wgnet
* Add ping
* Add listening
* Finish refactor to make this work
* Add interface for swapping
* Fix conncache with interface
* chore: update gvisor
* fix tailscale types
* linting
* more linting
* Add coordinator
* Add coordinator tests
* Fix coordination
* It compiles!
* Move all connection negotiation in-memory
* Migrate coordinator to use net.conn
* Add closed func
* Fix close listener func
* Make reconnecting PTY work
* Fix reconnecting PTY
* Update CI to Go 1.19
* Add CLI flags for DERP mapping
* Fix Tailnet test
* Rename ConnCoordinator to TailnetCoordinator
* Remove print statement from workspace agent test
* Refactor wsconncache to use tailnet
* Remove STUN from unit tests
* Add migrate back to dump
* chore: Upgrade to Go 1.19
This is required as part of #3505.
* Fix reconnecting PTY tests
* fix: update wireguard-go to fix devtunnel
* fix migration numbers
* linting
* Return early for status if endpoints are empty
* Update cli/server.go
Co-authored-by: Colin Adler <colin1adler@gmail.com>
* Update cli/server.go
Co-authored-by: Colin Adler <colin1adler@gmail.com>
* Fix frontend entites
* Fix agent bicopy
* Fix race condition for the last node
* Fix down migration
* Fix connection RBAC
* Fix migration numbers
* Fix forwarding TCP to a local port
* Implement ping for tailnet
* Rename to ForceHTTP
* Add external derpmapping
* Expose DERP region names to the API
* Add global option to enable Tailscale networking for web
* Mark DERP flags hidden while testing
* Update DERP map on reconnect
* Add close func to workspace agents
* Fix race condition in upstream dependency
* Fix feature columns race condition
Co-authored-by: Colin Adler <colin1adler@gmail.com>