Commit Graph

163 Commits

Author SHA1 Message Date
Spike Curtis e5268e4551 chore: spin clock library out to coder/quartz repo (#13777)
Code that was in `/clock` has been moved to github.com/coder/quartz.  This PR refactors our use of the clock library to point to the external Quartz repo.
2024-07-03 15:02:54 +04:00
Ethan a110d18275 chore: add DRPC tailnet & cli network telemetry (#13687) 2024-07-03 15:23:46 +10:00
Dean Sheather 6c94dd4f23 chore: add DRPC server implementation for network telemetry (#13675) 2024-07-02 01:50:52 +10:00
Colin Adler 3dec6ff32f chore: add protobuf types for tailnet telemetry (#13617) 2024-06-24 12:13:03 -05:00
Ethan 58bf0ec1c6 chore: add additional tailnet topology integration tests (#13549) 2024-06-12 16:02:34 +00:00
Spike Curtis 8326a3a675 chore: change mock clock to allow Advance() within timer/tick functions (#13500) 2024-06-10 15:27:24 +04:00
Spike Curtis 775fc3f5e9 chore: add Now, Since, AfterFunc to clock; use clock for configmaps test (#13476)
* chore: add Now, Since, AfterFunc to clock; use clock for configmaps test

* chore: update flake.nix vendor hash

Signed-off-by: Spike Curtis <spike@coder.com>

---------

Signed-off-by: Spike Curtis <spike@coder.com>
2024-06-05 16:00:02 +04:00
Dean Sheather 9299e9f6ba chore: hard NAT <-> easy NAT integration test (#13314) 2024-05-29 06:04:07 +10:00
Dean Sheather e5bb0a7a00 chore: add easy NAT integration tests part 2 (#13312) 2024-05-24 16:32:30 +10:00
Dean Sheather 273209432d chore: fix tailnet integration test flake (#13313) 2024-05-21 12:57:39 +10:00
Kayla Washburn-Love d8e0be6ee6 feat: add support for multiple banners (#13081) 2024-05-08 15:40:43 -06:00
Colin Adler 421c0d1242 chore: add nginx topology to tailnet tests (#13188) 2024-05-07 18:17:38 +10:00
Dean Sheather 72f2efe048 chore: implement easy NAT direct integration test (#13169) 2024-05-07 06:07:57 +00:00
Dean Sheather 5e8f97d8c3 chore: add DERP websocket integration tests (#13168)
- `DERPForceWebSockets`: Test that DERP over WebSocket (as well as DERPForceWebSockets works). This does not test the actual DERP failure detection code and automatic fallback.
- `DERPFallbackWebSockets`: Test that falling back to DERP over WebSocket works.

Also:
- Rearranges some test code and refactors `TestTopology.StartServer` to be `TestTopology.ServerOptions` and take a struct instead of a function

Closes #13045
2024-05-06 20:37:01 -07:00
Dean Sheather d956af0a3a chore: add EasyNATDERP tailnet integration test (#13138) 2024-05-06 15:36:54 +10:00
Dean Sheather ed0ca76b0b chore: do network integration tests in isolated net ns (#13117) 2024-05-03 05:42:13 +00:00
Spike Curtis 3de737fdc8 fix: start packet capture immediately on speedtest (#13128)
I initially made this change when hacking wgengine to also capture wireguard packets going into the magicsock, so that we could capture the initial wireguard handshake. 

I don't think we should ship that additional capture logic, but... it seems generally useful to capture packets from the get go on speedtest, so that you can see disco and pings before the TCP speedtest session starts.
2024-05-02 19:44:32 +04:00
Colin Adler 15157c1c40 chore: add network integration test suite scaffolding (#13072)
* chore: add network integration test suite scaffolding

* dean comments
2024-04-26 17:48:41 +00:00
Colin Adler 777dfbe965 feat(enterprise): add ready for handshake support to pgcoord (#12935) 2024-04-16 15:01:10 -05:00
Colin Adler e801e878ba feat: add agent acks to in-memory coordinator (#12786)
When an agent receives a node, it responds with an ACK which is relayed
to the client. After the client receives the ACK, it's allowed to begin
pinging.
2024-04-10 17:15:33 -05:00
Marcin Tojek b6359b0a89 fix: ignore gomock temporary files (#12924) 2024-04-10 08:48:56 +00:00
Colin Adler 4d5a7b2d56 chore(codersdk): move all tailscale imports out of codersdk (#12735)
Currently, importing `codersdk` just to interact with the API requires
importing tailscale, which causes builds to fail unless manually using
our fork.
2024-03-26 12:44:31 -05:00
Colin Adler 66154f937e fix(coderd): pass block endpoints into servertailnet (#12149) 2024-03-08 05:29:54 +00:00
Colin Adler e5d911462f fix(tailnet): enforce valid agent and client addresses (#12197)
This adds the ability for `TunnelAuth` to also authorize incoming wireguard node IPs, preventing agents from reporting anything other than their static IP generated from the agent ID.
2024-03-01 09:02:33 -06:00
Spike Curtis 4e7beee102 feat: show tailnet peer diagnostics after coder ping (#12314)
Beginnings of a solution to #12297 

Doesn't cover disco or definitively display whether we successfully connected to DERP, but shows some checklist diagnostics for connecting to an agent.

For this first PR, I just added it to `coder ping` to see how we like it, but could be incorporated into `coder ssh` _et al._ after a timeout.

```
$ coder ping dogfood2
p2p connection established in 147ms
pong from dogfood2 p2p via  95.217.xxx.yyy:42631  in 147ms
pong from dogfood2 p2p via  95.217.xxx.yyy:42631  in 140ms
pong from dogfood2 p2p via  95.217.xxx.yyy:42631  in 140ms
✔ preferred DERP region 999 (Council Bluffs, Iowa)
✔ sent local data to Coder networking coodinator
✔ received remote agent data from Coder networking coordinator
    preferred DERP 10013 (Europe Fly.io (Paris))
    endpoints: 95.217.xxx.yyy:42631, 95.217.xxx.yyy:37576, 172.17.0.1:37576, 172.20.0.10:37576
✔ Wireguard handshake 11s ago
```
2024-02-27 22:04:46 +04:00
Spike Curtis af3fdc68c3 chore: refactor agent routines that use the v2 API (#12223)
In anticipation of needing the `LogSender` to run on a context that doesn't get immediately canceled when you `Close()` the agent, I've undertaken a little refactor to manage the goroutines that get run against the Tailnet and Agent API connection.

This handles controlling two contexts, one that gets canceled right away at the start of graceful shutdown, and another that stays up to allow graceful shutdown to complete.
2024-02-23 11:04:23 +04:00
Dean Sheather 9861830e87 fix: never send local endpoints if disabled (#12138) 2024-02-20 15:51:25 +10:00
Cian Johnston a2cbb0f87f fix(enterprise/coderd): check provisionerd API version on connection (#12191) 2024-02-16 18:43:07 +00:00
Spike Curtis 2d0b9106c0 fix: change servertailnet to register the DERP dialer before setting DERP map (#12137)
I noticed a possible race where tailnet.Conn can try to dial the embedded region before we've set our custom dialer that send the DERP in-memory.  This closes that race and adds a test case for servertailnet with no STUN and an embedded relay
2024-02-15 10:51:12 +04:00
Spike Curtis e5ba586e30 fix: fix graceful disconnect in DialWorkspaceAgent (#11993)
I noticed in testing that the CLI wasn't correctly sending the disconnect message when it shuts down, and thus agents are seeing this as a "lost" peer, rather than a "disconnected" one. 

What was happening is that we just used a single context for everything from the netconn to the RPCs, and when the context was canceled we failed to send the disconnect message due to canceled context.

So, this PR splits things into two contexts, with a graceful one set to last up to 1 second longer than the main one.
2024-02-05 14:01:37 +04:00
Spike Curtis bb99cb7d2b chore: move FakeCoordinator to tailnettest (#11992)
Moves FakeCoordinator to tailnettest since it's reused in testing multiple packages in this stack of PRs.
2024-02-05 13:49:32 +04:00
Spike Curtis 646ac942b2 chore: rename FakeCoordinator for export (#11991)
Part of a stack that fixes graceful disconnect from the CLI to tailnet.  I reuse FakeCoordinator in a test for graceful disconnects.
2024-02-05 13:33:31 +04:00
Spike Curtis 520b12e1a2 fix: close MultiAgentConn when coordinator closes (#11941)
Fixes an issue where a MultiAgentConn isn't closed properly when the coordinator it is connected to is closed.

Since servertailnet checks whether the conn is closed before reinitializing, it is important that we check this, otherwise servertailnet can get stuck if the coordinator closes (e.g. when we switch from AGPL to PGCoordinator after decoding a license).
2024-01-31 00:38:19 +04:00
Spike Curtis d3983e4dba feat: add logging to client tailnet yamux (#11908)
Adds logging to yamux when used for tailnet client connections, e.g. CLI and wsproxy.  This could be useful for debugging connection issues with tailnet v2 API.
2024-01-30 09:58:59 +04:00
Spike Curtis 1e8a9c09fe chore: remove legacy wsconncache (#11816)
Fixes #8218

Removes `wsconncache` and related "is legacy?" functions and API calls that were used by it.

The only leftover is that Agents still use the legacy IP, so that back level clients or workspace proxies can dial them correctly.

We should eventually remove this: #11819
2024-01-30 07:56:36 +04:00
Colin Adler bc14e926d8 feat: add option to speedtest to dump a pcap of network traffic (#11848) 2024-01-29 09:57:31 -06:00
Spike Curtis 5cbb76b47a fix: stop spamming DERP map updates for equivalent maps (#11792)
Fixes 2 related issues:

1. wsconncache had incorrect logic to test whether to send DERPMap updates, sending if the maps were equivalent, instead of if they were _not equivalent_.
2. configmaps used a bugged check to test equality between DERPMaps, since it contains a map and the map entries are serialized in random order. Instead, we avoid comparing the protobufs and instead depend on the existing function that compares `tailcfg.DERPMap`. This also has the effect of reducing the number of times we convert to and from protobuf.
2024-01-24 16:27:15 +04:00
Spike Curtis 059e533544 feat: agent uses Tailnet v2 API for DERPMap updates (#11698)
Switches the Agent to use Tailnet v2 API to get DERPMap updates.

Subsequent PRs will do the same for the CLI (`codersdk`) and `wsproxy`.
2024-01-23 14:42:07 +04:00
Spike Curtis 3e0e7f8739 feat: check agent API version on connection (#11696)
fixes #10531

Adds a check for `version` on connection to the Agent API websocket endpoint.  This is primarily for future-proofing, so that up-level agents get a sensible error if they connect to a back-level Coderd.

It also refactors the location of the `CurrentVersion` variables, to be part of the `proto` packages, since the versions refer to the APIs defined therein.
2024-01-23 14:27:49 +04:00
Spike Curtis eb12fd7d92 feat: make ServerTailnet set peers lost when it reconnects to the coordinator (#11682)
Adds support to `ServerTailnet` to set all peers lost before attempting to reconnect to the coordinator. In practice, this only really affects `wsproxy` since coderd has a local connection to the coordinator that only goes down if we're shutting down or change licenses.
2024-01-23 13:17:56 +04:00
Spike Curtis 5388a1b6d7 fix: use TSMP ping for reachability, not latency (#11749)
Use TSMP ping for reachability, but leave Disco ping for when we call Ping() since we often use that to determine whether we have a direct connection.

Also adds unit tests to make sure Ping() returns direct connection vs DERP correctly.
2024-01-22 17:37:15 +04:00
Spike Curtis 7ffd99cfe2 fix: use DiscoPing (partially reverts #11306) (#11744) 2024-01-22 12:40:21 +00:00
Spike Curtis 3d85cdfa11 feat: set peers lost when disconnected from coordinator (#11681)
Adds support to Coordination to call SetAllPeersLost() when it is closed. This ensure that when we disconnect from a Coordinator, we set all peers lost.

This covers CoderSDK (CLI client) and Agent.  Next PR will cover MultiAgent (notably, `wsproxy`).
2024-01-22 15:26:20 +04:00
Spike Curtis b7b936547d feat: add setAllPeersLost to the configMaps subcomponent (#11665)
adds setAllPeersLost to the configMaps subcomponent of tailnet.Conn --- we'll call this when we disconnect from a coordinator so we'll eventually clean up peers if they disconnect while we are retrying the coordinator connection (or we don't succeed in reconnecting to the coordinator).
2024-01-22 12:12:15 +04:00
Spike Curtis f01cab9894 feat: use tailnet v2 API for coordination (#11638)
This one is huge, and I'm sorry.

The problem is that once I change `tailnet.Conn` to start doing v2 behavior, I kind of have to change it everywhere, including in CoderSDK (CLI), the agent, wsproxy, and ServerTailnet.

There is still a bit more cleanup to do, and I need to add code so that when we lose connection to the Coordinator, we mark all peers as LOST, but that will be in a separate PR since this is big enough!
2024-01-22 11:07:50 +04:00
Spike Curtis 8910ac715c feat: add tailnet v2 support to wsproxy coordinate endpoint (#11637)
wsproxy also needs to be updated to use tailnet v2 because the `tailnet.Conn` stores peers by ID, and the peerID was not being carried by the JSON protocol.  This adds a query param to the endpoint to conditionally switch to the new protocol.
2024-01-18 10:10:36 +04:00
Spike Curtis 07427e06f7 chore: add setBlockEndpoints to nodeUpdater (#11636)
nodeUpdater also needs block endpoints, so that it can stop sending nodes with endpoints.
2024-01-18 10:02:15 +04:00
Spike Curtis 5b4de667d6 chore: add setCallback to nodeUpdater (#11635)
we need to be able to (re-)set the node callback when we lose and regain connection to a coordinator over the network.
2024-01-18 09:51:09 +04:00
Spike Curtis e725f9d7d4 chore: stop passing addresses on configMaps constructor (#11634)
moving this out of the constructor so that setting this when creating a new `tailnet.Conn` triggers configuring the engine.
2024-01-18 09:43:28 +04:00
Spike Curtis a514df71ed chore: add setDERPMap to configMaps (#11590)
Add setDERPMap
2024-01-18 09:34:30 +04:00