Commit Graph

78 Commits

Author SHA1 Message Date
Spike Curtis 2c7f8ac65f chore: migrate to coder/websocket 1.8.12 (#15898)
Migrates us to `coder/websocket` v1.8.12 rather than `nhooyr/websocket` on an older version.

Works around https://github.com/coder/websocket/issues/504 by adding an explicit test for `xerrors.Is(err, io.EOF)` where we were previously getting `io.EOF` from the netConn.
2024-12-19 00:51:30 +04:00
Spike Curtis 747f7ce173 feat: add support for WorkspaceUpdates to WebsocketDialer (#15534)
closes #14730

Adds support for WorkspaceUpdates to the WebsocketDialer. This allows us to dial the new endpoint added in #14847 and connect it up to a `tailnet.Controllers` to connect to all agents over the tailnet.

I refactored the fakeWorkspaceUpdatesProvider to a mock and moved it to `tailnettest` so it could be more easily reused.  The Mock is a little more full-featured.
2024-11-18 10:54:11 +04:00
Spike Curtis 40802958e9 fix: use explicit api versions for agent and tailnet (#15508)
Bumps the Tailnet and Agent API version 2.3, and creates some extra controls and machinery around these versions.

What happened is that we accidentally shipped two new API features without bumping the version.  `ScriptCompleted` on the Agent API in Coder v2.16 and `RefreshResumeToken` on the Tailnet API in Coder v2.15.

Since we can't easily retroactively bump the versions, we'll roll these changes into API version 2.3 along with the new WorkspaceUpdates RPC, which hasn't been released yet.  That means there is some ambiguity in Coder v2.15-v2.17 about exactly what methods are supported on the Tailnet and Agent APIs.  This isn't great, but hasn't caused us major issues because 

1. RefreshResumeToken is considered optional, and clients just log and move on if the RPC isn't supported. 
2. Agents basically never get started talking to a Coderd that is older than they are, since the agent binary is normally downloaded from Coderd at workspace start.

Still it's good to get things squared away in terms of versions for SDK users and possible edge cases around client and server versions.

To mitigate against this thing happening again, this PR also:

1. adds a CODEOWNERS for the API proto packages, so I'll review changes
2. defines interface types for different API versions, and has the agent explicitly use a specific version.  That way, if you add a new method, and try to use it in the agent without thinking explicitly about versions, it won't compile.

With the protocol controllers stuff, we've sort of already abstracted the Tailnet API such that the interface type strategy won't work, but I'll work on getting the Controller to be version aware, such that it can check the API version it's getting against the controllers it has -- in a later PR.
2024-11-15 11:16:28 +04:00
Spike Curtis 8c00ebc6ee chore: refactor ServerTailnet to use tailnet.Controllers (#15408)
chore of #14729

Refactors the `ServerTailnet` to use `tailnet.Controller` so that we reuse logic around reconnection and handling control messages, instead of reimplementing.  This unifies our "client" use of the tailscale API across CLI, coderd, and wsproxy.
2024-11-08 13:18:56 +04:00
Jon Ayers cd890aa3a0 feat: enable key rotation (#15066)
This PR contains the remaining logic necessary to hook up key rotation
to the product.
2024-10-25 17:14:35 +01:00
Spike Curtis 32d5875fa4 fix: wait for server tailnet background routines to exit on Close (#15183)
fixes https://github.com/coder/internal/issues/114

We need to wait for ServerTailnet goroutines to finish when closing down, otherwise we can race with the shutdown of coderd & the coordinator, which causes errors.
2024-10-23 10:09:56 +04:00
Ethan 46cce333b1 fix: check unstaged files during ci lint (#15120) 2024-10-17 05:37:43 +00:00
Jon Ayers f537193682 chore: refactor keycache implementation to reduce duplication (#15100) 2024-10-16 20:01:45 +01:00
Jon Ayers 384873a114 feat: add wsproxy implementation for key fetching (#14917) 2024-10-14 20:04:10 +01:00
Jon Ayers 21b92ef893 feat: add cache abstraction for fetching signing keys (#14777)
- Adds the database implementation for fetching and caching keys
used for JWT signing. It's been merged into the `keyrotate` pkg and
renamed to `cryptokeys` since they're coupled concepts.
2024-10-01 11:04:51 -05:00
Steven Masley 2c8b264d78 chore: remove multi-organization and custom role experiment (#14862)
Closes https://github.com/coder/coder/issues/14704

---------

Co-authored-by: Kayla Washburn-Love <mckayla@hey.com>
2024-09-27 14:06:16 -05:00
Jon Ayers 3fdeaf7b24 feat: add endpoint for fetching workspace proxy keys (#14789) 2024-09-26 21:01:49 +01:00
Dean Sheather cf8be4eac5 feat: add resume support to coordinator connections (#14234) 2024-08-20 17:16:49 +10:00
Jon Ayers 203f48af56 fix: extend locking in wsproxy to avoid race (and fix flake) (#14167) 2024-08-05 14:30:44 -04:00
Kayla Washburn-Love bf4b7abf14 chore(coderd): allow creating workspaces without specifying an organization (#14048) 2024-07-30 10:44:02 -06:00
Steven Masley 7ea1a4c686 chore: protect organization endpoints with license (#14001)
* chore: move multi-org endpoints into enterprise directory

All multi-organization features are gated behind "premium" licenses. Enterprise licenses can no longer
access organization CRUD.
2024-07-25 16:07:53 -05:00
Dean Sheather 6c94dd4f23 chore: add DRPC server implementation for network telemetry (#13675) 2024-07-02 01:50:52 +10:00
Kyle Carberry 0793a4b35b feat: add cross-origin reporting for telemetry in the dashboard (#13612)
* feat: add cross-origin reporting for telemetry in the dashboard

* Respect the telemetry flag

* Fix embedded metadata

* Fix compilation error

* Fix linting
2024-06-20 15:19:45 -04:00
Spike Curtis 88eb6ce378 fix: fix flake in TestDERPEndToEnd (#13564) 2024-06-13 11:38:51 +04:00
Garrett Delfosse 5789ea5397 chore: move stat reporting into workspacestats package (#13386) 2024-05-29 11:49:08 -04:00
Colin Adler 4d5a7b2d56 chore(codersdk): move all tailscale imports out of codersdk (#12735)
Currently, importing `codersdk` just to interact with the API requires
importing tailscale, which causes builds to fail unless manually using
our fork.
2024-03-26 12:44:31 -05:00
Dean Sheather 2b773f9034 fix: allow proxy version mismatch (with warning) (#12433) 2024-03-20 18:24:18 +00:00
Dean Sheather cf50461ab4 fix: prevent single replica proxies from staying unhealthy (#12641)
In the peer healthcheck code, when an error pinging peers is detected we
write a "replicaErr" string with the error reason. However, if there are
no peer replicas to ping we returned early without setting the string to
empty. This would cause replicas that had peers (which were failing) and
then the peers left to permanently show an error until a new peer
appeared.

Also demotes DERP replica checking to a "warning" rather than an "error"
which should prevent the primary from removing the proxy from the region
map if DERP meshing is non-functional. This can happen without causing
problems if the peer is shutting down so we don't want to disrupt
everything if there isn't an issue.
2024-03-18 23:45:25 +10:00
Ammar Bandukwala 496232446d chore(cli): replace clibase with external coder/serpent (#12252) 2024-03-15 11:24:38 -05:00
Dean Sheather d2a5b31b2b feat: add derp mesh health checking in workspace proxies (#12222) 2024-03-08 16:31:40 +10:00
Colin Adler 6b0b87eb27 fix: add --block-direct-connections to wsproxies (#12182) 2024-03-07 23:45:59 -06:00
Colin Adler 66154f937e fix(coderd): pass block endpoints into servertailnet (#12149) 2024-03-08 05:29:54 +00:00
Dean Sheather 0016b0200b chore: add test for workspace proxy derp meshing (#12220)
- Reworks the proxy registration loop into a struct (so I can add a `RegisterNow` method)
- Changes the proxy registration loop interval to 15s (previously 30s)
- Adds test which tests bidirectional DERP meshing on all possible paths between 6 workspace proxy replicas

Related to https://github.com/coder/customers/issues/438
2024-03-04 23:40:15 -08:00
Colin Adler e5d911462f fix(tailnet): enforce valid agent and client addresses (#12197)
This adds the ability for `TunnelAuth` to also authorize incoming wireguard node IPs, preventing agents from reporting anything other than their static IP generated from the agent ID.
2024-03-01 09:02:33 -06:00
Dean Sheather ee7828a166 chore: fix wsproxy test flake (#12280)
* chore: fix wsproxy test flake

* fixup! chore: fix wsproxy test flake
2024-02-23 21:19:54 +10:00
Spike Curtis d3983e4dba feat: add logging to client tailnet yamux (#11908)
Adds logging to yamux when used for tailnet client connections, e.g. CLI and wsproxy.  This could be useful for debugging connection issues with tailnet v2 API.
2024-01-30 09:58:59 +04:00
Spike Curtis 1e8a9c09fe chore: remove legacy wsconncache (#11816)
Fixes #8218

Removes `wsconncache` and related "is legacy?" functions and API calls that were used by it.

The only leftover is that Agents still use the legacy IP, so that back level clients or workspace proxies can dial them correctly.

We should eventually remove this: #11819
2024-01-30 07:56:36 +04:00
Spike Curtis 3e0e7f8739 feat: check agent API version on connection (#11696)
fixes #10531

Adds a check for `version` on connection to the Agent API websocket endpoint.  This is primarily for future-proofing, so that up-level agents get a sensible error if they connect to a back-level Coderd.

It also refactors the location of the `CurrentVersion` variables, to be part of the `proto` packages, since the versions refer to the APIs defined therein.
2024-01-23 14:27:49 +04:00
Spike Curtis f01cab9894 feat: use tailnet v2 API for coordination (#11638)
This one is huge, and I'm sorry.

The problem is that once I change `tailnet.Conn` to start doing v2 behavior, I kind of have to change it everywhere, including in CoderSDK (CLI), the agent, wsproxy, and ServerTailnet.

There is still a bit more cleanup to do, and I need to add code so that when we lose connection to the Coordinator, we mark all peers as LOST, but that will be in a separate PR since this is big enough!
2024-01-22 11:07:50 +04:00
Kayla Washburn-Love 80eac73ed1 chore: remove useLocalStorage hook (#11712) 2024-01-19 16:04:19 -07:00
Steven Masley 6bb1a34a37 fix: allow ports in wildcard url configuration (#11657)
* fix: allow ports in wildcard url configuration

This just forwards the port to the ui that generates urls.
Our existing parsing + regex already supported ports for
subdomain app requests.
2024-01-18 09:44:05 -06:00
Steven Masley b246f08d84 chore: move app URL parsing to its own package (#11651)
* chore: move app url parsing to it's own package
2024-01-17 10:41:42 -06:00
Cian Johnston d583acad00 fix(coderd): workspaceapps: update last_used_at when workspace app reports stats (#11603)
- Adds a new query BatchUpdateLastUsedAt
- Adds calls to BatchUpdateLastUsedAt in app stats handler upon flush
- Passes a stats flush channel to apptest setup scaffolding and updates unit tests to assert modifications to LastUsedAt.
2024-01-16 14:06:39 +00:00
Steven Masley f5a9f5ca3d chore: handle errors in wsproxy server for cli using buildinfo (#11584)
Cli errors are pretty formatted. This handles nested pretty types. Before it found the first error it could understand and return that. Now it will print the full error stack with more information.

To prevent information loss, a "[Trace=...]" was added to capture some extra error context for debugging.
2024-01-11 16:55:34 -06:00
Colin Adler 4a0808259a fix: ensure wsproxy MultiAgent is closed when websocket dies (#11414)
The `SingleTailnet` behavior only checked to see if the `MultiAgent` was
closed, but the websocket error was not being propogated into the
`MultiAgent`, causing it to never be swapped for a new working one.

Fixes https://github.com/coder/coder/issues/11401

Before:
```
Coder Workspace Proxy v0.0.0-devel+85ff030 - Your Self-Hosted Remote Development Platform
Started HTTP listener at http://0.0.0.0:3001

View the Web UI: http://127.0.0.1:3001

==> Logs will stream in below (press ctrl+c to gracefully exit):
2024-01-04 20:11:56.376 [warn]  net.workspace-proxy.servertailnet: broadcast server node to agents ...
    error= write message:
               github.com/coder/coder/v2/enterprise/wsproxy/wsproxysdk.(*remoteMultiAgentHandler).writeJSON
                   /home/coder/coder/enterprise/wsproxy/wsproxysdk/wsproxysdk.go:524
             - failed to write msg: WebSocket closed: failed to read frame header: EOF
```

After:
```
Coder Workspace Proxy v0.0.0-devel+12f1878 - Your Self-Hosted Remote Development Platform
Started HTTP listener at http://0.0.0.0:3001

View the Web UI: http://127.0.0.1:3001

==> Logs will stream in below (press ctrl+c to gracefully exit):
2024-01-04 20:26:38.545 [warn]  net.workspace-proxy.servertailnet: multiagent closed, reinitializing
2024-01-04 20:26:38.546 [erro]  net.workspace-proxy.servertailnet: reinit multi agent ...
    error= dial coordinate websocket:
               github.com/coder/coder/v2/enterprise/wsproxy/wsproxysdk.(*Client).DialCoordinator
                   /home/coder/coder/enterprise/wsproxy/wsproxysdk/wsproxysdk.go:454
             - failed to WebSocket dial: failed to send handshake request: Get "http://127.0.0.1:3000/api/v2/workspaceproxies/me/coordinate": dial tcp 127.0.0.1:3000: connect: connection refused
2024-01-04 20:26:38.587 [erro]  net.workspace-proxy.servertailnet: reinit multi agent ...
    error= dial coordinate websocket:
               github.com/coder/coder/v2/enterprise/wsproxy/wsproxysdk.(*Client).DialCoordinator
                   /home/coder/coder/enterprise/wsproxy/wsproxysdk/wsproxysdk.go:454
             - failed to WebSocket dial: failed to send handshake request: Get "http://127.0.0.1:3000/api/v2/workspaceproxies/me/coordinate": dial tcp 127.0.0.1:3000: connect: connection refusedhandshake request: Get "http://127.0.0.1:3000/api/v2/workspaceproxies/me/coordinate": dial tcp 127.0.0.1:3000: connect: connection refused
2024-01-04 20:26:40.446 [info]  net.workspace-proxy.servertailnet: successfully reinitialized multiagent  agents=0  took=1.900892615s
```
2024-01-11 11:37:09 -06:00
Steven Masley fb29af664b fix: relax csrf to exclude path based apps (#11430)
* fix: relax csrf to exclude path based apps
* add unit test to verify path based apps are not CSRF blocked
2024-01-08 22:33:57 +00:00
Steven Masley dd05a6b13a chore: mockgen archived, moved to new location (#11415)
* chore: mockgen archived, moved to new location
2024-01-04 18:35:56 -06:00
Spike Curtis 48cd4c3a10 feat: promote single-tailnet out of experimental (#11366) 2024-01-04 09:27:36 +04:00
Cian Johnston 1ef96022b0 feat(coderd): add provisioner build version and api_version on serve (#11369)
* assert provisioner daemon version and api_version in unit tests
* add build info in HTTP header, extract codersdk.BuildVersionHeader
* add api_version to codersdk.ProvisionerDaemon
* testutil.MustString -> testutil.MustRandString
2024-01-03 09:01:57 +00:00
Steven Masley fbda21a9f2 feat: move moons experiment to ga (released) (#11285)
* feat: release moons experiment as ga
2023-12-19 14:40:22 -06:00
Spike Curtis b4ca1d6579 feat: include server agent API version in buildinfo (#11057)
First part of #10340 -- we need this version to compare with agents to tell if they are on a deprecated Agent API version
2023-12-08 12:50:25 +04:00
Cian Johnston 1e349f0d50 feat(cli): allow specifying name of provisioner daemon (#11077)
- Adds a --name argument to provisionerd start
- Plumbs through name to integrated and external provisioners
- Defaults to hostname if not specified for external, hostname-N for integrated
- Adds cliutil.Hostname
2023-12-07 16:59:13 +00:00
Kayla Washburn c194119689 chore: rename AwaitTemplateVersionJobCompleted and AwaitWorkspaceBuildJobCompleted (#10003) 2023-10-03 11:02:56 -06:00
Cian Johnston 93ef696b57 refactor(agent): add agenttest.New helper function (#9812)
* Adds agenttest.New() helper function
* Makes sure agent gets closed on test cleanup
* Makes sure you don't forget to set session token
* Sets the agent and client logger automatically
2023-09-26 12:05:19 +01:00
Colin Adler c900b5f8df feat: add single tailnet support to pgcoord (#9351) 2023-09-21 14:30:48 -05:00