Commit Graph

222 Commits

Author SHA1 Message Date
ケイラ f670bc31f5 chore: update testutil chan helpers (#17408) 2025-04-16 10:37:09 -06:00
Michael Suchacz 06d39151dc feat: extend request logs with auth & DB info (#17304)
Closes #16903
2025-04-15 13:27:23 +02:00
Danny Kopping 0b18e458f4 fix: reduce excessive logging when database is unreachable (#17363)
Fixes #17045

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
2025-04-15 10:55:30 +02:00
Spike Curtis 9e2af3e127 feat: add configurable DNS match domain for tailnet connections (#17336)
Use the hostname suffix to set the DNS match domain when creating a Tailnet as part of the vpn `Tunnel`.

part of: #16828
2025-04-11 15:00:48 +04:00
Spike Curtis 2c573dc023 feat: vpn uses WorkspaceHostnameSuffix for DNS names (#17335)
Use the hostname suffix to set DNS names as programmed into the DNS service and returned by the vpn `Tunnel`.

part of: #16828
2025-04-11 13:24:20 +04:00
Ethan 3c1cb5d05a chore: add generic DNS record for checking if Coder Connect is running (#17298)
Closes https://github.com/coder/internal/issues/466

```
$ dig -6 @fd60:627a:a42b::53 is.coder--connect--enabled--right--now.coder AAAA

; <<>> DiG 9.10.6 <<>> -6 @fd60:627a:a42b::53 is.coder--connect--enabled--right--now.coder AAAA
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62390
;; flags: qr aa rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;is.coder--connect--enabled--right--now.coder. IN AAAA

;; ANSWER SECTION:
is.coder--connect--enabled--right--now.coder. 2	IN AAAA	fd60:627a:a42b::53

;; Query time: 3 msec
;; SERVER: fd60:627a:a42b::53#53(fd60:627a:a42b::53)
;; WHEN: Wed Apr 09 16:59:18 AEST 2025
;; MSG SIZE  rcvd: 134
```

Hostname considerations:
- Workspace names, usernames, and agent names can't have double hyphens, so this name can't conflict with a real Coder Connect hostname.
- Components can't start or end with hyphens according to [RFC 952](https://www.rfc-editor.org/rfc/rfc952.html)
- DNS records can't have hyphens in the 3rd and 4th positions, as to not conflict with IDNs https://datatracker.ietf.org/doc/html/rfc5891
2025-04-11 13:59:25 +10:00
Jon Ayers 17ddee05e5 chore: update golang to 1.24.1 (#17035)
- Update go.mod to use Go 1.24.1
- Update GitHub Actions setup-go action to use Go 1.24.1
- Fix linting issues with golangci-lint by:
  - Updating to golangci-lint v1.57.1 (more compatible with Go 1.24.1)

🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <claude@anthropic.com>
2025-03-26 01:56:39 -05:00
Mathias Fredriksson b79167293c chore(Makefile): update golden files as part of make gen (#17039)
Updating golden files is an unnecessary extra step in addition to gen
that is easily overlooked, leading to the developer noticing the issue
in CI leading to lost developer time waiting for tests to complete.
2025-03-21 13:04:30 +00:00
Eng Zer Jun 04c33968cf refactor: replace golang.org/x/exp/slices with slices (#16772)
The experimental functions in `golang.org/x/exp/slices` are now
available in the standard library since Go 1.21.

Reference: https://go.dev/doc/go1.21#slices

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2025-03-04 00:46:49 +11:00
Thomas Kosiewski d0e2060692 feat(agent): add second SSH listener on port 22 (#16627)
Fixes: https://github.com/coder/internal/issues/377

Added an additional SSH listener on port 22, so the agent now listens on both, port one and port 22.

---
Change-Id: Ifd986b260f8ac317e37d65111cd4e0bd1dc38af8
Signed-off-by: Thomas Kosiewski <tk@coder.com>
2025-03-03 04:47:42 +01:00
Hugo Dutka 44499315ed chore: reduce log volume on server startup (#16608)
Addresses https://github.com/coder/coder/issues/16231.

This PR reduces the volume of logs we print after server startup in
order to surface the web UI URL better.

Here are the logs after the changes a couple of seconds after starting
the server:

<img width="868" alt="Screenshot 2025-02-18 at 16 31 32"
src="https://github.com/user-attachments/assets/786dc4b8-7383-48c8-a5c3-a997c01ca915"
/>

The warning is due to running a development site-less build. It wouldn't
show in a release build.
2025-02-20 16:33:14 +01:00
Mathias Fredriksson b07b33ec9d feat: add agentapi endpoint to report connections for audit (#16507)
This change adds a new `ReportConnection` endpoint to the `agentapi`.

The protocol version was bumped previously, so it has been omitted here.

This allows the agent to report connection events, for example when the
user connects to the workspace via SSH or VS Code.

Updates #15139
2025-02-20 14:52:01 +02:00
Ethan 92870f0642 fix: force lowercase DNS hostnames for VPN (#16613)
Closes https://github.com/coder/coder-desktop-macos/issues/54


I've also double checked that agents with hyphens & underscores play nice once programmed, as do workspaces with hyphens:

```
$ ping6 main_agent-1.main-workspace.admin.coder
PING6(56=40+8+8 bytes) fd60:627a:a42b:4e91:88c0:da4a:df4f:b54e --> fd60:627a:a42b:46d4:8b55:e549:e498:e6f5
```
also fine in Firefox & Safari, though I'm a little surprised underscores work.
2025-02-20 13:02:45 +11:00
Vincent Vielle bc609d0056 feat: integrate agentAPI with resources monitoring logic (#16438)
As part of the new resources monitoring logic - more specifically for
OOM & OOD Notifications , we need to update the AgentAPI , and the
agents logic.

This PR aims to do it, and more specifically :  
We are updating the AgentAPI & TailnetAPI to version 24 to add two new
methods in the AgentAPI :
- One method to fetch the resources monitoring configuration
- One method to push the datapoints for the resources monitoring.

Also, this PR adds a new logic on the agent side, with a routine running
and ticking - fetching the resources usage each time , but also storing
it in a FIFO like queue.

Finally, this PR fixes a problem we had with RBAC logic on the resources
monitoring model, applying the same logic than we have for similar
entities.
2025-02-14 10:28:15 +01:00
Mathias Fredriksson c069563af1 test: fix use of t.Logf where t.Log would suffice (#16328) 2025-01-29 14:35:04 +00:00
Dean Sheather 28088165a1 chore: get TUN/DNS working on Windows for CoderVPN (#16310) 2025-01-29 08:09:36 +00:00
Thomas Kosiewski 1336925c9f feat(flake.nix): switch dogfood dev image to buildNixShellImage from dockerTools (#16223)
Replace Depot build action with Nix for Nix dogfood image builds

The dogfood Nix image is now built using Nix's native container tooling instead of Depot. This change:

- Adds Nix setup steps to the GitHub Actions workflow
- Removes the Dockerfile.nix in favor of a Nix-native container build
- Updates the flake.nix to support building Docker images
- Introduces a hash file to track Nix-related changes
- Updates the vendorHash for Go dependencies

Change-Id: I4e011fe3a19d9a1375fbfd5223c910e59d66a5d9
Signed-off-by: Thomas Kosiewski <tk@coder.com>
2025-01-28 16:38:37 +01:00
Cian Johnston 7b88776403 chore(testutil): add testutil.GoleakOptions (#16070)
- Adds `testutil.GoleakOptions` and consolidates existing options to
this location
- Pre-emptively adds required ignore for this Dependabot PR to pass CI
https://github.com/coder/coder/pull/16066
2025-01-08 15:38:37 +00:00
Spike Curtis 2c7f8ac65f chore: migrate to coder/websocket 1.8.12 (#15898)
Migrates us to `coder/websocket` v1.8.12 rather than `nhooyr/websocket` on an older version.

Works around https://github.com/coder/websocket/issues/504 by adding an explicit test for `xerrors.Is(err, io.EOF)` where we were previously getting `io.EOF` from the netConn.
2024-12-19 00:51:30 +04:00
Ethan ba48069325 chore: implement CoderVPN client & tunnel (#15612)
Addresses #14734.

This PR wires up `tunnel.go` to a `tailnet.Conn` via the new `/tailnet` endpoint, with all the necessary controllers such that a VPN connection can be started, stopped and inspected via the CoderVPN protocol.
2024-12-05 13:30:22 +11:00
Spike Curtis 029cd5d064 fix(tailnet): prevent redial after Coord graceful restart (#15586)
fixes: https://github.com/coder/internal/issues/217

> There are a couple problems:
>
> One is that we assert the RPCs succeed, but if the pipeDialer context is canceled at the end of the test, then these assertions happen after the test is officially complete, which panics and affects other tests.

This converts these to just return the error rather than assert.

> The other is that the retrier is slightly bugged: if the current retry delay is 0 AND the ctx is done, (e.g. after successfully connecting and then gracefully disconnecting), then retrier.Wait(c.ctx) is racy and could return either true or false.

Fixes the phantom redial by explicitly checking the context before dialing. Also, in the test, we assert that the controller is closed before completing the test.
2024-11-19 11:37:11 +04:00
Spike Curtis 85c3c4c025 feat(tailnet): add alias with username and short alias to DNS (#15585)
Adds DNS aliases of the form `<agent>.<workspace>.<username>.coder.` and
`<workspace>.coder.`
2024-11-19 11:23:17 +04:00
Spike Curtis 1aaaad998c fix: fix listening flake on TestTailnet_ForcesWebSockets (#15555)
Fixes a test flake on TestTailnet_ForcesWebsockets like:

```
    t.go:106: 2024-11-18 07:44:25.939 [debu]  w2: dial tcp  addr_port="[fd7a:115c:a1e0:46cc:bd8e:400d:1bc6:f6ac]:35565"
    t.go:106: 2024-11-18 07:44:25.943 [debu]  w1.net.netstack: netstack: could not connect to local server at 127.0.0.1:35565 (or [::1]:35565)%!(EXTRA *net.OpError=dial tcp [::1]:35565: connect: connection refused)
    conn_test.go:146:
        	Error Trace:	/Users/spike/repos/coder/tailnet/conn_test.go:146
        	Error:      	Received unexpected error:
        	            	connect tcp [fd7a:115c:a1e0:46cc:bd8e:400d:1bc6:f6ac]:35565: connection was refused
        	Test:       	TestTailnet/ForcesWebSockets
    t.go:106: 2024-11-18 07:44:25.945 [info]  w1: closing tailnet Conn
    t.go:106: 2024-11-18 07:44:25.945 [debu]  w1: closing configMaps configLoop
    t.go:106: 2024-11-18 07:44:25.945 [debu]  w1: closing nodeUpdater updateLoop
    t.go:106: 2024-11-18 07:44:25.945 [debu]  w1: closed netstack
    conn_test.go:135:
        	Error Trace:	/Users/spike/repos/coder/tailnet/conn_test.go:135
        	            				/Users/spike/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.8.darwin-arm64/src/runtime/asm_arm64.s:1222
        	Error:      	Received unexpected error:
        	            	connection closed:
        	            	    github.com/coder/coder/v2/tailnet.init
        	            	        <autogenerated>:1
        	Test:       	TestTailnet/ForcesWebSockets
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x0 pc=0x1039771dc]

goroutine 2224 [running]:
github.com/coder/coder/v2/tailnet_test.TestTailnet.func3.2()
	/Users/spike/repos/coder/tailnet/conn_test.go:136 +0x7c
created by github.com/coder/coder/v2/tailnet_test.TestTailnet.func3 in goroutine 109
	/Users/spike/repos/coder/tailnet/conn_test.go:133 +0x7dc
```

Test didn't synchronize listening on the port before dialing it. 

It also has a nil pointer deference when the test fails, which causes a bunch of unrelated output. Also fixed.
2024-11-18 16:05:16 +04:00
Spike Curtis 5861e516b9 chore: add standard test logger ignoring db canceled (#15556)
Refactors our use of `slogtest` to instantiate a "standard logger" across most of our tests.  This standard logger incorporates https://github.com/coder/slog/pull/217 to also ignore database query canceled errors by default, which are a source of low-severity flakes.

Any test that has set non-default `slogtest.Options` is left alone. In particular, `coderdtest` defaults to ignoring all errors. We might consider revisiting that decision now that we have better tools to target the really common flaky Error logs on shutdown.
2024-11-18 14:09:22 +04:00
Spike Curtis 747f7ce173 feat: add support for WorkspaceUpdates to WebsocketDialer (#15534)
closes #14730

Adds support for WorkspaceUpdates to the WebsocketDialer. This allows us to dial the new endpoint added in #14847 and connect it up to a `tailnet.Controllers` to connect to all agents over the tailnet.

I refactored the fakeWorkspaceUpdatesProvider to a mock and moved it to `tailnettest` so it could be more easily reused.  The Mock is a little more full-featured.
2024-11-18 10:54:11 +04:00
Spike Curtis 16992ee548 feat(tailnet): add workspace updates support to Controller (#15529)
re: #14730

Adds support in `tailnet.Controller` for WorkspaceUpdates.

Also checks configured controllers against the clients returned by the dialer, so that if we connect with a dialer that doesn't support an RPC (for instance the in-memory dialer for ServerTailnet doesn't support WorkspaceUpdates), we throw an error if there is a controller expecting it.
2024-11-18 10:41:19 +04:00
Spike Curtis 40802958e9 fix: use explicit api versions for agent and tailnet (#15508)
Bumps the Tailnet and Agent API version 2.3, and creates some extra controls and machinery around these versions.

What happened is that we accidentally shipped two new API features without bumping the version.  `ScriptCompleted` on the Agent API in Coder v2.16 and `RefreshResumeToken` on the Tailnet API in Coder v2.15.

Since we can't easily retroactively bump the versions, we'll roll these changes into API version 2.3 along with the new WorkspaceUpdates RPC, which hasn't been released yet.  That means there is some ambiguity in Coder v2.15-v2.17 about exactly what methods are supported on the Tailnet and Agent APIs.  This isn't great, but hasn't caused us major issues because 

1. RefreshResumeToken is considered optional, and clients just log and move on if the RPC isn't supported. 
2. Agents basically never get started talking to a Coderd that is older than they are, since the agent binary is normally downloaded from Coderd at workspace start.

Still it's good to get things squared away in terms of versions for SDK users and possible edge cases around client and server versions.

To mitigate against this thing happening again, this PR also:

1. adds a CODEOWNERS for the API proto packages, so I'll review changes
2. defines interface types for different API versions, and has the agent explicitly use a specific version.  That way, if you add a new method, and try to use it in the agent without thinking explicitly about versions, it won't compile.

With the protocol controllers stuff, we've sort of already abstracted the Tailnet API such that the interface type strategy won't work, but I'll work on getting the Controller to be version aware, such that it can check the API version it's getting against the controllers it has -- in a later PR.
2024-11-15 11:16:28 +04:00
Spike Curtis 916df4d411 feat: set DNS hostnames in workspace updates controller (#15507)
re: #14730

Adds support for the workspace updates protocol controller to also program DNS names for each agent.

Right now, we only program names like `myagent.myworkspace.me.coder` and `myworkspace.coder.` (if there is exactly one agent in the workspace).  We also want to support `myagent.myworkspace.username.coder.`, but for that we need to update WorkspaceUpdates RPC to also send the workspace owner's username, which will be in a separate PR.
2024-11-15 11:00:19 +04:00
Spike Curtis 08216aaad6 feat: add workspace updates controller (#15506)
re: #14730

Adds a protocol controller for WorkspaceUpdates RPC that takes all the agents we learn about over the RPC, and programs them into the Coordination controller, so that we set up tunnels to all the agents.

Handling DNS is in a PR up the stack, as is actually wiring it up to anything.
2024-11-14 16:16:04 +04:00
Ethan 5d853fcfd8 chore: support adding dns hosts to tailnet.Conn (#15419)
Relates to #14718.

The remaining changes (regarding the Tailscale DNS service) will need to
be made on `coder/tailscale`.
2024-11-08 09:37:56 +00:00
Spike Curtis e5661c2748 feat: add support for multiple tunnel destinations in tailnet (#15409)
Closes #14729

Expands the Coordination controller used by the CLI client to allow multiple tunnel destinations (agents).  Our current client uses just one, but this unifies the logic so that when we add Coder VPN, 1 is just a special case of "many."
2024-11-08 13:32:07 +04:00
Spike Curtis 8c00ebc6ee chore: refactor ServerTailnet to use tailnet.Controllers (#15408)
chore of #14729

Refactors the `ServerTailnet` to use `tailnet.Controller` so that we reuse logic around reconnection and handling control messages, instead of reimplementing.  This unifies our "client" use of the tailscale API across CLI, coderd, and wsproxy.
2024-11-08 13:18:56 +04:00
Spike Curtis 718722af1b chore: refactor tailnetAPIConnector to tailnet.Controller (#15361)
Refactors `workspacesdk.tailnetAPIConnector` as a `tailnet.Controller` to reuse all the reconnection and graceful disconnect logic.

chore re: #14729
2024-11-08 10:10:54 +04:00
Spike Curtis 2d061e698d chore: refactor tailnetAPIConnector to use dialer (#15347)
refactors `tailnetAPIConnector` to use the `Dialer` interface in `tailnet`, introduced lower in this stack of PRs. This will let us use the same Tailnet API handling code across different things that connect to the Tailnet API (CLI client, coderd, workspace proxies, and soon: Coder VPN).

chore re: #14729
2024-11-07 17:24:19 +04:00
Ethan 098728138f chore: add a tailscale router that uses the CoderVPN protocol (#15391)
Closes #14732.
2024-11-07 22:45:17 +11:00
Spike Curtis d7e86278c8 chore: add resume token controller (#15346)
Implements a controller for the Tailnet API resume token RPC, by refactoring from `workspacesdk`.

chore re: #14729
2024-11-07 11:32:20 +04:00
Spike Curtis 335e4ab6bf chore: refactor sending telemetry (#15345)
Implements a tailnet API Telemetry controller by refactoring from `workspacesdk`.

chore re: #14729
2024-11-06 20:23:23 +04:00
Spike Curtis 9126cd78a6 chore: refactor DERP setting loop (#15344)
Implements a Tailnet API DERP controller by refactoring from `workspacesdk`

chore re: #14729
2024-11-06 20:04:05 +04:00
Spike Curtis 886dcbec84 chore: refactor coordination (#15343)
Refactors the way clients of the Tailnet API (clients of the API, which include both workspace "agents" and "clients") interact with the API.  Introduces the idea of abstract "controllers" for each of the RPCs in the API, and implements a Coordination controller by refactoring from `workspacesdk`.

chore re: #14729
2024-11-05 13:50:10 +04:00
Ethan 871cc05e99 chore: add a dns.OSConfigurator implementation that uses the CoderVPN protocol (#15342)
Closes #14733.
2024-11-05 19:23:16 +11:00
Ethan b1298a3c1e feat: add WorkspaceUpdates tailnet RPC (#14847)
Closes #14716
Closes #14717

Adds a new user-scoped tailnet API endpoint (`api/v2/tailnet`) with a new RPC stream for receiving updates on workspaces owned by a specific user, as defined in #14716. 

When a stream is started, the `WorkspaceUpdatesProvider` will begin listening on the user-scoped pubsub events implemented in #14964. When a relevant event type is seen (such as a workspace state transition), the provider will query the DB for all the workspaces (and agents) owned by the user. This gets compared against the result of the previous query to produce a set of workspace updates. 

Workspace updates can be requested for any user ID, however only workspaces the authorised user is permitted to `ActionRead` will have their updates streamed.
Opening a tunnel to an agent requires that the user can perform `ActionSSH` against the workspace containing it.
2024-11-01 14:53:53 +11:00
Jon Ayers cd890aa3a0 feat: enable key rotation (#15066)
This PR contains the remaining logic necessary to hook up key rotation
to the product.
2024-10-25 17:14:35 +01:00
Spike Curtis 8785a51b09 feat: include Coder service prefix on agents (#14944)
fixes #14715

Configures agents to use an address both in the Tailscale service prefix and the new Coder service prefix. Also modifies the Coordinator auth to allow the new prefix.

Updates `coder/tailscale` to include https://github.com/coder/tailscale/pull/62 which fixes a bug around forwarding TCP connections to localhost.  This functionality is tested in the modifications to `TestAgent_Dial`.
2024-10-04 10:16:33 +04:00
Spike Curtis 7d9f5ab81d chore: add Coder service prefix to tailnet (#14943)
re: #14715

This PR introduces the Coder service prefix: `fd60:627a:a42b::/48` and refactors our existing code as calling the Tailscale service prefix explicitly (rather than implicitly).

Removes the unused `Addresses` agent option. All clients today assume they can compute the Agent's IP address based on its UUID, so an agent started with a custom address would break things.
2024-10-04 10:04:10 +04:00
Ethan b8944074c4 chore: improve coder server ux (#14761) 2024-09-24 13:16:36 +10:00
Spike Curtis 2df9a3e554 fix: fix tailnet remoteCoordination to wait for server (#14666)
Fixes #12560

When gracefully disconnecting from the coordinator, we would send the Disconnect message and then close the dRPC stream.  However, closing the dRPC stream can cause the server not to process the Disconnect message, since we use the stream context in a `select` while sending it to the coordinator.

This is a product bug uncovered by the flake, and probably results in us failing graceful disconnect some minority of the time.

Instead, the `remoteCoordination` (and `inMemoryCoordination` for consistency) should send the Disconnect message and then wait for the coordinator to hang up (on some graceful disconnect timer, in the form of a context).
2024-09-16 09:24:30 +04:00
Spike Curtis d6154c4310 chore: remove tailnet v1 API support (#14641)
Drops support for v1 of the tailnet API, which was the original coordination protocol where we only sent node updates, never marked them lost or disconnected.

v2 of the tailnet API went GA for CLI clients in Coder 2.8.0, so clients older than that would stop working.
2024-09-12 07:56:31 +04:00
Spike Curtis fb3523b37f chore: remove legacy AgentIP address (#14640)
Removes the support for the Agent's "legacy IP" which was a hardcoded IP address all agents used to use, before we introduced "single tailnet". Single tailnet went GA in 2.7.0.
2024-09-12 07:40:19 +04:00
Spike Curtis 5bd19f8ba3 fix: fix flake in TestWorkspaceAgentClientCoordinate_ResumeToken (#14642)
fixes #14365

I bet what's going on is that in `connectToCoordinatorAndFetchResumeToken()` we call `Coordinate()`, send a message on the `Coordinate` client and then close it in rapid succession. We don't wait around for a response from the coordinator, so dRPC is likely aborting the call `Coordinate()` in the backend because the stream is closed before it even gets a chance.

Instead of using the Coordinator to record the peer ID assigned on the API call, we can wrap the resume token provider, since we call that API _and_ wait for a response. This also affords the opportunity to directly assert we get called with the right token.
2024-09-11 16:32:47 +04:00
Spike Curtis 7b39f6b0d4 fix: improves coordination logging (#14556) 2024-09-04 15:10:43 +04:00