Closes https://github.com/coder/vscode-coder/issues/447
Closes https://github.com/coder/jetbrains-coder/issues/543
Closes https://github.com/coder/coder-jetbrains-toolbox/issues/21
This PR adds Coder Connect support to `coder ssh --stdio`.
When connecting to a workspace, if `--force-new-tunnel` is not passed, the CLI will first do a DNS lookup for `<agent>.<workspace>.<owner>.<hostname-suffix>`. If an IP address is returned, and it's within the Coder service prefix, the CLI will not create a new tailnet connection to the workspace, and instead dial the SSH server running on port 22 on the workspace directly over TCP.
This allows IDE extensions to use the Coder Connect tunnel, without requiring any modifications to the extensions themselves.
Additionally, `using_coder_connect` is added to the `sshNetworkStats` file, which the VS Code extension (and maybe Jetbrains?) will be able to read, and indicate to the user that they are using Coder Connect.
One advantage of this approach is that running `coder ssh --stdio` on an offline workspace with Coder Connect enabled will have the CLI wait for the workspace to build, the agent to connect (and optionally, for the startup scripts to finish), before finally connecting using the Coder Connect tunnel.
As a result, `coder ssh --stdio` has the overhead of looking up the workspace and agent, and checking if they are running. On my device, this meant `coder ssh --stdio <workspace>` was approximately a second slower than just connecting to the workspace directly using `ssh <workspace>.coder` (I would assume anyone serious about their Coder Connect usage would know to just do the latter anyway).
To ensure this doesn't come at a significant performance cost, I've also benchmarked this PR.
<details>
<summary>Benchmark</summary>
## Methodology
All tests were completed on `dev.coder.com`, where a Linux workspace running in AWS `us-west1` was created.
The machine running Coder Desktop (the 'client') was a Windows VM running in the same AWS region and VPC as the workspace.
To test the performance of specifically the SSH connection, a port was forwarded between the client and workspace using:
```
ssh -p 22 -L7001:localhost:7001 <host>
```
where `host` was either an alias for an SSH ProxyCommand that called `coder ssh`, or a Coder Connect hostname.
For latency, [`tcping`](https://www.elifulkerson.com/projects/tcping.php) was used against the forwarded port:
```
tcping -n 100 localhost 7001
```
For throughput, [`iperf3`](https://iperf.fr/iperf-download.php) was used:
```
iperf3 -c localhost -p 7001
```
where an `iperf3` server was running on the workspace on port 7001.
## Test Cases
### Testcase 1: `coder ssh` `ProxyCommand` that bicopies from Coder Connect
This case tests the implementation in this PR, such that we can write a config like:
```
Host codercliconnect
ProxyCommand /path/to/coder ssh --stdio workspace
```
With Coder Connect enabled, `ssh -p 22 -L7001:localhost:7001 codercliconnect` will use the Coder Connect tunnel. The results were as follows:
**Throughput, 10 tests, back to back:**
- Average throughput across all tests: 788.20 Mbits/sec
- Minimum average throughput: 731 Mbits/sec
- Maximum average throughput: 871 Mbits/sec
- Standard Deviation: 38.88 Mbits/sec
**Latency, 100 RTTs:**
- Average: 0.369ms
- Minimum: 0.290ms
- Maximum: 0.473ms
### Testcase 2: `ssh` dialing Coder Connect directly without a `ProxyCommand`
This is what we assume to be the 'best' way to use Coder Connect
**Throughput, 10 tests, back to back:**
- Average throughput across all tests: 789.50 Mbits/sec
- Minimum average throughput: 708 Mbits/sec
- Maximum average throughput: 839 Mbits/sec
- Standard Deviation: 39.98 Mbits/sec
**Latency, 100 RTTs:**
- Average: 0.369ms
- Minimum: 0.267ms
- Maximum: 0.440ms
### Testcase 3: `coder ssh` `ProxyCommand` that creates its own Tailnet connection in-process
This is what normally happens when you run `coder ssh`:
**Throughput, 10 tests, back to back:**
- Average throughput across all tests: 610.20 Mbits/sec
- Minimum average throughput: 569 Mbits/sec
- Maximum average throughput: 664 Mbits/sec
- Standard Deviation: 27.29 Mbits/sec
**Latency, 100 RTTs:**
- Average: 0.335ms
- Minimum: 0.262ms
- Maximum: 0.452ms
## Analysis
Performing a two-tailed, unpaired t-test against the throughput of testcases 1 and 2, we find a P value of `0.9450`. This suggests the difference between the data sets is not statistically significant. In other words, there is a 94.5% chance that the difference between the data sets is due to chance.
## Conclusion
From the t-test, and by comparison to the status quo (regular `coder ssh`, which uses gvisor, and is noticeably slower), I think it's safe to say any impact on throughput or latency by the `ProxyCommand` performing a bicopy against Coder Connect is negligible. Users are very much unlikely to run into performance issues as a result of using Coder Connect via `coder ssh`, as implemented in this PR.
Less scientifically, I ran these same tests on my home network with my Sydney workspace, and both throughput and latency were consistent across testcases 1 and 2.
</details>
* Refactors toolsdk.Tools to remove opaque `map[string]any` argument in
favour of typed args structs.
* Refactors toolsdk.Tools to remove opaque passing of dependencies via
`context.Context` in favour of a tool dependencies struct.
* Adds panic recovery and clean context middleware to all tools.
* Adds `GenericTool` implementation to allow keeping `toolsdk.All` with
uniform type signature while maintaining type information in handlers.
* Adds stricter checks to `patchWorkspaceAgentAppStatus` handler.
There were too many ways to configure the agentcontainers API resulting
in inconsistent behavior or features not being enabled. This refactor
introduces a control flag for enabling or disabling the containers API.
When disabled, all implementations are no-op and explicit endpoint
behaviors are defined. When enabled, concrete implementations are used
by default but can be overridden by passing options.
* Updates default Coder prompt.
* Skips the directions to report tasks if the pre-requisites are not
available (agent token and app slug).
* Adds the capability to override the default Coder prompt via
`CODER_MCP_CLAUDE_CODER_PROMPT`.
fixes#16828
With all the recent changes, I believe it is now safe to change the Call to Action for `config-ssh` to use the hostname suffix rather than prefix if it was set.
relates to #16828
Changes SSH config so that suffixes only match if Coder Connect is not running / available. This means that we will use the existing Coder Connect tunnel if it is available, rather than creating a new tunnel via `coder ssh --stdio`.
Adds a new hidden subcommand `coder connect exists <hostname>` that checks if the name exists via Coder Connect. This will be used in SSH config to match only if Coder Connect is unavailable for the hostname in question, so that the SSH client will directly dial the workspace over an existing Coder Connect tunnel.
Also refactors the way we inject a test DNS resolver into the lookup functions so that we can test from outside the `workspacesdk` package.
Fixes two issues with the MCP server:
- Ensures we have a non-null schema, as the following schema was making
claude-code unhappy:
```
"inputSchema": { "type": "object", "properties": null },
```
- Skip adding the coder_report_task tool if an agent client is not
available. Otherwise the agent may try to report tasks and get confused.
- Refactors existing `mcp` package to use `kylecarbs/aisdk-go` and moves
to `codersdk/toolsdk` package.
- Updates existing MCP server implementation to use `codersdk/toolsdk`
Co-authored-by: Kyle Carberry <kyle@coder.com>
Fixes https://github.com/coder/internal/issues/272
* Increases healthcheck timeout in tests. This seems to be the most
usual cause of test failures.
* Adds a non-nilness check before caching a healthcheck report.
* Modifies the HTTP response code to 503 (was 404) when no healthcheck
report is available. 503 seems to be a better indicator of the server
state in this case, whereas 404 could be misinterpreted as a typo in the
healthcheck URL.
Should hopefully fix https://github.com/coder/internal/issues/282
Instead of picking a random port for the prometheus server, listen on
`:0` and read the port from the CLI stdout.
Wires up `config-ssh` command to use a hostname suffix if configured.
part of: #16828
e.g. `coder config-ssh --hostname-suffix spiketest` gives:
```
# ------------START-CODER-----------
# This section is managed by coder. DO NOT EDIT.
#
# You should not hand-edit this section unless you are removing it, all
# changes will be lost when running "coder config-ssh".
#
# Last config-ssh options:
# :hostname-suffix=spiketest
#
Host coder.* *.spiketest
ConnectTimeout=0
StrictHostKeyChecking=no
UserKnownHostsFile=/dev/null
LogLevel ERROR
ProxyCommand /home/coder/repos/coder/build/coder_config_ssh --global-config /home/coder/.config/coderv2 ssh --stdio --ssh-host-prefix coder. --hostname-suffix spiketest %h
# ------------END-CODER------------
```
Adds `hostname-suffix` flag to `coder ssh` command for use in SSH Config ProxyCommands.
Also enforces that Coder server doesn't start the suffix with a dot.
part of: #16828
Fixes an issue where old ssh configs that use the
`owner--workspace--agent` format will fail to properly use the `coder
ssh` command since we migrated off the `coder vscodessh` command.
Adds `hostname-suffix` as a Config SSH option that we get from Coderd, and also accept via a CLI flag.
It doesn't actually do anything with this value --- that's for PRs up the stack, since we need the `coder ssh` command to be updated to understand the suffix first.
Adds deployment option `CODER_WORKSPACE_HOSTNAME_SUFFIX`. This will eventually replace `CODER_SSH_HOSTNAME_PREFIX`, but we will do this slowly and support both for `coder ssh` for some time.
Note that the name is changed to "workspace" hostname, since this suffix will also be used for Coder Connect on Coder Desktop, which is not limited to SSH.
Changes the SSH host key seeding to use the owner username, workspace name, and agent name. This prevents SSH from complaining about a mismatched host key if you use Coder Desktop to connect, and delete and recreate your workspace with the same name. Previously this would generate a different key because the workspace ID changed.
We also include the owner's username in anticipation of using Coder Desktop to access shared workspaces (or as a superuser) down the road, so that workspaces with the same name owned by different users will not have the same key.
This change is **BREAKING** in a limited sense that early access users of Coder Desktop will see their SSH clients complain about host keys changing the first time each workspace is rebuilt with this code. It can be resolved by clearing your `.ssh/known_hosts` file of the Coder workspaces you access this way.
Closes https://github.com/coder/internal/issues/551
We've noticed lots of flakes in `go test -race` tests that use the echo provisioner. I believe the root cause of this to be https://github.com/coder/coder/pull/17012/, where we started mutating the `echo.Responses`. This only caused issues as we previously shared `echo.Responses` across multiple test cases.
This PR is therefore the same as https://github.com/coder/coder/pull/17128, but I believe this is all the cases where an `echo.Responses` is shared between tests - including tests that haven't flaked (yet).
There's a flake reported in https://github.com/coder/internal/issues/549
that was caused by the built-in Postgres failing to start. However, the
test was written in a way that didn't log the actual error which caused
Postgres to fail. This PR improves error logging in the affected test so
that the next time the error happens, we know what it is.
Adds a `coder exp mcp` command which will start a local MCP server
listening on stdio with the following capabilities:
* Show logged in user (`coder whoami`)
* List workspaces (`coder list`)
* List templates (`coder templates list`)
* Start a workspace (`coder start`)
* Stop a workspace (`coder stop`)
* Fetch a single workspace (no direct CLI analogue)
* Execute a command inside a workspace (`coder exp rpty`)
* Report the status of a task (currently a no-op, pending task support)
This can be tested as follows:
```
# Start a local Coder server.
./scripts/develop.sh
# Start a workspace. Currently, creating workspaces is not supported.
./scripts/coder-dev.sh create -t docker --yes
# Add the MCP to your Claude config.
claude mcp add coder ./scripts/coder-dev.sh exp mcp
# Tell Claude to do something Coder-related. You may need to nudge it to use the tools.
claude 'start a docker workspace and tell me what version of python is installed'
```
This does ~95% of the backend work required to integrate the AI work.
Most left to integrate from the tasks branch is just frontend, which
will be a lot smaller I believe.
The real difference between this branch and that one is the abstraction
-- this now attaches statuses to apps, and returns the latest status
reported as part of a workspace.
This change enables us to have a similar UX to in the tasks branch, but
for agents other than Claude Code as well. Any app can report status
now.
* Improves tests for webpush notifications
* Sets subscriber correctly in web push payload (without this,
notifications do not work in Safari)
* NOTE: for now, I'm using the Coder Access URL. Some push messaging
service don't like it when you use a non-HTTPS URL, so dropping a warn
log about this.
* Adds a service worker and context for push notifications
* Adds a button beside "Inbox" to enable / disable push notifications
Notes:
* ✅ Tested in in Firefox and Safari, and Chrome.
* Adds `codersdk.ExperimentWebPush` (`web-push`)
* Adds a `coderd/webpush` package that allows sending native push
notifications via `github.com/SherClockHolmes/webpush-go`
* Adds database tables to store push notification subscriptions.
* Adds an API endpoint that allows users to subscribe/unsubscribe, and
send a test notification (404 without experiment, excluded from API docs)
* Adds server CLI command to regenerate VAPID keys (note: regenerating
the VAPID keypair requires deleting all existing subscriptions)
---------
Co-authored-by: Kyle Carberry <kyle@carberry.com>
- Update go.mod to use Go 1.24.1
- Update GitHub Actions setup-go action to use Go 1.24.1
- Fix linting issues with golangci-lint by:
- Updating to golangci-lint v1.57.1 (more compatible with Go 1.24.1)
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>