Commit Graph

19 Commits

Author SHA1 Message Date
Cian Johnston 93b000776f fix(cli): revert #21583 (#22000)
Relates to https://github.com/coder/internal/issues/1217

This reverts commit f799cba395.

@deansheather reported that this breaks ControlMaster.

Investigating alternative fixes to coder/internal#1217
2026-02-09 09:56:33 +00:00
Cian Johnston f799cba395 fix(cli): allow coder ssh --stdio to exit when parent process dies (#21583)
Relates to https://github.com/coder/internal/issues/1217

Adds a background goroutine in `--stdio` mode to check if the parent PID
is still alive and exit if it is no longer present.

🤖 Implemented using Mux + Claude Opus 4.5, reviewed and refactored by
me.
2026-01-21 14:14:51 +00:00
Spike Curtis bddb808b25 chore: arrange imports in a standard way (#21452)
Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:

```
import (
	"context"
	"time"

	"github.com/prometheus/client_golang/prometheus"
	"golang.org/x/xerrors"
	"gopkg.in/natefinch/lumberjack.v2"

	"cdr.dev/slog/v3"
	"github.com/coder/coder/v2/codersdk/agentsdk"
	"github.com/coder/serpent"
)
```

3 groups: standard library, 3rd partly libs, Coder libs.

This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
2026-01-08 15:24:11 +04:00
Spike Curtis 49b34a716a fix: fix slog to always use array of Fields (#21426)
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).

It also updates dependencies that also use slog and were updated.

I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.

Other dependencies, I pushed new tags.
2026-01-08 10:29:41 +04:00
Spike Curtis 4fc0093388 fix: fix TestCloserStack_Timeout to wait for all asyncClosers (#19837)
fixes https://github.com/coder/internal/issues/966


TestCloserStack_Timeout creates `asyncCloser`s which allow control over the exact timing and order of their close method returning. They also, as a final backstop will throw an error if the test context ends before they are unblocked.

TestCloserStack_Timeout unblocks all `asyncCloser`s in a defer and then ends the test. This defer _unblocks_ the running close goroutines, but does not wait for them to finish. Since the test context is canceled as soon as the test completes, this creates a race condition where the close goroutines can trigger the context cancelled  arm of the `select` statement.

The fix is to both unblock and wait for all close goroutines to complete before ending the test and cancelling the context.
2025-09-16 21:53:50 +04:00
Mathias Fredriksson 6c4db7a2bc feat(cli): replace open vscode container with devcontainer subagent (#18765)
This change allows a devcontainer to be opened via the agent syntax,
`coder open vscode <workspace>.<agent>` and removes the `--container`
option to simplify the subcommand. Accessing the subagent will behave
similarly to how the `--container` option behaved.

Fixes coder/internal#748
2025-07-08 19:21:41 +03:00
Mathias Fredriksson bacdc28881 fix(cli)!: enforce selection for multiple agents rather than use randomness (#18427)
In the past we randomly selected workspace agent if there were multiple.
Unless both are running on the same machine with the same configuration,
this would be very confusing behavior for a user.

With the introduction of sub agents (devcontainer agents), we have now
made this an error state and require the specifying of agent when there
is more than one (either normal agent or sub agent).

This aligns with the behavior of e.g. Coder Desktop.

Fixes coder/internal#696
2025-06-19 22:13:35 +03:00
Bruno Quaresma d779126ee3 chore: rollback PR #18081 (#18104)
Rollback https://github.com/coder/coder/pull/18081
2025-05-29 13:12:13 -03:00
Bruno Quaresma 2ec7404197 chore: make owner_name and owner_username consistent (#18081)
We've been using owner_name inconsistently as username. So this PR fixes
it by making the attribute naming more consistent.
2025-05-28 17:25:32 -03:00
Spike Curtis 6c0bed0f53 chore: update to coder/quartz v0.2.0 (#18007)
Upgrade to coder/quartz v0.2.0 including fixing up a minor API breaking change.
2025-05-27 16:05:03 +04:00
Ethan 53ba3613b3 feat(cli): use coder connect in coder ssh --stdio, if available (#17572)
Closes https://github.com/coder/vscode-coder/issues/447
Closes https://github.com/coder/jetbrains-coder/issues/543
Closes https://github.com/coder/coder-jetbrains-toolbox/issues/21

This PR adds Coder Connect support to `coder ssh --stdio`. 

When connecting to a workspace, if `--force-new-tunnel` is not passed, the CLI will first do a DNS lookup for `<agent>.<workspace>.<owner>.<hostname-suffix>`. If an IP address is returned, and it's within the Coder service prefix, the CLI will not create a new tailnet connection to the workspace, and instead dial the SSH server running on port 22 on the workspace directly over TCP.

This allows IDE extensions to use the Coder Connect tunnel, without requiring any modifications to the extensions themselves. 

Additionally, `using_coder_connect` is added to the `sshNetworkStats` file, which the VS Code extension (and maybe Jetbrains?) will be able to read, and indicate to the user that they are using Coder Connect.

One advantage of this approach is that running `coder ssh --stdio` on an offline workspace with Coder Connect enabled will have the CLI wait for the workspace to build, the agent to connect (and optionally, for the startup scripts to finish), before finally connecting using the Coder Connect tunnel.

As a result, `coder ssh --stdio` has the overhead of looking up the workspace and agent, and checking if they are running. On my device, this meant `coder ssh --stdio <workspace>` was approximately a second slower than just connecting to the workspace directly using `ssh <workspace>.coder` (I would assume anyone serious about their Coder Connect usage would know to just do the latter anyway).
 
To ensure this doesn't come at a significant performance cost, I've also benchmarked this PR.

<details>
<summary>Benchmark</summary>

## Methodology
All tests were completed on `dev.coder.com`, where a Linux workspace running in AWS `us-west1` was created.
The machine running Coder Desktop (the 'client') was a Windows VM running in the same AWS region and VPC as the workspace.

To test the performance of specifically the SSH connection, a port was forwarded between the client and workspace using:
```
ssh -p 22 -L7001:localhost:7001 <host>
```
where `host` was either an alias for an SSH ProxyCommand that called `coder ssh`, or a Coder Connect hostname.

For latency, [`tcping`](https://www.elifulkerson.com/projects/tcping.php) was used against the forwarded port:
```
tcping -n 100 localhost 7001
```

For throughput, [`iperf3`](https://iperf.fr/iperf-download.php) was used:
```
iperf3 -c localhost -p 7001
```
where an `iperf3` server was running on the workspace on port 7001.

## Test Cases

### Testcase 1: `coder ssh` `ProxyCommand` that bicopies from Coder Connect
This case tests the implementation in this PR, such that we can write a config like:
```
Host codercliconnect
    ProxyCommand /path/to/coder ssh --stdio workspace
```
With Coder Connect enabled, `ssh -p 22 -L7001:localhost:7001 codercliconnect` will use the Coder Connect tunnel. The results were as follows:

**Throughput, 10 tests, back to back:**
- Average throughput across all tests: 788.20 Mbits/sec
- Minimum average throughput: 731 Mbits/sec
- Maximum average throughput: 871 Mbits/sec
- Standard Deviation: 38.88 Mbits/sec

**Latency, 100 RTTs:**
- Average: 0.369ms
- Minimum: 0.290ms
- Maximum: 0.473ms

### Testcase 2: `ssh` dialing Coder Connect directly without a `ProxyCommand`

This is what we assume to be the 'best' way to use Coder Connect

**Throughput, 10 tests, back to back:**
- Average throughput across all tests: 789.50 Mbits/sec
- Minimum average throughput: 708 Mbits/sec
- Maximum average throughput: 839 Mbits/sec
- Standard Deviation: 39.98 Mbits/sec

**Latency, 100 RTTs:**
- Average: 0.369ms
- Minimum: 0.267ms
- Maximum: 0.440ms

### Testcase 3:  `coder ssh` `ProxyCommand` that creates its own Tailnet connection in-process

This is what normally happens when you run `coder ssh`:

**Throughput, 10 tests, back to back:**
- Average throughput across all tests: 610.20 Mbits/sec
- Minimum average throughput: 569 Mbits/sec
- Maximum average throughput: 664 Mbits/sec
- Standard Deviation: 27.29 Mbits/sec

**Latency, 100 RTTs:**
- Average: 0.335ms
- Minimum: 0.262ms
- Maximum: 0.452ms

## Analysis

Performing a two-tailed, unpaired t-test against the throughput of testcases 1 and 2, we find a P value of `0.9450`. This suggests the difference between the data sets is not statistically significant. In other words, there is a 94.5% chance that the difference between the data sets is due to chance.

## Conclusion

From the t-test, and by comparison to the status quo (regular `coder ssh`, which uses gvisor, and is noticeably slower), I think it's safe to say any impact on throughput or latency by the `ProxyCommand` performing a bicopy against Coder Connect is negligible. Users are very much unlikely to run into performance issues as a result of using Coder Connect via `coder ssh`, as implemented in this PR.

Less scientifically, I ran these same tests on my home network with my Sydney workspace, and both throughput and latency were consistent across testcases 1 and 2.

</details>
2025-04-30 15:17:10 +10:00
ケイラ f670bc31f5 chore: update testutil chan helpers (#17408) 2025-04-16 10:37:09 -06:00
Spike Curtis 5861e516b9 chore: add standard test logger ignoring db canceled (#15556)
Refactors our use of `slogtest` to instantiate a "standard logger" across most of our tests.  This standard logger incorporates https://github.com/coder/slog/pull/217 to also ignore database query canceled errors by default, which are a source of low-severity flakes.

Any test that has set non-default `slogtest.Options` is left alone. In particular, `coderdtest` defaults to ignoring all errors. We might consider revisiting that decision now that we have better tools to target the really common flaky Error logs on shutdown.
2024-11-18 14:09:22 +04:00
Spike Curtis 6ff9a05832 fix: close SSH sessions bottom-up if top-down fails (#14678) 2024-09-17 14:46:49 +04:00
Spike Curtis b96f6b48a4 fix: ensure ssh cleanup happens on cmd error
I noticed in my logs that sometimes `coder ssh` doesn't gracefully disconnect from the coordinator.

The cause is the `closerStack` construct we use in that function.  It has two paths to start closing things down:

1. explicit `close()` which we do in `defer`
2. context cancellation, which happens if the cli function returns an error

sometimes the ssh remote command returns an error, and this triggers context cancellation of the `closerStack`.  That is fine in and of itself, but we still want the explicit `close()` to wait until everything is closed before returning, since that's where we do cleanup, including the graceful disconnect.  Prior to this fix the `close()` just immediately exits if another goroutine is closing the stack.  Here we add a wait until everything is done.
2024-03-07 17:26:49 +04:00
Spike Curtis 71f87d054f fix: accept legacy redirect HTTP environment variables (#10748)
> Can someone help me understand the differences between these env variables:
>
>    CODER_REDIRECT_TO_ACCESS_URL
>    CODER_TLS_REDIRECT_HTTP_TO_HTTPS
>    CODER_TLS_REDIRECT_HTTP

Oh man, what a mess. It looks like `CODER_TLS_REDIRECT_HTTP ` appears in our config docs. Maybe that was the initial name for the environment variable?

At some point, both the flag and the environment variable were `--tls-redirect-http-to-https` and `CODER_TLS_REDIRECT_HTTP_TO_HTTPS`.  `CODER_TLS_REDIRECT_HTTP` did nothing.

However, then we introduced `CODER_REDIRECT_TO_ACCESS_URL`, we put in some deprecation code that was maybe fat-fingered such that we accept the environment variable `CODER_TLS_REDIRECT_HTTP` but the flag `--tls-redirect-http-to-https`.  Our docs still refer to `CODER_TLS_REDIRECT_HTTP` at https://coder.com/docs/v2/latest/admin/configure#address

So, I think what we gotta do is still accept `CODER_TLS_REDIRECT_HTTP` since it was working and in an example doc, but also fix the deprecation code to accept `CODER_TLS_REDIRECT_HTTP_TO_HTTPS` environment variable.
2023-11-17 15:09:29 +04:00
Spike Curtis 3dd35e019b fix: close ssh sessions gracefully (#10732)
Re-enables TestSSH/RemoteForward_Unix_Signal and addresses the underlying race: we were not closing the remote forward on context expiry, only the session and connection.

However, there is still a more fundamental issue in that we don't have the ability to ensure that TCP sessions are properly terminated before tearing down the Tailnet conn.  This is due to the assumption in the sockets API, that the underlying IP interface is long 
lived compared with the TCP socket, and thus closing a socket returns immediately and does not wait for the TCP termination handshake --- that is handled async in the tcpip stack.  However, this assumption does not hold for us and tailnet, since on shutdown,
we also tear down the tailnet connection, and this can race with the TCP termination.

Closing the remote forward explicitly should prevent forward state from accumulating, since the Close() function waits for a reply from the remote SSH server.

I've also attempted to workaround the TCP/tailnet issue for `--stdio` by using `CloseWrite()` instead of `Close()`.  By closing the write side of the connection, half-close the TCP connection, and the server detects this and closes the other direction, which then
triggers our read loop to exit only after the server has had a chance to process the close.

TODO in a stacked PR is to implement this logic for `vscodessh` as well.
2023-11-17 12:43:20 +04:00
Kyle Carberry 22e781eced chore: add /v2 to import module path (#9072)
* chore: add /v2 to import module path

go mod requires semantic versioning with versions greater than 1.x

This was a mechanical update by running:
```
go install github.com/marwan-at-work/mod/cmd/mod@latest
mod upgrade
```

Migrate generated files to import /v2

* Fix gen
2023-08-18 18:55:43 +00:00
Marcin Tojek 641aacf793 feat: show banner when workspace is outdated (#4926)
* feat: show banner when workspace is outdated

* Address PR comments

* Fix: writer
2022-11-07 19:12:39 +01:00