coder

mirror of https://github.com/coder/coder.git synced 2026-06-03 04:58:23 +00:00

Author	SHA1	Message	Date
Jon Ayers	40dbc557bd	fix: ignore goroutine leak for protocol lookup on windows (#19131 )	2025-08-04 08:26:56 -04:00
Spike Curtis	abcf3df71a	chore: move InProcNet to testutil (#18563 ) Moves `InProcNet` to `testutil` so that it can be reused by X11 forwarding tests (see up stack PRs).	2025-06-27 14:42:22 +04:00
Mathias Fredriksson	b9ac16cb40	test(testutil): improve chan.go error visibility (#18406 )	2025-06-17 14:39:31 +00:00
Hugo Dutka	f825477a5c	fix: add timeouts to test telemetry snapshot (#17879 ) This PR ensures that waits on channels will time out according to the test context, rather than waiting indefinitely. This should alleviate the panic seen in https://github.com/coder/internal/issues/645 and, if the deadlock recurs, allow the test to be retried automatically in CI.	2025-05-22 13:51:24 +02:00
Spike Curtis	818d4d03f4	chore: ignore 'session shutdown' yamux error in tests (#17964 ) Fixes flake seen here: https://github.com/coder/coder/actions/runs/15154327939/job/42606133069?pr=17960 Error log dropped when the dRPC server is being shut down right as we are (re)dialing.	2025-05-21 11:29:25 +04:00
Ethan	53ba3613b3	feat(cli): use coder connect in `coder ssh --stdio`, if available (#17572 ) Closes https://github.com/coder/vscode-coder/issues/447 Closes https://github.com/coder/jetbrains-coder/issues/543 Closes https://github.com/coder/coder-jetbrains-toolbox/issues/21 This PR adds Coder Connect support to `coder ssh --stdio`. When connecting to a workspace, if `--force-new-tunnel` is not passed, the CLI will first do a DNS lookup for `<agent>.<workspace>.<owner>.<hostname-suffix>`. If an IP address is returned, and it's within the Coder service prefix, the CLI will not create a new tailnet connection to the workspace, and instead dial the SSH server running on port 22 on the workspace directly over TCP. This allows IDE extensions to use the Coder Connect tunnel, without requiring any modifications to the extensions themselves. Additionally, `using_coder_connect` is added to the `sshNetworkStats` file, which the VS Code extension (and maybe Jetbrains?) will be able to read, and indicate to the user that they are using Coder Connect. One advantage of this approach is that running `coder ssh --stdio` on an offline workspace with Coder Connect enabled will have the CLI wait for the workspace to build, the agent to connect (and optionally, for the startup scripts to finish), before finally connecting using the Coder Connect tunnel. As a result, `coder ssh --stdio` has the overhead of looking up the workspace and agent, and checking if they are running. On my device, this meant `coder ssh --stdio <workspace>` was approximately a second slower than just connecting to the workspace directly using `ssh <workspace>.coder` (I would assume anyone serious about their Coder Connect usage would know to just do the latter anyway). To ensure this doesn't come at a significant performance cost, I've also benchmarked this PR. <details> <summary>Benchmark</summary> ## Methodology All tests were completed on `dev.coder.com`, where a Linux workspace running in AWS `us-west1` was created. The machine running Coder Desktop (the 'client') was a Windows VM running in the same AWS region and VPC as the workspace. To test the performance of specifically the SSH connection, a port was forwarded between the client and workspace using: ``` ssh -p 22 -L7001:localhost:7001 <host> ``` where `host` was either an alias for an SSH ProxyCommand that called `coder ssh`, or a Coder Connect hostname. For latency, [`tcping`](https://www.elifulkerson.com/projects/tcping.php) was used against the forwarded port: ``` tcping -n 100 localhost 7001 ``` For throughput, [`iperf3`](https://iperf.fr/iperf-download.php) was used: ``` iperf3 -c localhost -p 7001 ``` where an `iperf3` server was running on the workspace on port 7001. ## Test Cases ### Testcase 1: `coder ssh` `ProxyCommand` that bicopies from Coder Connect This case tests the implementation in this PR, such that we can write a config like: ``` Host codercliconnect ProxyCommand /path/to/coder ssh --stdio workspace ``` With Coder Connect enabled, `ssh -p 22 -L7001:localhost:7001 codercliconnect` will use the Coder Connect tunnel. The results were as follows: Throughput, 10 tests, back to back: - Average throughput across all tests: 788.20 Mbits/sec - Minimum average throughput: 731 Mbits/sec - Maximum average throughput: 871 Mbits/sec - Standard Deviation: 38.88 Mbits/sec Latency, 100 RTTs: - Average: 0.369ms - Minimum: 0.290ms - Maximum: 0.473ms ### Testcase 2: `ssh` dialing Coder Connect directly without a `ProxyCommand` This is what we assume to be the 'best' way to use Coder Connect Throughput, 10 tests, back to back: - Average throughput across all tests: 789.50 Mbits/sec - Minimum average throughput: 708 Mbits/sec - Maximum average throughput: 839 Mbits/sec - Standard Deviation: 39.98 Mbits/sec Latency, 100 RTTs: - Average: 0.369ms - Minimum: 0.267ms - Maximum: 0.440ms ### Testcase 3: `coder ssh` `ProxyCommand` that creates its own Tailnet connection in-process This is what normally happens when you run `coder ssh`: Throughput, 10 tests, back to back: - Average throughput across all tests: 610.20 Mbits/sec - Minimum average throughput: 569 Mbits/sec - Maximum average throughput: 664 Mbits/sec - Standard Deviation: 27.29 Mbits/sec Latency, 100 RTTs: - Average: 0.335ms - Minimum: 0.262ms - Maximum: 0.452ms ## Analysis Performing a two-tailed, unpaired t-test against the throughput of testcases 1 and 2, we find a P value of `0.9450`. This suggests the difference between the data sets is not statistically significant. In other words, there is a 94.5% chance that the difference between the data sets is due to chance. ## Conclusion From the t-test, and by comparison to the status quo (regular `coder ssh`, which uses gvisor, and is noticeably slower), I think it's safe to say any impact on throughput or latency by the `ProxyCommand` performing a bicopy against Coder Connect is negligible. Users are very much unlikely to run into performance issues as a result of using Coder Connect via `coder ssh`, as implemented in this PR. Less scientifically, I ran these same tests on my home network with my Sydney workspace, and both throughput and latency were consistent across testcases 1 and 2. </details>	2025-04-30 15:17:10 +10:00
Hugo Dutka	b47d54d777	chore: cache terraform providers between CI test runs (#17373 ) Addresses https://github.com/coder/internal/issues/322. This PR starts caching Terraform providers used by `TestProvision` in `provisioner/terraform/provision_test.go`. The goal is to improve the reliability of this test by cutting down on the number of network calls to external services. It leverages GitHub Actions cache, which [on depot runners is persisted for 14 days by default](https://depot.dev/docs/github-actions/overview#cache-retention-policy). Other than the aforementioned `TestProvision`, I couldn't find any other tests which depend on external terraform providers.	2025-04-28 10:57:24 +02:00
ケイラ	f670bc31f5	chore: update testutil chan helpers (#17408 )	2025-04-16 10:37:09 -06:00
Cian Johnston	1e0051a9a2	feat(testutil): add GetRandomNameHyphenated (#17342 ) This started coming up more often for me, so time for a test helper! --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-04-10 19:08:38 +01:00
Cian Johnston	057cbd4d80	feat(cli): add `coder exp mcp` command (#17066 ) Adds a `coder exp mcp` command which will start a local MCP server listening on stdio with the following capabilities: * Show logged in user (`coder whoami`) * List workspaces (`coder list`) * List templates (`coder templates list`) * Start a workspace (`coder start`) * Stop a workspace (`coder stop`) * Fetch a single workspace (no direct CLI analogue) * Execute a command inside a workspace (`coder exp rpty`) * Report the status of a task (currently a no-op, pending task support) This can be tested as follows: ``` # Start a local Coder server. ./scripts/develop.sh # Start a workspace. Currently, creating workspaces is not supported. ./scripts/coder-dev.sh create -t docker --yes # Add the MCP to your Claude config. claude mcp add coder ./scripts/coder-dev.sh exp mcp # Tell Claude to do something Coder-related. You may need to nudge it to use the tools. claude 'start a docker workspace and tell me what version of python is installed' ```	2025-03-31 18:52:09 +01:00
Jon Ayers	17ddee05e5	chore: update golang to 1.24.1 (#17035 ) - Update go.mod to use Go 1.24.1 - Update GitHub Actions setup-go action to use Go 1.24.1 - Fix linting issues with golangci-lint by: - Updating to golangci-lint v1.57.1 (more compatible with Go 1.24.1) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <claude@anthropic.com>	2025-03-26 01:56:39 -05:00
Mathias Fredriksson	de41bd6b95	feat: add support for workspace app audit (#16801 ) This change adds support for workspace app auditing. To avoid audit log spam, we introduce the concept of app audit sessions. An audit session is unique per workspace app, user, ip, user agent and http status code. The sessions are stored in a separate table from audit logs to allow use-case specific optimizations. Sessions are ephemeral and the table does not function as a log. The logic for auditing is placed in the DBTokenProvider for workspace apps so that wsproxies are included. This is the final change affecting the API fo #15139. Updates #15139	2025-03-18 13:50:52 +02:00
Hugo Dutka	899836d47a	chore: reduce Windows PG tests flakiness (#16090 ) This PR: - Reduces test parallelism on Windows in CI - Unifies wait intervals on Windows with Linux and macOS. Previously we had custom intervals for Windows to reduce test flakiness on smaller CI workers, but we don't run tests on small CI workers anymore. Due to how our CI file is defined, forks run tests on small CI machines, but I'm not sure if the different intervals actually help or whether that's a heuristic that happened to fix issues on a particular day and was it ever reevaluated. I propose we make the change and if someone complains, revert it. In particular, reduced test parallelism seems to actually help: I was able to run Windows tests 5 times in a row without flakes. Not sure if that's going to fix the problem long term, but it seems worth trying.	2025-01-10 15:21:03 +01:00
Cian Johnston	7b88776403	chore(testutil): add testutil.GoleakOptions (#16070 ) - Adds `testutil.GoleakOptions` and consolidates existing options to this location - Pre-emptively adds required ignore for this Dependabot PR to pass CI https://github.com/coder/coder/pull/16066	2025-01-08 15:38:37 +00:00
Hugo Dutka	83c493e832	chore: fix more flaky tests on Windows with Postgres (#15629 ) Addresses the following flakes: - https://github.com/coder/internal/issues/222 - https://github.com/coder/internal/issues/223 - https://github.com/coder/internal/issues/224 - https://github.com/coder/internal/issues/225 - https://github.com/coder/internal/issues/226 - https://github.com/coder/internal/issues/227 - https://github.com/coder/internal/issues/228 - https://github.com/coder/internal/issues/229 - https://github.com/coder/internal/issues/230	2024-11-26 11:56:07 +01:00
Spike Curtis	5861e516b9	chore: add standard test logger ignoring db canceled (#15556 ) Refactors our use of `slogtest` to instantiate a "standard logger" across most of our tests. This standard logger incorporates https://github.com/coder/slog/pull/217 to also ignore database query canceled errors by default, which are a source of low-severity flakes. Any test that has set non-default `slogtest.Options` is left alone. In particular, `coderdtest` defaults to ignoring all errors. We might consider revisiting that decision now that we have better tools to target the really common flaky Error logs on shutdown.	2024-11-18 14:09:22 +04:00
Cian Johnston	4719d2406f	chore(testutil): extract testutil.CreateZip and testutil.CreateTar helpers (#15540 ) Extracts `testutil.CreateTar` and `testutil.CreateZip` test helpers.	2024-11-18 09:17:04 +00:00
Cian Johnston	30e6fbd35c	fix(coderd): ensure correct RBAC when enqueueing notifications (#15478 ) - Assert rbac in fake notifications enqueuer - Move fake notifications enqueuer to separate notificationstest package - Update dbauthz rbac policy to allow provisionerd and autostart to create and read notification messages - Update tests as required	2024-11-12 12:40:46 +00:00
Steven Masley	343f8ec9ab	chore: join owner, template, and org in new workspace view (#15116 ) Joins in fields like `username`, `avatar_url`, `organization_name`, `template_name` to `workspaces` via a view. The view must be maintained moving forward, but this prevents needing to add RBAC permissions to fetch related workspace fields.	2024-10-22 09:20:54 -05:00
Marcin Tojek	6de59371ea	feat: notifications: report failed workspace builds (#14571 )	2024-09-18 09:11:44 +02:00
Dean Sheather	cf8be4eac5	feat: add resume support to coordinator connections (#14234 )	2024-08-20 17:16:49 +10:00
Cian Johnston	49a2880abc	fix(testutil): ensure GetRandomName never returns strings greater tha… (#14153 )	2024-08-05 15:03:07 +01:00
Cian Johnston	37a859f071	chore(testutil): add testutil.GetRandomName that does not return duplicates (#14020 ) Fixes #13910 Adds testutil.GetRandomName that replaces namesgenerator.GetRandomName but instead appends a monotonically increasing integer instead of a number between 1 and 10.	2024-07-26 09:44:34 +01:00
Bruno Quaresma	0d9615b4fd	feat(coderd): notify when workspace is marked as dormant (#13868 )	2024-07-24 13:38:21 -03:00
Marcin Tojek	91cbe679c0	chore: move `notiffake` to `testutil` (#13933 )	2024-07-18 13:36:02 +00:00
Danny Kopping	4671ebb330	feat: measure pubsub latencies and expose metrics (#13126 )	2024-05-10 12:31:49 +00:00
Cian Johnston	eba8cd7c07	chore: consolidate various randomPort() implementations (#12362 ) Consolidates our existing randomPort() implementations to package testutil	2024-02-29 12:51:44 +00:00
Colin Adler	c7f52b73bb	feat(coderd): add prometheus metrics to servertailnet (#11988 )	2024-02-05 23:57:18 -06:00
Spike Curtis	3d85cdfa11	feat: set peers lost when disconnected from coordinator (#11681 ) Adds support to Coordination to call SetAllPeersLost() when it is closed. This ensure that when we disconnect from a Coordinator, we set all peers lost. This covers CoderSDK (CLI client) and Agent. Next PR will cover MultiAgent (notably, `wsproxy`).	2024-01-22 15:26:20 +04:00
Steven Masley	50b78e3325	chore: instrument external oauth2 requests (#11519 ) * chore: instrument external oauth2 requests External requests made by oauth2 configs are now instrumented into prometheus metrics.	2024-01-10 09:13:30 -06:00
Cian Johnston	1ef96022b0	feat(coderd): add provisioner build version and api_version on serve (#11369 ) * assert provisioner daemon version and api_version in unit tests * add build info in HTTP header, extract codersdk.BuildVersionHeader * add api_version to codersdk.ProvisionerDaemon * testutil.MustString -> testutil.MustRandString	2024-01-03 09:01:57 +00:00
Mathias Fredriksson	198b56c137	fix(coderd): fix memory leak in `watchWorkspaceAgentMetadata` (#10685 ) Fixes #10550	2023-11-16 17:03:53 +02:00
Spike Curtis	f400d8a0c5	fix: handle SIGHUP from OpenSSH (#10638 ) Fixes an issue where remote forwards are not correctly torn down when using OpenSSH with `coder ssh --stdio`. OpenSSH sends a disconnect signal, but then also sends SIGHUP to `coder`. Previously, we just exited when we got SIGHUP, and this raced against properly disconnecting. Fixes https://github.com/coder/customers/issues/327	2023-11-13 15:14:42 +04:00
Spike Curtis	70e481e7a5	fix: use terminal emulator that keeps state in ReconnectingPTY tests (#9765 ) * Add more pty diagnostics for terminal parsing Signed-off-by: Spike Curtis <spike@coder.com> * print escaped strings Signed-off-by: Spike Curtis <spike@coder.com> * Only log on failure - heisenbug? Signed-off-by: Spike Curtis <spike@coder.com> * use the terminal across matches to keep cursor & contents state Signed-off-by: Spike Curtis <spike@coder.com> * Only log bytes if we're not expecting EOF Signed-off-by: Spike Curtis <spike@coder.com> --------- Signed-off-by: Spike Curtis <spike@coder.com>	2023-09-19 17:57:30 +00:00
Kyle Carberry	22e781eced	chore: add /v2 to import module path (#9072 ) * chore: add /v2 to import module path go mod requires semantic versioning with versions greater than 1.x This was a mechanical update by running: ``` go install github.com/marwan-at-work/mod/cmd/mod@latest mod upgrade ``` Migrate generated files to import /v2 * Fix gen	2023-08-18 18:55:43 +00:00
Asher	02ee724d9f	fix: do terminal emulation in reconnecting pty tests (#9114 ) It looks like it is possible for screen to use control sequences instead of literal newlines which fails the tests. This reuses the existing readUntil function used in other pty tests.	2023-08-16 13:02:03 -08:00
Mathias Fredriksson	58265881af	test(testutil): increase wait times to reduce flakes (#8576 )	2023-07-18 17:25:54 +03:00
Cian Johnston	7fcf319e01	fix(cli)!: protect client Logger and refactor cli scaletest tests (#8317 ) - (breaking) Protects Logger and LogBodies fields of codersdk.Client with its mutex. This addresses a data race in cli/scaletest. - Fillets the existing cli/createworkspaces unit test and moves the testing logic there into the tests under scaletest/createworkspaces. - Adds testutil.RaceEnabled bool const and conditionaly skips previously-skipped tests under scaletest/ if the race detector is enabled. This is unfortunate and sad, but I would prefer to have these tests at least running without the race detector than not running at all. - Adds IgnoreErrors option to fake in-memory agent loggers; having the agents fail the test immediately when they encounter any sort of error isn't really helpful.	2023-07-06 09:43:39 +01:00
Ammar Bandukwala	465fe8658d	chore: skip timing-sensistive AgentMetadata test in the standard suite (#7237 ) * chore: skip timing-sensistive AgentMetadata test in the standard suite * Add test-timing target * fix windows? * Works on my Windows desktop? * Use tag system * fixup! Use tag system	2023-05-02 10:41:41 +00:00
Ammar Bandukwala	80bf042528	chore(coderd): remove timing check (#7144 )	2023-04-17 17:40:02 +00:00
Ammar Bandukwala	f36a4a0b07	chore: fix race check for AgentMetadata test (#7141 )	2023-04-14 20:02:44 +03:00
Ammar Bandukwala	24d8644c0b	chore: de-flake TestWorkspaceAgent_Metadata (round 2) (#7039 ) This time, we keep the timing / "racey" tests, but avoid running them in the harsher CI conditions.	2023-04-06 21:10:13 +00:00
Ammar Bandukwala	2bd6d2908e	feat: convert entire CLI to clibase (#6491 ) I'm sorry.	2023-03-23 17:42:20 -05:00
Kyle Carberry	df31636e72	feat: pass `access_token` to `coder_git_auth` resource (#6713 ) This allows template authors to leverage git auth to perform custom actions, like clone repositories.	2023-03-22 19:37:08 +00:00
Ammar Bandukwala	3b73321a6c	feat: refactor deployment config (#6347 )	2023-03-07 15:10:01 -06:00
Kyle Carberry	026b1cd2a4	chore: update to go 1.20 (#5968 ) Co-authored-by: Colin Adler <colin1adler@gmail.com>	2023-02-02 12:36:27 -06:00
Mathias Fredriksson	db7877012c	test: Fix flaky TestServer/Logging/{Multiple,Stackdriver} (#5727 ) * test: Fix flaky TestServer/Logging/Multiple * test: Fix flaky TestServer/Logging/Stackdriver * test: Add testutil.TempFile and testutil.CreateTemp, cleanup tests relying on temp file	2023-01-17 14:14:29 +02:00
Marcin Tojek	84872d970d	fix: loadtest/reconnectingpty tweak timeout (#5300 ) * flaky * fix: load test increase timeout * Remove flaky * Improvement * only Linux * WaitSuperLong * Fix * Try longer * Try: sleep 120	2022-12-06 14:40:38 +01:00
Mathias Fredriksson	90c34b74de	feat: Add connection_timeout and troubleshooting_url to agent (#4937 ) * feat: Add connection_timeout and troubleshooting_url to agent This commit adds the connection timeout and troubleshooting url fields to coder agents. If an initial connection cannot be established within connection timeout seconds, then the agent status will be marked as `"timeout"`. The troubleshooting URL will be present, if configured in the Terraform template, it can be presented to the user when the agent state is either `"timeout"` or `"disconnected"`. Fixes #4678	2022-11-09 17:27:05 +02:00
Kyle Carberry	2ba4a62a0d	feat: Add high availability for multiple replicas (#4555 ) * feat: HA tailnet coordinator * fixup! feat: HA tailnet coordinator * fixup! feat: HA tailnet coordinator * remove printlns * close all connections on coordinator * impelement high availability feature * fixup! impelement high availability feature * fixup! impelement high availability feature * fixup! impelement high availability feature * fixup! impelement high availability feature * Add replicas * Add DERP meshing to arbitrary addresses * Move packages to highavailability folder * Move coordinator to high availability package * Add flags for HA * Rename to replicasync * Denest packages for replicas * Add test for multiple replicas * Fix coordination test * Add HA to the helm chart * Rename function pointer * Add warnings for HA * Add the ability to block endpoints * Add flag to disable P2P connections * Wow, I made the tests pass * Add replicas endpoint * Ensure close kills replica * Update sql * Add database latency to high availability * Pipe TLS to DERP mesh * Fix DERP mesh with TLS * Add tests for TLS * Fix replica sync TLS * Fix RootCA for replica meshing * Remove ID from replicasync * Fix getting certificates for meshing * Remove excessive locking * Fix linting * Store mesh key in the database * Fix replica key for tests * Fix types gen * Fix unlocking unlocked * Fix race in tests * Update enterprise/derpmesh/derpmesh.go Co-authored-by: Colin Adler <colin1adler@gmail.com> * Rename to syncReplicas * Reuse http client * Delete old replicas on a CRON * Fix race condition in connection tests * Fix linting * Fix nil type * Move pubsub to in-memory for twenty test * Add comment for configuration tweaking * Fix leak with transport * Fix close leak in derpmesh * Fix race when creating server * Remove handler update * Skip test on Windows * Fix DERP mesh test * Wrap HTTP handler replacement in mutex * Fix error message for relay * Fix API handler for normal tests * Fix speedtest * Fix replica resend * Fix derpmesh send * Ping async * Increase wait time of template version jobd * Fix race when closing replica sync * Add name to client * Log the derpmap being used * Don't connect if DERP is empty * Improve agent coordinator logging * Fix lock in coordinator * Fix relay addr * Fix race when updating durations * Fix client publish race * Run pubsub loop in a queue * Store agent nodes in order * Fix coordinator locking * Check for closed pipe Co-authored-by: Colin Adler <colin1adler@gmail.com>	2022-10-17 13:43:30 +00:00

1 2

55 Commits