Commit Graph

202 Commits

Author SHA1 Message Date
Jon Ayers 757634c720 fix: filter sub-agents from build duration metric (#22732) (#22919) 2026-03-10 14:11:01 -05:00
Cian Johnston 0d21365825 chore: fix failing agent tests with non-default shell (#21671)
* Updates agent tests to write `exit 0` to stdin before closing.
* Updates agent stats tests to detect required stats split out over multiple reports
2026-01-26 09:42:24 +00:00
Spike Curtis bddb808b25 chore: arrange imports in a standard way (#21452)
Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:

```
import (
	"context"
	"time"

	"github.com/prometheus/client_golang/prometheus"
	"golang.org/x/xerrors"
	"gopkg.in/natefinch/lumberjack.v2"

	"cdr.dev/slog/v3"
	"github.com/coder/coder/v2/codersdk/agentsdk"
	"github.com/coder/serpent"
)
```

3 groups: standard library, 3rd partly libs, Coder libs.

This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
2026-01-08 15:24:11 +04:00
Spike Curtis 49b34a716a fix: fix slog to always use array of Fields (#21426)
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).

It also updates dependencies that also use slog and were updated.

I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.

Other dependencies, I pushed new tags.
2026-01-08 10:29:41 +04:00
Zach 174a6192fa refactor: consolidate darwin unix socket test helpers (#21283) 2025-12-16 09:11:54 -07:00
Mathias Fredriksson 6bea82bafc fix(agent/agentssh)!: use configured directory for SFTP connections (#21194)
BREAKING CHANGE: SFTP/SCP now respects the agent's configured directory.

If your workspace agent has a custom `dir` configured in Terraform, SFTP
and SCP connections will now land there instead of `$HOME`. Previously,
only SSH and rsync respected this setting, which caused confusing behavior
where `scp file.txt coder:.` and `rsync file.txt coder:.` would put files
in different places. If you have scripts that relied on SFTP/SCP always
using `$HOME` regardless of agent configuration, you may need to use
explicit paths instead.
2025-12-16 16:35:51 +02:00
Ethan 04d5ff88e4 test: bump TestAgent_SessionTTYShell timeout (#21155)
## Problem

The `TestAgent_SessionTTYShell` test was flaking on macOS CI runners
with:

```
match deadline exceeded: context deadline exceeded (wanted 1 bytes; got 0: "")
```

The test uses `WaitShort` (10s) for the context timeout when waiting for
shell prompt output via `Peek(ctx, 1)`. On slow macOS CI runners, the
shell startup can exceed this timeout due to resource contention.

This is evidenced in the failure logs, the SSH session was not reported
by the agent until the 10s timeout is nearly up - it took a while to
connect.

## Solution

Increase the timeout from `WaitShort` (10s) to `WaitMedium` (30s). This
matches the timeout used by `ExpectMatch` internally and gives the shell
more time to initialize on slow CI machines.

---

This PR was entirely generated by [mux](https://github.com/coder/mux)
but reviewed by a human.

Closes https://github.com/coder/internal/issues/1177
2025-12-09 00:48:47 +11:00
Ethan 33b42fca7a test: fix flake in TestAgent_Metrics_SSH (#20450)
Second flake for this test today 😮‍💨.

Flake seen here, though I couldn't replicate this locally, some CI
exclusive networking issue.

https://github.com/coder/coder/actions/runs/18770305895/job/53553517887?pr=20448
```
    agent_test.go:3619: 
        	Error Trace:	/home/runner/work/coder/coder/agent/agent_test.go:3619
        	Error:      	Received unexpected error:
        	            	expected 1, got 0.000000:
        	            	    github.com/coder/coder/v2/agent_test.TestAgent_Metrics_SSH.func7
        	            	        /home/runner/work/coder/coder/agent/agent_test.go:3557
        	Test:       	TestAgent_Metrics_SSH
        	Messages:   	check fn for coderd_agentstats_currently_reachable_peers failed
```
This value is incremented by a successful ping to the peer from the
agent, which is dependent on all the networking code, which I think is
definitely out of scope of this test for agent metrics. So, we'll just
assert that the metrics exist with the correct labels (`derp`, `p2p`)
2025-10-24 17:28:57 +11:00
Ethan 86ef3fb497 test: fix flake in TestAgent_Metrics_SSH (#20447)
Closes https://github.com/coder/internal/issues/921

The flake in the linked issue was caused by the startup script taking longer than 1 second in CI. The existing conditional, that the startup script duration was under a second, was incorrect; the correct conditional is that the metric exists with the `success` label set to `true`.
2025-10-24 14:06:25 +11:00
Spike Curtis 5807fe01e4 test: prevent TestAgent_ReconnectingPTY connection reporting check from interfering (#20210)
When we added support for connection tracking in the Workspace agent, we modified the ReconnectingPTY tests to add an initial connection that we immediately hang up and check that connections are logged.

In the case of `screen`-based pty handling, hanging up the initial connection can race with the initial attachment to the `screen` process, and cause that process to exit early. This leaves subsequent connections to the same session ID to fail.

In this PR we just use different pty session IDs so that the initial connections we do to verify logging don't interfere with the rest of the test.

_Arguably_ it's a bug in our Reconnecting PTY code that hanging up immediately can leave the system in a weird state, but we do eventually recover and error out, so I don't think it's worth trying to fix.
2025-10-08 16:23:46 +04:00
Zach 4d1003eace fix: remove initial global HTTP client usage (#20128)
This PR makes the initial steps at removing usage of the global Go HTTP
client, which was seen to have impacts on test flakiness in
https://github.com/coder/internal/issues/1020. The first commit removes
uses from tests, with the exception of one test that is tightly coupled
to the default client. The second commit makes easy/low-risk removals
from application code. This should have some impact to reduce test flakiness.
2025-10-02 11:43:13 -06:00
Spike Curtis 1354d84eb4 chore: refactor instance identity to be a SessionTokenProvider (#19566)
Refactors Agent instance identity to be a SessionTokenProvider.

Refactors the CLI to create Agent clients via a centralized function, rather than add-hoc via individual command handlers and their flags.

This allows commands besides `coder agent`, but which still use the agent identity, to support instance identity authentication.

Fixes #19111 by unifying all API requests to go thru the SessionTokenProvider for auth credentials.
2025-09-03 10:38:42 +04:00
Ethan 51d8a05301 test: disable direct connections for a deterministic reachable peers metric (#19458)
closes https://github.com/coder/internal/issues/921

Not sure what I was thinking when I wrote this test case, but it was
relying on the connection being p2p on every ping, which is technically
and evidently not always the case. Instead we'll require a DERP peer,
and block direct connections.
2025-08-21 11:46:56 +10:00
Danielle Maywood 5e84d257b7 refactor: convert workspacesdk.AgentConn to an interface (#19392)
Fixes https://github.com/coder/internal/issues/907

We convert `workspacesdk.AgentConn` to an interface and generate a mock
for it. This allows writing `coderd` tests that rely on the agent's HTTP
api to not have to set up an entire tailnet networking stack.
2025-08-20 10:00:44 +01:00
Dean Sheather c6c8b00b07 chore: require nolint for testutil.RunRetry (#19394) 2025-08-19 00:48:10 +10:00
Dean Sheather e2ba9e7d62 chore: retry TestAgent_Dial subtests (#19387)
Closes https://github.com/coder/internal/issues/595
2025-08-18 13:51:19 +00:00
Ethan d7bdb3cdef ci: add paralleltestctx to lint/go (#19369)
Closes https://github.com/coder/internal/issues/884

We're adding this as a `go run` in `lint/go` for now, since adding it to
golangci-lint ourselves involves recompiling golangci-lint and then
running that new binary. I'll look into proposing it being added to the
public golangci-lint linters.

Doesn't appear to cause the lint ci job to take any longer, which is
nice.
2025-08-15 16:16:18 +10:00
Danielle Maywood ddb5b87815 chore(agent/agentcontainers): test current prebuilds integration (#19074)
As it turns out, prebuilds + devcontainers appear to already work
together. This PR has created a test that simulates a prebuild claim
happening to `agentcontainers.API`, to see how we handle it.
2025-07-31 15:31:44 +01:00
Danielle Maywood 0118e75009 fix(agent): disable dev container integration inside sub agents (#18781)
It appears we accidentally broke this logic in a previous PR. This
should now correctly disable the agent api as we'd expect.
2025-07-08 11:05:30 +01:00
Mathias Fredriksson e03d13211c test(agent): fix TestAgent_DevcontainerRecreate (#18618) 2025-06-26 17:50:53 +00:00
Mathias Fredriksson 9fde8353ad test(agent/agentcontainers): add is a test ignore label to integration tests (#18570) 2025-06-25 11:20:14 +00:00
Mathias Fredriksson 99d124e276 feat(agent): enable devcontainers by default (#18533) 2025-06-24 21:17:04 +03:00
ケイラ fae30a00fd chore: remove unnecessary redeclarations in for loops (#18440) 2025-06-20 13:16:55 -06:00
Danielle Maywood 118bf98145 chore(agent): add workspace owner env var and log dev container app failures (#18433)
Listen to feedback that was missed in
https://github.com/coder/coder/pull/18346

- Adds `CODER_WORKSPACE_OWNER_NAME` into the agent environment.
- Logs warnings for when dev container app creation fails.
2025-06-19 09:37:48 +01:00
Mathias Fredriksson 7fa1ad8923 fix(agent/agentcontainers): reduce need to recreate sub agents (#18402) 2025-06-17 18:53:41 +03:00
Mathias Fredriksson ae0c8701bb feat(agent): disable devcontainers for sub agents (#18303)
Updates coder/internal#621
Refs #18245
2025-06-10 10:47:02 +00:00
Mathias Fredriksson fca99174ad feat(agent/agentcontainers): implement sub agent injection (#18245)
This change adds support for sub agent creation and injection into dev
containers.

Updates coder/internal#621
2025-06-10 12:37:54 +03:00
Mathias Fredriksson a18eb9d08f feat(site): allow recreating devcontainers and showing dirty status (#18049)
This change allows showing the devcontainer dirty status in the UI as
well as a recreate button to update the devcontainer.

Closes #16424
2025-05-27 19:42:24 +03:00
Mathias Fredriksson 0731304905 feat(agent/agentcontainers): recreate devcontainers concurrently (#18042)
This change introduces a refactor of the devcontainers recreation logic
which is now handled asynchronously rather than being request scoped.
The response was consequently changed from "No Content" to "Accepted" to
reflect this.

A new `Status` field was introduced to the devcontainer struct which
replaces `Running` (bool). This reflects that the devcontainer can now
be in various states (starting, running, stopped or errored).

The status field also protects against multiple concurrent recrations,
as long as they are initiated via the API.

Updates #16424
2025-05-26 18:30:52 +03:00
Spike Curtis 90e93a2399 chore: fix agent tests on Windows 11 (#17631)
Fixes a couple agent tests so that they work correctly on Windows.

`HOME` is not a standard Windows environment variable, and we don't have any specific Code in Coder to set it on SSH, so I've removed the test case. Amazingly/bizarrely the Windows test runners set this variable, but this is not standard Windows behavior so we shouldn't be including it in our tests.

Also the command `true` is not valid on a default Windows install.

```
true: The term 'true' is not recognized as a name of a cmdlet, function, script file, or executable program.
Check the spelling of the name, or if a path was included, verify that the path is correct and try again.
```

I'm not really sure how the CI runners are allowing this test to pass, but again, it's not standard so we shouldn't be doing it.
2025-05-16 07:50:29 +04:00
Mathias Fredriksson 3de0003e4b feat(agent): send devcontainer CLI logs during recreate (#17845)
We need a way to surface what's happening to the user, since autostart
logs here, it's natural we do so during re-create as well.

Updates #16424
2025-05-15 16:06:56 +03:00
Mathias Fredriksson 7af188bfc1 fix(agent): fix unexpanded devcontainer paths for agentcontainers (#17736)
Devcontainers were duplicated in the API because paths weren't
absolute, we now normalize them early on to keep it simple.

Updates #16424
2025-05-12 14:03:40 +03:00
ケイラ f670bc31f5 chore: update testutil chan helpers (#17408) 2025-04-16 10:37:09 -06:00
Spike Curtis 73f5af82ad test: fix TestAgent_Lifecycle/ShutdownScriptOnce to wait for stats (#17387)
fixes: https://github.com/coder/internal/issues/576

TestAgent_Lifecycle/ShutdownScriptOnce hits error logs which cause test
failures. These logs are legit errors and have to do with shutting down
the agent before it has fully come up.

This PR changes the test to wait for the agent to send stats (a good
indicator that it's fully up, and beyond the errors that have triggered
test failures in past) before closing it.
2025-04-14 12:20:50 +00:00
Spike Curtis c1816e3674 fix(agent): fix deadlock if closed while starting listeners (#17329)
fixes #17328

Fixes a deadlock if we close the Agent in the middle of starting listeners on the tailnet.
2025-04-10 12:46:19 +04:00
Spike Curtis b1f59aafc1 fix: stop checking gauges unrelated to TestAgent_Stats_Magic (#17290)
Fixes https://github.com/coder/internal/issues/564

The test is asserting too much, including stats guages that are not directly related to the thing we are trying to test: ConnectionCount, RxBytes, and TxBytes.  I think the author assumed that these are counts that only go up, but they are guages and eventually zero back out, so there are race condtions where not all of them are non-zero at the same time.
2025-04-08 17:01:21 +04:00
Mathias Fredriksson 7d4b3c8634 feat(agent): add devcontainer autostart support (#17076)
This change adds support for devcontainer autostart in workspaces. The
preconditions for utilizing this feature are:

1. The `coder_devcontainer` resource must be defined in Terraform
2. By the time the startup scripts have completed,
	- The `@devcontainers/cli` tool must be installed
	- The given workspace folder must contain a devcontainer configuration

Example Terraform:

```tf
resource "coder_devcontainer" "coder" {
  agent_id         = coder_agent.main.id
  workspace_folder = "/home/coder/coder"
  config_path      = ".devcontainer/devcontainer.json" # (optional)
}
```

Closes #16423
2025-03-27 12:31:30 +02:00
Mathias Fredriksson 3005cb4594 feat(agent): set additional login vars, LOGNAME and SHELL (#16874)
This change stes additional env vars. This is useful for programs that
assume their presence (for instance, Zed remote relies on SHELL).

See `man login`.
2025-03-11 10:18:57 +00:00
Mathias Fredriksson dfcd93b26e feat: enable agent connection reports by default, remove flag (#16778)
This change enables agent connection reports by default and removes the
experimental flag for enabling them.

Updates #15139
2025-03-03 18:37:28 +02:00
Eng Zer Jun 04c33968cf refactor: replace golang.org/x/exp/slices with slices (#16772)
The experimental functions in `golang.org/x/exp/slices` are now
available in the standard library since Go 1.21.

Reference: https://go.dev/doc/go1.21#slices

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2025-03-04 00:46:49 +11:00
Thomas Kosiewski d0e2060692 feat(agent): add second SSH listener on port 22 (#16627)
Fixes: https://github.com/coder/internal/issues/377

Added an additional SSH listener on port 22, so the agent now listens on both, port one and port 22.

---
Change-Id: Ifd986b260f8ac317e37d65111cd4e0bd1dc38af8
Signed-off-by: Thomas Kosiewski <tk@coder.com>
2025-03-03 04:47:42 +01:00
Cian Johnston ec44f06f5c feat(cli): allow SSH command to connect to running container (#16726)
Fixes https://github.com/coder/coder/issues/16709 and
https://github.com/coder/coder/issues/16420

Adds the capability to`coder ssh` into a running container if `CODER_AGENT_DEVCONTAINERS_ENABLE=true`.

Notes:
* SFTP is currently not supported
* Haven't tested X11 container forwarding
* Haven't tested agent forwarding
2025-02-28 09:38:45 +00:00
Mathias Fredriksson 4ba5a8a2ba feat(agent): add connection reporting for SSH and reconnecting PTY (#16652)
Updates #15139
2025-02-27 10:45:45 +00:00
Cian Johnston 172e52317c feat(agent): wire up agentssh server to allow exec into container (#16638)
Builds on top of https://github.com/coder/coder/pull/16623/ and wires up
the ReconnectingPTY server. This does nothing to wire up the web
terminal yet but the added test demonstrates the functionality working.

Other changes:
* Refactors and moves the `SystemEnvInfo` interface to the
`agent/usershell` package to address follow-up from
https://github.com/coder/coder/pull/16623#discussion_r1967580249
* Marks `usershellinfo.Get` as deprecated. Consumers should use the
`EnvInfoer` interface instead.

---------

Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
Co-authored-by: Danny Kopping <danny@coder.com>
2025-02-26 09:03:27 +00:00
Mathias Fredriksson 9f5ad23644 refactor(agent/agentssh): move parsing of magic session and create type (#16630)
This change refactors the parsing of MagicSessionEnvs in the agentssh
package and moves the logic to an earlier stage. Also intoduces enums
for MagicSessionType.

Refs #15139
2025-02-19 22:18:31 +02:00
Mathias Fredriksson 9520da338e fix: conform to stricter printf usage in Go 1.24 (#16330) 2025-01-29 18:06:22 +02:00
Mathias Fredriksson c069563af1 test: fix use of t.Logf where t.Log would suffice (#16328) 2025-01-29 14:35:04 +00:00
Cian Johnston 7b88776403 chore(testutil): add testutil.GoleakOptions (#16070)
- Adds `testutil.GoleakOptions` and consolidates existing options to
this location
- Pre-emptively adds required ignore for this Dependabot PR to pass CI
https://github.com/coder/coder/pull/16066
2025-01-08 15:38:37 +00:00
Jon Ayers 1f238fed59 feat: integrate new agentexec pkg (#15609)
- Integrates the `agentexec` pkg into the agent and removes the
legacy system of iterating over the process tree. It adds some linting
rules to hopefully catch future improper uses of `exec.Command` in the package.
2024-11-27 20:12:15 +02:00
Spike Curtis 5861e516b9 chore: add standard test logger ignoring db canceled (#15556)
Refactors our use of `slogtest` to instantiate a "standard logger" across most of our tests.  This standard logger incorporates https://github.com/coder/slog/pull/217 to also ignore database query canceled errors by default, which are a source of low-severity flakes.

Any test that has set non-default `slogtest.Options` is left alone. In particular, `coderdtest` defaults to ignoring all errors. We might consider revisiting that decision now that we have better tools to target the really common flaky Error logs on shutdown.
2024-11-18 14:09:22 +04:00