Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:
```
import (
"context"
"time"
"github.com/prometheus/client_golang/prometheus"
"golang.org/x/xerrors"
"gopkg.in/natefinch/lumberjack.v2"
"cdr.dev/slog/v3"
"github.com/coder/coder/v2/codersdk/agentsdk"
"github.com/coder/serpent"
)
```
3 groups: standard library, 3rd partly libs, Coder libs.
This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).
It also updates dependencies that also use slog and were updated.
I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.
Other dependencies, I pushed new tags.
BREAKING CHANGE: SFTP/SCP now respects the agent's configured directory.
If your workspace agent has a custom `dir` configured in Terraform, SFTP
and SCP connections will now land there instead of `$HOME`. Previously,
only SSH and rsync respected this setting, which caused confusing behavior
where `scp file.txt coder:.` and `rsync file.txt coder:.` would put files
in different places. If you have scripts that relied on SFTP/SCP always
using `$HOME` regardless of agent configuration, you may need to use
explicit paths instead.
## Problem
The `TestAgent_SessionTTYShell` test was flaking on macOS CI runners
with:
```
match deadline exceeded: context deadline exceeded (wanted 1 bytes; got 0: "")
```
The test uses `WaitShort` (10s) for the context timeout when waiting for
shell prompt output via `Peek(ctx, 1)`. On slow macOS CI runners, the
shell startup can exceed this timeout due to resource contention.
This is evidenced in the failure logs, the SSH session was not reported
by the agent until the 10s timeout is nearly up - it took a while to
connect.
## Solution
Increase the timeout from `WaitShort` (10s) to `WaitMedium` (30s). This
matches the timeout used by `ExpectMatch` internally and gives the shell
more time to initialize on slow CI machines.
---
This PR was entirely generated by [mux](https://github.com/coder/mux)
but reviewed by a human.
Closes https://github.com/coder/internal/issues/1177
Second flake for this test today 😮💨.
Flake seen here, though I couldn't replicate this locally, some CI
exclusive networking issue.
https://github.com/coder/coder/actions/runs/18770305895/job/53553517887?pr=20448
```
agent_test.go:3619:
Error Trace: /home/runner/work/coder/coder/agent/agent_test.go:3619
Error: Received unexpected error:
expected 1, got 0.000000:
github.com/coder/coder/v2/agent_test.TestAgent_Metrics_SSH.func7
/home/runner/work/coder/coder/agent/agent_test.go:3557
Test: TestAgent_Metrics_SSH
Messages: check fn for coderd_agentstats_currently_reachable_peers failed
```
This value is incremented by a successful ping to the peer from the
agent, which is dependent on all the networking code, which I think is
definitely out of scope of this test for agent metrics. So, we'll just
assert that the metrics exist with the correct labels (`derp`, `p2p`)
Closes https://github.com/coder/internal/issues/921
The flake in the linked issue was caused by the startup script taking longer than 1 second in CI. The existing conditional, that the startup script duration was under a second, was incorrect; the correct conditional is that the metric exists with the `success` label set to `true`.
When we added support for connection tracking in the Workspace agent, we modified the ReconnectingPTY tests to add an initial connection that we immediately hang up and check that connections are logged.
In the case of `screen`-based pty handling, hanging up the initial connection can race with the initial attachment to the `screen` process, and cause that process to exit early. This leaves subsequent connections to the same session ID to fail.
In this PR we just use different pty session IDs so that the initial connections we do to verify logging don't interfere with the rest of the test.
_Arguably_ it's a bug in our Reconnecting PTY code that hanging up immediately can leave the system in a weird state, but we do eventually recover and error out, so I don't think it's worth trying to fix.
This PR makes the initial steps at removing usage of the global Go HTTP
client, which was seen to have impacts on test flakiness in
https://github.com/coder/internal/issues/1020. The first commit removes
uses from tests, with the exception of one test that is tightly coupled
to the default client. The second commit makes easy/low-risk removals
from application code. This should have some impact to reduce test flakiness.
Refactors Agent instance identity to be a SessionTokenProvider.
Refactors the CLI to create Agent clients via a centralized function, rather than add-hoc via individual command handlers and their flags.
This allows commands besides `coder agent`, but which still use the agent identity, to support instance identity authentication.
Fixes#19111 by unifying all API requests to go thru the SessionTokenProvider for auth credentials.
closes https://github.com/coder/internal/issues/921
Not sure what I was thinking when I wrote this test case, but it was
relying on the connection being p2p on every ping, which is technically
and evidently not always the case. Instead we'll require a DERP peer,
and block direct connections.
Fixes https://github.com/coder/internal/issues/907
We convert `workspacesdk.AgentConn` to an interface and generate a mock
for it. This allows writing `coderd` tests that rely on the agent's HTTP
api to not have to set up an entire tailnet networking stack.
Closes https://github.com/coder/internal/issues/884
We're adding this as a `go run` in `lint/go` for now, since adding it to
golangci-lint ourselves involves recompiling golangci-lint and then
running that new binary. I'll look into proposing it being added to the
public golangci-lint linters.
Doesn't appear to cause the lint ci job to take any longer, which is
nice.
As it turns out, prebuilds + devcontainers appear to already work
together. This PR has created a test that simulates a prebuild claim
happening to `agentcontainers.API`, to see how we handle it.
Listen to feedback that was missed in
https://github.com/coder/coder/pull/18346
- Adds `CODER_WORKSPACE_OWNER_NAME` into the agent environment.
- Logs warnings for when dev container app creation fails.
This change introduces a refactor of the devcontainers recreation logic
which is now handled asynchronously rather than being request scoped.
The response was consequently changed from "No Content" to "Accepted" to
reflect this.
A new `Status` field was introduced to the devcontainer struct which
replaces `Running` (bool). This reflects that the devcontainer can now
be in various states (starting, running, stopped or errored).
The status field also protects against multiple concurrent recrations,
as long as they are initiated via the API.
Updates #16424
Fixes a couple agent tests so that they work correctly on Windows.
`HOME` is not a standard Windows environment variable, and we don't have any specific Code in Coder to set it on SSH, so I've removed the test case. Amazingly/bizarrely the Windows test runners set this variable, but this is not standard Windows behavior so we shouldn't be including it in our tests.
Also the command `true` is not valid on a default Windows install.
```
true: The term 'true' is not recognized as a name of a cmdlet, function, script file, or executable program.
Check the spelling of the name, or if a path was included, verify that the path is correct and try again.
```
I'm not really sure how the CI runners are allowing this test to pass, but again, it's not standard so we shouldn't be doing it.
fixes: https://github.com/coder/internal/issues/576
TestAgent_Lifecycle/ShutdownScriptOnce hits error logs which cause test
failures. These logs are legit errors and have to do with shutting down
the agent before it has fully come up.
This PR changes the test to wait for the agent to send stats (a good
indicator that it's fully up, and beyond the errors that have triggered
test failures in past) before closing it.
Fixes https://github.com/coder/internal/issues/564
The test is asserting too much, including stats guages that are not directly related to the thing we are trying to test: ConnectionCount, RxBytes, and TxBytes. I think the author assumed that these are counts that only go up, but they are guages and eventually zero back out, so there are race condtions where not all of them are non-zero at the same time.
This change adds support for devcontainer autostart in workspaces. The
preconditions for utilizing this feature are:
1. The `coder_devcontainer` resource must be defined in Terraform
2. By the time the startup scripts have completed,
- The `@devcontainers/cli` tool must be installed
- The given workspace folder must contain a devcontainer configuration
Example Terraform:
```tf
resource "coder_devcontainer" "coder" {
agent_id = coder_agent.main.id
workspace_folder = "/home/coder/coder"
config_path = ".devcontainer/devcontainer.json" # (optional)
}
```
Closes#16423
This change stes additional env vars. This is useful for programs that
assume their presence (for instance, Zed remote relies on SHELL).
See `man login`.
The experimental functions in `golang.org/x/exp/slices` are now
available in the standard library since Go 1.21.
Reference: https://go.dev/doc/go1.21#slices
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
Fixes: https://github.com/coder/internal/issues/377
Added an additional SSH listener on port 22, so the agent now listens on both, port one and port 22.
---
Change-Id: Ifd986b260f8ac317e37d65111cd4e0bd1dc38af8
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Builds on top of https://github.com/coder/coder/pull/16623/ and wires up
the ReconnectingPTY server. This does nothing to wire up the web
terminal yet but the added test demonstrates the functionality working.
Other changes:
* Refactors and moves the `SystemEnvInfo` interface to the
`agent/usershell` package to address follow-up from
https://github.com/coder/coder/pull/16623#discussion_r1967580249
* Marks `usershellinfo.Get` as deprecated. Consumers should use the
`EnvInfoer` interface instead.
---------
Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
Co-authored-by: Danny Kopping <danny@coder.com>
This change refactors the parsing of MagicSessionEnvs in the agentssh
package and moves the logic to an earlier stage. Also intoduces enums
for MagicSessionType.
Refs #15139
- Adds `testutil.GoleakOptions` and consolidates existing options to
this location
- Pre-emptively adds required ignore for this Dependabot PR to pass CI
https://github.com/coder/coder/pull/16066
- Integrates the `agentexec` pkg into the agent and removes the
legacy system of iterating over the process tree. It adds some linting
rules to hopefully catch future improper uses of `exec.Command` in the package.
Refactors our use of `slogtest` to instantiate a "standard logger" across most of our tests. This standard logger incorporates https://github.com/coder/slog/pull/217 to also ignore database query canceled errors by default, which are a source of low-severity flakes.
Any test that has set non-default `slogtest.Options` is left alone. In particular, `coderdtest` defaults to ignoring all errors. We might consider revisiting that decision now that we have better tools to target the really common flaky Error logs on shutdown.