Commit Graph

568 Commits

Author SHA1 Message Date
Mathias Fredriksson 98e2ec4417 feat: show devcontainer dirty status and allow recreate (#17880)
Updates #16424
2025-05-19 12:56:10 +03:00
Danielle Maywood 83df55700b revert(agent): remove CODER_AGENT_IS_SUB_AGENT cli flag (#17875)
The RFC has changed, this information will be passed through the
manifest instead.
2025-05-16 11:04:21 +00:00
Spike Curtis 90e93a2399 chore: fix agent tests on Windows 11 (#17631)
Fixes a couple agent tests so that they work correctly on Windows.

`HOME` is not a standard Windows environment variable, and we don't have any specific Code in Coder to set it on SSH, so I've removed the test case. Amazingly/bizarrely the Windows test runners set this variable, but this is not standard Windows behavior so we shouldn't be including it in our tests.

Also the command `true` is not valid on a default Windows install.

```
true: The term 'true' is not recognized as a name of a cmdlet, function, script file, or executable program.
Check the spelling of the name, or if a path was included, verify that the path is correct and try again.
```

I'm not really sure how the CI runners are allowing this test to pass, but again, it's not standard so we shouldn't be doing it.
2025-05-16 07:50:29 +04:00
Mathias Fredriksson 3de0003e4b feat(agent): send devcontainer CLI logs during recreate (#17845)
We need a way to surface what's happening to the user, since autostart
logs here, it's natural we do so during re-create as well.

Updates #16424
2025-05-15 16:06:56 +03:00
Mathias Fredriksson 522c178271 fix(agent/agentcontainers): always use /bin/sh for devcontainer autostart (#17847)
This fixes startup issues when the user shell is set to Fish.

Refs: #17845
2025-05-15 12:49:52 +03:00
Mathias Fredriksson eb6412a69b refactor(agent/agentcontainers): update routes and locking in container api (#17768)
This refactor updates the devcontainer routes and in-api locking for
better clarity.

Updates #16424
2025-05-15 11:29:26 +03:00
Sas Swart 425ee6fa55 feat: reinitialize agents when a prebuilt workspace is claimed (#17475)
This pull request allows coder workspace agents to be reinitialized when
a prebuilt workspace is claimed by a user. This facilitates the transfer
of ownership between the anonymous prebuilds system user and the new
owner of the workspace.

Only a single agent per prebuilt workspace is supported for now, but
plumbing has already been done to facilitate the seamless transition to
multi-agent support.

---------

Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>
2025-05-14 14:15:36 +02:00
Steven Masley 64807e1d61 chore: apply the 4mb max limit on drpc protocol message size (#17771)
Respect the 4mb max limit on proto messages
2025-05-13 11:24:51 -05:00
Danielle Maywood 7f056da088 feat: add hidden CODER_AGENT_IS_SUB_AGENT flag to coder agent (#17783)
Closes https://github.com/coder/internal/issues/620

Adds a new, hidden, flag `CODER_AGENT_IS_SUB_AGENT` to the `coder agent`
command.
2025-05-13 10:57:50 +01:00
Steven Masley 37832413ba chore: resolve internal drpc package conflict (#17770)
Our internal drpc package name conflicts with the external one in usage. 
`drpc.*` == external
`drpcsdk.*` == internal
2025-05-12 10:31:38 -05:00
Mathias Fredriksson 7af188bfc1 fix(agent): fix unexpanded devcontainer paths for agentcontainers (#17736)
Devcontainers were duplicated in the API because paths weren't
absolute, we now normalize them early on to keep it simple.

Updates #16424
2025-05-12 14:03:40 +03:00
Mathias Fredriksson ebad5c3ed0 test(agent): fix channel timeout in TestNewServer_CloseActiveConnections (#17690)
This fixes a test issue where we were waiting on a channel indefinitely
and the test timed out instead of failing due to earlier error.

Updates coder/internal#558
2025-05-06 11:20:28 +00:00
Spike Curtis ef00ae54f4 fix: fix data race in agentscripts.Runner (#17630)
Fixes https://github.com/coder/internal/issues/604

Fixes a data race in `agentscripts.Runner` where a concurrent `Execute()` call races with `Init()`. We hit this race during shut down, which is not synchronized against starting up.

In this PR I've chosen to add synchronization to the `Runner` rather than try to synchronize the calls in the agent. When we close down the agent, it's OK to just throw an error if we were never initialized with a startup script---we don't want to wait for it since that requires an active connection to the control plane.
2025-05-01 14:25:02 +04:00
Mathias Fredriksson 1fc74f629e refactor(agent): update agentcontainers api initialization (#17600)
There were too many ways to configure the agentcontainers API resulting
in inconsistent behavior or features not being enabled. This refactor
introduces a control flag for enabling or disabling the containers API.
When disabled, all implementations are no-op and explicit endpoint
behaviors are defined. When enabled, concrete implementations are used
by default but can be overridden by passing options.
2025-04-29 17:53:10 +03:00
Mathias Fredriksson 268a50c193 feat(agent/agentcontainers): add file watcher and dirty status (#17573)
Fixes coder/internal#479
Fixes coder/internal#480
2025-04-29 11:53:58 +03:00
Dean Sheather b15d060410 fix(agent): return listed drives on failure on windows (#17505)
The behavior of the partitions listing function from gopsutil is that it
will return all partitions that didn't fail to be read, but will return
something similar to a multierror.

Errors are now ignored unless there are no drives returned.
2025-04-23 03:31:43 +10:00
Cian Johnston 6a79965948 fix(agent/agentcontainers): handle race between docker ps and docker inspect (#17447)
Fixes https://github.com/coder/internal/issues/586#event-17291038671
2025-04-17 13:50:51 +01:00
ケイラ f670bc31f5 chore: update testutil chan helpers (#17408) 2025-04-16 10:37:09 -06:00
Aaron Lehmann 8cc743a812 chore: clarify error variable name in doAttach (#17284) 2025-04-16 14:44:33 +05:00
Mathias Fredriksson 00b5f56734 feat(agent/agentcontainers): add devcontainers list endpoint (#17389)
This change allows listing both predefined and runtime-detected
devcontainers, as well as showing whether or not the devcontainer is
running and which container represents it.

Fixes coder/internal#478
2025-04-15 17:53:37 +03:00
Spike Curtis 73f5af82ad test: fix TestAgent_Lifecycle/ShutdownScriptOnce to wait for stats (#17387)
fixes: https://github.com/coder/internal/issues/576

TestAgent_Lifecycle/ShutdownScriptOnce hits error logs which cause test
failures. These logs are legit errors and have to do with shutting down
the agent before it has fully come up.

This PR changes the test to wait for the agent to send stats (a good
indicator that it's fully up, and beyond the errors that have triggered
test failures in past) before closing it.
2025-04-14 12:20:50 +00:00
Mathias Fredriksson 60fbe675ed refactor(agent/agentcontainers): implement API service (#17340)
This refactor improves separation of API and containers with minimal
changes to logic.

Highlights:

- Routes are now defined in `agentcontainers` package
- Handler renamed to API
- API lazy init was moved into NewAPI
- Tests that don't need to be internal were made external
2025-04-11 11:41:13 +03:00
Mathias Fredriksson 25fb34cabe feat(agent): implement recreate for devcontainers (#17308)
This change implements an interface for running `@devcontainers/cli up`
and an API endpoint on the agent for triggering recreate for a running
devcontainer.

A couple of limitations:

1. Synchronous HTTP request, meaning the browser might choose to time it
out before it's done => no result/error (and devcontainer cli command
probably gets killed via ctx cancel).
2. Logs are only written to agent logs via slog, not as a "script" in
the UI.

Both 1 and 2 will be improved in future refactors.

Fixes coder/internal#481
Fixes coder/internal#482
2025-04-10 16:16:16 +03:00
Mathias Fredriksson 33b9487899 fix(agent/agentcontainers/dcspec): generate unmarshalers and add tests (#17330)
This change allows proper unmarshaling of `devcontainer.json` and will
no longer break EnvInfoer or the Web Terminal.

Fixes coder/internal#556
2025-04-10 12:10:58 +00:00
Spike Curtis c1816e3674 fix(agent): fix deadlock if closed while starting listeners (#17329)
fixes #17328

Fixes a deadlock if we close the Agent in the middle of starting listeners on the tailnet.
2025-04-10 12:46:19 +04:00
Mathias Fredriksson 98c05b3568 test(agent/agentssh): fix macos signal flake during close (#17313)
Fixes coder/internal#558
2025-04-09 20:28:32 +03:00
Cian Johnston 43b1a034b1 chore(agent/agentscripts): disable TestTimeout on macOS (#17300) 2025-04-09 08:23:43 +00:00
Cian Johnston 44ddc9f654 chore(agent/agentscripts): increase timeout in TestTimeout (#17293)
Fixes https://github.com/coder/internal/issues/329

This was due to a race between the process starting and the timeout of
the agent startup script executor. I'm taking the 'lazy' route here and
increasing the timeout to 100ms. This does technically mean that this
makes the test 100 times longer to execute. However, if it takes more
than 100ms to run a `sleep infinity` command on our test runner, I think
we have other issues.
2025-04-08 14:35:13 +00:00
Spike Curtis b1f59aafc1 fix: stop checking gauges unrelated to TestAgent_Stats_Magic (#17290)
Fixes https://github.com/coder/internal/issues/564

The test is asserting too much, including stats guages that are not directly related to the thing we are trying to test: ConnectionCount, RxBytes, and TxBytes.  I think the author assumed that these are counts that only go up, but they are guages and eventually zero back out, so there are race condtions where not all of them are non-zero at the same time.
2025-04-08 17:01:21 +04:00
Aaron Lehmann aa0a63a295 fix(agent): log correct error variable in createTailnet (#17283) 2025-04-07 16:32:52 +00:00
Mathias Fredriksson 074ec2887d test(agent/agentssh): fix test race and improve Windows compat (#17271)
Fixes coder/internal#558
2025-04-07 11:32:37 +03:00
Aaron Lehmann e9863aba81 fix: log correct error on drpc connection close error (#17265) 2025-04-04 22:09:42 +03:00
Spike Curtis 42e5d71f59 fix: fix closeMutex unlock bug (#17259)
Fixes https://github.com/coder/internal/issues/550

Classic return before unlocking bug.
2025-04-04 14:29:56 +04:00
Spike Curtis f6bf6c6ec4 fix!: use names not IDs for agent SSH key seed (#17258)
Changes the SSH host key seeding to use the owner username, workspace name, and agent name. This prevents SSH from complaining about a mismatched host key if you use Coder Desktop to connect, and delete and recreate your workspace with the same name. Previously this would generate a different key because the workspace ID changed.

We also include the owner's username in anticipation of using Coder Desktop to access shared workspaces (or as a superuser) down the road, so that workspaces with the same name owned by different users will not have the same key.

This change is **BREAKING** in a limited sense that early access users of Coder Desktop will see their SSH clients complain about host keys changing the first time each workspace is rebuilt with this code. It can be resolved by clearing your `.ssh/known_hosts` file of the Coder workspaces you access this way.
2025-04-04 12:51:46 +04:00
Mathias Fredriksson b61f0ab958 fix(agent): ensure SSH server shutdown with process groups (#17227)
Fix hanging workspace shutdowns caused by orphaned SSH child processes.
Key changes:

- Create process groups for non-PTY SSH sessions
- Send SIGHUP to entire process group for proper termination
- Add 5-second timeout to prevent indefinite blocking

Fixes #17108
2025-04-03 16:01:43 +03:00
Ethan 998724de91 chore: sort agent /list-directory output (#17218)
This sorts the `contents` list alphabetically, but with directories before everything else.
This is purely for UX on the Coder Desktop side, where the user only really cares about directories, and files are just for providing context in the file picker.
2025-04-03 12:31:46 +11:00
Jon Ayers eded0ed4b6 chore: fix false positives in CodeQL (#17138)
Clears up some false positives being surfaced by CodeQL
2025-03-27 16:06:58 -05:00
Mathias Fredriksson 7d4b3c8634 feat(agent): add devcontainer autostart support (#17076)
This change adds support for devcontainer autostart in workspaces. The
preconditions for utilizing this feature are:

1. The `coder_devcontainer` resource must be defined in Terraform
2. By the time the startup scripts have completed,
	- The `@devcontainers/cli` tool must be installed
	- The given workspace folder must contain a devcontainer configuration

Example Terraform:

```tf
resource "coder_devcontainer" "coder" {
  agent_id         = coder_agent.main.id
  workspace_folder = "/home/coder/coder"
  config_path      = ".devcontainer/devcontainer.json" # (optional)
}
```

Closes #16423
2025-03-27 12:31:30 +02:00
Danielle Maywood 1bbbae8d57 chore: migrate to github.com/coder/clistat (#17107)
Migrate from in-tree `clistat` package to
https://github.com/coder/clistat.
2025-03-26 10:36:53 +00:00
Jon Ayers 17ddee05e5 chore: update golang to 1.24.1 (#17035)
- Update go.mod to use Go 1.24.1
- Update GitHub Actions setup-go action to use Go 1.24.1
- Fix linting issues with golangci-lint by:
  - Updating to golangci-lint v1.57.1 (more compatible with Go 1.24.1)

🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <claude@anthropic.com>
2025-03-26 01:56:39 -05:00
Mathias Fredriksson 5c8cac9fb7 feat: add name to workspace agent devcontainers (#17089)
In the presence of multiple devcontainers, it would be nice to
differentiate them by name. This change inherits the resource name from
terraform.

Refs #17076
2025-03-25 12:59:20 +00:00
Mathias Fredriksson 6bf22f8dc6 fix(Makefile): fix dcspec gen dependencies and hide error output (#17043) 2025-03-24 11:41:45 +00:00
Danielle Maywood 765e7058e8 chore: use container memory if containerised for oom notifications (#17062)
Currently we query only the underlying host's memory usage for our
memory resource monitor. This PR changes that to check if the workspace
is in a container, and if so it queries the container's memory usage,
falling back to the host's memory usage if not.
2025-03-24 11:14:21 +00:00
Mathias Fredriksson 69ba27e347 feat: allow specifying devcontainer on agent in terraform (#16997)
This change allows specifying devcontainers in terraform and plumbs it
through to the agent via agent manifest.

This will be used for autostarting devcontainers in a workspace.

Depends on coder/terraform-provider-coder#368
Updates #16423
2025-03-20 19:09:39 +02:00
Cian Johnston 68624092a4 feat(agent/reconnectingpty): allow selecting backend type (#17011)
agent/reconnectingpty: allow specifying backend type
cli: exp rpty: automatically select backend based on command
2025-03-20 13:45:31 +00:00
Mathias Fredriksson 3ac844ad3d chore(codersdk): rename WorkspaceAgent(Dev)container structs (#16996)
This is to free up the devcontainer name space for more targeted
structs.

Updates #16423
2025-03-19 10:16:14 +00:00
Cian Johnston 75b27e8f19 fix(agent/agentcontainers): improve testing of convertDockerInspect, return correct host port (#16887)
* Improves separation of concerns between `runDockerInspect` and
`convertDockerInspect`: `runDockerInspect` now just runs the command and
returns the output, while `convertDockerInspect` now does all of the
conversion and parsing logic.
* Improves testing of `convertDockerInspect` using real test fixtures.
* Fixes issue where the container port is returned instead of the host
port.
* Updates UI to link to correct host port. Container port is still
displayed in the button text, but the HostIP:HostPort is shown in a
popover.
* Adds stories for workspace agent UI
2025-03-18 14:37:45 +00:00
Cian Johnston 13a3ddd964 fix(agent/agentcontainers): generate devcontainer metadata from schema (#16881)
Adds new dcspec package containing automatically generated devcontainer schema (using glideapps/quicktype).
2025-03-18 13:00:21 +00:00
Mathias Fredriksson df92df4565 fix(agent): filter out GOTRACEBACK=none (#16924)
With the switch to Go 1.24.1, our dogfood workspaces started setting
`GOTRACEBACK=none` in the environment, resulting in missing stacktraces
for users.

This is due to the capability changes we do when
`USE_CAP_NET_ADMIN=true`.

https://github.com/coder/coder/blob/564b387262e5b768c503e5317242d9ab576395d6/provisionersdk/scripts/bootstrap_linux.sh#L60-L76

This most likely triggers a change in securitybits which sets
`_AT_SECURE` for the process.

https://github.com/golang/go/blob/a1ddbdd3ef8b739aab53f20d6ed0a61c3474cf12/src/runtime/os_linux.go#L297-L327

Which in turn triggers secure mode:

https://github.com/golang/go/blob/a1ddbdd3ef8b739aab53f20d6ed0a61c3474cf12/src/runtime/security_unix.go

This should not affect workspaces as template authors can still set the
environment on the agent resource.

See https://pkg.go.dev/runtime#hdr-Security
2025-03-17 11:10:14 +02:00
Mathias Fredriksson 3005cb4594 feat(agent): set additional login vars, LOGNAME and SHELL (#16874)
This change stes additional env vars. This is useful for programs that
assume their presence (for instance, Zed remote relies on SHELL).

See `man login`.
2025-03-11 10:18:57 +00:00