Commit Graph

29 Commits

Author SHA1 Message Date
Ethan c650aabbef chore: standardize on *_internal_test.go for white-box tests (#25601)
My agent added `//nolint:testpackage` to a test file on one of my PRs.
Again. This PR cleans it up across the entire repo and updates the
in-repo conventions so future agents stop doing it.

The repo already has a precedent for white-box tests that need to touch
unexported symbols: `*_internal_test.go` (145+ existing files). The
`testpackage` linter's default `skip-regexp` exempts that filename
suffix, so the `//nolint:testpackage` directive is unnecessary in every
case where someone reached for it. This PR renames 51 such files to
`*_internal_test.go` via `git mv` so blame and history follow, and
strips the dead directive from 2 files that were already correctly named
(`coderd/oauth2provider/authorize_internal_test.go`,
`coderd/x/chatd/advisor_internal_test.go`).

`.claude/docs/TESTING.md` now documents the rule explicitly under *Test
Package Naming*, which is imported into the root `AGENTS.md` via
`@.claude/docs/TESTING.md`. The rule: prefer `package foo_test`; if you
need internal access, rename the file to `*_internal_test.go` rather
than adding a nolint directive.
2026-05-22 20:24:38 +10:00
Michael Suchacz cd54861e4f fix(agent): set utf8 locale for tmux terminals (#25530)
> Mux is updating this PR on behalf of Mike.

## Summary
- Set a UTF-8 `LC_CTYPE` fallback for reconnecting PTYs when no
effective UTF-8 locale is present.
- Preserve non-empty `LC_ALL` so explicit user locale choices still win.
- Add tmux glyph regression coverage for reconnecting PTYs, plus unit
coverage for the env helper.
- Stabilize the tmux regression by keeping the pane alive until the
glyph output is observed.
- Keep the env helper unit test expectations OS-aware for Windows and
cover unhyphenated UTF8 locales.

## Validation
- `go test ./agent/reconnectingpty -run TestWithTerminalEnv -count=1`
- `go test ./agent -run '^TestAgent_ReconnectingPTY$/Buffered$'
-count=1`
- `go test ./agent -run '^TestAgent_ReconnectingPTY$' -count=1`
- `make lint`
- `git commit` pre-commit hook
- `git push` pre-push hook
2026-05-20 17:12:23 +02:00
Atif Ali 7ffeac711c fix: correct web terminal glyph rendering and tmux display (#25059)
The web terminal was rendering Claude Code and Codex incorrectly because
xterm's custom glyph renderer draws block and quadrant characters with
its own geometry. The reconnecting PTY screen backend also exposed
`screen.xterm-256color` to the user's shell, which made tmux rendering
issues harder to reason about.

This PR:

* Disables xterm custom glyph rendering so the selected terminal font
draws block and quadrant glyphs.
* Adds a tiny Powerline-only terminal symbol fallback font so common
prompt separators still render when custom glyphs are disabled.
* Configures the screen backend to keep the inner shell `TERM` aligned
with the browser terminal emulator, including background color erase
behavior.
* Tightens reconnecting PTY tests around prompt synchronization and
`TERM` assertions.

<!-- linear:table-colwidths:200,200 -->
| Before | After |
| -- | -- |
| <img
src="https://uploads.linear.app/e62091d9-44f5-421c-8e5c-df481fc99003/3c45efce-9d7e-43b4-b24f-88d4d23d294a/ba68155e-949e-4961-b0b2-124757cb07bb?signature=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJwYXRoIjoiL2U2MjA5MWQ5LTQ0ZjUtNDIxYy04ZTVjLWRmNDgxZmM5OTAwMy8zYzQ1ZWZjZS05ZDdlLTQzYjQtYjI0Zi04OGQ0ZDIzZDI5NGEvYmE2ODE1NWUtOTQ5ZS00OTYxLWIwYjItMTI0NzU3Y2IwN2JiIiwiaWF0IjoxNzc4MTgxNjUwLCJleHAiOjE4MDk3NTIyMTB9.45f1ZzBpWOF5OCJV0xHfICdpyRQ1UoGMbJjLYPqeAkg
" alt="Before: Claude Code logo rendering is distorted in the web
terminal outside and inside tmux" width="640" /> | <img
src="https://uploads.linear.app/e62091d9-44f5-421c-8e5c-df481fc99003/26b0a109-5e21-4000-b1b5-ddac87c409d4/46a301c2-a815-419a-92d2-c51cecdefe40?signature=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJwYXRoIjoiL2U2MjA5MWQ5LTQ0ZjUtNDIxYy04ZTVjLWRmNDgxZmM5OTAwMy8yNmIwYTEwOS01ZTIxLTQwMDAtYjFiNS1kZGFjODdjNDA5ZDQvNDZhMzAxYzItYTgxNS00MTlhLTkyZDItYzUxY2VjZGVmZTQwIiwiaWF0IjoxNzc4MTgxNjUwLCJleHAiOjE4MDk3NTIyMTB9.SQVwUbtaf2OrpjRJPkRH3uc0nPqad0bNBVvcRyuR6NQ
" alt="After: Claude Code logo renders correctly in the web terminal
outside and inside tmux" width="640" /> |

## Validation

* `go test ./agent -run '^TestAgent_ReconnectingPTY$' -count=1`
* `pnpm --dir site test -- src/theme/constants.test.ts`
* `pnpm --dir site lint:types`
* `pnpm --dir site check`
* `pnpm --dir site build`
* `git commit` pre-commit hook passed
* `git push` pre-push hook ran and printed the repo CI monitoring hint

> Mux worked on this PR on Mike's behalf.

---------

Co-authored-by: Michael Suchacz <203725896+ibetitsmike@users.noreply.github.com>
2026-05-20 13:57:50 +02:00
Steven Masley 1afc6d4fd0 feat: structured disconnect attribution for agent logs (#25191)
Implements
[PLAT-60](https://linear.app/codercom/issue/PLAT-60/enhance-disconnect-logs-with-structured-reason-attribution):
adds structured disconnect attribution to disconnect logs throughout the
agent and tailnet packages.

Every disconnect log site now carries structured slog fields. All
existing logs remain; existing messages are preserved with the fields
added alongside.

New fields on disconnect log lines:

- `connect_type` — which layer disconnected: `server_to_agent`,
`agent_to_client`, or `client_to_server`
- `disconnect_reason` — categorical reason: `graceful`, `network_error`,
`server_shutdown`, etc.
- `disconnect_expected` — whether the disconnect is normal operation
(`true`) or should be investigated (`false`)
- `disconnect_initiator` — who started it: `client`, `agent`, `server`,
or `network` (control-plane sites only)
- `disconnect_detail` — free-form supplemental info (where useful)

## What's covered

**Control plane (`server_to_agent`):** coordination RPC, DERP map
subscriber, agent runLoop, agent Close, `BasicCoordination.Close`,
`Controller.run`.

**Data plane (`agent_to_client`):** SSH sessions, reconnecting PTY,
JetBrains port-forwarding.

<details>
<summary>Control-plane sites</summary>

| Site | Reason | Initiator |
|---|---|---|
| `agent/agent.go` `runLoop` EOF | `network_error` | `network` |
| `agent/agent.go` `runCoordinator` deferred exit | `server_shutdown` /
`graceful` / `network_error` | `agent` / `server` / `network` |
| `agent/agent.go` `runDERPMapSubscriber` deferred exit | same (shared
`classifyCoordinatorRPCExit`) | same |
| `agent/agent.go` `Close` shutdown timeout | `server_shutdown` + detail
| `agent` |
| `agent/agent.go` `Close` clean coord disconnect | `server_shutdown` |
`agent` |
| `tailnet/controllers.go` `BasicCoordination.Close` | `graceful` or
`network_error` | `c.initiator` |
| `tailnet/controllers.go` `Controller.run` `net.ErrClosed` |
`network_error` | `network` |

</details>

<details>
<summary>Data-plane sites</summary>

| Site | Reason | Notes |
|---|---|---|
| `agent/agentssh/agentssh.go` SSH session closed | free-form
(`graceful`, `process exited with error status: N`, etc.) | Also sets
`closeCause("normal exit")` for clean exits so coderd's
`connection_log.DisconnectReason` is no longer empty |
| `agent/reconnectingpty/server.go` PTY closed | `server_shutdown`,
error string, or `graceful` | |
| `agent/agentssh/jetbrainstrack.go` channel closed | `normal close` or
error string | Previously passed empty reason |

</details>

<details>
<summary>Bug fix</summary>

The deferred `disconnected from coordination RPC` log no longer fires
when the initial `Coordinate()` RPC call fails before any connection is
established.

</details>

Refs PLAT-60.

---

_This PR was prepared by Coder Agents on behalf of @Emyrk._
**Manually QA'd a lot of common disconnects**

---------

Co-authored-by: Coder Agents <noreply@coder.com>
2026-05-19 09:47:03 -05:00
Spike Curtis bddb808b25 chore: arrange imports in a standard way (#21452)
Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:

```
import (
	"context"
	"time"

	"github.com/prometheus/client_golang/prometheus"
	"golang.org/x/xerrors"
	"gopkg.in/natefinch/lumberjack.v2"

	"cdr.dev/slog/v3"
	"github.com/coder/coder/v2/codersdk/agentsdk"
	"github.com/coder/serpent"
)
```

3 groups: standard library, 3rd partly libs, Coder libs.

This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
2026-01-08 15:24:11 +04:00
Spike Curtis 49b34a716a fix: fix slog to always use array of Fields (#21426)
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).

It also updates dependencies that also use slog and were updated.

I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.

Other dependencies, I pushed new tags.
2026-01-08 10:29:41 +04:00
Spike Curtis 40df21ed62 fix: fixes use of possibly nil RemoteAddr() and LocalAddr() return values (#21076)
fixes: https://github.com/coder/internal/issues/1143

Both gVisor and the Go standard library implementations of `net.Conn` can under certain circumstances return `nil` for `RemoteAddr()` and `LocalAddr()` calls. If we call their methods, we segfault.

This PR fixes these calls and adds ruleguard rules.

Note that `slog.F("remote_addr", conn.RemoteAddr())` is fine because slog detects the `nil` before attempting to stringify the type.
2025-12-03 15:06:00 +04:00
Spike Curtis 5807fe01e4 test: prevent TestAgent_ReconnectingPTY connection reporting check from interfering (#20210)
When we added support for connection tracking in the Workspace agent, we modified the ReconnectingPTY tests to add an initial connection that we immediately hang up and check that connections are logged.

In the case of `screen`-based pty handling, hanging up the initial connection can race with the initial attachment to the `screen` process, and cause that process to exit early. This leaves subsequent connections to the same session ID to fail.

In this PR we just use different pty session IDs so that the initial connections we do to verify logging don't interfere with the rest of the test.

_Arguably_ it's a bug in our Reconnecting PTY code that hanging up immediately can leave the system in a weird state, but we do eventually recover and error out, so I don't think it's worth trying to fix.
2025-10-08 16:23:46 +04:00
Mathias Fredriksson 99d124e276 feat(agent): enable devcontainers by default (#18533) 2025-06-24 21:17:04 +03:00
Aaron Lehmann 8cc743a812 chore: clarify error variable name in doAttach (#17284) 2025-04-16 14:44:33 +05:00
Jon Ayers 17ddee05e5 chore: update golang to 1.24.1 (#17035)
- Update go.mod to use Go 1.24.1
- Update GitHub Actions setup-go action to use Go 1.24.1
- Fix linting issues with golangci-lint by:
  - Updating to golangci-lint v1.57.1 (more compatible with Go 1.24.1)

🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <claude@anthropic.com>
2025-03-26 01:56:39 -05:00
Cian Johnston 68624092a4 feat(agent/reconnectingpty): allow selecting backend type (#17011)
agent/reconnectingpty: allow specifying backend type
cli: exp rpty: automatically select backend based on command
2025-03-20 13:45:31 +00:00
Eng Zer Jun 04c33968cf refactor: replace golang.org/x/exp/slices with slices (#16772)
The experimental functions in `golang.org/x/exp/slices` are now
available in the standard library since Go 1.21.

Reference: https://go.dev/doc/go1.21#slices

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2025-03-04 00:46:49 +11:00
Cian Johnston ec44f06f5c feat(cli): allow SSH command to connect to running container (#16726)
Fixes https://github.com/coder/coder/issues/16709 and
https://github.com/coder/coder/issues/16420

Adds the capability to`coder ssh` into a running container if `CODER_AGENT_DEVCONTAINERS_ENABLE=true`.

Notes:
* SFTP is currently not supported
* Haven't tested X11 container forwarding
* Haven't tested agent forwarding
2025-02-28 09:38:45 +00:00
Mathias Fredriksson 4ba5a8a2ba feat(agent): add connection reporting for SSH and reconnecting PTY (#16652)
Updates #15139
2025-02-27 10:45:45 +00:00
Cian Johnston 172e52317c feat(agent): wire up agentssh server to allow exec into container (#16638)
Builds on top of https://github.com/coder/coder/pull/16623/ and wires up
the ReconnectingPTY server. This does nothing to wire up the web
terminal yet but the added test demonstrates the functionality working.

Other changes:
* Refactors and moves the `SystemEnvInfo` interface to the
`agent/usershell` package to address follow-up from
https://github.com/coder/coder/pull/16623#discussion_r1967580249
* Marks `usershellinfo.Get` as deprecated. Consumers should use the
`EnvInfoer` interface instead.

---------

Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>
Co-authored-by: Danny Kopping <danny@coder.com>
2025-02-26 09:03:27 +00:00
Cian Johnston 4edd77bc82 chore(agent/agentssh): extract CreateCommandDeps (#16603)
Extracts environment-level dependencies of
`agentssh.Server.CreateCommand()` to an interface to allow alternative
implementations to be passed in.
2025-02-19 09:03:59 +00:00
Jon Ayers ce573b9faa fix: add agent exec abstraction (#15717) 2024-12-04 23:30:25 +02:00
Jon Ayers 1f238fed59 feat: integrate new agentexec pkg (#15609)
- Integrates the `agentexec` pkg into the agent and removes the
legacy system of iterating over the process tree. It adds some linting
rules to hopefully catch future improper uses of `exec.Command` in the package.
2024-11-27 20:12:15 +02:00
Spike Curtis 103824f726 fix: fix panic while tearing down reconnecting PTY (#15615)
fixes https://github.com/coder/internal/issues/221

Fixes an issue where two goroutines were sharing the `err` variable, leading to a data race where we'd fail to process the error and then nil-pointer panic.

I ended up refactoring reconnecting PTY stuff into the `reconnectingpty` package, instead of having it on the agent.  That `createTailnet` routine had waaay too many deeply nested goroutines, which is I'm sure a big contributor to the bug appearing in the first place.
2024-11-22 09:46:25 +04:00
Cian Johnston 12a9d6336b fix(agent): start rpty lifecycle after all reads/writes (#15535)
Fixes https://github.com/coder/internal/issues/214

#15475 missed that we also write to `rpty` after starting
`rpty.lifecycle()`.
This PR moves the function call right at the end. Hopefully this should
address the data races before we go resorting to mutexes.
2024-11-15 14:48:17 +00:00
Cian Johnston b6e7498cb8 fix(agent/reconnectingpty): generate rpty id before starting lifecycle (#15475)
Fixes https://github.com/coder/coder/issues/12687

There was a race condition where we would start the rpty lifecycle
before generating the ID, leading to a data race where we would try to
concurrently read and write the struct field.
2024-11-11 15:02:55 +00:00
Colin Adler 4d5a7b2d56 chore(codersdk): move all tailscale imports out of codersdk (#12735)
Currently, importing `codersdk` just to interact with the API requires
importing tailscale, which causes builds to fail unless manually using
our fork.
2024-03-26 12:44:31 -05:00
Mathias Fredriksson 0442ee5fa8 fix(agent/reconnectingpty): fix screen startup speed by disabling messages (#12190) 2024-02-16 22:37:02 +02:00
Asher 4af8446f48 fix: initialize terminal with correct size (#10369)
* Fit once during creation

This does not fix any bugs (that I know of) but we only need to fit once
when the terminal is created, not every time we reconnect.  Granted,
currently we do not support reconnecting without refreshing anyway so it
does not really matter, but this just seems more correct.

Plus now we will not have to pass the fit addon around.

* Pass size when connecting web socket URL

I think this will solve an issue where screen does does not correctly
handle an immediate resize.  It seems to ignore the resize, but even if
you send it again nothing changes, seemingly thinking it is already at
that size?

* Use new struct for decoding reconnecting pty requests

Decoding a JSON message does not touch omitted (or null) fields so once
a message with a resize comes in, every single message from that point
will cause a resize.

I am not sure if this is an actual problem in practice but at the very
least it seems unintentional.
2023-10-23 23:42:39 -04:00
Asher a9077812e2 fix: use UTF-8 encoding with screen (#10190)
This will make characters like ❯ and ⇣ work, for example.
2023-10-11 13:25:04 -08:00
Kyle Carberry 22e781eced chore: add /v2 to import module path (#9072)
* chore: add /v2 to import module path

go mod requires semantic versioning with versions greater than 1.x

This was a mechanical update by running:
```
go install github.com/marwan-at-work/mod/cmd/mod@latest
mod upgrade
```

Migrate generated files to import /v2

* Fix gen
2023-08-18 18:55:43 +00:00
Asher a08f7b8fb9 fix: catch missing output with reconnecting PTY (#9094)
I forgot that waiting on the cond releases the lock so it was possible
to get pty output after writing the buffer but before adding the pty to
the map.  To fix, add the pty to the map while under the same lock where
we read from the buffer.

The rest does not need to be behind the lock so I moved it out of
doAttach, and that also means we no longer need
waitForStateOrContextLocked.

Also, this can hit a logger error saying the attach failed which fails
the tests however it is not that the attach failed, just that the
process already ran and exited, so when the process exits do not
set an error, instead for now assume this is an expected close.
2023-08-14 15:54:23 -08:00
Asher b993cab49a fix: use screen for reconnecting terminal sessions on Linux if available (#8640)
* Add screen backend for reconnecting ptys

The screen portion is a port from wsep.  There is an interface that lets
you choose between screen and the previous method.  By default it will
choose screen if it is installed but this can be overidden (mostly for
tests).

The tests use a scanner instead of a reader now because the reader will
loop infinitely at the end of a stream.

Replace /bin/bash with bash since bash is not always in /bin.

* Remove connection_id from reconnecting PTY logger

This serves multiple connections so it makes no sense to scope it to a
single connection.

Also lets us use "connection_id" when logging write errors instead of
"other_conn_id".

* Use PATH to test buffered reconnecting pty
2023-08-14 11:19:13 -08:00