follows on from #21940.
The API endpoints existed for this already, so this PR just adds CLI functionality which uses those API endpoints.
Generated with the help of Mux
## Summary
Fixes flaky `TestServer/BuiltinPostgres` test caused by port conflicts
in CI.
## Fix
Increase retry attempts from 3 to 10 for better odds when port conflicts
occur.
Fixes https://github.com/coder/internal/issues/1017
Adds additional logs for determining what signal the agent receives
prior to shut down. Also helps distinguish whether the signal originated
at the agent or reaper.
Context was created before expensive setup operations (building
workspaces, starting agents), leaving insufficient time for the actual
command execution. Split into setupCtx for setup and a fresh ctx for
the command to ensure both get the full timeout.
The API endpoints existed for this already, so this PR just adds CLI
functionality which uses those API endpoints.
closes#21891
Generated with the help of Mux
* Adds support for parameter `format=text` in the following API routes:
* `/api/v2/workspaceagents/:id/logs`
* `/api/v2/workspacebuilds/:id/logs`
* `/api/v2/templateversions/:id/logs`
* `/api/v2/templateversions/:id/dry-run/:id/logs`
* Adds links to view raw logs on the following pages:
* Workspace build page
* Template editor page
* Template version page
* Refactors existing log formatting in `cli/logs.go` to live in `codersdk`.
🤖 Generated with Claude Opus 4.5, reviewed by me.
---------
Co-authored-by: Claude <noreply@anthropic.com>
These tests use dbfake to set up database state directly and don't
need a provisioner daemon. Removing it fixes a flaky failure on
Windows where the provisioner daemon acquired a job that dbfake had
already "completed", causing the task status to be "error" instead
of "paused".
Fixescoder/internal#1322
Refs coder/internal#1323
Apply optimizations:
* https://github.com/openai/openai-go/pull/602
* https://github.com/coder/aibridge/pull/160
These reduce CPU time and allocation count for OpenAI `chat/completions`
and `responses` APIs, making the use of OpenAI chat models through AI
Bridge more performant.
In order to test these changes, we add scaletesting support for the
responses API.
## Description
Mark `--ssh-hostname-prefix` flag and `CODER_SSH_HOSTNAME_PREFIX` env
variable as deprecated, recommending users to use
`--workspace-hostname-suffix` / `CODER_WORKSPACE_HOSTNAME_SUFFIX`
instead for consistency with Coder Desktop.
The deprecated option is now hidden from help output and docs but
remains functional for backward compatibility. When used, it will show a
deprecation warning pointing to the recommended alternative.
## Changes
- Added `UseInstead` pointing to `workspace-hostname-suffix` option
(triggers deprecation warning)
- Set `Hidden: true` to hide from CLI help and documentation
- Updated description to mention deprecation
- Regenerated docs and help files via `make gen`
Closes#18156
---
_Originally requested by @matifali in
https://github.com/coder/coder/pull/18085#discussion_r2115594447_
The test was creating two template versions without explicit names,
relying on `namesgenerator.NameDigitWith()` which can produce
collisions. When both versions got the same random name, the test failed
with a 409 Conflict error.
Fix by giving each version an explicit name (`v1`, `v2`).
Closes https://github.com/coder/internal/issues/1309
---
*Generated by [mux](https://mux.coder.com)*
Adds a new subcommand to print the current session token for use in
scripts and automation, similar to `gh auth token`.
## Usage
```bash
CODER_SESSION_TOKEN=$(coder login token)
```
Fixes#21515
Fixes: https://github.com/coder/internal/issues/560
"Select" CLI UI component should ignore "space" when `+Add custom value`
is highlighted. Otherwise it interprets that as a potential option...
and panics.
The reaper (PID 1) now returns the child's exit code instead of always
exiting 0. Signal termination uses the standard Unix convention of 128 +
signal number.
fixes#21661
The test occasionally times out at 15s on Windows CI runners.
Investigation of CI logs shows the HTTP request to the agent's
gitsshkey endpoint never appears in server logs, suggesting it
hangs before the request completes (possibly in connection setup,
middleware, or database queries). Increase to 60s to reduce flake
rate.
Fixescoder/internal#770
This undeprecates the `allow-workspace-renames` flag. IIUC, the 'danger'
with using this flag is that the workspace name might have been used in
the definition of some other terraform resources within template code,
so a rename could cause problems such as with persistent disks.
for https://github.com/coder/coder/issues/21628
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Source code changes:
- Added a wrapper for the boundary subcommand that checks feature
entitlement before executing the underlying command.
- Added a helper that returns the Boundary version using the
runtime/debug package, which reads this information from the go.mod
file.
- Added FeatureBoundary to the corresponding enum.
- Move boundary command from AGPL to enterprise.
`NOTE`: From now on, the Boundary version will be specified in go.mod
instead of being defined in AI modules.
Relates to https://github.com/coder/internal/issues/1289
I was able to reproduce the issue locally -- it appears to sometimes
just take 25 seconds to get all of the test dependencies stood up:
```
t.go:111: 2026-01-22 16:39:15.388 [debu] pubsub: pubsub dialing postgres network=tcp address=127.0.0.1:5432 timeout_ms=0
...
t.go:111: 2026-01-22 16:39:38.789 [info] agent.net.tailnet.tcp: accepted connection src=[fd7a:115c:a1e0:44b1:8901:8f09:e605:d019]:55406 dst=[fd7a:115c:a1e0:4cfd:a892:e4e2:8cad:8534]:1
...
ssh_test.go:1208:
Error Trace: /Users/cian/src/coder/coder/testutil/chan.go:74
/Users/cian/src/coder/coder/cli/ssh_test.go:1208
Error: SoftTryReceive: context expired
Test: TestSSH/StdioExitOnParentDeath
ssh_test.go:1212:
```
Hopefully bumping the timeout should fix it.
The removal of that permission from the role broke valid use cases (e.g.
a site owner user creating a workspace owned by a system account and
then trying to share it with another user).
The bulk of the PR is made up of the rollbacks of the previously
introduced test updates necessitated by the removal.
Related to: https://github.com/coder/internal/issues/1285
Relates to https://github.com/coder/internal/issues/1217
Adds a background goroutine in `--stdio` mode to check if the parent PID
is still alive and exit if it is no longer present.
🤖 Implemented using Mux + Claude Opus 4.5, reviewed and refactored by
me.
## Summary
Add circuit breaker support for AI Bridge to protect against cascading
failures from upstream AI provider rate limits (HTTP 429, 503, and
Anthropic's 529 overloaded responses).
## Changes
- Add 5 new CLI options for circuit breaker configuration:
- `--aibridge-circuit-breaker-enabled` (default: false)
- `--aibridge-circuit-breaker-failure-threshold` (default: 5)
- `--aibridge-circuit-breaker-interval` (default: 10s)
- `--aibridge-circuit-breaker-timeout` (default: 30s)
- `--aibridge-circuit-breaker-max-requests` (default: 3)
- Update aibridge dependency to include circuit breaker support
- Add tests for pool creation with circuit breaker providers
## Notes
- Circuit breaker is **disabled by default** for backward compatibility
- When enabled, applies to both OpenAI and Anthropic providers
- Uses sony/gobreaker internally via the aibridge library
## Testing
```
make test RUN=TestPoolWithCircuitBreakerProviders
```
- Adds pprof collection support now that we have the listeners
automatically starting (requires Coder server 2.28.0+, includes a
version check). Collects heap, allocs, profile (30s), block, mutex,
goroutine, threadcreate, trace (30s), cmdline, symbol. Performs capture
for 30 seconds and emits a log line stating as such. Enable capture by
supplying the `--pprof` flag or `CODER_SUPPORT_BUNDLE_PPROF` env var.
Collection of pprof data from both coderd and the Coder agent occurs.
- Adds collection of Prometheus metrics, also requires 2.28.0+
- Adds the ability to include a template in the bundle independently of
supplying the details of a running workspace by supplying the
`--template` flag or `CODER_SUPPORT_BUNDLE_TEMPLATE` env var
- Captures a list of workspaces the user has access to. Defaults to a
max of 10, configurable via `--workspaces-total-cap` /
`CODER_SUPPORT_BUNDLE_WORKSPACES_TOTAL_CAP`
- Collects additional stats from the coderd deployment (aggregated
workspace/session metrics), as well as entitlements via license and
dismissed health checks.
created with help from mux
## Description
Adds upstream proxy support for AI Bridge Proxy passthrough requests.
This allows aiproxy to forward non-allowlisted requests through an
upstream proxy. Currently, the only supported configuration is when
aiproxy is the first proxy in the chain (client → aiproxy → upstream
proxy).
## Changes
* Add `--aibridge-proxy-upstream` option to configure an upstream
HTTP/HTTPS proxy URL for passthrough requests
* Add `--aibridge-proxy-upstream-ca` option to trust custom CA
certificates for HTTPS upstream proxies
* Passthrough requests (non-allowlisted domains) are forwarded through
the upstream proxy
* MITM'd requests (allowlisted domains) continue to go directly to
aibridge, not through the upstream proxy
* Add tests for upstream proxy configuration and request routing
Closes: https://github.com/coder/internal/issues/1204
This PR improves the usability of `coder show`:
- Adds a header with workspace owner/name, latest build status and time
since, and template name / version name.
- Updates `namedWorkspace` to allow looking up by UUID
- Also improves associated `TestShow` to respect context deadlines.
Adds a per-organization setting to disable workspace sharing. When enabled,
all existing workspace ACLs in the organization are cleared and the workspace
ACL mutation API endpoints return `403 Forbidden`.
This complements the existing site-wide `--disable-workspace-sharing` flag by
providing more granular control at the organization level.
Closes https://github.com/coder/internal/issues/1073 (part 2)
---------
Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com>
## Description
Implements selective MITM (Man-in-the-Middle) in `aibridgeproxyd` so
that only requests to allowlisted domains are intercepted and decrypted.
Requests to all other domains are tunneled directly without decryption.
## Changes
* New config option: `CODER_AIBRIDGE_PROXY_DOMAIN_ALLOWLIST` (default:
`api.anthropic.com`,`api.openai.com`)
* Selective MITM: Uses `goproxy.ReqHostIs()` to only intercept `CONNECT`
requests to allowlisted hosts
* Certificate caching: Now only generates/caches certificates for
allowlisted domains
* Validation: Startup fails if domain allowlist is empty or contains
invalid entries
Closes: https://github.com/coder/internal/issues/1182
Closes https://github.com/coder/coder/issues/21360
A few considerations/notes:
- I've kept the number of conns to 10 in all other places, except coderd
- which uses the config value
- I opted to also make idle conns configurable; the greater the delta
between max open and max idle, the more connection churn
- Postgres maintains a [_process_ per
connection](https://www.postgresql.org/docs/current/connect-estab.html),
contrary to what the comment said previously
- Operators should be able to tune this, since process churn can
negatively affect OS scheduling
- I've set the value to `"auto"` by default so it's not another knob one
_has to_ twiddle, and sets max idle = max conns / 3
---------
Signed-off-by: Danny Kopping <danny@coder.com>
Adds the following information to CLI User-Agent headers to aid
deployment administrators in troubleshooting where requests are coming
from.
Before: `Go-http-client/1.1`
After: `coder-cli/v2.34.5 (linux/amd64; coder whoami)`
🤖 These changes were generated by Claude Sonnet 4.5 but reviewed and
edited manually by me.
## Summary
Adds a `--no-build` flag to `coder state push` that updates the
Terraform state directly without triggering a workspace build.
## Use Case
This enables state-only migrations, such as migrating Kubernetes
resources from deprecated types (e.g., `kubernetes_config_map`) to
versioned types (e.g., `kubernetes_config_map_v1`):
```bash
coder state pull my-workspace > state.json
terraform init
terraform state rm -state=state.json kubernetes_config_map.example
terraform import -state=state.json kubernetes_config_map_v1.example default/example
coder state push --no-build my-workspace state.json
```
## Changes
- Add `PUT /api/v2/workspacebuilds/{id}/state` endpoint to update state
without triggering a build
- Add `UpdateWorkspaceBuildState` SDK method
- Add `--no-build`/`-n` flag to `coder state push`
- Add confirmation prompt (can be skipped with `--yes`/`-y`) since this
is a potentially dangerous operation
- Add test for `--no-build` functionality
Fixes#21336
**This is just the protobuf changes for the PR https://github.com/coder/coder/pull/21398**
Moved `UploadFileRequest` from `provisionerd.proto` -> `provisioner.proto`.
Renamed to `FileUpload` because it is now bi-directional.
This **is backwards compatible**. I tested it to confirm the payloads are identical. Types were just renamed and moved around.
```golang
func TestTypeUpgrade(t *testing.T) {
t.Parallel()
x := &proto2.UploadFileRequest{
Type: &proto2.UploadFileRequest_ChunkPiece{
ChunkPiece: &proto.ChunkPiece{
Data: []byte("Hello World!"),
FullDataHash: []byte("Foobar"),
PieceIndex: 42,
},
},
}
data, err := protobuf.Marshal(x)
require.NoError(t, err)
// Exactly the same output
// EhgKDEhlbGxvIFdvcmxkIRIGRm9vYmFyGCo= on `main`
// EhgKDEhlbGxvIFdvcmxkIRIGRm9vYmFyGCo= on this branch
fmt.Println(base64.StdEncoding.EncodeToString(data))
}
```
# What this does
This allows provisioner daemons to download files from `coderd`'s `files` table. This is used to send over cached module files and prevent the need of downloading these modules on each workspace build.
Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:
```
import (
"context"
"time"
"github.com/prometheus/client_golang/prometheus"
"golang.org/x/xerrors"
"gopkg.in/natefinch/lumberjack.v2"
"cdr.dev/slog/v3"
"github.com/coder/coder/v2/codersdk/agentsdk"
"github.com/coder/serpent"
)
```
3 groups: standard library, 3rd partly libs, Coder libs.
This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
This PR adds a command to view the provisioner and agent logs for a
given workspace.
Note: I did investigate using the existing `cliui` methods to tail the
logs but they are tailored to a very specific use-case.
Other changes:
- Adds `Agents` to `dbfake.WorkspaceResponse`
- Adds methods to generate provisioner and agent logs in `dbgen`
---------
Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com>
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).
It also updates dependencies that also use slog and were updated.
I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.
Other dependencies, I pushed new tags.
Fixes https://github.com/coder/internal/issues/272
This test periodically fails due to the healthcheck timing out.
The problem is compounded due to the fact that we stand up a new
coderdtest instance for each test.
This PR does the following:
* Updates the subtests to share a single `coderdtest` instance.
* Hits the `/debug/health` endpoint before completing the setup phase so
that the result is cached.
This will not completely remove the issue, as the healthcheck could
still fail due to test-infrastructure-related issues. In this case we
may decide to add a retry in this 'seed' function.
Because this affects more than just the template insights
page (specifically it also affects the deployment stats endpoint which
is shown on bottom bar and Prometheus), the group is being renamed
generically to just "stats collection". In the future if we need to
affect the other stats we can put those options here.
Then, because this change only affects a portion of stats, specifically
usage stats like connection and application time, bytes sent, etc, add a
new sub-group called "usage stats".
Then finally add back the "enable" flag. This also gives us a place to
one day place an "anonymize" flag if we need to go that route.