Continuation of https://github.com/coder/coder/pull/23067
Add filtering to the paginated org member endpoint (pretty much the same
as what I did in the previous PR with group members, except there I also
had to add pagination since it was missing).
## Description
Blocks `CONNECT` tunnels to private and reserved IP ranges in
aibridgeproxyd, preventing the proxy from being used to reach internal
networks.
The Coder access URL is always exempt (hostname+port match) so the proxy
can reach its own deployment. It is possible to exempt additional ranges
via `CODER_AIBRIDGE_PROXY_ALLOWED_PRIVATE_CIDRS`.
DNS rebinding is handled differently per path:
* Direct (no upstream proxy): validate the resolved IP right before the
TCP dial, no window between check and connect.
* Upstream proxy: Resolves and checks before forwarding to the upstream
dialer. A small rebinding window exists since the upstream proxy
re-resolves independently.
## Changes
* Add blocked IP denylist covering private, reserved, and
special-purpose ranges
* Add `AllowedPrivateCIDRs` option with CLI flag and env var
* Wire IP checks into `proxy.ConnectDial` for both upstream and direct
paths
* Add tests for blocked/allowed cases across direct dial, upstream
proxy, CIDR exemptions, and CoderAccessURL exemption
Notes: documentation will be handled in a follow-up PR.
Closes: https://github.com/coder/security/issues/124
- Replace real healthcheck with mock `HealthcheckFunc` that returns a
canned report instantly
- Remove healthcheck cache-seeding goroutine/channel workaround
- Remove `HealthcheckTimeout: testutil.WaitSuperLong` (no longer needed)
- Reduce `setupCtx` from `WaitSuperLong` (60s) to `WaitLong` (25s)
The DERP healthcheck performs real network operations (portmapper
gateway probing, STUN) that hang for 60s+ on macOS CI runners. Since
`TestSupportBundle` validates bundle generation, not healthcheck
correctness, a canned report eliminates this entire class of flake.
Fixescoder/internal#272
> 🤖 This PR was created with the help of Coder Agents, and was reviewed
by my human. 🧑💻
Eliminates the timing flake in
`TestInterruptAutoPromotionIgnoresLaterUsageLimitIncrease` by making the
chatd worker loop clock-controllable.
## Changes
**`coderd/chatd/chatd.go`**
- Replace `time.NewTicker` calls in `Server.start()` with
`p.clock.NewTicker` using named quartz tags `("chatd", "acquire")` and
`("chatd", "stale-recovery")`.
**`coderd/chatd/chatd_test.go`**
- Inject `quartz.NewMock(t)` into the test via `newActiveTestServer`
config override.
- Trap the acquire ticker so the test controls exactly when pending
chats are reacquired.
- Rewrite the test flow as explicit clock-advance steps instead of
wall-clock polling.
**`AGENTS.md`**
- Document the PR title scope rule (scope must be a real path containing
all changed files).
## Validation
- `go test ./coderd/chatd -run
TestInterruptAutoPromotionIgnoresLaterUsageLimitIncrease -count=100` ✅
- `go test ./coderd/chatd` ✅
- `make lint` ✅
- Adds a new API endpoint `GET /api/v2/users/oidc-claims` that returns
only the **merged claims** (not the separate id_token/userinfo
breakdown). Scoped exclusively to the authenticated user's own identity
— no user parameter, so users cannot view each other's claims.
- Adds a new CLI command:** `coder users oidc-claims` that hits the
above endpoint.
- The existing owner-only debug endpoint is preserved unchanged for
admins who need the full claim breakdown.
> 🤖 This PR was created with the help of Coder Agents, and will be
reviewed by my human. 🧑💻
## Description
Implements the server-side merge logic for the `merge_strategy`
attribute added to `coder_env` in [terraform-provider-coder
v2.15.0](https://github.com/coder/terraform-provider-coder/pull/489).
This allows template authors to control how duplicate environment
variable names are combined across multiple `coder_env` resources.
Relates to https://github.com/coder/coder/issues/21885
## Supported strategies
| Strategy | Behavior |
|----------|----------|
| `replace` (default) | Last value wins — backward compatible |
| `append` | Joins values with `:` separator (e.g. PATH additions) |
| `prepend` | Prepends value with `:` separator |
| `error` | Fails the build if the variable is already defined |
## Example
```hcl
resource "coder_env" "path_tools" {
agent_id = coder_agent.dev.id
name = "PATH"
value = "/home/coder/tools/bin"
merge_strategy = "append"
}
```
## Changes
- **Proto**: Added `merge_strategy` field to `Env` message in
`provisioner.proto`
- **State reader**: Updated `agentEnvAttributes` struct and proto
construction in `resources.go`
- **Merge logic**: Added `mergeExtraEnvs()` function in
`provisionerdserver.go` with strategy-aware merging for both agent envs
and devcontainer subagent envs
- **Tests**: 15 unit tests covering all strategies, edge cases (empty
values, mixed strategies, multiple appends)
- **Dependency**: Bumped `terraform-provider-coder` v2.14.0 → v2.15.0
- **Fixtures**: Updated `duplicate-env-keys` test fixtures and golden
files
## Ordering
When multiple resources `append` or `prepend` to the same key, they are
processed in alphabetical order by Terraform resource address (per the
determinism fix in #22706).
## Summary
- add a hidden deployment config option for chat acquire batch size
(`CODER_CHAT_ACQUIRE_BATCH_SIZE` / `chat.acquireBatchSize`)
- thread the configured value into chatd startup while preserving the
existing default of `10`
- clamp the deployment value to the `int32` range before passing it into
chatd
- regenerate the API/docs/types/testdata artifacts for the new config
field
## Why
`chatd` currently acquires pending chats in batches of `10` via a
compile-time default. This change makes that batch size
operator-configurable from deployment config, so we can tune acquisition
behavior without another code change.
Fixes AIGOV-141
The `coder support bundle` command previously required admin permissions
(`Read DeploymentConfig`) and would abort entirely for non-admin
`member` users with:
```
failed authorization check: cannot Read DeploymentValues
```
This change makes the command **degrade gracefully** instead of failing
outright.
<details>
<summary>
Changes
</summary>
### `support/support.go`
- **`Run()`**: The authorization check for `Read DeploymentValues` is
now a soft warning instead of a hard gate. Unauthenticated users (401)
still fail, but authenticated users with insufficient permissions
proceed with reduced data.
- **`DeploymentInfo()`**: `DeploymentConfig` and `DebugHealth` fetches
now handle 403/401 responses gracefully, matching the existing pattern
used by `DeploymentStats`, `Entitlements`, and `HealthSettings`.
- **`NetworkInfo()`**: Coordinator debug and tailnet debug fetches now
check response status codes for 403/401 before reading the body.
### `cli/support.go`
- **`summarizeBundle()`**: No longer returns early when `Config` or
`HealthReport` is nil. Instead prints warnings and continues summarizing
available data (e.g., netcheck).
### Tests
- `MissingPrivilege` → `MemberNoWorkspace`: Asserts member users can
generate a bundle successfully with degraded admin-only data.
- `NoPrivilege` → `MemberCanGenerateBundle`: Asserts the CLI produces a
valid zip bundle for member users.
- All existing tests continue to pass (`NoAuth`, `OK`, `OK_NoWorkspace`,
`DontPanic`, etc.).
## Behavior matrix
| User type | Before | After |
|---|---|---|
| **Admin** | Full bundle | Full bundle (no change) |
| **Member** | Hard error | Bundle with degraded admin-only data |
| **Unauthenticated** | Hard error | Hard error (no change) |
Related to PRODUCT-182
## Summary
- add shared MCP annotation metadata to toolsdk tools
- emit MCP tool annotations from both coderd and CLI MCP servers
- cover annotation serialization in toolsdk, coderd MCP e2e, and CLI MCP
tests
## Why
- Coder already exposed MCP tools, but it did not populate MCP tool
annotation hints (`readOnlyHint`, `destructiveHint`, `idempotentHint`,
`openWorldHint`).
- Hosts such as Claude Desktop use those hints to classify and group
tools, so without them Coder tools can get lumped together.
- This change adds a shared annotation source in `toolsdk` and has both
MCP servers emit those hints through `mcp.Tool.Annotations`, avoiding
drift between local and remote MCP implementations.
## Testing
- Tested locally on Cladue Desktop and the tools are categorized
correctly.
<table>
<tr>
<td> Before
<td> After
<tr>
<td> <img width="613" height="183" alt="image"
src="https://github.com/user-attachments/assets/29d2e3fb-53bc-4ea7-bdb3-f10df4ef996b"
/>
<td> <img width="600" height="457" alt="image"
src="https://github.com/user-attachments/assets/cc384036-c9a7-4db9-9400-43ad51920ff5"
/>
</table>
Note: Done using Coder Agents, reviewed and tested by human locally
Introduce a three-way workspace sharing setting (none, everyone,
service_accounts) replacing the boolean workspace_sharing_disabled.
In service_accounts mode, only service account-owned workspaces can be
shared while regular members' share permissions are removed. Adds a
new organization-service-account system role with per-org permissions
reconciled alongside the existing organization-member system role.
Related to:
https://linear.app/codercom/issue/PLAT-28/feat-service-accounts-sharing-mode-and-rbac-role
---------
Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com>
Co-authored-by: Kayla はな <mckayla@hey.com>
Adds a `--no-wait` flag (CODER_CREATE_NO_WAIT) to the create command,
matching the existing pattern in `coder start`. When set, the `coder
create` command returns immediately after the workspace creation API
call succeeds instead of streaming build logs until completion.
This enables fire-and-forget workspace creation in CI/automation
contexts (e.g., GitHub Actions), where waiting for the build to finish
is unnecessary. Combined with other existing flags, users can create a
workspace with no interactivity, assuming the user is already
authenticated.
This PR adds a `WatchAllWorkspaces` function with `watch-all-workspaces`
endpoint, which can be used to listen on a single global pubsub channel
for _all_ workspace build updates, and makes use of it in the autostart
scaletest.
This negates the need to use a workspace watch pubsub channel _per_
workspace, which has auth overhead associated with each call. This is
especially relevant in situations such as the autostart scaletest, where
we need to start/stop a set of workspaces before we can configure their
autostart config. The overhead associated with all the watch requests
skews the scaletest results and makes it harder to reason about the
performance of the autostart feature itself.
The autostart scaletest also no longer generates its own metrics nor
does it wait for all the workspaces to actually start via autostart. We
should update the scaletest dashboard after both PRs are merged to
measure autostart performance via the new metrics.
The new function/endpoint and its usage in the autostart scaletest are
gated behind an experiment feature flag, this is something we should
discuss whether we want to enable the endpoint in prod by default or
not. If so, we can remove the experiment.
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Callum Styan <callum@coder.com>
_Disclaimer: implemented by a Coder Agent using Claude Opus 4.6._
Marks the injected MCP approach in AI Bridge as deprecated across the
codebase.
## Changes
- **`codersdk/deployment.go`**: Deprecated `ExternalAuthConfig.MCPURL`,
`.MCPToolAllowRegex`, `.MCPToolDenyRegex` fields; deprecated and hid the
`--aibridge-inject-coder-mcp-tools` server flag; deprecated
`AIBridgeConfig.InjectCoderMCPTools`.
- **`coderd/externalauth/externalauth.go`**: Deprecated `Config.MCPURL`,
`.MCPToolAllowRegex`, `.MCPToolDenyRegex`.
- **`enterprise/aibridgedserver/aibridgedserver.go`**: Added runtime
deprecation warning when `CODER_AIBRIDGE_INJECT_CODER_MCP_TOOLS` is
enabled; deprecated `getCoderMCPServerConfig`.
- **`enterprise/aibridged/mcp.go`**: Deprecated `MCPProxyBuilder`
interface and `MCPProxyFactory` struct.
- **`docs/ai-coder/ai-bridge/mcp.md`**: Added deprecation warning
banner.
OverrideVSCodeConfigs previously unconditionally set
`git.useIntegratedAskPass` and `github.gitAuthentication` to false,
clobbering any values provided by template authors via module settings
(e.g. the vscode-web module's settings block). This change only set
these keys when they are not already present, so template-provided
values are preserved.
Registry PR [#758](https://github.com/coder/registry/pull/758) fixed the
module side (run.sh merges template-author settings into the existing
settings.json instead of overwriting the file). But the agent still
unconditionally stamped false onto both keys before the script ran, so
the merge base always contained the agent's values and template authors
couldn't set them to anything else. This change fixes the agent side by
only writing defaults when the keys are absent.
WaitBuffer is a thread-safe io.Writer that supports blocking until
accumulated output matches a substring or custom predicate. It
replaces ad-hoc safeBuffer/syncWriter types and time.Sleep-based
poll loops in tests with signal-driven waits.
- WaitFor/WaitForNth/WaitForCond for blocking on output
- Replace custom buffer types in cli/sync_test.go and
provisionersdk/agent_test.go
- Convert time.Sleep poll loops to require.Eventually/require.Never
in cli/ssh_test.go, coderd/activitybump_test.go,
coderd/workspaceagentsrpc_test.go, workspaceproxy_test.go, and
scaletest tests
Removes `t.Parallel()` from `TestKeyring` and
`TestWindowsKeyring_WriteReadDelete`. The OS keyring is a shared system
resource that's flaky under concurrent access, especially Windows
Credential Manager in CI.
Fixescoder/internal#1370
- Adds `_API_BASE_URL` to `CODER_EXTERNAL_AUTH_CONFIG_`
- Extracts and refactors existing GitHub PR sync logic to new packages
`coderd/gitsync` and `coderd/externalauth/gitprovider`
- Associated wiring and tests
Created using Opus 4.6
The `start_with_dependencies` golden test was flaky on Windows CI. It
used `time.Sleep(100ms)` in a goroutine hoping the `sync start` command
would have time to call `SyncReady`, find the dependency unsatisfied,
and print the "Waiting..." message before the goroutine completed the
dependency.
On slower Windows runners, the sleep could finish and complete the
dependency before the command's first `SyncReady` call, so `ready` was
already `true` and the "Waiting..." message was never printed, causing
the golden file mismatch.
This replaces the `time.Sleep` with a `syncWriter` that wraps
`bytes.Buffer` with a mutex and a channel. The channel closes when the
written output contains the expected signal string ("Waiting"). The
goroutine blocks on this channel instead of sleeping, so it only
completes the dependency after the command has confirmed it is in the
waiting state.
Fixes https://github.com/coder/internal/issues/1376
_Disclaimer: implemented with Opus 4.6 and Coder Agents._
Follow-up to #22879.
## Problem
The `CODER_SESSION_TOKEN` guard added in #22879 blocks `coder login`
unconditionally when the env var is set. This conflicts with
`--use-token-as-session`, which intentionally uses the provided token
(including from the env var) directly as the session token.
## Fix
Add `&& !useTokenForSession` to the check so that `coder login
--use-token-as-session` still works when `CODER_SESSION_TOKEN` is set.
## Testing
Added `TestLogin/SessionTokenEnvVarWithUseTokenAsSession` — sets the env
var with a valid token and passes `--use-token-as-session`, verifying
login succeeds.
---------
Signed-off-by: Danny Kopping <danny@coder.com>
The `TestGitSSH/Local_SSH_Keys` test was flaking on Windows CI with a
context deadline exceeded error when calling `client.GitSSHKey(ctx)`.
Two issues contributed to the flake:
1. `prepareTestGitSSH` called `coderdtest.AwaitWorkspaceAgents` without
passing the caller's context. This created a separate internal 25s
timeout, wasting time budget independently of the setup context.
Changed to use `NewWorkspaceAgentWaiter(...).WithContext(ctx).Wait()`
so the agent wait shares the caller's timeout.
2. The `Local SSH Keys` subtest used `WaitLong` (25s) for its setup
context, but this subtest does more work than `Dial` (runs the
command twice). Bumped to `WaitSuperLong` (60s) to give slow
Windows CI runners enough time.
Fixescoder/internal#770
Handle errors that were previously assigned to blank identifiers in the
`cli/` package.
- ssh.go: Log ExistsViaCoderConnect DNS lookup error at debug level
instead of silently discarding it. Fallthrough behavior preserved.
- exp_scaletest_llmmock.go: Log srv.Stop() error via the existing
logger instead of discarding it.
_Disclaimer: created with Opus 4.6 and Coder Agents._
## Problem
When `CODER_SESSION_TOKEN` is set as an environment variable with an
invalid value, `coder login` fails with a confusing error:
```
error: Trace=[create api key: ]
You are signed out or your session has expired. Please sign in again to continue.
Suggestion: Try logging in using 'coder login'.
```
The suggestion to run `coder login` is what the user just did, making it
circular and unhelpful.
## Root cause
The `--token` flag is mapped to `CODER_SESSION_TOKEN` via serpent. When
the env var is set, `coder login` picks it up as the session token and
tries to use it to create a new API key, which fails because the token
is invalid. Even if login were to succeed and write a new token to disk,
subsequent commands would still use the env var (which takes precedence
over the on-disk token), so the user would remain stuck.
## Fix
Before attempting login, check if `CODER_SESSION_TOKEN` is set in the
environment. If so, return a clear error telling the user to unset it:
```
the environment variable CODER_SESSION_TOKEN is set, which takes precedence
over the session token stored on disk. Please unset it and try again.
unset CODER_SESSION_TOKEN
```
## Testing
Added `TestLogin/SessionTokenEnvVar` that verifies the error is returned
when the env var is set.
Previously `coder login token` didn't load the server URL from config,
so it always required --url or CODER_URL when using the keyring to store
the session token. This command would only print out the token when
already logged in to a deployment and file storage is used to store the
session token (keyring is the default on Windows/macOS). It would also
print out an incorrect token when --url was specified and the session
token stored on disk was for a different deployment that the user logged
into.
This change fixes all of these issues, and also errors out when using
session token file storage with a `--url` argument that doesn't match
the stored config URL, since the file only stores one token and would
silently return the wrong one.
See https://github.com/coder/coder/issues/22733 for a table of the
before/after behaviors.
The `--parameter-default` value is now used to pre-select the default option for a coder parameter
with option blocks when prompting interactively in CLI.
Related to: https://github.com/coder/coder/issues/22078
## Problem
When `coder ssh --stdio` checks for Coder Connect availability, it
constructs a hostname like `agent.workspace.owner.coder` and performs a
DNS AAAA lookup via `ExistsViaCoderConnect`. Without a trailing dot,
this hostname is not a fully-qualified domain name (FQDN), so the system
DNS resolver appends each configured search domain before querying.
Go's pure-Go DNS resolver (used when `CGO_ENABLED=0`, which is the
default for CLI builds) does **not** stop after getting NXDOMAIN on the
first name. It tries all names in the search list sequentially:
1. `agent.workspace.owner.coder.` → NXDOMAIN (fast)
2. `agent.workspace.owner.coder.corp.example.com.` → timeout
3. `agent.workspace.owner.coder.internal.company.com.` → timeout
On corporate networks where the search-domain-expanded queries hit DNS
infrastructure that drops rather than responds (common for nonsensical
hostnames with deep subdomain chains), each expanded query hits the full
DNS timeout (default 5s × 2 attempts = 10s per name). With 2-3 search
domains, this compounds to 20-30+ seconds of blocking.
## Fix
Adding a trailing dot marks the hostname as an FQDN. Go's `nameList()`
in `src/net/dnsclient_unix.go` returns a single-entry list for rooted
names, completely bypassing search domain expansion.
This is consistent with how `IsCoderConnectRunning` already handles its
DNS check — `tailnet.IsCoderConnectEnabledFmtString` includes a trailing
dot for exactly this reason.
## Verification
Tested with a fake DNS server that responds with NXDOMAIN for `.coder`
queries but drops search-domain-expanded queries:
| Hostname | Time | Queries sent |
|---|---|---|
| `main.workstation.kevin.coder` (no trailing dot) | **~15s** | 4 (as-is
+ 3 search domains) |
| `main.workstation.kevin.coder.` (trailing dot) | **<1ms** | 1 (FQDN
only) |
Closes https://github.com/coder/coder/issues/22581
_Generated by [mux](https://github.com/coder/mux) but reviewed by a
human_
## Description
Adds optional TLS support for the AI Bridge Proxy listener. When TLS cert and key files are provided, the proxy serves over HTTPS instead of plain HTTP.
## Changes
* New configuration options to enable TLS on the proxy listener
* Wraps the TCP listener in `tls.NewListener` when configured
* Tests for validation errors, invalid files, and full integration (tunneled + MITM) through a TLS listener
Note: Documentation for TLS listener setup and client configuration will be handled in a follow-up PR.
Related to: https://github.com/coder/internal/issues/1335
## Description
Renames internal fields, variables, and comments related to the proxy's certificate/key configuration to explicitly reference their MITM CA purpose.
The AI Bridge Proxy uses a CA certificate to sign dynamically generated leaf certificates during MITM interception of HTTPS traffic from AI clients. With the upcoming introduction of TLS listener certificates (for serving the proxy itself over HTTPS, implemented upstack https://github.com/coder/coder/pull/22411), the previous generic naming would become ambiguous. This refactor makes it clear which certificate is which.
No user-facing flags, environment variables, YAML keys, or JSON fields were changed, this is purely an internal rename to avoid confusion going forward.
Related to https://github.com/coder/internal/issues/1335
## Summary
Fixes cross-replica chat relay failing with:
```
failed to open initial relay for chat stream
error= dial relay stream: - failed to WebSocket dial: expected handshake response status code 101 but got 200
failed to open relay for message parts
error= dial relay stream: - failed to WebSocket dial: expected handshake response status code 101 but got 200
```
Subscribers see accurate `status=running` (delivered via pubsub) but
miss all in-progress `message_part` events (delivered only via the relay
WebSocket that never connects).
## Root cause
`redirectToAccessURL` in `cli/server.go` redirects any request whose
`Host` header doesn't match the access URL. The enterprise chat relay
dials another replica directly via its DERP relay address (e.g.
`http://10.0.0.2:8080`), so the `Host` header is the pod IP — not the
access URL.
This triggers a **307 redirect** to the access URL. The WebSocket
library follows the redirect, but the second request is a plain GET —
`Connection: Upgrade` and `Upgrade: websocket` headers are **not carried
over** by HTTP redirect semantics. The load-balanced access URL routes
the plain GET to any replica, which serves the SPA catch-all handler and
returns **HTTP 200 with `index.html`**.
The WebSocket library then fails: `expected handshake response status
code 101 but got 200`.
DERP mesh already has an exemption for this exact scenario
(`isDERPPath`). Chat relay was added later and didn't get one.
## Fix
Bypass `redirectToAccessURL` for requests that carry the
`X-Coder-Relay-Source-Replica` header, which the enterprise relay
already sets on every request (`enterprise/coderd/chatd/chatd.go:573`).
## Sequence diagram
**Before (broken):**
```
Replica A (subscriber) Replica B (worker) Load Balancer
| | |
|--- WS dial pod-ip:8080 ----->| |
| |-- 307 redirect to LB --->|
| | |
|<----------- plain GET (no Upgrade headers) ------------->|
| | |-- routes to any replica
|<----------- 200 index.html -------------------------------|
| |
X 'expected 101 but got 200' |
```
**After (fixed):**
```
Replica A (subscriber) Replica B (worker)
| |
|--- WS dial pod-ip:8080 ----->|
| (X-Coder-Relay-Source- |
| Replica header set) |
| |-- bypass redirect
|<--------- 101 Upgrade ------|
|<==== message_part events ====|
```
relates to #21335
Modifies our local MCP server used in Tasks to push task status updates over the agentsocket, rather than directly dialing Coderd. This will significantly reduce pressure on the database at scale because we can avoid expensive authentication of the agent API key.
Disclosure: I used AI to generate a lot of this PR, but hand-reviewed and tweaked it.
relates to #21335
Enables the agent socket by default and updates docs to strike references to having to enable it.
The PRs in this stack change the MCP server that Tasks use to update their status to rely on the agent socket, rather than directly dialing Coderd with the agent token.
Default disable was a reasonable default when it was only used for the experimental script ordering features, but now that we want to use it for Tasks, it should be default on.
Replace manual experiment checks in web-push handlers with the
`RequireExperimentWithDevBypass` middleware on the route group, matching
the pattern used by OAuth2, Agents, and MCP experiments.
## Changes
- **`coderd/coderd.go`**: Add `RequireExperimentWithDevBypass`
middleware to `/webpush` route group
- **`coderd/webpush.go`**: Remove inline
`api.Experiments.Enabled(codersdk.ExperimentWebPush)` checks from all
three handlers
- **`cli/server.go`**: Gate webpush dispatcher initialization with
`buildinfo.IsDev()` fallback so dev builds always init the real
dispatcher
- **`coderd/webpush_test.go`**: Remove experiment enablement from tests
(dev bypass handles it)
Net effect: -26 lines removed, +5 added.
Created using whatchamacallits (Opus 4.6 Max)
## Problem
When the git askpass flow triggered diff status refreshes, it updated
**every chat** connected to the workspace. This was wasteful and could
cause confusing status updates on unrelated chats.
## Solution
Thread the chat ID through the entire git askpass flow so only the chat
that initiated the git operation gets updated:
1. **`coderd/chatd/chattool/execute.go`** — Sets `CODER_CHAT_ID` env var
on spawned processes (alongside the existing `CODER_CHAT_AGENT`)
2. **`cli/gitaskpass.go`** — Reads `CODER_CHAT_ID` from the environment
and sends it as a `chat_id` query parameter in the `ExternalAuthRequest`
3. **`codersdk/agentsdk/agentsdk.go`** — Adds `ChatID` field to
`ExternalAuthRequest` and encodes it as a query param
4. **`coderd/workspaceagents.go`** — Parses `chat_id` query param and
passes it through to `storeChatGitRef` and
`triggerWorkspaceChatDiffStatusRefresh`
5. **`coderd/chats.go`** — `storeChatGitRef` and
`refreshWorkspaceChatDiffStatuses` now scope updates to just the
initiating chat when a chat ID is provided, falling back to
all-workspace-chats behavior for backwards compatibility (non-chat git
operations)
Fixes three bugs that caused `coder update` to always re-prompt for
multi-select (`list(string)`) parameters instead of reusing previous
build values:
1. **`isValidTemplateParameterOption` failed for multi-select values**
(`cli/parameterresolver.go`): It compared the entire JSON array string
(e.g. `["vim","emacs"]`) against individual option values, which never
matched. Now parses the JSON array and validates each element
separately.
2. **`RichParameter` ignored previous build value for multi-select**
(`cli/cliui/parameter.go`): The `list(string)` branch always used the
template's default value instead of the `defaultValue` argument (which
carries the previous build's value). Now uses `defaultValue` when
available, falling back to the template default.
3. **Pre-existing crash when `list(string)` has no default value**
(`cli/cliui/parameter.go`): `json.Unmarshal` on an empty string caused
`unexpected end of JSON input`. Now skips unmarshaling when the default
source is empty.
Fixes#19956
Fixes#22030
## Problem
When a template has `require_active_version = true` and a workspace is
outdated, the web UI always shows "Update and start" as the **only**
button (for all users including admins), but `coder start` starts with
the old version. For admins, this silently succeeds on the stale
version. For non-admins, it goes through a clunky 403→retry path. This
also affects the VS Code extension, which calls `coder start --yes`
under the hood.
## Root Cause
`buildWorkspaceStartRequest()` in `cli/start.go` checks
`workspace.AutomaticUpdates == "always"` but ignores
`workspace.TemplateRequireActiveVersion`. The server-side autostart
already ORs both settings together:
```go
// coderd/autobuild/lifecycle_executor.go
func useActiveVersion(opts, ws) bool {
return opts.RequireActiveVersion || ws.AutomaticUpdates == "always"
}
```
The CLI was missing the `RequireActiveVersion` check.
## Fix
Add `workspace.TemplateRequireActiveVersion` to the existing OR
condition:
```go
// Before:
if workspace.AutomaticUpdates == codersdk.AutomaticUpdatesAlways || action == WorkspaceUpdate {
// After:
if workspace.AutomaticUpdates == codersdk.AutomaticUpdatesAlways || workspace.TemplateRequireActiveVersion || action == WorkspaceUpdate {
```
Now `coder start` and `coder restart` proactively use the active
template version when `require_active_version` is set, matching the web
UI and server autostart behavior. The 403→retry fallback remains as a
safety net but is no longer the primary path for any user.
## Testing
Updated `enterprise/cli/start_test.go` — all user types (owner, template
admin, ACL admin, group ACL admin, member) now expect the active version
when `require_active_version` is set, and verify the 403→retry message
does NOT appear.
When AgentAPI is configured, `WithTaskReporter` unconditionally
overrides all self-reported states to `working`. The intent was to
distrust the agent's `idle` and rely on the screen watcher, but the
override also blocks `failure` and `complete`, which only the agent can
produce (the screen watcher only knows `running`/`stable`). Tasks get
stuck as `working` or `null` forever.
Now only `idle` is overridden to `working`; `failure`, `complete`, and
`working` pass through as-is.
Also:
- Remove misplaced unconditional `"Failed to watch screen events"` log
that fired on every startup
- Add SSE reconnection with exponential backoff (1s-30s) in
`startWatcher` so it recovers from dropped connections instead of dying
silently
- Add `complete` to the `coder_report_task` tool enum, which the
`coder/claude-code` registry module already instructs agents to use but
was missing from the schema
Refs coder/internal#1350
## Summary
Moves expired token filtering from client-side to server-side by adding
an `include_expired` parameter to the `GetAPIKeysByLoginType` and
`GetAPIKeysByUserID` database queries. This is more efficient for large
deployments with many expired/short-lived tokens.
## Changes
- Add `include_expired` parameter to SQL queries using `OR`
short-circuit
- Add `include_expired` query parameter to `GET
/users/{user}/keys/tokens`
- Add `IncludeExpired` field to `codersdk.TokensFilter`
- Remove client-side filtering from CLI `tokens list` command
- Add `TestTokensFilterExpired` test
Fixescoder/internal#1357
## Problem
When a template adds a new immutable parameter, `coder update
--parameter param=value` fails with:
```
error: start workspace: parameter "machine_type" is immutable and cannot be updated
```
The interactive prompt handles this correctly (allows setting first-time
immutable params), but the CLI `--parameter` flag path does not.
## Root Cause
In `cli/parameterresolver.go`, `verifyConstraints()` runs before the
interactive prompt and unconditionally rejects any immutable parameter
during updates. It doesn't distinguish between **new** immutable
parameters (first-time use, should be allowed) and **existing** ones
(already set, should be blocked from changing).
## Fix
Added an `isFirstTimeUse` check to the immutable parameter constraint,
matching the logic already used by the interactive prompt path (line
323). New immutable parameters can now be set via `--parameter`, while
existing immutable parameters are still blocked from being changed.
## Testing
Added `TestUpdateValidateRichParameters/NewImmutableParameterViaFlag`
which:
1. Creates a workspace with a mutable parameter
2. Updates the template to add a new immutable parameter
3. Runs `coder update --parameter immutable_param=value`
4. Verifies the update succeeds and the parameter is set correctly
Fixes#22164