## Summary
Adds `--ai-gateway-allow-byok` deployment option to control whether
users can use Bring Your Own Key (BYOK) mode with AI Gateway.
When disabled (`--ai-gateway-allow-byok=false`), BYOK requests are
rejected with a 403 and a message directing the admin to enable the
flag. Centralized key authentication works regardless of this setting.
Defaults to `true` (BYOK allowed).
---------
Co-authored-by: Danny Kopping <danny@coder.com>
_Disclaimer: produced by Claude Opus 4.6_
Adds a `coder_build_info` metric which allows operators to see which
versions of Coder are currently running.
---------
Signed-off-by: Danny Kopping <danny@coder.com>
_Disclaimer: produced mostly by Claude Opus 4.6 following detailed
planning._
## Summary
- Support multiple instances of the same AI Bridge provider type via
indexed env vars (`CODER_AIBRIDGE_PROVIDER_<N>_<KEY>`), following the
`CODER_EXTERNAL_AUTH_<N>_<KEY>` pattern
- Existing single-provider env vars (`CODER_AIBRIDGE_OPENAI_KEY`, etc.)
continue to work unchanged
- Setting both a legacy env var and an indexed provider with the same
name errors at startup to prevent silent misconfiguration
- Mark legacy provider fields (`OpenAI`, `Anthropic`, `Bedrock`) as
deprecated in `AIBridgeConfig` in favor of `Providers`
## Example
```sh
CODER_AIBRIDGE_PROVIDER_0_TYPE=anthropic
CODER_AIBRIDGE_PROVIDER_0_NAME=anthropic-corp
CODER_AIBRIDGE_PROVIDER_0_KEY=sk-ant-corp-xxx
CODER_AIBRIDGE_PROVIDER_0_BASE_URL=https://llm-proxy.internal.example.com/anthropic
CODER_AIBRIDGE_PROVIDER_1_TYPE=anthropic
CODER_AIBRIDGE_PROVIDER_1_NAME=anthropic-direct
CODER_AIBRIDGE_PROVIDER_1_KEY=sk-ant-direct-yyy
```
Each instance is routed by name:
- /api/v2/aibridge/**anthropic-corp**/v1/messages
- /api/v2/aibridge/**anthropic-direct**/v1/messages
Closes
[AIGOV-157](https://linear.app/codercom/issue/AIGOV-157/spike-to-understand-if-there-is-a-simple-way-to-handle-multi-api-key)
---------
Signed-off-by: Danny Kopping <danny@coder.com>
* Removes experiment `web-push`.
* Falls back to NoopWebpusher in case of error
* Checks browser capability in FE
* Adds note to agents getting-started docs regarding webpush without TLS
> 🤖
Closes#16332
Previously `coder provisioner jobs list` showed no indication of what a workspace
build job was doing (i.e., start, stop, or delete). This adds
`workspace_build_transition` to the provisioner job metadata, exposed in
both the REST API and CLI. Template and workspace name columns were also
added, both available via `-c`.
```
$ coder provisioner jobs list -c id,type,status,"workspace build transition"
ID TYPE STATUS WORKSPACE BUILD TRANSITION
95f35545-a59f-4900-813d-80b8c8fd7a33 template_version_import succeeded
0a903bbe-cef5-4e72-9e62-f7e7b4dfbb7a workspace_build succeeded start
```
Reorder error checks in isRetryableError so IsConnectionError is evaluated before context.DeadlineExceeded. Dial timeouts (*net.OpError wrapping DeadlineExceeded) were incorrectly treated as non-retryable, causing Coder Connect to fail immediately on broken tunnels with valid DNS despite existing retry logic.
Fixes#24201
Adds `coder exp chat context add` and `coder exp chat context clear`
commands that run inside a workspace to manage chat context files via
the agent token.
`add` reads instruction and skill files from a directory (defaulting to
cwd) and inserts them as context-file messages into an active chat.
Multiple calls are additive — `instructionFromContextFiles` already
accumulates all context-file parts across messages.
`clear` soft-deletes all context-file messages, causing
`contextFileAgentID()` to return `!found` on the next turn, which
triggers `needsInstructionPersist=true` and re-fetches defaults from the
agent.
Both commands auto-detect the target chat via `CODER_CHAT_ID` (already
set by `agentproc` on chat-spawned processes), or fall back to
single-active-chat resolution for the agent. The `--chat` flag overrides
both.
Also adds sub-agent context inheritance: `createChildSubagentChat` now
copies parent context-file messages to child chats at spawn time, so
delegated sub-agents share the same instruction context without
independently re-fetching from the workspace agent.
<details><summary>Implementation details</summary>
**New files:**
- `cli/exp_chat.go` — CLI command tree under `coder exp chat context`
**Modified files:**
- `agent/agentcontextconfig/api.go` — `ConfigFromDir()` reads context
from an arbitrary directory without env vars
- `codersdk/agentsdk/agentsdk.go` — `AddChatContext`/`ClearChatContext`
SDK methods
- `coderd/workspaceagents.go` — POST/DELETE handlers on
`/workspaceagents/me/chat-context`
- `coderd/coderd.go` — Route registration
- `coderd/database/queries/chats.sql` — `GetActiveChatsByAgentID`,
`SoftDeleteContextFileMessages`
- `coderd/database/dbauthz/dbauthz.go` — RBAC implementations for new
queries
- `coderd/x/chatd/subagent.go` — `copyParentContextFiles` for sub-agent
inheritance
- `cli/root.go` — Register `chatCommand()` in `AGPLExperimental()`
**Auth pattern:** Uses `AgentAuth` (same as `coder external-auth`) —
agent token via `CODER_AGENT_TOKEN` + `CODER_AGENT_URL` env vars.
</details>
> 🤖 Generated by Coder Agents
---------
Co-authored-by: Michael Suchacz <203725896+ibetitsmike@users.noreply.github.com>
The agent SSH server unconditionally allows all four SSH forwarding
paths (TCP local, TCP reverse, Unix local, Unix reverse). This is a
sandbox escape vector when workspaces are used for AI agent containment
— a reverse tunnel lets anything inside the workspace reach the user's
local machine, bypassing network isolation.
This adds two new agent CLI flags / environment variables:
- `--block-reverse-port-forwarding` /
`CODER_AGENT_BLOCK_REVERSE_PORT_FORWARDING` — blocks both TCP (`ssh -R`)
and Unix socket reverse forwarding
- `--block-local-port-forwarding` /
`CODER_AGENT_BLOCK_LOCAL_PORT_FORWARDING` — blocks both TCP (`ssh -L`)
and Unix socket local forwarding
Template admins can set these via the `env` block on the container/VM
resource that runs the agent (e.g. `docker_container`,
`kubernetes_pod`), or via `coder_env` resources tied to the agent.
Fixes https://github.com/coder/coder/issues/22275
<details>
<summary>Implementation notes</summary>
Follows the existing `BlockFileTransfer` pattern:
1. `agent/agentssh/agentssh.go` — New `BlockReversePortForwarding` and
`BlockLocalPortForwarding` fields on `Config`. TCP callbacks check these
before allowing forwarding. The `direct-streamlocal@openssh.com` channel
handler is wrapped to reject Unix local forwards.
2. `agent/agentssh/forward.go` — `forwardedUnixHandler` gains a
`blockReversePortForwarding` field to reject
`streamlocal-forward@openssh.com` requests.
3. `agent/agent.go` — New fields on `Options` and `agent` struct,
plumbed to SSH config.
4. `cli/agent.go` — New serpent flags with env vars.
5. Tests cover all four blocked paths: TCP local, TCP reverse, Unix
local, Unix reverse.
</details>
> 🤖 Generated by Coder Agents
The default `net.Dialer` in the Coder Connect path had no timeout,
falling back to the OS TCP timeout when the tunnel was broken but DNS
still resolved. Add a 5s dial timeout and 30s TCP keepalive.
Fixes#24006
When `coder ssh` connects to a workspace after laptop wake, DNS or the
control plane may be briefly unavailable. Previously this caused an
immediate failure, which VS Code Remote SSH classified as permanent
("Reload Window").
Wrap each network step (workspace resolution, template version fetch,
agent connection info, Coder Connect dial, tailnet dial) with
`retryWithInterval` so transient errors (DNS, connection refused, 5xx)
are retried individually. Non-retryable errors (auth, 404) and context
cancellation stop immediately. Data transfer is never retried.
## Problem
When a prebuilt workspace is claimed, the agent reinitializes via a
single fire-and-forget pubsub event over SSE. If the agent's SSE
connection is interrupted at claim time, the event is permanently lost —
the workspace is stuck with no self-healing path.
Additionally, regular (non-prebuild) workspaces had no way to opt out of
the `/reinit` polling loop — agents would reconnect indefinitely to an
endpoint that would never send them anything useful.
## Root Cause
`workspaceAgentReinit` fetches the workspace (with its current
`owner_id`) via `GetWorkspaceByAgentID`, but never checked whether a
claim already happened. It only subscribed to pubsub for future events.
The database already has durable claim state (`owner_id` changes from
`PrebuildsSystemUserID` to the real user), but no layer ever consulted
it on reconnection.
## Solution
### Server-side durable check with first-build-initiator gating
**TOCTOU-safe ordering**: Subscribe to pubsub claim events *before* any
durable checks, so a claim that fires during the check is buffered in
the channel rather than lost.
**First-build-initiator gating**: When `!workspace.IsPrebuild()` (owner
is no longer the system user), look up the first build's `InitiatorID`.
The prebuild reconciler always uses `PrebuildsSystemUserID` as the
initiator. This distinguishes claimed prebuilds from regular workspaces
without any SQL schema changes.
- **Regular workspace** (first build initiator ≠ system user) → **409
Conflict**, agent stops reconnecting
- **Claimed prebuild, build completed** → pre-seed channel with reinit
event and close it, transmitter delivers one-shot then exits
- **Claimed prebuild, build in-progress** → fall through to pubsub
subscription, agent waits for completion event
- **Unclaimed prebuild** → pubsub subscription (existing happy path)
### Declarative reinit events (defense-in-depth)
- Added `UserID` field to `ReinitializationEvent` with JSON tags
- Switched pubsub serialization from raw string to JSON (with
backward-compat fallback for rolling upgrades)
- Populated `UserID` at both the publish site and the durable check
### Agent SDK: 409 handling
`WaitForReinitLoop` detects 409 Conflict from the server and closes the
`reinitEvents` channel, cleanly exiting the retry goroutine.
### Agent CLI: fixed two bugs + added reinitCtx
- **Closed channel (`!ok`)**: now blocks on `<-ctx.Done()` instead of
`continue`, keeping the current agent running. Previously this would
leak agents by skipping `agnt.Close()` and re-entering the loop.
- **Duplicate owner reinit**: cancels `reinitCtx` (stops the reinit
goroutine), then blocks on `<-ctx.Done()`. Previously `continue` would
skip cleanup and create a new agent on the next loop iteration.
- **`reinitCtx`**: a cancellable child of `ctx` passed to
`WaitForReinitLoop`, allowing the agent to stop the reinit HTTP polling
after reinit completes.
### Agent-side idempotency
Tracks `lastOwnerID` in the agent reinit loop — duplicate events for the
same owner are skipped.
## Testing
- **"unclaimed prebuild receives reinit via pubsub"**: prebuild owned by
system user, pubsub event triggers reinit
- **"claimed prebuild receives one-shot reinit on reconnect"**: first
build by system user, owner changed, build completed → immediate reinit
(no pubsub needed)
- **"claimed prebuild waits during in-progress claim build"**: claimed
but build still running → no reinit until build completes
- **"regular workspace gets 409"**: first build by real user → 409
Conflict, agent stops polling
- Updated claim publisher/listener tests: verify `UserID` survives JSON
round-trip + backward compat with raw string payloads
- Updated SSE round-trip test: verify `UserID` survives transmit →
receive cycle
Fixes#22359
## Rolling upgrade note
During a rolling deploy where old coderd instances coexist with new
ones, the pubsub `ReinitializationEvent` has a new `workspace_id` field
(JSON key `workspace_id`). Old publishers send a raw reason string
instead of JSON; the new listener gracefully falls back by treating the
entire payload as the reason and filling in `WorkspaceID` from context.
The only visible effect during the upgrade window is that `WorkspaceID`
may be the zero UUID in agent-side logs — this is cosmetic and resolves
once all instances are updated.
Previously the command required exactly two arguments, forcing users to
run it multiple times to declare multiple dependencies for a single
unit.
This accepts variadic depends-on arguments so all dependencies can be
declared in one call:
```
coder exp sync want my-unit dep-1 dep-2 dep-3
```
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Marcin Tojek <mtojek@users.noreply.github.com>
## Description
Adds support for multiple Copilot provider instances to route requests to different Copilot upstreams (individual, business, enterprise). Each instance has its own name and base URL, enabling per-upstream metrics, logs, circuit breakers, API dump, and routing.
## Changes
* Add Copilot business and enterprise provider names and host constants
* Register three Copilot provider instances in aibridged (default, business, enterprise)
* Update `defaultAIBridgeProvider` in `aibridgeproxy` to route new Copilot hosts to their corresponding providers
## Related
* Depends on: https://github.com/coder/aibridge/pull/240
* Closes: https://github.com/coder/aibridge/issues/152
Note: documentation changes will be added in a follow-up PR.
_Disclaimer: initially produced by Claude Opus 4.6, heavily modified and reviewed by @ssncferreira ._
## Changes
- **Commit 1**: Remove 17 unnecessary `//nolint` directives:
- `//nolint:varnamelen` — linter not active
- `//nolint:unused` on exported `SlimUnsupported`
- `//nolint:govet` in `coderd/httpmw/csrf` — no longer fires
- `//nolint:revive` on functions refactored since the nolint was added
- `//nolint:paralleltest` citing Go 1.22 loop variable capture
(obsolete)
- Bare `//nolint` narrowed to specific `//nolint:gocritic` with
justification
- **Commit 2**: Fix root causes behind 5 dangerous nolint suppressions:
- Add `MinVersion: tls.VersionTLS12` to TLS client config (removes
`gosec` G402)
- Delete trivial unexported wrappers `apiKey()`/`normalizeProvider()` in
chatprovider (removes `revive` confusing-naming)
- Add doc comments to `StartWithAssert` and `Router` (removes `revive`
exported)
- Rename unused parameters to `_` in integration test helpers
> 🤖 This PR was created using Coder Agents and reviewed by me.
## Summary
Adds an entitlement-gated **AI add-on** column to both the **Users**
table and the **Organization Members** table. When
`ai_governance_user_limit` is entitled, each row shows whether the user
is consuming an AI seat.
## Background
The AI governance add-on tracks which users are consuming AI seats.
Admins need visibility into per-user seat consumption directly from the
user management tables. This change surfaces that information through
both the site-wide Users table and the per-organization Members table,
gated behind the `ai_governance_user_limit` entitlement so the column
only appears when the feature is licensed.
## Implementation
### Backend
- **New SQL query** `GetUserAISeatStates`
(`coderd/database/queries/aiseatstate.sql`) — returns user IDs consuming
an AI seat, derived from:
- Users with entries in `aibridge_interceptions` (AI Bridge usage)
- Users who own workspaces with `has_ai_task = true` builds (AI Tasks
usage)
- **SDK types** — added `has_ai_seat: boolean` to `codersdk.User` and
`codersdk.OrganizationMemberWithUserData`
- **Handler wiring** — both the Users list endpoint (`coderd/users.go`)
and all Members endpoints (`coderd/members.go`) query AI seat state per
page of user IDs and populate the response field
- **dbauthz** — per-user `ActionRead` checks on `ResourceUserObject`
### Frontend
- **Shared `AISeatCell` component**
(`site/src/modules/users/AISeatCell.tsx`) — green `CircleCheck` for
consuming, gray `X` for non-consuming
- **`TableColumnHelpTooltip`** — extended with `ai_addon` variant with
tooltip: *"Users with access to AI features like AI Bridge, Boundary, or
Tasks who are actively consuming a seat."*
- **Column visibility** gated behind
`useFeatureVisibility().ai_governance_user_limit`
## Validation
- Backend: dbauthz full method suite (`TestMethodTestSuite`) passes
including new `GetUserAISeatStates` test
- Backend: `TestGetUsers`, `TestUsersFilter`, CLI golden file tests pass
- Frontend: 7/7 tests pass across `UsersPage.test.tsx` and
`OrganizationMembersPage.test.tsx` (column visibility gating both
directions)
- `go build ./coderd/...` compiles clean
- `pnpm --dir site run lint:types` passes
- `make gen` clean
## Risks
- **Pagination performance**: The AI seat query is scoped to the current
page's user IDs (not a full table scan), keeping it efficient for
paginated views.
- **Semantic scope**: The workspace-side AI seat derivation uses "any
build with `has_ai_task = true`" rather than "latest build only". If the
product intent is latest-build-only, this can be tightened in a
follow-up.
---
_Generated with `mux` • Model: `anthropic:claude-opus-4-6` • Thinking:
`xhigh` • Cost: `$27.25`_
<!-- mux-attribution: model=anthropic:claude-opus-4-6 thinking=xhigh
costs=27.25 -->
Consolidates coderdtest invocations in 7 tests to reduce 23 instances to 7 across:
- `TestGetUser` (3 → 1) — read-only user lookups
- `TestUserTerminalFont` (3 → 1) — each creates own user via
CreateAnotherUser
- `TestUserTaskNotificationAlertDismissed` (3 → 1) — each creates own
user
- `TestUserLogin` (3 → 1) — each creates/deletes own user
- `TestExpMcpConfigureClaudeCode` (5 → 1) — writes to isolated temp dirs
- `TestOAuth2RegistrationTokenSecurity` (3 → 1) — independent
registrations
- `TestOAuth2SpecificErrorScenarios` (3 → 1) — independent error
scenarios
> 🤖 This PR was created with the help of Coder Agents, and has been
reviewed by my human. 🧑💻
The slices package provides type-safe generic replacements for the
old typed sort convenience functions. The codebase already uses
slices.Sort in 43 call sites; this finishes the migration for the
remaining 29.
- sort.Strings(x) -> slices.Sort(x)
- sort.Float64s(x) -> slices.Sort(x)
- sort.StringsAreSorted(x) -> slices.IsSorted(x)
Continuation of https://github.com/coder/coder/pull/23067
Add filtering to the paginated org member endpoint (pretty much the same
as what I did in the previous PR with group members, except there I also
had to add pagination since it was missing).
## Description
Blocks `CONNECT` tunnels to private and reserved IP ranges in
aibridgeproxyd, preventing the proxy from being used to reach internal
networks.
The Coder access URL is always exempt (hostname+port match) so the proxy
can reach its own deployment. It is possible to exempt additional ranges
via `CODER_AIBRIDGE_PROXY_ALLOWED_PRIVATE_CIDRS`.
DNS rebinding is handled differently per path:
* Direct (no upstream proxy): validate the resolved IP right before the
TCP dial, no window between check and connect.
* Upstream proxy: Resolves and checks before forwarding to the upstream
dialer. A small rebinding window exists since the upstream proxy
re-resolves independently.
## Changes
* Add blocked IP denylist covering private, reserved, and
special-purpose ranges
* Add `AllowedPrivateCIDRs` option with CLI flag and env var
* Wire IP checks into `proxy.ConnectDial` for both upstream and direct
paths
* Add tests for blocked/allowed cases across direct dial, upstream
proxy, CIDR exemptions, and CoderAccessURL exemption
Notes: documentation will be handled in a follow-up PR.
Closes: https://github.com/coder/security/issues/124
- Replace real healthcheck with mock `HealthcheckFunc` that returns a
canned report instantly
- Remove healthcheck cache-seeding goroutine/channel workaround
- Remove `HealthcheckTimeout: testutil.WaitSuperLong` (no longer needed)
- Reduce `setupCtx` from `WaitSuperLong` (60s) to `WaitLong` (25s)
The DERP healthcheck performs real network operations (portmapper
gateway probing, STUN) that hang for 60s+ on macOS CI runners. Since
`TestSupportBundle` validates bundle generation, not healthcheck
correctness, a canned report eliminates this entire class of flake.
Fixescoder/internal#272
> 🤖 This PR was created with the help of Coder Agents, and was reviewed
by my human. 🧑💻
Eliminates the timing flake in
`TestInterruptAutoPromotionIgnoresLaterUsageLimitIncrease` by making the
chatd worker loop clock-controllable.
## Changes
**`coderd/chatd/chatd.go`**
- Replace `time.NewTicker` calls in `Server.start()` with
`p.clock.NewTicker` using named quartz tags `("chatd", "acquire")` and
`("chatd", "stale-recovery")`.
**`coderd/chatd/chatd_test.go`**
- Inject `quartz.NewMock(t)` into the test via `newActiveTestServer`
config override.
- Trap the acquire ticker so the test controls exactly when pending
chats are reacquired.
- Rewrite the test flow as explicit clock-advance steps instead of
wall-clock polling.
**`AGENTS.md`**
- Document the PR title scope rule (scope must be a real path containing
all changed files).
## Validation
- `go test ./coderd/chatd -run
TestInterruptAutoPromotionIgnoresLaterUsageLimitIncrease -count=100` ✅
- `go test ./coderd/chatd` ✅
- `make lint` ✅
- Adds a new API endpoint `GET /api/v2/users/oidc-claims` that returns
only the **merged claims** (not the separate id_token/userinfo
breakdown). Scoped exclusively to the authenticated user's own identity
— no user parameter, so users cannot view each other's claims.
- Adds a new CLI command:** `coder users oidc-claims` that hits the
above endpoint.
- The existing owner-only debug endpoint is preserved unchanged for
admins who need the full claim breakdown.
> 🤖 This PR was created with the help of Coder Agents, and will be
reviewed by my human. 🧑💻
## Description
Implements the server-side merge logic for the `merge_strategy`
attribute added to `coder_env` in [terraform-provider-coder
v2.15.0](https://github.com/coder/terraform-provider-coder/pull/489).
This allows template authors to control how duplicate environment
variable names are combined across multiple `coder_env` resources.
Relates to https://github.com/coder/coder/issues/21885
## Supported strategies
| Strategy | Behavior |
|----------|----------|
| `replace` (default) | Last value wins — backward compatible |
| `append` | Joins values with `:` separator (e.g. PATH additions) |
| `prepend` | Prepends value with `:` separator |
| `error` | Fails the build if the variable is already defined |
## Example
```hcl
resource "coder_env" "path_tools" {
agent_id = coder_agent.dev.id
name = "PATH"
value = "/home/coder/tools/bin"
merge_strategy = "append"
}
```
## Changes
- **Proto**: Added `merge_strategy` field to `Env` message in
`provisioner.proto`
- **State reader**: Updated `agentEnvAttributes` struct and proto
construction in `resources.go`
- **Merge logic**: Added `mergeExtraEnvs()` function in
`provisionerdserver.go` with strategy-aware merging for both agent envs
and devcontainer subagent envs
- **Tests**: 15 unit tests covering all strategies, edge cases (empty
values, mixed strategies, multiple appends)
- **Dependency**: Bumped `terraform-provider-coder` v2.14.0 → v2.15.0
- **Fixtures**: Updated `duplicate-env-keys` test fixtures and golden
files
## Ordering
When multiple resources `append` or `prepend` to the same key, they are
processed in alphabetical order by Terraform resource address (per the
determinism fix in #22706).
## Summary
- add a hidden deployment config option for chat acquire batch size
(`CODER_CHAT_ACQUIRE_BATCH_SIZE` / `chat.acquireBatchSize`)
- thread the configured value into chatd startup while preserving the
existing default of `10`
- clamp the deployment value to the `int32` range before passing it into
chatd
- regenerate the API/docs/types/testdata artifacts for the new config
field
## Why
`chatd` currently acquires pending chats in batches of `10` via a
compile-time default. This change makes that batch size
operator-configurable from deployment config, so we can tune acquisition
behavior without another code change.
Fixes AIGOV-141
The `coder support bundle` command previously required admin permissions
(`Read DeploymentConfig`) and would abort entirely for non-admin
`member` users with:
```
failed authorization check: cannot Read DeploymentValues
```
This change makes the command **degrade gracefully** instead of failing
outright.
<details>
<summary>
Changes
</summary>
### `support/support.go`
- **`Run()`**: The authorization check for `Read DeploymentValues` is
now a soft warning instead of a hard gate. Unauthenticated users (401)
still fail, but authenticated users with insufficient permissions
proceed with reduced data.
- **`DeploymentInfo()`**: `DeploymentConfig` and `DebugHealth` fetches
now handle 403/401 responses gracefully, matching the existing pattern
used by `DeploymentStats`, `Entitlements`, and `HealthSettings`.
- **`NetworkInfo()`**: Coordinator debug and tailnet debug fetches now
check response status codes for 403/401 before reading the body.
### `cli/support.go`
- **`summarizeBundle()`**: No longer returns early when `Config` or
`HealthReport` is nil. Instead prints warnings and continues summarizing
available data (e.g., netcheck).
### Tests
- `MissingPrivilege` → `MemberNoWorkspace`: Asserts member users can
generate a bundle successfully with degraded admin-only data.
- `NoPrivilege` → `MemberCanGenerateBundle`: Asserts the CLI produces a
valid zip bundle for member users.
- All existing tests continue to pass (`NoAuth`, `OK`, `OK_NoWorkspace`,
`DontPanic`, etc.).
## Behavior matrix
| User type | Before | After |
|---|---|---|
| **Admin** | Full bundle | Full bundle (no change) |
| **Member** | Hard error | Bundle with degraded admin-only data |
| **Unauthenticated** | Hard error | Hard error (no change) |
Related to PRODUCT-182
## Summary
- add shared MCP annotation metadata to toolsdk tools
- emit MCP tool annotations from both coderd and CLI MCP servers
- cover annotation serialization in toolsdk, coderd MCP e2e, and CLI MCP
tests
## Why
- Coder already exposed MCP tools, but it did not populate MCP tool
annotation hints (`readOnlyHint`, `destructiveHint`, `idempotentHint`,
`openWorldHint`).
- Hosts such as Claude Desktop use those hints to classify and group
tools, so without them Coder tools can get lumped together.
- This change adds a shared annotation source in `toolsdk` and has both
MCP servers emit those hints through `mcp.Tool.Annotations`, avoiding
drift between local and remote MCP implementations.
## Testing
- Tested locally on Cladue Desktop and the tools are categorized
correctly.
<table>
<tr>
<td> Before
<td> After
<tr>
<td> <img width="613" height="183" alt="image"
src="https://github.com/user-attachments/assets/29d2e3fb-53bc-4ea7-bdb3-f10df4ef996b"
/>
<td> <img width="600" height="457" alt="image"
src="https://github.com/user-attachments/assets/cc384036-c9a7-4db9-9400-43ad51920ff5"
/>
</table>
Note: Done using Coder Agents, reviewed and tested by human locally
Introduce a three-way workspace sharing setting (none, everyone,
service_accounts) replacing the boolean workspace_sharing_disabled.
In service_accounts mode, only service account-owned workspaces can be
shared while regular members' share permissions are removed. Adds a
new organization-service-account system role with per-org permissions
reconciled alongside the existing organization-member system role.
Related to:
https://linear.app/codercom/issue/PLAT-28/feat-service-accounts-sharing-mode-and-rbac-role
---------
Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com>
Co-authored-by: Kayla はな <mckayla@hey.com>
Adds a `--no-wait` flag (CODER_CREATE_NO_WAIT) to the create command,
matching the existing pattern in `coder start`. When set, the `coder
create` command returns immediately after the workspace creation API
call succeeds instead of streaming build logs until completion.
This enables fire-and-forget workspace creation in CI/automation
contexts (e.g., GitHub Actions), where waiting for the build to finish
is unnecessary. Combined with other existing flags, users can create a
workspace with no interactivity, assuming the user is already
authenticated.
This PR adds a `WatchAllWorkspaces` function with `watch-all-workspaces`
endpoint, which can be used to listen on a single global pubsub channel
for _all_ workspace build updates, and makes use of it in the autostart
scaletest.
This negates the need to use a workspace watch pubsub channel _per_
workspace, which has auth overhead associated with each call. This is
especially relevant in situations such as the autostart scaletest, where
we need to start/stop a set of workspaces before we can configure their
autostart config. The overhead associated with all the watch requests
skews the scaletest results and makes it harder to reason about the
performance of the autostart feature itself.
The autostart scaletest also no longer generates its own metrics nor
does it wait for all the workspaces to actually start via autostart. We
should update the scaletest dashboard after both PRs are merged to
measure autostart performance via the new metrics.
The new function/endpoint and its usage in the autostart scaletest are
gated behind an experiment feature flag, this is something we should
discuss whether we want to enable the endpoint in prod by default or
not. If so, we can remove the experiment.
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Callum Styan <callum@coder.com>
_Disclaimer: implemented by a Coder Agent using Claude Opus 4.6._
Marks the injected MCP approach in AI Bridge as deprecated across the
codebase.
## Changes
- **`codersdk/deployment.go`**: Deprecated `ExternalAuthConfig.MCPURL`,
`.MCPToolAllowRegex`, `.MCPToolDenyRegex` fields; deprecated and hid the
`--aibridge-inject-coder-mcp-tools` server flag; deprecated
`AIBridgeConfig.InjectCoderMCPTools`.
- **`coderd/externalauth/externalauth.go`**: Deprecated `Config.MCPURL`,
`.MCPToolAllowRegex`, `.MCPToolDenyRegex`.
- **`enterprise/aibridgedserver/aibridgedserver.go`**: Added runtime
deprecation warning when `CODER_AIBRIDGE_INJECT_CODER_MCP_TOOLS` is
enabled; deprecated `getCoderMCPServerConfig`.
- **`enterprise/aibridged/mcp.go`**: Deprecated `MCPProxyBuilder`
interface and `MCPProxyFactory` struct.
- **`docs/ai-coder/ai-bridge/mcp.md`**: Added deprecation warning
banner.
OverrideVSCodeConfigs previously unconditionally set
`git.useIntegratedAskPass` and `github.gitAuthentication` to false,
clobbering any values provided by template authors via module settings
(e.g. the vscode-web module's settings block). This change only set
these keys when they are not already present, so template-provided
values are preserved.
Registry PR [#758](https://github.com/coder/registry/pull/758) fixed the
module side (run.sh merges template-author settings into the existing
settings.json instead of overwriting the file). But the agent still
unconditionally stamped false onto both keys before the script ran, so
the merge base always contained the agent's values and template authors
couldn't set them to anything else. This change fixes the agent side by
only writing defaults when the keys are absent.
WaitBuffer is a thread-safe io.Writer that supports blocking until
accumulated output matches a substring or custom predicate. It
replaces ad-hoc safeBuffer/syncWriter types and time.Sleep-based
poll loops in tests with signal-driven waits.
- WaitFor/WaitForNth/WaitForCond for blocking on output
- Replace custom buffer types in cli/sync_test.go and
provisionersdk/agent_test.go
- Convert time.Sleep poll loops to require.Eventually/require.Never
in cli/ssh_test.go, coderd/activitybump_test.go,
coderd/workspaceagentsrpc_test.go, workspaceproxy_test.go, and
scaletest tests
Removes `t.Parallel()` from `TestKeyring` and
`TestWindowsKeyring_WriteReadDelete`. The OS keyring is a shared system
resource that's flaky under concurrent access, especially Windows
Credential Manager in CI.
Fixescoder/internal#1370
- Adds `_API_BASE_URL` to `CODER_EXTERNAL_AUTH_CONFIG_`
- Extracts and refactors existing GitHub PR sync logic to new packages
`coderd/gitsync` and `coderd/externalauth/gitprovider`
- Associated wiring and tests
Created using Opus 4.6
The `start_with_dependencies` golden test was flaky on Windows CI. It
used `time.Sleep(100ms)` in a goroutine hoping the `sync start` command
would have time to call `SyncReady`, find the dependency unsatisfied,
and print the "Waiting..." message before the goroutine completed the
dependency.
On slower Windows runners, the sleep could finish and complete the
dependency before the command's first `SyncReady` call, so `ready` was
already `true` and the "Waiting..." message was never printed, causing
the golden file mismatch.
This replaces the `time.Sleep` with a `syncWriter` that wraps
`bytes.Buffer` with a mutex and a channel. The channel closes when the
written output contains the expected signal string ("Waiting"). The
goroutine blocks on this channel instead of sleeping, so it only
completes the dependency after the command has confirmed it is in the
waiting state.
Fixes https://github.com/coder/internal/issues/1376
_Disclaimer: implemented with Opus 4.6 and Coder Agents._
Follow-up to #22879.
## Problem
The `CODER_SESSION_TOKEN` guard added in #22879 blocks `coder login`
unconditionally when the env var is set. This conflicts with
`--use-token-as-session`, which intentionally uses the provided token
(including from the env var) directly as the session token.
## Fix
Add `&& !useTokenForSession` to the check so that `coder login
--use-token-as-session` still works when `CODER_SESSION_TOKEN` is set.
## Testing
Added `TestLogin/SessionTokenEnvVarWithUseTokenAsSession` — sets the env
var with a valid token and passes `--use-token-as-session`, verifying
login succeeds.
---------
Signed-off-by: Danny Kopping <danny@coder.com>
The `TestGitSSH/Local_SSH_Keys` test was flaking on Windows CI with a
context deadline exceeded error when calling `client.GitSSHKey(ctx)`.
Two issues contributed to the flake:
1. `prepareTestGitSSH` called `coderdtest.AwaitWorkspaceAgents` without
passing the caller's context. This created a separate internal 25s
timeout, wasting time budget independently of the setup context.
Changed to use `NewWorkspaceAgentWaiter(...).WithContext(ctx).Wait()`
so the agent wait shares the caller's timeout.
2. The `Local SSH Keys` subtest used `WaitLong` (25s) for its setup
context, but this subtest does more work than `Dial` (runs the
command twice). Bumped to `WaitSuperLong` (60s) to give slow
Windows CI runners enough time.
Fixescoder/internal#770