Wire DERPTLSConfig through the CLI, SDK, tailnet, VPN client, agent, and
health checks to allow custom TLS configuration for DERP connections.
The main use case is to be able to set a custom CA and also present
client certs (mTLS). See https://github.com/coder/tailscale/pull/105 for
related changes.
Adds three new global CLI flags:
- `--client-tls-ca-file` / `CODER_CLIENT_TLS_CA_FILE`
- `--client-tls-cert-file` / `CODER_CLIENT_TLS_CERT_FILE`
- `--client-tls-key-file` / `CODER_CLIENT_TLS_KEY_FILE`
Based on community PR #22695 by @ibdafna, with autogeneration issues
fixed (protobuf version mismatches in .pb.go files, golden file
regeneration, lint fixes).
> [!NOTE]
> This PR was authored by Coder Agents on behalf of a Coder team member.
<details>
<summary>Relationship to #22695</summary>
This is a clean reimplementation of the changes from #22695 on top of
current `main`, with the following differences:
- **Removed**: Accidental protobuf version changes in `.pb.go` files
(contributor had `protoc v6.33.4` vs project's `protoc v4.23.4`)
- **Added**: Properly regenerated golden files and docs via `make gen`
- **Fixed**: Lint issue (`var-declaration` revive warning on explicit
type in `createHTTPClient`)
- All meaningful code changes are identical to the original PR
</details>
Adds a coder secret command group for managing user secrets from the
CLI, with create, update, list, and delete subcommands backed by the
existing user secret API.
This branch adds CLI test coverage and refreshes the generated help
output and CLI reference docs for the new command group.
Relates to https://linear.app/codercom/issue/CODAGT-103
- Add `keywords={[option.label]}` to `CommandItem` in
`MultiSelectCombobox` so cmdk's default filter matches against the
visible label text, not just the value (UUID)
- Extend `OpenCombobox` story with type-to-filter assertions
- Add "search filters by display name" step to `TemplateAllowlist` story
> 🤖
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
> Mux working on behalf of Mike.
## Summary
- add an enabled chat model config lookup by ID for internal callers
- keep `spawn_agent` unchanged while threading an internal model
override through child subagent chat creation
- extend chatd coverage for inherited bindings, plan mode, and internal
override behavior
## Validation
- `go test ./coderd/x/chatd ./coderd/database/dbauthz`
- `make lint`
Two `InsertChatParams` blocks in `startworkspace_test.go` were missing
the `ClientType` field. Since the `chat_client_type` enum column is `NOT
NULL`, Postgres rejects the Go zero value (`""`), causing
`TestStartWorkspace` subtests `StoppedWorkspaceReportsAutoUpdate` and
`ManualUpdateRequired` to fail with:
```
pq: invalid input value for enum chat_client_type: ""
```
Closes https://github.com/coder/internal/issues/1471
## Problem
When a template has `require_active_version` enabled and the chat agent
tries to start a workspace that is stopped on an older template version,
the agent gets stuck in an infinite loop: `start_workspace` fails with a
403 (the old version is not the active version and the user lacks
`ActionUpdate` on the template), then `create_workspace` sees the
existing stopped workspace and tells the agent to use `start_workspace`,
repeat forever.
The root cause is that `chatStartWorkspace()` passes the start build
request through without setting `TemplateVersionID`, so `wsbuilder`
defaults to the previous build's template version — which RBAC rejects
when `RequireActiveVersion` is true.
## Fix
In `chatStartWorkspace()` (`coderd/exp_chats.go`), when the template's
access control has `RequireActiveVersion` enabled, explicitly set
`req.TemplateVersionID` to `template.ActiveVersionID` before calling
`postWorkspaceBuildsInternal()`. This mirrors how the autobuild executor
handles the same scenario (`coderd/autobuild/lifecycle_executor.go`).
If the new active version introduces required parameters that cannot be
resolved automatically (no defaults, no previous values), the build
fails at parameter validation before a provisioner job is created. In
that case, a clear error message tells the user to update and start the
workspace from the UI instead of surfacing a raw internal error.
On successful auto-update, the tool response includes
`updated_to_active_version`, `update_reason`, and a human-readable
`message` so the model can explain to the user what happened.
<img width="782" height="122" alt="image"
src="https://github.com/user-attachments/assets/289430d6-066e-41cf-bc97-cd013dcf717d"
/>
### Changes
- **`coderd/exp_chats.go`**: `chatStartWorkspace()` loads the template,
checks `RequireActiveVersion` via `AccessControlStore`, and pins the
build to the active version when required. New
`isChatStartWorkspaceManualUpdateRequiredError()` classifies parameter
validation failures from both the dynamic parameters path
(`DiagnosticError`) and the classic path (`ErrParameterValidation`
sentinel).
- **`coderd/wsbuilder/wsbuilder.go`**: New `ErrParameterValidation`
sentinel error, wrapped into the classic parameter validation
`BuildError` so callers can use `errors.Is` instead of string matching.
- **`coderd/x/chatd/chattool/startworkspace.go`**:
`waitForAgentAndRespond` now returns `map[string]any` instead of
`fantasy.ToolResponse`, letting the caller annotate the result (e.g.
auto-update metadata) before converting. Error handling for `StartFn`
checks for `httperror.Responder` errors to surface clean messages for
the manual-update case.
- **`coderd/x/chatd/chattool/startworkspace_test.go`**: Two new tests —
`StoppedWorkspaceReportsAutoUpdate` (verifies auto-update fields in
response) and `ManualUpdateRequired` (verifies clean error message
without internal wrapping).
### Follow-up
The manual-update error message could include a direct link to the
workspace settings page, but the chattool layer does not currently have
access to the deployment's access URL. Plumbing it through is
straightforward but out of scope for this fix.
Closes CODAGT-192
Replace the old `InTx` ruleguard rule in `scripts/rules.go` with a
custom in-tree `go/analysis` analyzer under `scripts/intxcheck/`. The
new analyzer catches the same direct and pass-through misuse classes as
before, plus two new classes the pattern-matcher couldn't reach:
- **Indirect same-package helper misuse** — flags `p.someHelper(ctx)`
inside `InTx` when the helper body uses the outer store (the PR #24369
bug class).
- **Nested dangerous closures** — descends into `go func() { ... }()`,
`defer func() { ... }()`, and immediately-invoked function literals.
The analyzer uses semantic `types.Object` identity instead of raw
expression string comparison, which avoids false positives from
closure-local shadowing and catches simple aliases like `outer := s.db`
and `alias := s`.
This PR also fixes three real outer-store-inside-transaction bugs the
new analyzer surfaced:
- `coderd/wsbuilder/wsbuilder.go`: `FindMatchingPresetID` and
`getWorkspaceTask` now use the inner transaction store instead of
`b.store`.
- `enterprise/dbcrypt/dbcrypt.go`: `ensureEncrypted` now calls
`s.InsertDBCryptKey` (the tx-wrapped store) instead of
`db.InsertDBCryptKey`. The `dbCrypt.InTx` method wraps the raw tx in a
new `*dbCrypt`, so `s.InsertDBCryptKey` still dispatches through the
encryption layer.
Two call sites need `// intxcheck:ignore` suppressions. Both are one-off
patterns that only look like misuse because the analyzer doesn't track
assignments — proving them safe would require full dataflow analysis,
which is well beyond what a targeted lint like this should attempt:
- `coderd/database/dbfake/dbfake.go` — `b.db` is reassigned to `tx` on
the preceding line, so `b.doInTX()` actually uses the transaction. The
analyzer sees the original `b.db` identity and flags it.
- `coderd/database/db_test.go` — test intentionally passes the outer
store to `require.Equal` to assert that nested `InTx` returns the same
handle.
Suppressions use `// intxcheck:ignore` instead of `//nolint:intxcheck`
because `intxcheck` runs as a standalone `go/analysis` tool outside
golangci-lint. golangci-lint's `nolintlint` checker flags `//nolint`
directives for linters it doesn't control, so we use a custom comment
prefix to avoid that conflict.
Add a `chat_client_type` enum (`ui` | `api`) and `client_type` column to
the `chats` table. The column defaults to `api` for new rows so API
callers don't need to set it explicitly. Existing rows are backfilled to
`ui`.
The field flows through `CreateChatRequest`, `chatd.CreateOptions`,
`InsertChat`, and is returned in the `Chat` response via `db2sdk`.
<details>
<summary>Implementation notes (Coder Agents generated)</summary>
### Changes
**Database migration (000469)**
- New enum `chat_client_type` with values `ui`, `api`.
- New `client_type` column, `NOT NULL DEFAULT 'api'`.
- Backfill: `UPDATE chats SET client_type = 'ui'`.
**SQL query** — `InsertChat` now includes `client_type`.
**SDK** — `ChatClientType` type added; `ClientType` field added to both
`CreateChatRequest` (optional, defaults server-side to `api`) and `Chat`
response.
**Handler** — `postChats` maps the request field (defaulting to `api`)
and passes it through `chatd.CreateOptions`.
**Sub-agent** — Child chats inherit their parent's `client_type`.
**db2sdk** — Maps the database value to the SDK type.
### Decision log
- Default is `api` (not `ui`) so existing API integrations get the
correct value without code changes.
- Backfill sets existing rows to `ui` per requirement.
- Child chats inherit `client_type` from parent rather than defaulting.
</details>
Relates to https://github.com/coder/internal/issues/1455
From that issue:
> Going to skip this test until the underlying race in chatd is fixed.
https://github.com/coder/coder/pull/24279 was a band-aid fix that I no
longer think is valuable pursuing short term. Hugo is working on a RFC
for a redesign of the system to prevent the class of race condition into
the future.
Dependabot's npm updater now ships Node.js v24.14.1 (Active LTS
"Krypton"). The `engines.node` field in `site/package.json` and
`offlinedocs/package.json` restricted to `>=18.0.0 <23.0.0`, causing
`ERR_PNPM_UNSUPPORTED_ENGINE` failures when Dependabot tried to update
packages (e.g. the `axios` security update).
Widens the upper bound to `<25.0.0` so Node.js 24.x is accepted. The
project itself continues to use Node 22 via `flake.nix`.
Reference:
https://github.com/coder/coder/actions/runs/24482279340/job/71549366110
> [!NOTE]
> This PR was authored by Coder Agents.
Agent log tabs could spill over the Copy/Download area, and the overflow
kebab sometimes never wired up because ResizeObserver ran with no node
or while inactive. This ties the hook to when the strip is real and
clips the tab column.
- `AgentRow`: `enabled` for `useKebabMenu` is `hasStartupFeatures &&
hasAnyLogs && showLogs`; `hasAnyLogs` is computed before the hook; tab
list wrapper gets `overflow-hidden`.
- `useKebabMenu`: attach `ResizeObserver` in `useLayoutEffect`; bail if
missing container or `!enabled || !isActive`; deps include `enabled` and
`isActive`.
- `TabsList` (`overflowKebabMenu`): add `min-w-0 w-full max-w-full` with
`flex-nowrap` so the list stays inside the flex column.
This pull-request ensures that we don't rely on MUI for something that
doesn't need to be MUI-specific (hurrah!)
The issues with the accessibility persist however.
> This PR was authored by Mux on behalf of Mike.
## Summary
Adds support for multiple peer root workspace agents sharing the same
`auth_instance_id`, so AWS, Azure, and GCP instance-identity auth can
issue the correct session token for a selected agent instead of assuming
a
single root agent per instance.
## Problem
When a Terraform template attaches two or more `coder_agent` resources
(with `auth = "aws-instance-identity"`) to a single compute instance,
every agent shares the same cloud instance ID. The existing singular
lookup picks whichever agent was created most recently, silently
ignoring
the others.
## Solution
Introduce an optional pre-auth agent selector (`CODER_AGENT_NAME`) and
make the server-side lookup ambiguity-aware.
**Database layer:**
- `GetWorkspaceAgentsByInstanceID` (`:many`): returns all matching root
agents for an instance ID.
- `GetWorkspaceAgentByInstanceIDAndName` (`:one`): returns the named
root
agent for disambiguation.
**SDK and CLI:**
- `agent_name` field added to AWS, Azure, and GCP request structs
(`omitempty` for backward compatibility).
- `CODER_AGENT_NAME` env var and `--agent-name` flag wired into the
agent
bootstrap before instance-identity auth runs.
**Server handler (`handleAuthInstanceID`):**
- When `agent_name` is present: direct lookup by (instance ID, name).
- When absent: legacy lookup, then resource-scoped ambiguity check.
Returns 409 with available agent names if multiple root agents match.
- Whitespace-only names are trimmed and treated as unspecified.
- Sub-agents remain excluded (`parent_id IS NULL` filter).
**Verification template:**
- `examples/templates/aws-multi-agent/` provisions one EC2 instance with
two agents (`main` and `dev`), both using instance-identity auth with
`CODER_AGENT_NAME` set in the cloud-init user data.
## Backward compatibility
Existing single-agent deployments work unchanged. The `agent_name` field
is optional with `omitempty`, and the unnamed path preserves today's
behavior when only one root agent matches.
> This PR was authored by Mux on behalf of Mike.
## Summary
- add persistent plan mode for chats and the chat-specific plan file
flow
- add structured planning tools such as `ask_user_question` and
`propose_plan`
- keep `write_file` and `edit_files` constrained to the chat-specific
plan file during plan turns
- allow shell exploration in plan mode, including subagents, via
`execute` and `process_output`
- block implementation-oriented, provider-native, MCP, dynamic, and
computer-use tools during plan turns
- update the chat UI, tests, and docs for the new planning flow
The `coderd_chatd_message_count` histogram's current max bucket of 128
is being hit in production. This increases the exponential bucket count
from 8 to 11, extending coverage from `1..128` to `1..1024`.
Before: `1, 2, 4, 8, 16, 32, 64, 128`
After: `1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024`
Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>
Context files (AGENTS.md) and skills were only fetched from the
workspace on the first turn or when the agent changed. On subsequent
turns, stale content from persisted messages was used. This meant that
if AGENTS.md or skills were modified on the workspace between turns, the
agent wouldn't see the changes until the user created a new chat.
## Changes
- Extract `fetchWorkspaceContext` from `persistInstructionFiles` to
allow fetching workspace context without persisting
- On subsequent turns, re-fetch fresh context from the workspace instead
of reading stale persisted content; falls back to persisted messages if
the workspace dial fails
- Update `ReloadMessages` callback to re-derive instruction and skills
from reloaded database messages after compaction, instead of using
captured closure variables
- Add `formatSystemInstructionsFromParts` helper to build system
instructions directly from agent parts without requiring separate
OS/directory params
- Add tests for the new helper
<details><summary>Implementation Notes</summary>
### Root cause
In `runChat`, the `else if hasContextFiles` branch (subsequent turns)
called `instructionFromContextFiles(messages)` which read stale content
from persisted DB messages. The `ReloadMessages` callback
(post-compaction) also used captured `instruction`/`skills` closure
variables from the start of the turn, never re-deriving them.
### Approach
1. **Extract `fetchWorkspaceContext`** — Pure refactor of the fetch-only
part of `persistInstructionFiles` (agent connection, context config
retrieval, content sanitization, metadata stamping). Returns parts +
skills without persisting.
2. **Subsequent turns**: Instead of reading from persisted messages,
launch a `g2` goroutine that calls `fetchWorkspaceContext` to get fresh
context from the workspace. Falls back gracefully to persisted messages
if the workspace is unreachable.
3. **ReloadMessages**: Re-derive `instruction` from
`instructionFromContextFiles(reloadedMsgs)` and `skills` from
`skillsFromParts(reloadedMsgs)` using the freshly loaded messages, with
fallback to captured values if the reloaded messages don't contain
context (e.g. compacted away).
</details>
> 🤖 Generated by Coder Agents
## Summary
Adds `--ai-gateway-allow-byok` deployment option to control whether
users can use Bring Your Own Key (BYOK) mode with AI Gateway.
When disabled (`--ai-gateway-allow-byok=false`), BYOK requests are
rejected with a 403 and a message directing the admin to enable the
flag. Centralized key authentication works regardless of this setting.
Defaults to `true` (BYOK allowed).
---------
Co-authored-by: Danny Kopping <danny@coder.com>
Removes the claim that users can fork a chat to explore a different
direction — this is not a supported feature and the reference is
misleading.
---
*PR generated with Coder Agents*
## Summary
Fixes the "See all templates" link overlapping template items in the New
Workspace dropdown.
## Root cause
Two compounding issues:
1. **`OverflowY` className was being overwritten, not merged.** The
component spread `...attrs` (which included the caller's `className`)
onto the div, silently replacing its own base classes
(`overflow-y-auto`, `shrink`, `w-full`). This meant the template list
never scrolled independently.
2. **`PopoverContent` has `overflow-y-auto` in its base styles.** With
the inner `OverflowY` not scrolling, the *entire popover* became the
scroll container. The "See all templates" footer was part of that
scrollable flow and overlapped template rows as the user scrolled.
## Fix
- **`OverflowY`**: Destructure `className` explicitly and merge it with
the base classes using `cn()` so `overflow-y-auto` and `shrink` are
always preserved.
- **`PopoverContent`**: Add `overflow-hidden flex flex-col` to make it a
non-scrolling flex container. Only the `OverflowY` child scrolls.
- **`OverflowY` usage**: Add `min-h-0` so the flex child can shrink
below its content size when the popover's available height is
constrained.
## Screenshot
<img width="1460" height="1072" alt="image"
src="https://github.com/user-attachments/assets/9b519f2d-9806-44ca-a354-12248de36952"
/>
> 🤖 Generated with [Coder Agents](https://coder.com/agents)
## Problem
`resolveDeploymentSystemPrompt` was called inside `InTx` closures in
both `CreateChat` (`coderd/x/chatd/chatd.go`) and
`createChildSubagentChatWithOptions` (`coderd/x/chatd/subagent.go`).
That method uses `p.db` (the root store) internally to call
`GetChatSystemPromptConfig`, which requires a second DB pool checkout
while the transaction already holds one connection.
Under concurrent chat creation load (e.g., the chat scaletest at 4800
chats), this causes pool starvation: every in-flight create holds one
connection and blocks waiting for another, leading to `idle in
transaction` pileups and cascading timeouts across the entire coderd DB
pool — including unrelated background work like prebuild metrics and the
chat acquire loop.
## Fix
Move the `resolveDeploymentSystemPrompt` call before `p.db.InTx(...)` in
both call sites. The system prompt config is a read-only
deployment-level setting that does not need transactional consistency
with the chat insert, so fetching it before the transaction is both safe
and preferable (it also shortens transaction lifetime).
## Backporting
The `CreateChat` instance of this bug is also present on `release/2.32`
(`coderd/x/chatd/chatd.go` line 907). The `subagent.go` instance is not
— the child-subagent-chat creation path with its own `InTx` was added
after the branch cut.
This should be backported, but because this is only in the chat creation
path, and that's not typically hit with a great deal of concurrency in
the real world, I don't think an urgent patch for 2.32 is necessary.
## Lint gap
The existing `InTx` ruleguard rule in `scripts/rules.go` catches direct
outer-store usage (`p.db.GetFoo()`) and passing the outer store as a
function argument inside `InTx` closures, but it explicitly cannot catch
indirect access through receiver methods like
`p.resolveDeploymentSystemPrompt()` — the rule documents this blind spot
at line 273. Catching this class of bug would require interprocedural
analysis (following the callee's body to see if it touches `p.db`),
which is beyond what ruleguard's AST pattern matching can express. We're
considering a lightweight custom `go/analysis` analyzer (similar to
`paralleltestctx`) that does 1-level same-package callee inspection to
detect this pattern. In the meantime, this PR adds guidance to
`AGENTS.md` so AI reviewers can flag the pattern during code review.
This change reuses the authenticated subject's existing organization
membership information during chat creation instead of issuing an
`OrganizationMembers` query.
The current query is still correct, so this is not required for
correctness. However, `workspaceapps` already answers the same question
more cheaply from the request's RBAC subject. This extracts that logic
into `rbac.Subject.HasOrganizationMembership` and reuses it in both
places, removing an extra database lookup from chat creation without
changing the authorization behavior.
I'm currently debugging a Coder agents scaletest regression where a run
on April 2, 2026 with 4800 concurrent chat creations passed, while the
same run on April 15, 2026 does not. We could stagger chat creation to
reduce the burst, but I'd rather understand why this bottleneck appeared
in the first place so we can keep making small hot-path improvements
like this one instead of only smoothing over the symptom.
## Summary
Follows up on https://github.com/coder/coder/pull/24032
Adds a BYOK compatibility table to the AI Gateway client configuration
page, showing which clients support personal API keys and provider
subscriptions through AI Gateway.
We can simplify by merging related columns:
- Personal API Key (OpenAI) and Personal API Key (Anthropic) → Personal
API Key
- ChatGPT Subscription and Claude Subscription → Subscription (Claude
Pro/Max, ChatGPT Plus/Pro)
`NOTE`: This is displayed immediately after the existing Compatibility
table.
<img width="864" height="474" alt="image"
src="https://github.com/user-attachments/assets/644c5a7c-a9fe-454c-9112-3e3db268afc8"
/>
OSV-Scanner currently causes the scheduled security workflow to fail
whenever it reports findings, which also triggers the Slack failure
notification.
Treat exit code 1 as a successful scan with uploaded SARIF results, and
only fail the job when OSV-Scanner itself errors.
_Disclaimer: produced by Claude Opus 4.6_
Adds a `coder_build_info` metric which allows operators to see which
versions of Coder are currently running.
---------
Signed-off-by: Danny Kopping <danny@coder.com>