Adds `coder exp scaletest chat`, a harness for creating Coder Agents
chat load.
Start the mock LLM separately, prepare the scaletest workspaces you want
to target, then run the chat scaletest against the existing
`scaletest-*` fleet selected by the shared workspace targeting flags:
```sh
coder exp scaletest llm-mock --address 127.0.0.1:18080
coder exp scaletest chat --llm-mock-url http://127.0.0.1:18080/v1 --chats-per-workspace 10 --turns 1
coder exp scaletest chat --llm-mock-url http://127.0.0.1:18080/v1 --template docker --target-workspaces 0:10 --chats-per-workspace 1 --turns 10 --turn-start-delay 30s
```
This is the same pattern used by the `workspace-traffic` load generator.
Keeping the fake LLM as a separate process is intentional so it can be
scaled independently from the Coder deployment, which will likely be
necessary as we scale up and up.
This PR is the starting point: it provides the command, mock
provider/model bootstrap, existing workspace selection, chat streaming,
follow-up turns, metrics, and cleanup. Follow-up PRs will add multi-step
turns via tool calls. I'm still a bit iffy on the mechanism I have for
that. It'll likely involve having the runner send some magic strings
that the mock will recognise.
Relates to CODAGT-307
Relates to GRU-48
Relates to https://github.com/coder/scaletest/issues/124
Generated by Mux, but reviewed by a human
This fixes the flaky `TestSubscribeAfterMessageID` by seeding its chat
and messages directly, so the test no longer creates pending work that a
chat worker can pick up. The assertion now covers only the
`afterMessageID` subscription behavior, independent of chat processing
lifecycle timing.
Closes DEVEX-326
Closes https://github.com/coder/internal/issues/1489
When removing the `/` personal skill trigger, the popover content stayed
mounted during its close transition and briefly rendered the empty
skills state at the viewport origin.
This keeps the menu content mounted for stable Radix positioning,
preserves the last open menu state during the close transition, and adds
a Storybook regression for the backspace path.
> Mux is creating this PR on behalf of Mike.
## Description
`Provider.InjectAuthHeader` is no longer needed. With the addition of `KeyFailoverConfig` in #24920, authentication is now applied per-attempt by `KeyFailoverTransport` on passthrough routes. This PR removes the dead method from the `Provider` interface, all implementations (`Anthropic`, `OpenAI`, `Copilot`), and the test mock.
The orphaned `InjectAuthHeader` unit tests are replaced with `Test{Anthropic,OpenAI,Copilot}_KeyFailoverConfig`. `TestPassthrough_KeyFailover` is also extended to cover Copilot in the BYOK scenario.
Related to: https://linear.app/codercom/issue/AIGOV-334/aibridge-follow-ups-from-key-failover-prs
> [!NOTE]
> Initially generated by Claude Opus 4.7, modified and reviewed by @ssncferreira
## Description
Cleans up how key pool errors are represented and how they get turned into HTTP responses. Consolidates two error types into a single type with a kind tag, and gives the response helpers in both providers consistent names.
## Changes
- Replaced the keypool sentinel and transient error struct with one error type that carries a kind and a retry-after duration.
- Updated `KeyFailoverConfig.BuildKeyPoolResponse` to take the typed key pool error, so each provider can shape the exhaustion response in its own format.
- Removed the per-provider `MarkKey` callback from `KeyFailoverConfig` since providers can rely on the shared `MarkKeyOnStatus` helper.
- Renamed the response-error helpers so OpenAI and Anthropic use the same naming.
Related to: https://linear.app/codercom/issue/AIGOV-334/aibridge-follow-ups-from-key-failover-prs
> [!NOTE]
> Initially generated by Claude Opus 4.7, modified and reviewed by @ssncferreira
Documents the known race in `EventStream.IsStreaming()` and the
resulting flake in
`TestStreamingInterception_AgenticLoopFailover/agentic_all_keys_fail `,
accepted rather than fixed since the inner agentic loop is on track to
be removed as part of the reverse proxy migration in coder/aibridge#223.
Full reasoning in coder/internal#1524.
waitForTaskIdle used time.NewTicker(5s) which delays the first poll
by 5 seconds. Debugger tracing proved the failure mechanism: on slow
CI (Windows), the first poll at 5s sees "working" (idle patch has not
landed due to goroutine scheduling), needs poll #2 at 10s, but the
25s context expires before it fires.
Two changes:
1. Use r.clock.NewTicker (quartz) with time.Nanosecond initial
interval and Reset(5s) for immediate first poll. Tests inject a
mock clock via clitest.NewWithClock for deterministic control.
2. Rewrite WaitsForWorkingAppState test with quartz traps
(NewTicker + TickerReset) for deterministic synchronization
instead of racing goroutines. Fix PausedDuringWaitForReady
sync point.
Closes DEVEX-381
Adds a `test_image` job that runs `make gen`, `make fmt`, `make lint`, and `make build` inside the
newly built image via `docker run`. This helps detect breaking changes before merge.
> [!NOTE]
> Generated with [Coder Agents](https://coder.com/agents)
ContextPartsFromDir scans ~/.coder/skills via DefaultSkillsDir.
On machines with real skills installed, these leaked into test
results. Set HOME/USERPROFILE to temp dirs on the parent test
so subtests run in a clean environment.
TestPromoteQueuedWhileRunningRespectsMessageOrder was flaky because
it read queue state from the database immediately after PromoteQueued
returned. The active server worker drains queued messages concurrently,
so the DB read races the auto-promote pipeline (TOCTOU).
Instead of asserting intermediate queue state, wait for all three
promoted messages to appear in chat history and verify their relative
order (B before A before C). This asserts the same invariant (promote
reorders B to the front) without reading during the race window.
Closes CODAGT-384
handleProcessOutput read proc.output() then proc.info() using
separate locks. Between the two reads the exit goroutine could
finish I/O and set running=false, pairing stale output with final
status. On Windows CI this caused OutputExceedsBuffer to flake
when the buffer snapshot caught mid-write data (OmittedBytes=0)
but info reported the process as exited.
Swap the read order so info is read first. The exit goroutine
completes cmd.Wait (draining all pipe data) before setting
running=false, so seeing Running=false guarantees the subsequent
output read reflects the final buffer state.
Closes CODAGT-399
The root cause of the TestPromoteQueuedWhileRequiresActionMixedTools
flake (CODAGT-425) was the subscriber out-of-order durable message
delivery bug, fixed by PR #25433 (ec1e861). All five CI failures
predate that fix. Zero failures since.
This change hardens the subscriber event-drain pattern in both
PromoteQueued requires_action tests: wrap the channel select in a
for-loop so interleaved non-target events (status, queue_update,
message_parts) are consumed in the same Eventually tick instead of
each burning a 25ms interval. This is defense-in-depth for slow CI
runners, not a standalone bug fix.
Closescoder/internal#1523
Closes CODAGT-425
Adds `--env-file` to `scripts/develop.sh` to allow reading environment
from a given file. This makes it easier to configure things like external
auth providers, access URLs, and other dev-time settings without
exporting a wall of environment variables in every shell session.
> Generated with [Coder Agents](https://coder.com/agents)
Generic agent chat tool cards now render an `Input` section before the
existing output viewer, so MCP and workspace MCP tools expose the
arguments sent to the tool. Empty inputs stay hidden, model-intent
wrappers are stripped before display, and the formatted input is the
single source of truth for whether an input block renders.
Refs
https://linear.app/codercom/issue/CODAGT-260/show-mcp-tool-inputs-in-agent-chats
> Mux worked on this on Mike's behalf.
RFC: [Bridge ↔ Boundaries Correlation
RFC](https://www.notion.so/coderhq/Gateway-and-Firewall-Correlation-RFC-31ad579be592803aa8b3d48348ccdde9)
Add up/down migrations and matching sqlc queries for persisting Boundary
audit events, as specified in the Bridge/Boundaries Correlation RFC.
**Tables:**
- `boundary_sessions`: session metadata with `workspace_agent_id` FK,
`confined_process_name`, and timestamps (`started_at`, `updated_at`). ID
is externally supplied by the Boundary process (no DB-side default).
Created lazily when the first log for a session arrives.
- `boundary_logs`: individual audit events with `session_id` FK,
`sequence_number` (INT, primary ordering key), protocol/method/detail
fields, and `matched_rule` (nullable; non-NULL implies allowed).
**Indexes (per RFC):**
- `(session_id, sequence_number)` for the ordering query path
- `(captured_at)` for the retention purge path
**Queries:**
- `InsertBoundarySession` / `GetBoundarySessionByID`
- `InsertBoundaryLog` / `GetBoundaryLogByID`
- `ListBoundaryLogsBySessionID` with nullable `seq_after`/`seq_before`
exclusive bounds for fetching events between two known interception
sequence numbers
- `DeleteOldBoundaryLogs` with row limit to avoid long-running
transactions
**Also includes:** dbgen helpers (`BoundarySession`, `BoundaryLog`),
dbauthz implementations (reads gated on `ResourceAuditLog`, deletes on
`ResourceSystem`), and all generated wrappers (dbmock, dbmetrics).
No callers yet. A follow-up PR will add the dedicated `boundary_log`
RBAC resource type.
> Generated by Coder Agents
Allows an `api_key_id` to be passed from a trusted in-memory transport
(currently: `chatd`) to `aibridged` for use in authenticating LLM
requests.
This value can _only_ be passed via context, and all users of the
in-memory transport _must_ provide it.
It can be used in conjunction with BYOK headers.
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes#24183
## Changes
Drops `mx-auto` so README content left-aligns with the header. Bumps
padding from 24px to 32px and widens `max-w` from 800px to 860px for
breathing room.
Applied to both:
- `TemplateDocsPage.tsx`
- `StarterTemplatePageView.tsx`
> Generated with [Coder Agents](https://coder.com/agents)
Skips `TestSignalWakeSendMessage`, which flakes because the current
chatd control notification flow can deliver stale status notifications
after a new processing run starts.
This mirrors the existing CODAGT-353 skips for the same
stale-notification class and leaves the deterministic fix to that
notification-flow refactor.
Refs
https://linear.app/codercom/issue/ENG-2727/flake-testsignalwakesendmessage
> Generated by Coder Agents on behalf of @ibetitsmike.
Replace the brief runtime-behavior paragraph with a dedicated section
covering when env and file secrets appear in a workspace, what the
running workspace sees, and how create/update/delete propagate. Call out
that Coder never explicitly removes secret files it has written, so deleting a
secret or changing its file path may leave the previous file on disk.
Co-authored-by: Coder Agents <noreply@coder.com>
## Summary
- Compute mobile dropdown bottom offsets in layout-viewport coordinates,
matching the fixed Radix popover wrapper.
- Use `visualViewport.offsetTop` to clamp the above-composer popup
height when iOS WebKit pans the visual viewport for the soft keyboard.
- Align mobile dropdown width/left to the chat composer and add a
Storybook regression for shifted visual viewports.
## Testing
- `cd site && pnpm tsc --noEmit -p .`
- `cd site && pnpm test:storybook
src/pages/AgentsPage/components/ChatMessageInput/ChatMessageInput.stories.tsx`
- `cd site && pnpm lint`
## Manual mobile verification
Start dev mode with `./scripts/develop.sh`, open the forwarded port 8080
URL on a real iPhone in Safari and Chrome, focus the Agents chat input,
type `/`, and verify the personal skills popup appears directly above
the composer, stays within the visible viewport while the keyboard is
open, and scrolls internally for long lists.
Generated by Coder Agents.
GPT-class chat turns could eagerly create workspaces or repeat setup
such as cloning an existing repo because the system prompt framed setup
work as the default path.
This updates chatd prompt guidance and the `create_workspace` tool
description so agents reuse existing chat and workspace context, treat
injected workspace context as already read, avoid recloning present
repositories, and create or start workspaces only when workspace-backed
work is required. Delegated chats now report workspace needs to the
parent instead of trying to create one.
> Mux opened this PR on behalf of Mike.
We decided to remove secret requirements and go a different direction
for secrets in Coder (see PLAT-243). As a result, we removed the code in
terraform-provider-coder and coder/preview to handle this resource. This
PR pulls in said updated versions.
Generated with assistance by Coder Agents.
On mobile, typing `/` in the chat input could leave the personal-skills
popup partially clipped above the visible viewport. With the soft
keyboard open, Radix's collision detection flipped the caret-anchored
popup above the caret, and the resulting position pushed the top of the
list off-screen.
Add a `.mobile-full-width-dropdown-above-composer` CSS variant in
`site/src/index.css`, driven by a new
`--mobile-dropdown-above-composer-bottom` custom property set from the
existing composer geometry effect in `AgentChatInput.tsx`. The variant
pins the Radix popper wrapper to sit just above the chat input with the
same horizontal padding (`calc(100vw - 2rem)`), and caps `max-height` to
the space between the viewport top and the composer top so the inner
`CommandList` scrolls when the skill list overflows.
Apply the new classes to `PersonalSkillsTriggerMenu`'s `PopoverContent`.
Desktop behavior is unchanged: the new selectors only apply below the
`md` breakpoint, and the caret-anchored `PopoverAnchor` still drives
Radix positioning everywhere else.
Two new Storybook stories cover the mobile geometry:
`MobileAboveChatInput` asserts the popup stays inside the visible
viewport, and `MobileLongListScrolls` asserts the popup is scrollable
when the skill list is taller than the available space.
<details>
<summary>Implementation plan</summary>
The plan file lives at
`/home/coder/.coder/plans/PLAN-28f5e6ed-97dd-4375-a338-60fded8ef8b0.md`
in the agent workspace and was followed end-to-end without scope drift.
Key decisions:
- Did not reuse the existing `.mobile-full-width-dropdown-bottom`
because its formula (`window.innerHeight - composer.bottom`) aligns the
popup's bottom edge with the composer's bottom edge, which overlaps the
composer rather than sitting above it.
- Did not change the existing class's behavior because other dropdowns
(Plus menu, ContextUsageIndicator, ModelSelector, WorkspacePill,
CompactOrgSelector) rely on the current geometry. If the project decides
the overlap pattern is also a bug, those callsites can migrate to the
new variant in a separate change.
- Kept the caret-pinned `PopoverAnchor` span in
`PersonalSkillsTriggerMenu` because it still drives desktop positioning,
and on mobile the CSS overrides the wrapper position entirely (same
pattern as the existing `mobile-full-width-dropdown-bottom` usage).
- Left `CommandList`'s `max-h-72` in place so desktop still caps the
popup at ~18 rem; on mobile the wrapper's CSS-driven `max-height` is the
binding constraint.
</details>
Generated by Coder Agents on behalf of @jaaydenh.
---------
Co-authored-by: Coder Agents <noreply@coder.com>
Relates to CODAGT-432
Adds three new search filters to the chat list endpoint (`GET
/api/experimental/chats/`):
- `pr:<number>` - exact PR number match
- `repo:<owner/repo>` - substring match against git remote origin or URL
- `pr_title:<text>` - case-insensitive PR title substring match
Includes SQL filter clauses (EXISTS against `chat_diff_statuses`),
parser with validation, handler wiring, unit tests, swagger annotation
update, and a new search syntax documentation page.
> 🤖 Generated with [Coder Agents](https://coder.com/agents)
When the personal skills menu is open and the user clicks outside (e.g.
the send button), the Popover closes via `onOpenChange` but the
`SkillsTriggerPlugin`'s `dismissedTriggerRef` is not set. The next
Lexical update listener call detects the trigger again and briefly
reopens the menu, causing a visible flash.
Addresses this symptom:
https://github.com/user-attachments/assets/0c1442a2-df75-442b-bcf8-4b028dc647b0
Fix by recording the current trigger position in `dismissedTriggerRef`
when the `open` prop transitions from `true` to `false`. This mirrors
what the Escape key handler already does and prevents `refreshTrigger`
from immediately re-opening the menu at the same position.
<details><summary>Implementation details</summary>
- Added a `useLayoutEffect` in `SkillsTriggerPlugin` that tracks `open`
prop transitions via a `prevOpenRef`. When `open` goes from `true` to
`false`, it snapshots the current trigger position into
`dismissedTriggerRef`, matching the pattern the Escape handler uses
(line 225-227).
- Added `OutsideClickDismissesTriggerOnRefocus` Storybook regression
story that verifies the menu stays closed when clicking back into the
editor after an outside-click dismissal.
</details>
---
*PR generated with Coder Agents*
Replace redundant matching Tailwind width and height utilities in
AgentsPage with the `size-*` shorthand. This addresses the AgentsPage
`react-doctor/design-no-redundant-size-axes` findings without changing
rendered dimensions.
Fixes: ENG-2719
Fixes the flake in
`TestSendMessageWithModelOverrideUpdatesLastModelConfigID` (and the same
pattern in `TestSubsequentSendWithoutOverrideUsesPersistedModel`).
> Generated with [Coder Agents](https://coder.com/agents)
Since AI Gateway is now enabled by default, and if the AI Gateway Proxy is enabled too it's possible the server can start without any configured providers. This would previously block startup, which is unacceptable.
In an upstack PR we will handle reloading the providers at runtime, so the server needs to be able to start up even if it can't handle any proxy requests to AI Gateway.
This change was necessitated because if there are providers configured in the environment they need to be seeded _before_ the proxy starts.
Fixes CODAGT-311.
Users receive too many auto-archive notification emails because the
dbpurge loop runs every 10 minutes and archives chats on each tick using
timestamp-precise cutoffs, causing chats to trickle past the threshold
continuously.
Switch archive eligibility from timestamp arithmetic to date arithmetic
(UTC day boundaries). All chats whose last activity falls on the same
UTC date are now archived together on the first tick after midnight UTC,
reducing notification emails to ~at most~ probably one per day.
(Exception: if we hit the auto-archive limit)
- SQL compares `(last_activity AT TIME ZONE 'UTC')::date` against cutoff
date
- Go truncates current time to start-of-day before subtracting archive
days
- Tests verify date boundary semantics including late-activity and batch
edge cases
- Docs updated to describe UTC day boundary behavior and at-most-daily
notification cadence
> [!NOTE]
> Generated by Coder Agents
Normalize program names in shellparse.Parse to their basename.
Does not rely on filepath.Base because the server may run on either
Linux or Windows where the behavior would differ.
Closes CODAGT-470
### TL;DR
Introduces an in-process `TransportFactory` for aibridge so that chatd (coder-agent LLM traffic) can route requests through the aibridged handler without crossing the HTTP route or requiring a license entitlement check.
### What changed?
- Added a new `coderd/aibridge` package with a `TransportFactory` interface and a `Source` type for tagging the call site on request contexts. `SourceAgents` is defined as the constant for coder-agent traffic.
- Implemented `NewTransportFactory` in `coderd/aibridged/transport.go`, which returns an `http.RoundTripper` that dispatches requests to the aibridged handler in-process. The response body is streamed through an `io.Pipe` so SSE/NDJSON/chunked responses propagate token-by-token. Handler panics are recovered and surfaced as 500 responses, and context cancellation closes the pipe with the appropriate error.
- `RegisterInMemoryAIBridgedHTTPHandler` now also constructs a `TransportFactory` from the registered handler and stores it on `API.AIBridgeTransportFactory` (an `atomic.Pointer`), making it available to chatd without going through the license-gated HTTP route.
- Added `API.AIBridgeTransportFactory` as a public `atomic.Pointer[aibridge.TransportFactory]` field on `coderd.API`.
### How to test?
- `coderd/aibridged/transport_test.go` covers: transport creation, nil-handler errors, source attachment to context, header/status passthrough, streaming (SSE-style chunked writes visible before handler completion), context cancellation closing the body with an error, concurrent requests, handler panics producing 500s, and handlers that return without writing.
- `coderd/aibridge_test.go` verifies that `AIBridgeTransportFactory` starts as nil on AGPL coderd, can be stored and loaded atomically, and that the stored factory correctly dispatches requests through the stub handler.
### Why make this change?
Chatd needs to send LLM requests through aibridge in-process rather than via the external HTTP route, which is license-gated. The `TransportFactory` abstraction provides a clean seam: the entitlement check remains on the HTTP route for external callers, while in-process coder-agent traffic bypasses it through the factory. The `Source` type allows downstream handlers and logs to attribute traffic without gating behavior on the caller identity.
My agent added `//nolint:testpackage` to a test file on one of my PRs.
Again. This PR cleans it up across the entire repo and updates the
in-repo conventions so future agents stop doing it.
The repo already has a precedent for white-box tests that need to touch
unexported symbols: `*_internal_test.go` (145+ existing files). The
`testpackage` linter's default `skip-regexp` exempts that filename
suffix, so the `//nolint:testpackage` directive is unnecessary in every
case where someone reached for it. This PR renames 51 such files to
`*_internal_test.go` via `git mv` so blame and history follow, and
strips the dead directive from 2 files that were already correctly named
(`coderd/oauth2provider/authorize_internal_test.go`,
`coderd/x/chatd/advisor_internal_test.go`).
`.claude/docs/TESTING.md` now documents the rule explicitly under *Test
Package Naming*, which is imported into the root `AGENTS.md` via
`@.claude/docs/TESTING.md`. The rule: prefer `package foo_test`; if you
need internal access, rename the file to `*_internal_test.go` rather
than adding a nolint directive.
`TestWatchAgentContainers/CoderdWebSocketCanHandleClientClosing` spent
about 15 seconds waiting for the real websocket heartbeat ticker to
detect that the client closed.
Add a clock-aware `HeartbeatClose` wrapper and pass `api.Clock` through
the containers watch handler so the test can drive the heartbeat
deterministically with `quartz.Mock`. The test still verifies the same
client-close teardown path, but it advances the heartbeat tick instead
of waiting for wall-clock time.
Refs #25557
Discovered as part of the work on CODAGT-381.
In order to allow Coder Agents to use AI Gateway in OSS, we need to rehome the `aibridged`\-related code into the AGPL path.
The HTTP API is only registered under enterprise so will still require the AI Governance Add-on to be present in order to use it, whereas Coder Agents uses an in-memory pipe to the same handlers.
`CODER_AI_GATEWAY_ENABLED` / `CODER_AIBRIDGE_ENABLED` is now being defaulted to `true` now that it will be used by Coder Agents.
If you previously had this value disabled explicitly, that value will persist.
_Disclaimer: implemented by a Coder Agent using Claude Opus 4.7_
Part of the implementation of [RFC: Common AI Provider Configs](https://www.notion.so/coderhq/RFC-Common-AI-Provider-Configs-34bd579be59280ed958feffb82024797) (AIGOV-201).
## Note
This change can cause a previously working installation to fail to start should a conflict exist between the providers configured in the environment & those now migrated to the database.
I'll raise a PR upstack to document this process and workarounds should a startup fail.
## What this PR does
Reconciles environment-derived AI provider configuration with the `ai_providers` table at server startup. The seed runs **before** the aibridged daemon is initialized, so the runtime always reads providers from the database; the legacy `CODER_AIBRIDGE_*` environment variables become a one-shot migration source.
### Behavior
- Concurrent server starts are serialized through a Postgres advisory lock (`LockIDAIProvidersEnvSeed`).
- Missing rows are inserted with an audit entry attributed to the system actor.
- Existing rows whose canonical hash matches the env-derived hash are left alone (the common no-op restart path).
- Existing rows whose canonical hash does **not** match cause server startup to fail with a descriptive error so the operator can explicitly resolve the conflict in either env or DB.
- Soft-deleted rows are NOT resurrected from env; an explicit operator deletion is sticky across restarts.
- Indexed providers whose name conflicts with a legacy env var fail startup with a clear remediation message.
- Unknown provider types (e.g. `copilot`, until the DB enum is widened) are skipped with a log entry rather than failing startup.
### Canonical hashing
The `canonicalAIProvider` shape captures exactly the fields that determine runtime behavior — `type`, `base_url`, and the Bedrock subset of settings (access key, access key secret, region, model, small fast model) — and is hashed with SHA-256. The hash is **computed on demand from the row + env**, never persisted, so the database does not need a new column for it. API keys live in the separate `ai_provider_keys` table and are intentionally excluded from the hash so operators can rotate keys via the API without forcing a server restart.
<details>
<summary>Decision log</summary>
- The hash is intentionally not persisted in the database. The RFC discussed this trade-off; computing on demand keeps the schema minimal and lets the canonical shape evolve without a migration.
- The lock uses an `iota` slot in `coderd/database/lock.go` rather than `GenLockID` so it's stable, easy to audit, and matches the convention used for every other startup lock.
- A bearer-token Anthropic provider whose env vars also set Bedrock metadata but no AWS credentials does NOT store the Bedrock fields. Without credentials the discriminated settings would misrepresent the row as Bedrock auth.
- We deliberately do NOT publish to the `ai_providers_changed` pubsub channel from the seed because the seed completes before any subscriber is started; the follow-up PR introduces that channel.
</details>