Coder runs all migrations in a single transaction (`pgTxnDriver`).
Postgres forbids using an enum value added by `ALTER TYPE ... ADD VALUE`
within the same transaction that added it. Migration `000499` widened
`ai_provider_type` with `ADD VALUE`, and `000504` casts existing
`chat_providers` rows to that enum in the same transaction. On
deployments with a legacy provider using one of the new values (for
example `openai-compat`), the batch failed with `unsafe use of new
value` and the server could not start.
Recreate the type (create a new enum, alter the column, drop and rename)
instead of using `ADD VALUE`, matching the existing precedent in
`000144_user_status_dormant`. A freshly created enum's values are usable
immediately in the same transaction, so the cast in `000504` succeeds.
The resulting schema is identical, so `make gen` produces no `dump.sql`
diff and databases that already applied these migrations see no drift.
Added a regression test that seeds an `openai-compat` provider and
applies `000499` through `000504` in a single transaction, reproducing
the production path. The per-step `Stepper` used by the other migration
tests commits each migration separately and cannot surface this class of
bug.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Signed-off-by: Danny Kopping <danny@coder.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements https://linear.app/codercom/issue/AIGOV-285
Follow the structure established in
https://github.com/coder/coder/pull/25203
## Summary
Adds the `user_ai_budget_overrides` table and CRUD API at
`/api/v2/users/{user}/ai/budget`. An override sets a custom per-user
spend cap that supersedes group-budget resolution, attributing spend to
a specific group.
## Schema
```sql
CREATE TABLE user_ai_budget_overrides (
user_id UUID PRIMARY KEY REFERENCES users(id) ON DELETE CASCADE,
group_id UUID NOT NULL REFERENCES groups(id) ON DELETE CASCADE,
spend_limit_micros BIGINT NOT NULL CHECK (spend_limit_micros >= 0),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
```
## Membership lifecycle
The membership invariant — a user must be a member of the attributed
group, including when that group is "Everyone" — would naturally be
expressed as a composite FK on `(user_id, group_id) →
group_members_expanded(user_id, group_id)`. PostgreSQL doesn't allow
foreign keys to reference views, so enforcement is split across two
mechanisms:
- **Write-time check.** A CHECK constraint on the table
(`user_ai_budget_overrides_must_be_group_member`) calls a `STABLE`
function `is_group_member(user_id, group_id)` that queries
`group_members_expanded`. The view surfaces both regular group
memberships and the implicit "Everyone" group memberships from
`organization_members`. Any INSERT or UPDATE that violates the predicate
is rejected with a Postgres `check_violation`, which the handler maps to
a 400. `is_group_member` is defined as a general predicate, reusable by
any future table that needs the same check.
- **Cascade on removal.** Two `BEFORE DELETE` triggers handle membership
loss:
- `trigger_delete_user_ai_budget_overrides_on_group_member_delete` on
`group_members` — covers regular group removals (admin action, OIDC
sync).
- `trigger_delete_user_ai_budget_overrides_on_org_member_delete` on
`organization_members` — covers the "Everyone" group, whose membership
lives in `organization_members`.
The single-column FKs on `users(id)` and `groups(id)` remain to cascade
on user or group deletion (those paths don't pass through
`group_members`).
## Authorization
The dbauthz layer gates each operation against the `User` and (for
writes) `Group` resources:
| Operation | User resource | Group resource |
|-----------|----------------|----------------|
| `GET` | `ActionRead` | — |
| `PUT` | `ActionUpdate` | `ActionUpdate` |
| `DELETE` | `ActionUpdate` | `ActionUpdate` |
For `DELETE`, the dbauthz layer fetches the existing override first to
learn the attributed `group_id`, then runs both checks.
### Role matrix
| Role | GET | PUT | DELETE |
|--------------|-----|-----|--------|
| Owner | ✅ | ✅ | ✅ |
| UserAdmin | ✅ | ✅ | ✅ |
| OrgAdmin | ✅ | ❌ | ❌ |
| OrgUserAdmin | ✅ | ❌ | ❌ |
Internal discussion:
https://codercom.slack.com/archives/C096PFVBZKN/p1779392747885359
## Audit logs
Audit logs will be addressed in a follow-up PR.
## Problem
Centralized requests recorded *the first available key from the pool at
`CreateInterceptor` time* as `credential_hint`, so the interception
could be persisted in the database with a hint that didn't match the key
that actually served the request. The fix consists in storing, at
end-of-interception, the hint of the key that succeeded, or the last
attempted key if all keys are unavailable.
## Changes
- Add `Key.Hint()` and update `credential_hint` on every failover
attempt so it reflects the actually-used key.
- Stop pre-populating `credential_hint` at `CreateInterceptor`.
Centralized starts empty and is updated by the key failover loop.
- Persist the final hint via `RecordInterceptionEnded`; SQL updates
`credential_hint` only when `credential_kind = 'centralized'` so BYOK
keeps its start-time value.
- Log the actually-used hint on interception end/failure; start log uses
a `<keypool-pending>` placeholder for centralized.
> [!NOTE]
> Initially generated by Claude Opus 4.7, modified and reviewed by
@ssncferreira
RFC: [Bridge ↔ Boundaries Correlation
RFC](https://www.notion.so/coderhq/Gateway-and-Firewall-Correlation-RFC-31ad579be592803aa8b3d48348ccdde9)
Register a dedicated `boundary_log` RBAC resource type with `create`,
`read`, and `delete` actions, replacing the placeholder
`rbac.ResourceAuditLog` and `rbac.ResourceSystem` references previously
used in the dbauthz layer.
Create is granted at user-level so workspace agents can only write logs
owned by their workspace owner, preventing cross-workspace log
fabrication. Delete is restricted to `DBPurge` only; no human role
(including owner) can delete boundary logs.
| Subject | Create (own) | Create (other) | Read (all) | Delete |
|---|---|---|---|---|
| Workspace agent | yes | no | no | no |
| Owner (site admin) | yes (via member) | no | yes | no |
| Auditor | no | no | yes | no |
| DBPurge | no | no | no | yes |
### Changes
- **RBAC policy & resource definition**: add `boundary_log` to
`policy.go` and generate `ResourceBoundaryLog` object, scope constants,
and codersdk/TypeScript types.
- **dbauthz authorization**: replace all
`ResourceAuditLog`/`ResourceSystem` placeholders with
`ResourceBoundaryLog`. `InsertBoundaryLog` and `InsertBoundarySession`
derive the workspace owner from the agent and authorize with
`.WithOwner()` for user-scoped create.
- **Role assignments:**
- **Owner (site):** read only. Excluded from `allPermsExcept` wildcard;
create is inherited from member at user-level.
- **Member (user-level):** create. User-scoped so agents can only write
logs they own.
- **Auditor (site):** read.
- `boundary_log` is excluded from org-admin, org-member, and
org-service-account `allPermsExcept` calls for consistency with
`ResourceBoundaryUsage`.
- **System subjects:**
- **DB Purge** (`SubjectTypeDBPurge`): delete. The only subject that can
remove boundary logs.
- **Workspace agent scope**: `ResourceBoundaryLog` with wildcard ID in
the agent scope allow-list (necessary for creation since no pre-existing
ID exists). User-level role scoping prevents deployment-wide access.
- **DB migration** (`000510_boundary_log_scopes`): add `boundary_log:*`,
`boundary_log:create`, `boundary_log:delete`, `boundary_log:read` enum
values to `api_key_scope`.
- **Test coverage**: `BoundaryLogCreate` (user-scoped, only matching
owner succeeds), `BoundaryLogDelete` (all human roles denied),
`BoundaryLogRead` (owner + auditor). dbauthz mock tests set up workspace
agent lookups for owner derivation.
- **Generated docs**: update OpenAPI specs, API reference docs, and
frontend type definitions.
---------
Co-authored-by: Muhammad Danish <mdanishkhdev@gmail.com>
Co-authored-by: Coder Agents <coder-agents-review[bot]@users.noreply.github.com>
Adds a database migration that reconciles existing stale chat MCP server
IDs, then installs a `BEFORE DELETE` trigger on `mcp_server_configs` to
remove the deleted ID from `chats.mcp_server_ids`. This keeps chat
continuation from failing with `400 One or more MCP server IDs are
invalid` after an MCP server config is deleted.
This matches the existing repo precedent in
`coderd/database/migrations/000241_delete_user_roles.up.sql`, where
deleting a custom role cleans `organization_members.roles`, a similarly
structured array of references that cannot be protected by a normal
foreign key.
Closes CODAGT-505
<!--
If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting.
-->
Makes `MsgQueue` exported, so it can be used in pubsub implementations outside PGPubsub.
Add a Postgres trigger and matching codersdk constants that cap each
user's secrets in four dimensions: count (50), total stored value bytes
(200 KiB), env-injected stored value bytes (24 KiB), and env name length
(256 bytes). Without these caps a user could overflow the 4 MiB DRPC
agent manifest, the ~32 KiB Windows process env
block, or Linux/macOS ARG_MAX at workspace start. The trigger is the
source of truth on aggregates; the handler maps its check_violation
error into a 400 that names the per-user budget in stored
(post-encryption) bytes. A handler test exercises off-by-one at each cap
across POST and PATCH, plus per-user budget isolation.
Generated with help from Coder Agents.
## Summary
Routes chatd model calls backed by concrete AI Provider rows through the
in-process aibridge transport by default, with deployment options to use
direct provider routing when AI Gateway is disabled or chat AI Gateway
routing is disabled.
- Splits model routing into common, direct provider, and AI Gateway
paths behind a single deployment-mode entry point.
- Builds chatd models through explicit request, route, and options data.
Active API key attribution is passed explicitly instead of being hidden
inside generic model construction.
- For AI Gateway BYOK routes, resolves the user's provider key in chatd,
forwards it through provider-specific auth headers, and sets
`X-Coder-AI-Governance-Token` to the `delegated` marker so aibridge
preserves those headers while still stripping Coder-specific metadata.
- Keeps central provider credentials and deployment fallback credentials
out of forwarded provider auth headers, so AI Gateway central policy
remains authoritative.
- Redacts delegated provider auth from default string formatting to
avoid accidental plaintext logging of user BYOK credentials.
- Covers selected chat models, advisor overrides, title and quickgen
paths, subagent overrides, computer use model selection, and an
integration-style chat turn through the aibridge transport path.
- Persists initiating API key IDs on chat and queued user messages,
including subagent child messages, and fails closed for AI
Gateway-routed model builds without an active key.
- Removes unused `api_key_id` indexes while keeping the persistence
columns and foreign keys.
- Keeps the deployment option available through config and env parsing,
but hides it from CLI help and generated docs.
- Stabilizes the subagent poll fallback test so background CreateChat
processing cannot win the state transition under slower CI environments.
## Tests
- `go test ./coderd/x/chatd -run
'TestAIGatewayProviderAuthForUser|TestAIGatewayProviderAuthRedactsFormatting|TestResolveModelRouteForConfigAIGatewayProviderAuth|TestAIGatewayModelForwardsProviderAuth|TestProcessChat_AIGatewayRoutingUsesDelegatedAPIKey|TestAwaitSubagentCompletion'
-count=1`
- `go test ./coderd/aibridged -run
'TestServeHTTP_DelegatedAPIKey|TestServeHTTP_StripCoderToken' -count=1`
- `git diff --check HEAD~1..HEAD`
- `make lint`
> Mux working on behalf of Mike.
Replace the env-based `BuildProviders` with a DB-backed loader. The database is now the single source of truth for runtime provider configuration; env config arrives via `SeedAIProvidersFromEnv` (run at boot) and `BuildProviders` reads it back as `aibridge.Provider` instances. `cli/server.go` and `enterprise/cli/server.go` both call the same path, so aibridged and aibridgeproxyd see the same provider set.
Per-provider `DumpDir` is replaced by a top-level `CODER_AI_GATEWAY_DUMP_DIR` base; each provider's effective dump path is `<base>/<provider name>`.
RFC: [Bridge ↔ Boundaries Correlation
RFC](https://www.notion.so/coderhq/Gateway-and-Firewall-Correlation-RFC-31ad579be592803aa8b3d48348ccdde9)
Add up/down migrations and matching sqlc queries for persisting Boundary
audit events, as specified in the Bridge/Boundaries Correlation RFC.
**Tables:**
- `boundary_sessions`: session metadata with `workspace_agent_id` FK,
`confined_process_name`, and timestamps (`started_at`, `updated_at`). ID
is externally supplied by the Boundary process (no DB-side default).
Created lazily when the first log for a session arrives.
- `boundary_logs`: individual audit events with `session_id` FK,
`sequence_number` (INT, primary ordering key), protocol/method/detail
fields, and `matched_rule` (nullable; non-NULL implies allowed).
**Indexes (per RFC):**
- `(session_id, sequence_number)` for the ordering query path
- `(captured_at)` for the retention purge path
**Queries:**
- `InsertBoundarySession` / `GetBoundarySessionByID`
- `InsertBoundaryLog` / `GetBoundaryLogByID`
- `ListBoundaryLogsBySessionID` with nullable `seq_after`/`seq_before`
exclusive bounds for fetching events between two known interception
sequence numbers
- `DeleteOldBoundaryLogs` with row limit to avoid long-running
transactions
**Also includes:** dbgen helpers (`BoundarySession`, `BoundaryLog`),
dbauthz implementations (reads gated on `ResourceAuditLog`, deletes on
`ResourceSystem`), and all generated wrappers (dbmock, dbmetrics).
No callers yet. A follow-up PR will add the dedicated `boundary_log`
RBAC resource type.
> Generated by Coder Agents
Relates to CODAGT-432
Adds three new search filters to the chat list endpoint (`GET
/api/experimental/chats/`):
- `pr:<number>` - exact PR number match
- `repo:<owner/repo>` - substring match against git remote origin or URL
- `pr_title:<text>` - case-insensitive PR title substring match
Includes SQL filter clauses (EXISTS against `chat_diff_statuses`),
parser with validation, handler wiring, unit tests, swagger annotation
update, and a new search syntax documentation page.
> 🤖 Generated with [Coder Agents](https://coder.com/agents)
Fixes CODAGT-311.
Users receive too many auto-archive notification emails because the
dbpurge loop runs every 10 minutes and archives chats on each tick using
timestamp-precise cutoffs, causing chats to trickle past the threshold
continuously.
Switch archive eligibility from timestamp arithmetic to date arithmetic
(UTC day boundaries). All chats whose last activity falls on the same
UTC date are now archived together on the first tick after midnight UTC,
reducing notification emails to ~at most~ probably one per day.
(Exception: if we hit the auto-archive limit)
- SQL compares `(last_activity AT TIME ZONE 'UTC')::date` against cutoff
date
- Go truncates current time to start-of-day before subtracting archive
days
- Tests verify date boundary semantics including late-activity and batch
edge cases
- Docs updated to describe UTC day boundary behavior and at-most-daily
notification cadence
> [!NOTE]
> Generated by Coder Agents
_Disclaimer: implemented by a Coder Agent using Claude Opus 4.7_
Part of the implementation of [RFC: Common AI Provider Configs](https://www.notion.so/coderhq/RFC-Common-AI-Provider-Configs-34bd579be59280ed958feffb82024797) (AIGOV-201).
## Note
This change can cause a previously working installation to fail to start should a conflict exist between the providers configured in the environment & those now migrated to the database.
I'll raise a PR upstack to document this process and workarounds should a startup fail.
## What this PR does
Reconciles environment-derived AI provider configuration with the `ai_providers` table at server startup. The seed runs **before** the aibridged daemon is initialized, so the runtime always reads providers from the database; the legacy `CODER_AIBRIDGE_*` environment variables become a one-shot migration source.
### Behavior
- Concurrent server starts are serialized through a Postgres advisory lock (`LockIDAIProvidersEnvSeed`).
- Missing rows are inserted with an audit entry attributed to the system actor.
- Existing rows whose canonical hash matches the env-derived hash are left alone (the common no-op restart path).
- Existing rows whose canonical hash does **not** match cause server startup to fail with a descriptive error so the operator can explicitly resolve the conflict in either env or DB.
- Soft-deleted rows are NOT resurrected from env; an explicit operator deletion is sticky across restarts.
- Indexed providers whose name conflicts with a legacy env var fail startup with a clear remediation message.
- Unknown provider types (e.g. `copilot`, until the DB enum is widened) are skipped with a log entry rather than failing startup.
### Canonical hashing
The `canonicalAIProvider` shape captures exactly the fields that determine runtime behavior — `type`, `base_url`, and the Bedrock subset of settings (access key, access key secret, region, model, small fast model) — and is hashed with SHA-256. The hash is **computed on demand from the row + env**, never persisted, so the database does not need a new column for it. API keys live in the separate `ai_provider_keys` table and are intentionally excluded from the hash so operators can rotate keys via the API without forcing a server restart.
<details>
<summary>Decision log</summary>
- The hash is intentionally not persisted in the database. The RFC discussed this trade-off; computing on demand keeps the schema minimal and lets the canonical shape evolve without a migration.
- The lock uses an `iota` slot in `coderd/database/lock.go` rather than `GenLockID` so it's stable, easy to audit, and matches the convention used for every other startup lock.
- A bearer-token Anthropic provider whose env vars also set Bedrock metadata but no AWS credentials does NOT store the Bedrock fields. Without credentials the discriminated settings would misrepresent the row as Bedrock auth.
- We deliberately do NOT publish to the `ai_providers_changed` pubsub channel from the seed because the seed completes before any subscriber is started; the follow-up PR introduces that channel.
</details>
Removes the coder_secret Terraform integration: the data.coder_secret
consumption path through provisionerdserver → provisioner.proto →
provisioner/terraform, the dynamic-parameter secret-requirement
validation, and the workspace-update / resolve-autostart surfaces that
depended on it. This is being done due to a product/feature direction
change (see PLAT-243). User-secret CRUD (DB, REST, CLI, UI, telemetry, audit)
and the agent-manifest secret-injection path are untouched.
The provisionerd API is bumped from v1.17 to v1.18 rather than rolled
back: v1.17 shipped in v2.33.x, so user_secrets field numbers are
reserved and the changelog documents both versions.
Generated with assistance from Coder Agents.
Fixes https://linear.app/codercom/issue/CODAGT-432
Adds structured search/filter capabilities to the `GET
/api/experimental/chats/` endpoint via the `q` query parameter. All
filters use explicit `key:value` syntax; bare terms are rejected to
reserve them for potential future full-text search.
> Generated by Coder Agents
Co-authored-by: Danielle Maywood <danielle@themaywoods.com>
Co-authored-by: Jaayden Halko <jaayden.halko@gmail.com>
> Mux updated this PR on behalf of Mike.
## Stack Context
This PR is the storage, permissions, API, and SDK layer for experimental
personal skills. #25362 has landed on `main`, so this branch is
restacked directly on `main`.
Stack order:
1. #25363 storage, permissions, API, and SDK
2. #25365 API test coverage
3. #25366 chattool and chatd integration
4. #25066 settings UI and docs
5. #25386 personal skills slash menu
## What?
Adds the `user_skills` database table, generated queries, RBAC resources
and scopes, audit resource handling, experimental user-scoped CRUD
endpoints, SDK types, and generated API/site types.
Follow-up review and restack fixes:
- Enforce a bounded personal skill description in parser and database
constraints.
- Return `403 Forbidden` for unauthorized create and update attempts.
- Return explicit conflict responses when soft-deleted users are
targeted.
- Keep user admins out of personal skills, while site owners can read
and delete but not create or update.
- Document trigger-raised constraint names and keep schema constants
covered by tests.
- Reuse `UserSkillMetadata` in the full `UserSkill` SDK response type.
- Generate user skill IDs in Go instead of relying on a database
default.
- Rebase on latest `main` and renumber the user skills migration to
`000502_user_skills`.
## Why?
Personal skills need durable user-owned storage with owner
authorization, limited site-owner moderation, and a hidden API surface
before chatd can consume them.
## Validation
- `make gen`
- `go test ./coderd/database -run '^TestUserSkillSchemaConstants$'
-count=1`
- `go test ./coderd/database/dbauthz -run
'^TestMethodTestSuite/TestUserSkills$' -count=1`
- `go test ./coderd -run '^TestPatchUserSkill$' -count=1`
- `go test ./codersdk ./coderd/database/db2sdk`
- `make lint`
- pre-commit hook on `97fd58108d`
Generating `coderd/database/dump.sql` previously required a
Docker-compatible socket via `ory/dockertest`. Contributors using
runtimes that don't expose one (e.g. Apple's `container` CLI) hit a
panic during `make gen`:
```
build: panic: open containerized database failed: open container: could not start resource: dial unix /var/run/docker.sock: connect: no such file or directory
```
Fall back to `fergusstrange/embedded-postgres` (already a direct module
dep, used by `scripts/develop/dbrecovery.go`) when
`dbtestutil.OpenContainerized` fails. The server's timezone is forced to
UTC so `timestamptz` DEFAULT expressions canonicalize identically to the
Docker-based path; otherwise the host's local TZ leaks into the dump as
values like `'0001-12-31 23:06:32+00 BC'`.
`PGDumpSchemaOnly` still needs `pg_dump` v13.x on PATH (the
embedded-postgres archive ships only `initdb`/`postgres`/`pg_ctl`). When
neither `pg_dump` nor `docker` is available, the existing error is
supplemented with install hints for `mise`, `brew`, and `apt`.
CI keeps using the Docker path unchanged; the fallback is local-dev-only
and produces a byte-identical `dump.sql`.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes
https://linear.app/codercom/issue/AIGOV-284/add-group-budgets-table-and-crud-api
## Summary
Adds the `group_ai_budgets` table and the following endpoints:
- `GET /api/v2/groups/{group}/ai/budget`
- `PUT /api/v2/groups/{group}/ai/budget`
- `DELETE /api/v2/groups/{group}/ai/budget`
Each group may have at most one budget row. If no row exists, no budget
is enforced.
### Feature gate
Added `RequireFeatureMW(FeatureAIBridge)` on the `/ai/budget` sub-route.
## RBAC
Authorization reuses `rbac.ResourceGroup` with the existing
`.InOrganization(...).WithID(...)` scoping model.
The `dbauthz` wrappers load the parent `groups` row and authorize
against it.
No new resource type is introduced. As a result, anyone with
`group:update` permissions (Owner, OrgAdmin, or UserAdmin within the
organization) can manage AI budgets for that group.
## Read access for group members
`database.Group.RBACObject()` grants `policy.ActionRead` to all members
of the group through the group ACL:
```go
func (g Group) RBACObject() rbac.Object {
return rbac.ResourceGroup.WithID(g.ID).
InOrg(g.OrganizationID).
// Group members can read the group.
WithGroupACL(map[string][]policy.Action{
g.ID.String(): {
policy.ActionRead,
},
})
}
```
Because the `GET` endpoint authorizes against the same loaded `Group`
object, any group member can call:
```text
GET /api/v2/groups/{group}/ai/budget
```
`PUT` and `DELETE` remain admin-only. The group ACL grants only
`ActionRead`, so write operations continue to require role-based
`group:update` permissions.
## Alternative considered
A dedicated `rbac.ResourceGroupAiBudget` resource would allow budget
management to be separated from general group administration.
We decided not to add that complexity for now.