Commit Graph

3962 Commits

Author SHA1 Message Date
Michael Suchacz 8d554f1330 fix: resolve chat goal rebase issues 2026-06-02 11:03:17 +00:00
Michael Suchacz dd6855a9be fix(coderd): restrict chat goal updates to owners 2026-06-02 10:26:46 +00:00
Michael Suchacz fb6a8c70d8 fix(coderd/x/chatd/chattool): clarify complete goal ID schema 2026-06-02 10:26:46 +00:00
Michael Suchacz 1e5f6feeba fix: renumber chat goal migration after rebase 2026-06-02 10:26:45 +00:00
Michael Suchacz b5b905694d refactor(coderd/x/chatd/chattool): simplify goal ID validation 2026-06-02 10:26:44 +00:00
Michael Suchacz 30929fe99c fix(coderd/x/chatd): record user goal completions explicitly 2026-06-02 10:26:42 +00:00
Michael Suchacz a848e12c74 fix(coderd/database): use monotonic chat goal ordering 2026-06-02 10:26:40 +00:00
Michael Suchacz 4063c205f2 fix(coderd/x/chatd/chattool): parse complete goal IDs from strings 2026-06-02 10:26:39 +00:00
Michael Suchacz 8eaa0cd616 test(coderd): expect chat goal marker queries 2026-06-02 10:26:38 +00:00
Michael Suchacz 6813ccd368 fix: renumber chat goal migration after rebase 2026-06-02 10:26:37 +00:00
Michael Suchacz c56016a77a feat: persist goal-origin chat message marker 2026-06-02 10:26:32 +00:00
Michael Suchacz f9132d8c9c fix(coderd/database): renumber chat goal migration 2026-06-02 10:26:30 +00:00
Michael Suchacz 97a4beb8bd test: update completed goal tool expectation 2026-06-02 10:26:29 +00:00
Michael Suchacz 4dbba926a8 fix: keep completed chat goals visible 2026-06-02 10:26:28 +00:00
Michael Suchacz dcb58f9a47 fix: address coder agents review feedback 2026-06-02 10:26:27 +00:00
Michael Suchacz df8b1e1b0a fix: harden chat goal lifecycle 2026-06-02 10:26:26 +00:00
Michael Suchacz 7c786f4197 fix: address chat goal review feedback 2026-06-02 10:23:45 +00:00
Michael Suchacz 402aa60b61 fix(coderd): share chat goal conversion 2026-06-02 10:23:44 +00:00
Michael Suchacz 19e4f5883e fix: handle chat goal completion and cache family updates 2026-06-02 10:23:02 +00:00
Michael Suchacz 44193ce354 chore(coderd/x/chatd): deslop chat goals 2026-06-02 10:22:58 +00:00
Michael Suchacz 5477790f5c feat(coderd): implement chat goals 2026-06-02 10:22:57 +00:00
Michael Suchacz 53411e8b0d feat: add chat goal persistence foundation 2026-06-02 10:22:56 +00:00
Michael Suchacz dd22086734 fix(coderd/x/chatd): preserve chat API key after compaction (#25930)
> Mux updated this PR on behalf of Mike.

AI Gateway chat retries after context compaction could lose active turn
API key routing metadata because the prompt query keeps the compressed
model-only summary but omits the original visible user turn.

Persist the active API key ID onto compaction summaries explicitly.
Model construction now uses one active-turn lookup helper for visible
user turns and compressed summary boundaries, so prompt model
construction can recover the key when no later visible user turn exists.
Added unit and DB-backed coverage for the compacted prompt path.
2026-06-02 12:19:06 +02:00
Paweł Banaszewski f22d4e2cbb feat: add ai_gateway_keys table and related RBAC (#25563)
Adds table to store keys that AI Gateway standalone replicas will use
to authenticate into Coderd.
Also adds RBAC and audit boilerplate.
2026-06-02 09:28:43 +02:00
Ethan d0fa9ff986 fix(coderd/x/chatd/chattool): retry workspace name conflicts (#25668)
Retry Coder Agents workspace creation once with a generated random
suffix when the requested workspace name already exists. This preserves
structured errors for other conflicts and avoids surfacing avoidable
name collisions.

Closes CODAGT-386
2026-06-01 13:31:25 +00:00
Danny Kopping 85f56e4944 fix: recreate ai_provider_type instead of ADD VALUE (#25895)
Coder runs all migrations in a single transaction (`pgTxnDriver`).
Postgres forbids using an enum value added by `ALTER TYPE ... ADD VALUE`
within the same transaction that added it. Migration `000499` widened
`ai_provider_type` with `ADD VALUE`, and `000504` casts existing
`chat_providers` rows to that enum in the same transaction. On
deployments with a legacy provider using one of the new values (for
example `openai-compat`), the batch failed with `unsafe use of new
value` and the server could not start.

Recreate the type (create a new enum, alter the column, drop and rename)
instead of using `ADD VALUE`, matching the existing precedent in
`000144_user_status_dormant`. A freshly created enum's values are usable
immediately in the same transaction, so the cast in `000504` succeeds.
The resulting schema is identical, so `make gen` produces no `dump.sql`
diff and databases that already applied these migrations see no drift.

Added a regression test that seeds an `openai-compat` provider and
applies `000499` through `000504` in a single transaction, reproducing
the production path. The per-step `Stepper` used by the other migration
tests commits each migration separately and cannot surface this class of
bug.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Signed-off-by: Danny Kopping <danny@coder.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 13:30:45 +00:00
Danny Kopping a85462bd49 feat: support adding GitHub Copilot AI provider via UI (#25888)
Copilot is the only AI provider type that could not be added through the `/ai/settings` UI. The aibridge runtime and the env-var seeding path already supported it, but the runtime CRUD API rejected `type=copilot` and the UI omitted it entirely. The root cause is that Copilot's auth model (a per-request GitHub OAuth token, with no pre-shared key) does not fit the credential-centric add-provider flow that every other provider uses.

## Backend

Allow `type=copilot` in `CreateAIProviderRequest.Validate()`, and reject `api_keys` for Copilot on both create (validation) and update (handler sentinel), mirroring the existing Bedrock guards. Copilot carries no stored credential.

## Frontend

Add Copilot to the provider type picker (with the `github-copilot.svg` icon) and give the form a credential-free branch: name, display name, and a free-text endpoint defaulting to `https://api.business.githubcopilot.com`, with copy explaining that authentication happens via the user's GitHub token at request time. Copilot maps to the distinct `copilot` wire type rather than collapsing to `openai`, and the edit flow recovers it correctly.

The endpoint stays required with a business-tier default; users on the individual or enterprise endpoints edit the field.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-06-01 15:26:37 +02:00
Mathias Fredriksson 82752844bc fix: isolate MCP HTTP transports from DefaultTransport in tests (#25821)
Use testing.Testing() inside createTransport to automatically
clone http.DefaultTransport when running in tests. In production,
DefaultTransport is used as-is (efficient connection pooling).

This fixes the CloseIdleConnections flake class: httptest.Server.Close()
calls http.DefaultTransport.CloseIdleConnections(), which disrupts
any MCP client sharing that transport. The testing.Testing() check
means every MCP transport created during tests gets isolation
automatically, with no caller changes needed.

Closes coder/internal#1016
Closes PLAT-291
2026-06-01 16:17:29 +03:00
Mathias Fredriksson 8b7e040105 fix(coderd/x/chatd/chatloop): discourage doctrine in compaction summaries (#25850)
Two additions to the compaction summary prompt:

1. Error specificity: the "errors encountered" bullet now instructs the
   model to keep error notes specific (name the file, the error, the
   fix) and not generalize from a specific failure to a blanket
   tool-avoidance rule. This addresses the doctrine crystallization
   pattern where a single tool failure gets promoted to a standing
   "avoid tool X" rule that persists across compactions and model swaps.

2. Reproducibility: a new closing sentence instructs the model to
   reference reproducible content by path, command, or URL rather than
   inlining it. Content without a stable reproducer is still preserved
   inline with a brief summary. This targets summary bloat from
   inlined code blocks (worst case: 34k chars, 76 code blocks
   reproducing repo content verbatim).

Refs CODAGT-331
2026-06-01 12:42:09 +03:00
dylanhuff-at-coder 0401ed3af5 fix(coderd/notifications): serialize pending updates gauge writes (#25495)
Fixes a race where concurrent notification dispatch goroutines could
overwrite `coderd_notifications_pending_updates` with an older
buffer-length snapshot. Pending update snapshots now serialize count
evaluation with the gauge write, and inhibited dispatch results refresh
the metric when buffered.
2026-05-29 11:02:13 -07:00
Jon Ayers 5cdc9e28a9 feat: add nats cluster peer support (#25632) 2026-05-29 11:35:59 -05:00
Mathias Fredriksson 98d5e7948d fix(coderd/autobuild): handle concurrent build number race in lifecycle executor (#25824)
The lifecycle executor did not handle unique-violation errors from
InsertWorkspaceBuild. When a concurrent actor (API handler, another
lifecycle executor, or prebuilds reconciler) inserts a workspace build
with the same build number, PostgreSQL returns a unique constraint
violation on workspace_builds_workspace_id_build_number_key. The
lifecycle executor treated this as a hard error, logging it and storing
it in stats.Errors.

The per-workspace advisory lock (pg_try_advisory_xact_lock) prevents
two lifecycle executors from racing, but does not protect against
races with the CreateWorkspaceBuild API handler or the prebuilds
reconciler, which use different (or no) locking.

Catch the specific unique-violation error after InTx returns (where
the transaction is already rolled back) and clear it. The concurrent
actor's build takes effect; the lifecycle executor treats the
workspace as a no-op for this tick.

Closes coder/internal#455
Closes PLAT-290
2026-05-29 17:12:31 +03:00
Yevhenii Shcherbina 1a91d31793 feat: add user AI budget override endpoints (#25439)
Implements https://linear.app/codercom/issue/AIGOV-285
Follow the structure established in
https://github.com/coder/coder/pull/25203

## Summary

Adds the `user_ai_budget_overrides` table and CRUD API at
`/api/v2/users/{user}/ai/budget`. An override sets a custom per-user
spend cap that supersedes group-budget resolution, attributing spend to
a specific group.

## Schema

```sql
CREATE TABLE user_ai_budget_overrides (
    user_id            UUID        PRIMARY KEY REFERENCES users(id) ON DELETE CASCADE,
    group_id           UUID        NOT NULL REFERENCES groups(id) ON DELETE CASCADE,
    spend_limit_micros BIGINT      NOT NULL CHECK (spend_limit_micros >= 0),
    created_at         TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at         TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
```

## Membership lifecycle

The membership invariant — a user must be a member of the attributed
group, including when that group is "Everyone" — would naturally be
expressed as a composite FK on `(user_id, group_id) →
group_members_expanded(user_id, group_id)`. PostgreSQL doesn't allow
foreign keys to reference views, so enforcement is split across two
mechanisms:

- **Write-time check.** A CHECK constraint on the table
(`user_ai_budget_overrides_must_be_group_member`) calls a `STABLE`
function `is_group_member(user_id, group_id)` that queries
`group_members_expanded`. The view surfaces both regular group
memberships and the implicit "Everyone" group memberships from
`organization_members`. Any INSERT or UPDATE that violates the predicate
is rejected with a Postgres `check_violation`, which the handler maps to
a 400. `is_group_member` is defined as a general predicate, reusable by
any future table that needs the same check.

- **Cascade on removal.** Two `BEFORE DELETE` triggers handle membership
loss:
- `trigger_delete_user_ai_budget_overrides_on_group_member_delete` on
`group_members` — covers regular group removals (admin action, OIDC
sync).
- `trigger_delete_user_ai_budget_overrides_on_org_member_delete` on
`organization_members` — covers the "Everyone" group, whose membership
lives in `organization_members`.

The single-column FKs on `users(id)` and `groups(id)` remain to cascade
on user or group deletion (those paths don't pass through
`group_members`).

## Authorization

The dbauthz layer gates each operation against the `User` and (for
writes) `Group` resources:

| Operation | User resource  | Group resource |
|-----------|----------------|----------------|
| `GET`     | `ActionRead`   | —              |
| `PUT`     | `ActionUpdate` | `ActionUpdate` |
| `DELETE`  | `ActionUpdate` | `ActionUpdate` |

For `DELETE`, the dbauthz layer fetches the existing override first to
learn the attributed `group_id`, then runs both checks.

### Role matrix

| Role         | GET | PUT | DELETE |
|--------------|-----|-----|--------|
| Owner        |    |    |       |
| UserAdmin    |    |    |       |
| OrgAdmin     |    |    |       |
| OrgUserAdmin |    |    |       |

Internal discussion:
https://codercom.slack.com/archives/C096PFVBZKN/p1779392747885359

## Audit logs
Audit logs will be addressed in a follow-up PR.
2026-05-29 10:08:25 -04:00
Danny Kopping 110210d7c9 fix(coderd): block ai provider env key drift (#25849)
Previously, `SeedAIProvidersFromEnv` only hashed provider-level fields,
so env var key changes were silently ignored once a provider already
existed in the database.

Include bearer keys and Bedrock credentials in the canonical drift hash,
and cover multi-key, multi-provider cases so restarts now fail loudly
when the configured credentials no longer match what is stored.

When changing a key, you'll now see this in the server startup logs:

```
2026-05-29 12:29:02.674 [info]  api: Encountered an error running "coder server", see "coder server --help" for more information
2026-05-29 12:29:02.674 [info]  api: error: create coder API:
2026-05-29 12:29:02.674 [info]  api: github.com/coder/coder/v2/cli.(*RootCmd).Server.func2
2026-05-29 12:29:02.674 [info]  api: /home/coder/coder/cli/server.go:1015
2026-05-29 12:29:02.674 [info]  api: - seed ai providers from env:
2026-05-29 12:29:02.674 [info]  api: github.com/coder/coder/v2/enterprise/cli.(*RootCmd).Server.func1
2026-05-29 12:29:02.674 [info]  api: /home/coder/coder/enterprise/cli/server.go:187
2026-05-29 12:29:02.674 [info]  api: - execute transaction:
2026-05-29 12:29:02.674 [info]  api: github.com/coder/coder/v2/coderd/database.(*sqlQuerier).runTx
2026-05-29 12:29:02.674 [info]  api: /home/coder/coder/coderd/database/db.go:212
---> 2026-05-29 12:29:02.674 [info]  api: - AI provider "vercel" already exists in the database and differs from the current environment configuration; update the provider through the API or remove the CODER_AIBRIDGE_* env vars to stop seeding it:
2026-05-29 12:29:02.674 [info]  api: github.com/coder/coder/v2/coderd.SeedAIProvidersFromEnv.func1
2026-05-29 12:29:02.674 [info]  api: /home/coder/coder/coderd/ai_providers_migrate.go:139
2026-05-29 12:29:02.674 [info]  api: slogjson: failed to write entry: io: read/write on closed pipe
2026-05-29 12:29:02.700 [info]  dlv: Stop reason: exited
2026-05-29 12:29:02.825 [info]  site:  ELIFECYCLE  Command failed.
error: running command "develop": server did not become ready in 1m0s:
    main.waitForHealthy
        /home/coder/coder/scripts/develop/main.go:877
  - context canceled
```

_This PR was generated with Coder Agents._
2026-05-29 13:14:55 +00:00
Cian Johnston d0a51da0a9 feat: classify provider_disabled 503 as non-retryable (#25800)
Builds on top of https://github.com/coder/coder/pull/25794

Adds a new `provider_disabled` error classification in `chatd` with the
corresponding plumbing to classify it as non-retryable. Also adds a
story for how this particular error kind is displayed in the UI.
2026-05-29 13:14:04 +01:00
Susana Ferreira 7b903cad73 fix: track credential hint across key failover attempts in aibridge (#25735)
## Problem

Centralized requests recorded *the first available key from the pool at
`CreateInterceptor` time* as `credential_hint`, so the interception
could be persisted in the database with a hint that didn't match the key
that actually served the request. The fix consists in storing, at
end-of-interception, the hint of the key that succeeded, or the last
attempted key if all keys are unavailable.

## Changes

- Add `Key.Hint()` and update `credential_hint` on every failover
attempt so it reflects the actually-used key.
- Stop pre-populating `credential_hint` at `CreateInterceptor`.
Centralized starts empty and is updated by the key failover loop.
- Persist the final hint via `RecordInterceptionEnded`; SQL updates
`credential_hint` only when `credential_kind = 'centralized'` so BYOK
keeps its start-time value.
- Log the actually-used hint on interception end/failure; start log uses
a `<keypool-pending>` placeholder for centralized.

> [!NOTE]
> Initially generated by Claude Opus 4.7, modified and reviewed by
@ssncferreira
2026-05-29 12:01:37 +01:00
Sas Swart a586b7e5e0 feat: add boundary_log rbac resource (#24810)
RFC: [Bridge ↔ Boundaries Correlation
RFC](https://www.notion.so/coderhq/Gateway-and-Firewall-Correlation-RFC-31ad579be592803aa8b3d48348ccdde9)

Register a dedicated `boundary_log` RBAC resource type with `create`,
`read`, and `delete` actions, replacing the placeholder
`rbac.ResourceAuditLog` and `rbac.ResourceSystem` references previously
used in the dbauthz layer.

Create is granted at user-level so workspace agents can only write logs
owned by their workspace owner, preventing cross-workspace log
fabrication. Delete is restricted to `DBPurge` only; no human role
(including owner) can delete boundary logs.

| Subject | Create (own) | Create (other) | Read (all) | Delete |
|---|---|---|---|---|
| Workspace agent | yes | no | no | no |
| Owner (site admin) | yes (via member) | no | yes | no |
| Auditor | no | no | yes | no |
| DBPurge | no | no | no | yes |

### Changes

- **RBAC policy & resource definition**: add `boundary_log` to
`policy.go` and generate `ResourceBoundaryLog` object, scope constants,
and codersdk/TypeScript types.
- **dbauthz authorization**: replace all
`ResourceAuditLog`/`ResourceSystem` placeholders with
`ResourceBoundaryLog`. `InsertBoundaryLog` and `InsertBoundarySession`
derive the workspace owner from the agent and authorize with
`.WithOwner()` for user-scoped create.
- **Role assignments:**
- **Owner (site):** read only. Excluded from `allPermsExcept` wildcard;
create is inherited from member at user-level.
- **Member (user-level):** create. User-scoped so agents can only write
logs they own.
  - **Auditor (site):** read.
- `boundary_log` is excluded from org-admin, org-member, and
org-service-account `allPermsExcept` calls for consistency with
`ResourceBoundaryUsage`.
- **System subjects:**
- **DB Purge** (`SubjectTypeDBPurge`): delete. The only subject that can
remove boundary logs.
- **Workspace agent scope**: `ResourceBoundaryLog` with wildcard ID in
the agent scope allow-list (necessary for creation since no pre-existing
ID exists). User-level role scoping prevents deployment-wide access.
- **DB migration** (`000510_boundary_log_scopes`): add `boundary_log:*`,
`boundary_log:create`, `boundary_log:delete`, `boundary_log:read` enum
values to `api_key_scope`.
- **Test coverage**: `BoundaryLogCreate` (user-scoped, only matching
owner succeeds), `BoundaryLogDelete` (all human roles denied),
`BoundaryLogRead` (owner + auditor). dbauthz mock tests set up workspace
agent lookups for owner derivation.
- **Generated docs**: update OpenAPI specs, API reference docs, and
frontend type definitions.

---------

Co-authored-by: Muhammad Danish <mdanishkhdev@gmail.com>
Co-authored-by: Coder Agents <coder-agents-review[bot]@users.noreply.github.com>
2026-05-29 12:50:39 +02:00
Danny Kopping 5b10268827 feat: serve 503 sentinel for disabled providers (#25794)
_Disclosure: created with Coder Agents._

When providers are disabled, we should serve a sentinel error so the
requesting client (Claude Code, Coder Agents, etc) is informed. Coder
Agents can also conditionalize its display to show a helpful error
message.

---------

Signed-off-by: Danny Kopping <danny@coder.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 10:24:16 +02:00
Ethan eb2c2799ca fix: strip deleted MCP IDs from chats on delete (#25763)
Adds a database migration that reconciles existing stale chat MCP server
IDs, then installs a `BEFORE DELETE` trigger on `mcp_server_configs` to
remove the deleted ID from `chats.mcp_server_ids`. This keeps chat
continuation from failing with `400 One or more MCP server IDs are
invalid` after an MCP server config is deleted.

This matches the existing repo precedent in
`coderd/database/migrations/000241_delete_user_roles.up.sql`, where
deleting a custom role cleans `organization_members.roles`, a similarly
structured array of references that cannot be protected by a normal
foreign key.

Closes CODAGT-505
2026-05-29 16:49:25 +10:00
Jon Ayers bb11946bd4 fix: require update permission to recreate devcontainers (#25812)
- The httpmw upstream from this endpoint only checks for read perms to the
workspace agent. Recreating a dev container should require `update`
perms since it mutates state. This also matches the behavior of the
`DELETE` endpoint
2026-05-28 15:34:36 -05:00
Cian Johnston 7ea0eff94e fix: improve chat audit log descriptions and diff rendering (#25728)
Chat ACL audit diffs rendered as `[object Object]` because the diff
viewer called `.toString()` on object values. Common chat operations
(archive, share) showed generic "updated chat" descriptions instead of
semantic ones.

Add `chatAuditLogDescription` to derive semantic descriptions from the
audit diff for successful chat writes: "archived/unarchived chat" for
archive toggles, "updated sharing for chat" for ACL-only changes.
Extract diff value formatting into `formatAuditDiffValue`, which renders
object values as deterministic compact JSON with sorted keys, fixing the
`[object Object]` rendering for chat ACLs and any other object-valued
fields. The previous `determineIdPSyncMappingDiff` workaround for IdP
sync mappings was removed because the generic formatting handles it.

Closes CODAGT-513

> Generated by Coder Agents on behalf of @johnstcn
2026-05-28 18:37:57 +01:00
Danielle Maywood 0d1340a430 fix: collapse agent command output by default (#25748) 2026-05-28 16:54:52 +01:00
Steven Masley 4591212482 feat: implement SCIM handler for SCIM 2.0 compliance (#25572)
Rewrites the SCIM 2.0 user provisioning handler to be RFC 7644
compliant. Verified against an external IdP Okta.

Behavior is OPT IN
2026-05-28 10:00:37 -05:00
Cian Johnston 6df1536256 fix: add missing_key error kind for missing chat api_key_id (#25783)
Refs CODAGT-486

- `codersdk/chats.go`: New `ChatErrorKindMissingKey` constant and
`AllChatErrorKinds` entry
- `coderd/x/chatd/chaterror/message.go`: `terminalMessage` and
`retryMessage` cases
- `coderd/x/chatd/model_routing_aibridge.go`: Pre-classify error with
`WithClassification`
- `coderd/x/chatd/model_routing_internal_test.go`: Classification
assertion on production path (CRF-2)
- `chatStatusHelpers.ts`: Frontend title "Chat interrupted"
- `LiveStreamTail.stories.tsx`: Storybook story with `detail` assertion
- `docs/ai-coder/ai-gateway/clients/coder-agents.md`: Troubleshooting
entry
- Tests: classification round-trip, terminal message, metrics kind
enumeration

> Generated with [Coder Agents](https://coder.com/agents) on behalf of
@johnstcn
2026-05-28 15:50:52 +01:00
Danny Kopping 12520ee964 feat: add ai provider status and reload freshness metrics (#25770)
Add metrics for `aibridged` and `aibridgeproxyd`'s provider statuses. AI providers can be modified, and possibly misconfigured, at runtime. These metrics help operators understand the state of these provider definitions in case unexpected behaviour is observed.
2026-05-28 14:57:33 +02:00
Ethan 7e2f7198dd fix(coderd/x/chatd/chatloop): use stream silence timeout (#25782)
Replaces the 60 second first-token timeout in the chat loop with a 10
minute stream-silence timeout.

Previously, the guard bounded only the gap before the first stream part.
Once any part arrived the attempt could hang indefinitely if the
provider stopped streaming without closing the connection, and even
normal long-running responses could be killed after 60 seconds if the
provider was slow to emit the first token.

The guard now arms when a model attempt opens its stream, resets on
every received stream part, and fires after 10 minutes of complete
silence. The existing retry path still handles the timeout, and the
public `startup_timeout` error kind is preserved to avoid API and
frontend churn.

10 minutes matches the default request timeout used by the Anthropic and
OpenAI Python SDKs.


Closes CODAGT-493
2026-05-28 21:02:40 +10:00
Michael Suchacz f529577bee fix(coderd/x/chatd): harden openai-compatible chat calls (#25737)
OpenAI-compatible chat paths hit two provider compatibility issues. Some
compatible endpoints reject a named `tool_choice` when there is only one
tool, and Gemini's OpenAI-compatible endpoint requires thought
signatures on current-turn tool calls.

Centralize OpenAI-compatible request patches in the chat provider:
rewrite single named tool choices to `"required"`, and add the
documented dummy Google thought signature to the first tool call in each
current-turn tool step for Gemini routes. Vercel OpenAI-compatible
requests are left unchanged for the thought-signature patch.

> Mux created this PR on behalf of Mike.
2026-05-28 10:27:32 +02:00
Garrett Delfosse a2e1ddb56f fix: validate FileSize in NewDataBuilder to prevent OOM DoS (#25710)
`NewDataBuilder` allocated `make([]byte, 0, req.FileSize)` using the
client-supplied `int64` with no upper-bound check. The DRPC 4 MiB wire
cap limits message size but not the integer value, so a crafted message
with `FileSize = 1<<40` forces a 1 TiB allocation, triggering an
unrecoverable `runtime.throw` that kills the entire `coderd` process.

Add a `MaxFileSize` constant (100 MiB, matching `HTTPFileMaxBytes` in
`coderd/files.go`) and reject negative or oversized `FileSize`, plus
negative or excessive `Chunks`, before the allocation.
`BytesToDataUpload` also returns an error for oversized data to preserve
the encode/decode round-trip contract. Fix a pre-existing reversed
subtraction in the `Add()` overflow error message.

Closes https://linear.app/codercom/issue/PLAT-231

<details>
<summary>Implementation details</summary>

- `provisionersdk/proto/dataupload.go`: New exported `MaxFileSize`
constant; validation in `NewDataBuilder` and `BytesToDataUpload`. Fixed
reversed subtraction in `Add()` error.
- `provisionersdk/proto/dataupload_test.go`: New
`TestNewDataBuilderValidation` with 7 subtests.
- Updated all 5 callers of `BytesToDataUpload` for new error return.
- Audited all `make([]byte, ...)` in provisioner paths; no other
client-supplied sizes.

</details>

> Generated by Coder Agents on behalf of @f0ssel
2026-05-27 14:30:11 -04:00
Jon Ayers f6f284ea51 feat: add initial NATS implementation (#25602) 2026-05-27 12:57:20 -05:00
Cian Johnston b278be7361 fix(coderd): enforce api_key_id on user messages at type level (#25729)
- Empty string is valid for `apiKeyID` in paths that genuinely lack a
caller key (e.g. agent-initiated context injection in
`workspaceAgentAddChatContext`). AI Gateway fail-closed check remains
the runtime safety net.
- Context injection paths (`persistInstructionFiles`, compaction) read
the key from `aibridge.DelegatedAPIKeyIDFromContext(ctx)`, set upstream
by `contextWithActiveTurnAPIKeyID`.
- Subagent context copy branches on `copiedRole ==
database.ChatMessageRoleUser` to choose the right append function.

> Generated by Coder Agents
2026-05-27 17:00:23 +01:00