coder

mirror of https://github.com/coder/coder.git synced 2026-06-06 22:48:19 +00:00

Author	SHA1	Message	Date
Danny Kopping	282ab7de34	refactor: load AI providers from the database at startup (#25672 ) Replace the env-based `BuildProviders` with a DB-backed loader. The database is now the single source of truth for runtime provider configuration; env config arrives via `SeedAIProvidersFromEnv` (run at boot) and `BuildProviders` reads it back as `aibridge.Provider` instances. `cli/server.go` and `enterprise/cli/server.go` both call the same path, so aibridged and aibridgeproxyd see the same provider set. Per-provider `DumpDir` is replaced by a top-level `CODER_AI_GATEWAY_DUMP_DIR` base; each provider's effective dump path is `<base>/<provider name>`.	2026-05-26 15:57:01 +02:00
Danny Kopping	c801dcbbc8	fix: strip route prefix when passing request to aibridged handler (#25671 ) We weren't stripping the API base (`/api/v2/aibridge`), leading to 404s when using the in-memory transport. Signed-off-by: Danny Kopping <danny@coder.com>	2026-05-26 08:04:26 +00:00
Danny Kopping	4ddda3a9db	feat: filter interceptions and sessions by provider name (#25640 ) Allows filtering sessions & interceptions by provider name, and adds a test to vaidate that provider name is immutable (at least until #25606 lands).	2026-05-25 16:31:48 +02:00
Danny Kopping	ef6ee2af68	chore: tolerate empty providers at startup and log env seeds (#25605 ) Since AI Gateway is now enabled by default, and if the AI Gateway Proxy is enabled too it's possible the server can start without any configured providers. This would previously block startup, which is unacceptable. In an upstack PR we will handle reloading the providers at runtime, so the server needs to be able to start up even if it can't handle any proxy requests to AI Gateway. This change was necessitated because if there are providers configured in the environment they need to be seeded _before_ the proxy starts.	2026-05-22 12:45:14 +02:00
Michael Suchacz	ca1f6b19a2	feat: remove legacy chat provider tables (#25416 )	2026-05-22 09:50:01 +02:00
Danny Kopping	ddec110b0e	refactor: move aibridged out of enterprise to AGPL (#25570 ) In order to allow Coder Agents to use AI Gateway in OSS, we need to rehome the `aibridged`\-related code into the AGPL path. The HTTP API is only registered under enterprise so will still require the AI Governance Add-on to be present in order to use it, whereas Coder Agents uses an in-memory pipe to the same handlers.	2026-05-22 09:11:37 +02:00
Danny Kopping	c50b0e84b9	feat!: default `CODER_AI_GATEWAY_ENABLED` to true (#25575 ) `CODER_AI_GATEWAY_ENABLED` / `CODER_AIBRIDGE_ENABLED` is now being defaulted to `true` now that it will be used by Coder Agents. If you previously had this value disabled explicitly, that value will persist.	2026-05-22 08:57:36 +02:00
Danny Kopping	9341efec9f	feat!: seed ai_providers from env on server startup (#24895 ) _Disclaimer: implemented by a Coder Agent using Claude Opus 4.7_ Part of the implementation of [RFC: Common AI Provider Configs](https://www.notion.so/coderhq/RFC-Common-AI-Provider-Configs-34bd579be59280ed958feffb82024797) (AIGOV-201). ## Note This change can cause a previously working installation to fail to start should a conflict exist between the providers configured in the environment & those now migrated to the database. I'll raise a PR upstack to document this process and workarounds should a startup fail. ## What this PR does Reconciles environment-derived AI provider configuration with the `ai_providers` table at server startup. The seed runs before the aibridged daemon is initialized, so the runtime always reads providers from the database; the legacy `CODER_AIBRIDGE_` environment variables become a one-shot migration source. ### Behavior - Concurrent server starts are serialized through a Postgres advisory lock (`LockIDAIProvidersEnvSeed`). - Missing rows are inserted with an audit entry attributed to the system actor. - Existing rows whose canonical hash matches the env-derived hash are left alone (the common no-op restart path). - Existing rows whose canonical hash does not* match cause server startup to fail with a descriptive error so the operator can explicitly resolve the conflict in either env or DB. - Soft-deleted rows are NOT resurrected from env; an explicit operator deletion is sticky across restarts. - Indexed providers whose name conflicts with a legacy env var fail startup with a clear remediation message. - Unknown provider types (e.g. `copilot`, until the DB enum is widened) are skipped with a log entry rather than failing startup. ### Canonical hashing The `canonicalAIProvider` shape captures exactly the fields that determine runtime behavior — `type`, `base_url`, and the Bedrock subset of settings (access key, access key secret, region, model, small fast model) — and is hashed with SHA-256. The hash is computed on demand from the row + env, never persisted, so the database does not need a new column for it. API keys live in the separate `ai_provider_keys` table and are intentionally excluded from the hash so operators can rotate keys via the API without forcing a server restart. <details> <summary>Decision log</summary> - The hash is intentionally not persisted in the database. The RFC discussed this trade-off; computing on demand keeps the schema minimal and lets the canonical shape evolve without a migration. - The lock uses an `iota` slot in `coderd/database/lock.go` rather than `GenLockID` so it's stable, easy to audit, and matches the convention used for every other startup lock. - A bearer-token Anthropic provider whose env vars also set Bedrock metadata but no AWS credentials does NOT store the Bedrock fields. Without credentials the discriminated settings would misrepresent the row as Bedrock auth. - We deliberately do NOT publish to the `ai_providers_changed` pubsub channel from the seed because the seed completes before any subscriber is started; the follow-up PR introduces that channel. </details>	2026-05-22 08:37:27 +02:00
Michael Suchacz	06526a5822	feat: use AI provider chat APIs (#25415 )	2026-05-22 07:53:23 +02:00
Michael Suchacz	40878eeba4	feat: add AI provider schema expansion (#25412 )	2026-05-22 02:16:01 +02:00
Jon Ayers	269bd0cb8d	fix: skip no-op peer updates in pgcoord binder (#24226 )	2026-05-21 17:59:12 -05:00
Paweł Banaszewski	46e93e6325	chore: add ai_gateway options that alias aibridge options (#25061 ) Adds options matching new AI Gateway naming. New options are added as alias for old options. Old options are still working. Old options have deprecated message. No conflict detection was added. Updated documentation so it mentions only new options. Added note about old options still working. > Various AI tools where used to create this PR	2026-05-21 11:14:11 +02:00
Steven Masley	9b6eadab77	fix: drop N+1 db query on template ACL available (#25465 ) Fixes [PLAT-149](https://linear.app/codercom/issue/PLAT-149/template-permissions-search-is-extremely-slow-with-many-groups). `/acl/available` ran a db query per group. A deployment with >5,000 groups made this route extremely slow.	2026-05-20 22:40:50 +00:00
Spike Curtis	8dc4d76890	chore: add agent-connection-watch for workspaces (#24507 ) <!-- If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting. --> relates to GRU-18 Adds basic implementation for Workspace Agent Connection Watch and tests. Missing are handling of logs.	2026-05-20 13:09:11 -04:00
Danny Kopping	44b1edd4da	fix: unify key-ops audit shape and surface per-key detail (#25534 ) Adding missed commit from https://github.com/coder/coder/pull/25484 This formats the audit logs correctly ![image.png](https://app.graphite.com/user-attachments/assets/598d018b-cdf5-4a2c-8321-24ba2c650a1a.png) <!-- If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting. -->	2026-05-20 17:33:26 +02:00
Michael Suchacz	e105e3af45	test: cover personal skill API (#25365 ) > Mux updated this PR on behalf of Mike. ## Stack Context This PR is the API test coverage slice in the experimental personal skills stack. The storage, schema, permissions, API, and SDK implementation merged in #25363. Stack order: 1. #25362 personal skill resolver 2. #25363 storage, permissions, API, and SDK 3. #25365 API test coverage 4. #25366 chattool and chatd integration 5. #25066 settings UI and docs 6. #25386 personal skills slash menu ## What? Adds API and audit tests for personal skill CRUD, validation failures, limits, authorization, soft-delete cleanup, and audit content tracking. This PR is now test-only. It does not include migrations, generated database code, or API implementation changes. ## Why? The feature touches storage, permissions, and audit behavior. These tests make the server behavior reviewable and protected without re-reviewing the implementation that already merged in #25363. ## Validation - `go test ./coderd -run '^(TestUserSkill\|TestPatchUserSkill)' -count=1` - `go test ./enterprise/coderd -run '^TestUserSkillAuditDiffTracksContent$' -count=1` - pre-commit hook via `gt modify --no-edit`	2026-05-20 11:27:09 +02:00
Danny Kopping	dd3223451b	feat: add AI providers HTTP CRUD handlers (#24894 )	2026-05-20 10:21:36 +02:00
Michael Suchacz	5a8d0016a5	feat: add personal skill storage, API, and SDK (#25363 ) > Mux updated this PR on behalf of Mike. ## Stack Context This PR is the storage, permissions, API, and SDK layer for experimental personal skills. #25362 has landed on `main`, so this branch is restacked directly on `main`. Stack order: 1. #25363 storage, permissions, API, and SDK 2. #25365 API test coverage 3. #25366 chattool and chatd integration 4. #25066 settings UI and docs 5. #25386 personal skills slash menu ## What? Adds the `user_skills` database table, generated queries, RBAC resources and scopes, audit resource handling, experimental user-scoped CRUD endpoints, SDK types, and generated API/site types. Follow-up review and restack fixes: - Enforce a bounded personal skill description in parser and database constraints. - Return `403 Forbidden` for unauthorized create and update attempts. - Return explicit conflict responses when soft-deleted users are targeted. - Keep user admins out of personal skills, while site owners can read and delete but not create or update. - Document trigger-raised constraint names and keep schema constants covered by tests. - Reuse `UserSkillMetadata` in the full `UserSkill` SDK response type. - Generate user skill IDs in Go instead of relying on a database default. - Rebase on latest `main` and renumber the user skills migration to `000502_user_skills`. ## Why? Personal skills need durable user-owned storage with owner authorization, limited site-owner moderation, and a hidden API surface before chatd can consume them. ## Validation - `make gen` - `go test ./coderd/database -run '^TestUserSkillSchemaConstants$' -count=1` - `go test ./coderd/database/dbauthz -run '^TestMethodTestSuite/TestUserSkills$' -count=1` - `go test ./coderd -run '^TestPatchUserSkill$' -count=1` - `go test ./codersdk ./coderd/database/db2sdk` - `make lint` - pre-commit hook on `97fd58108d`	2026-05-20 00:09:09 +02:00
Steven Masley	51b531f5b3	chore: 'go generate' mockgen to use `go tool` wrapper (#25490 ) Calling `mockgen` relies on the executable in the `$PATH`. Using `go tool` uses the one defined in `go.mod`	2026-05-19 14:53:13 +00:00
Danielle Maywood	170a6e1fe9	feat: add chat sharing foundation (#25041 )	2026-05-18 22:32:05 +01:00
Yevhenii Shcherbina	2732378da2	feat: audit group AI budget mutations (#25374 ) Relates to https://linear.app/codercom/issue/AIGOV-284/add-group-budgets-table-and-crud-api Adds audit-log support for `group_ai_budget` mutations. Without it, an admin could silently lower a spend limit from `$500` to `$50` or delete a budget entirely, with no record of who performed the action. Both write (`create-or-update`) and delete actions now produce audit log entries, including before/after diffs for `spend_limit_micros`. Depends on #25203. ## Old Version <img width="1340" height="456" alt="image" src="https://github.com/user-attachments/assets/e9ff52fb-a905-4aef-a4ee-7cdc58e68b75" /> ## New Version (see https://github.com/coder/coder/pull/25374/changes/9d22833de87cc106c24142c1d471a3f71872bf67) <img width="1347" height="496" alt="image" src="https://github.com/user-attachments/assets/1b9bbfa1-f86d-48e3-a0b1-266eb76f851f" />	2026-05-18 15:17:20 -04:00
Marcin Tojek	38772bdb7c	refactor: remove cache tokens from ExtraTokenTypes (#25118 ) Fixes: https://github.com/coder/aibridge/issues/243 > Generated with [Coder Agents](https://coder.com/agents)	2026-05-18 11:30:19 +02:00
Danny Kopping	b7a282d544	fix(enterprise/coderd/prebuilds): downgrade reconciliation stats log to debug (#25340 ) Disclaimer: implemented by a Coder Agent using Claude Opus 4.6 The `reconciliation stats` log line runs on every reconciliation tick (every 5 minutes by default), even when there are no presets to reconcile. In a steady-state installation without prebuilds activity, this is the only log line that persistently shows up at info level. Demote it to `debug` so the steady-state log output stays quiet. Before: ``` 2026-05-14 15:01:25.085 [info] coderd.prebuilds: reconciliation stats elapsed=1.649153ms presets_total=0 presets_reconciled=0 ``` After: same line is emitted at `debug` level and is suppressed at the default info log level.	2026-05-18 10:47:41 +02:00
Callum Styan	191dd230ae	feat: add agentfake scaletest subcommand (#25072 ) This PR builds on top of https://github.com/coder/coder/pull/25070 to add a way of running the larger "fake agent" manager via the existing CLI, pulling in the URL/credentials already set. With this, we can run a pod per scaletest region to act as all the workspaces in that region. This is in a new subcommand `scaletest agentfake` currently. --------- Signed-off-by: Callum Styan <callumstyan@gmail.com>	2026-05-15 14:36:54 -07:00
Danny Kopping	985ae9ecd5	feat(enterprise/dbcrypt): encrypt ai_providers and ai_provider_keys at rest (#25326 )	2026-05-15 15:42:02 +02:00
Ethan	a59b951565	test: skip stale notification chatd flakes (#25376 ) These chatd tests are flaking for the same stale control-notification race tracked by CODAGT-353, so this change skips the newly reflaking advisor-chain and `TestPatchChatMessage/ChangesModel` tests and rewrites the older `TODO(hugodutka)` skips to point at the same root cause. This keeps the known flakes documented consistently until the chatd notification-flow refactor lands. Closes CODAGT-427 Closes https://github.com/coder/internal/issues/1510	2026-05-15 17:36:48 +10:00
Callum Styan	81212470fd	feat: implement basic (MVP version) of a fake agent + manager (#25070 ) This PR introduces a "fake agent" + manager, which can be used during scaletests to run a single executable that acts as many workspace agents. The goals of these are to provide a much lighter weight implementation of a workspace in terms of resource cost and startup time when executing scaletests. --------- Signed-off-by: Callum Styan <callumstyan@gmail.com> Co-authored-by: Mux <noreply@coder.com>	2026-05-14 14:46:36 -07:00
Yevhenii Shcherbina	238968cfa0	feat: add per-group AI budget table and endpoints (#25203 ) Closes https://linear.app/codercom/issue/AIGOV-284/add-group-budgets-table-and-crud-api ## Summary Adds the `group_ai_budgets` table and the following endpoints: - `GET /api/v2/groups/{group}/ai/budget` - `PUT /api/v2/groups/{group}/ai/budget` - `DELETE /api/v2/groups/{group}/ai/budget` Each group may have at most one budget row. If no row exists, no budget is enforced. ### Feature gate Added `RequireFeatureMW(FeatureAIBridge)` on the `/ai/budget` sub-route. ## RBAC Authorization reuses `rbac.ResourceGroup` with the existing `.InOrganization(...).WithID(...)` scoping model. The `dbauthz` wrappers load the parent `groups` row and authorize against it. No new resource type is introduced. As a result, anyone with `group:update` permissions (Owner, OrgAdmin, or UserAdmin within the organization) can manage AI budgets for that group. ## Read access for group members `database.Group.RBACObject()` grants `policy.ActionRead` to all members of the group through the group ACL: ```go func (g Group) RBACObject() rbac.Object { return rbac.ResourceGroup.WithID(g.ID). InOrg(g.OrganizationID). // Group members can read the group. WithGroupACL(map[string][]policy.Action{ g.ID.String(): { policy.ActionRead, }, }) } ``` Because the `GET` endpoint authorizes against the same loaded `Group` object, any group member can call: ```text GET /api/v2/groups/{group}/ai/budget ``` `PUT` and `DELETE` remain admin-only. The group ACL grants only `ActionRead`, so write operations continue to require role-based `group:update` permissions. ## Alternative considered A dedicated `rbac.ResourceGroupAiBudget` resource would allow budget management to be separated from general group administration. We decided not to add that complexity for now.	2026-05-14 15:54:37 -04:00
Danielle Maywood	9ddfafe2b1	feat: add chat ACL database foundation (#25080 )	2026-05-14 17:18:50 +01:00
Danny Kopping	841b777ccd	feat: add ai_providers table, queries, dbauthz, audit, RBAC (#24892 )	2026-05-14 16:10:46 +02:00
Yevhenii Shcherbina	b5e1ea33d8	feat: add AI budget policy and period deployment config (#25122 ) Closes https://linear.app/codercom/issue/AIGOV-283/add-deployment-config-for-ai-budget-policy-and-period Adds `CODER_AI_BUDGET_POLICY` and `CODER_AI_BUDGET_PERIOD` deployment options for AI Governance cost controls.	2026-05-12 10:48:36 -04:00
Kyle Carberry	5a5cd79c4c	fix: drop buffered chat parts after their durable message commits (#25164 )	2026-05-12 00:30:38 -04:00
Kyle Carberry	0ed57ee343	fix(coderd/x/chatd): checkpoint buffered message_parts to avoid stale replay (#25145 )	2026-05-11 17:27:03 -04:00
Steven Masley	19573e8aee	feat!: patchTemplateMeta to use optional fields (#24984 ) Closes https://github.com/coder/coder/issues/13112 Breaking Change: Removed status code `StatusNotModified` when no diffs occur in a patch. Now the patch is always applied and a template is always returned.	2026-05-11 12:43:52 -05:00
Zach	b221632615	fix: wipe user secrets when user is soft-deleted (#24985 ) Extend the delete_deleted_user_resources() trigger so that secrets belonging to a soft-deleted user are removed in the same transaction as the existing api_keys and user_links cleanup. user_secrets.user_id has ON DELETE CASCADE, but Coder soft-deletes users by flipping users.deleted rather than removing the row, so the foreign key cascade never fires and secrets would otherwise survive deletion. Assisted by Coder Agents.	2026-05-11 09:07:30 -06:00
Zach	81e2be69e9	test: use typed atomics in test files (#25071 ) Use typed atomics (atomic.Int64, atomic.Int32, etc.) in test files to prevent mixing atomic and non-atomic access on the same value, guarantee 64-bit alignment on 32-bit platforms, and provide a cleaner API.	2026-05-11 08:41:17 -06:00
Jeremy Ruppel	a1dbd758bc	feat: add template builder deployment config and telemetry types (#25082 )	2026-05-11 09:48:55 -04:00
Thomas Kosiewski	4a6756a3e8	fix: isolate test HTTP clients (#25038 )	2026-05-11 11:03:38 +02:00
Marcin Tojek	febabfb8b2	feat: add request/response dump support to aibridgeproxyd (#24837 ) Closes https://github.com/coder/coder/issues/24335	2026-05-11 10:59:26 +02:00
Ethan	b6dbc5614c	fix(coderd/x/chatd): handle truncated provider streams (#25074 ) coder/fantasy now fails closed when Anthropic or OpenAI Responses streams close before their provider terminal events instead of yielding a successful finish. This bumps the fantasy replacement to coder/fantasy#33 and teaches chat error classification to treat those failures as retryable timeout errors with explicit stream-closed messages. <img width="875" height="311" alt="image" src="https://github.com/user-attachments/assets/69c6f7b5-c885-46d2-a88b-b7a2b111bd55" />	2026-05-08 15:52:42 +10:00
Susana Ferreira	b6dacb4a3c	feat: add automatic key failover for AI Bridge OpenAI (#24847 ) ## Description Adds automatic key failover for centralized OpenAI provider, covering both chat completions and responses APIs. Same shape as the Anthropic PR: each upstream call walks the configured key pool, keys are marked temporary on 429 (with cooldown from `Retry-After`) and permanent on 401/403. Each agentic-loop iteration gets its own fresh walker so a tool-call continuation can fail over independently of the initial request. BYOK is unchanged: BYOK requests run as a single attempt with no failover. ## Changes - `config.OpenAI` carries a `KeyPool`. `Key` remains for BYOK Authorization Bearer set per interception. - Chat completions blocking interceptor: walks the pool via `newChatCompletionWithKeyFailover`, marks keys on key-specific failures, returns on first success or non-failover error. - Chat completions streaming interceptor: per-iteration walker. Pre-stream failures fail over to the next key; mid-stream errors are relayed as SSE events. - Responses blocking interceptor: extracts `newResponseWithKeyFailover` parallel to chatcompletions. - Responses streaming interceptor: per-iteration walker, retains the existing buffer-then-forward design. ## Related Issues Related to: https://github.com/coder/internal/issues/1446 Related to: https://linear.app/codercom/issue/AIGOV-197/aibridge-automatic-key-failover-for-bridged-and-passthrough-routes ## Follow-up PRs - Bedrock multi-key support. - Refactor provider vs interceptor config separation. - Record the actually-used key in the interception credential hint after failover. > [!NOTE] > Initially generated by Claude Opus 4.7, modified and reviewed by @ssncferreira	2026-05-07 15:35:46 +01:00
Susana Ferreira	f1155ac4d7	feat: add automatic key failover for AI Bridge Anthropic (#24836 ) ## Description Adds automatic key failover for centralized Anthropic provider. When a key pool is configured, each upstream call walks the pool and tries keys in order until one succeeds or the pool is exhausted. Keys are marked temporary on 429 (with cooldown from `Retry-After`) and permanent on 401/403. Errors that aren't key-specific don't trigger failover. Each agentic-loop iteration gets its own fresh walker, so a tool-call continuation can fail over independently of the initial request. BYOK is unchanged: BYOK requests run as a single attempt with no failover. ## Changes - `config.Anthropic` carries a `KeyPool`. `Key` remains for BYOK X-Api-Key set per interception. - Blocking interceptor: walks the pool, marks keys on key-specific failures, returns on first success or non-failover error. - Streaming interceptor: per-iteration walker. Pre-stream failures fail over to the next key; mid-stream errors are relayed as SSE events. - New `keypool` error types: `TransientExhaustionError` (carries soonest cooldown) and `ErrPermanentExhaustion`. Replace the prior `ErrAllKeysExhausted`. - Error responses now consistently include the outer `"type": "error"` field. ## Related Issues Related to: https://github.com/coder/internal/issues/1446 Related to: https://linear.app/codercom/issue/AIGOV-197/aibridge-automatic-key-failover-for-bridged-and-passthrough-routes ## Follow-up PRs - Bedrock multi-key support. - Refactor provider vs interceptor config separation. - Record the actually-used key in the interception credential hint after failover. > [!NOTE] > Initially generated by Claude Opus 4.7, modified and reviewed by @ssncferreira	2026-05-07 14:57:44 +01:00
Jon Ayers	2cab1b41ad	fix: increase MaxMessageSize to 16 MiB (#24599 )	2026-05-06 10:12:56 -05:00
Michael Suchacz	0bfb9f6f13	feat: show agent turn summary in agents sidebar (#24942 ) Persists the agent-generated turn-end summary on `chats` and shows it as the Agents sidebar subtitle when present, falling back to the model name. Errors still take precedence. > Mux is acting on Mike's behalf. ## What changes Storage. New nullable `last_turn_summary` column on `chats` (migration `000486`). New `UpdateChatLastTurnSummary` query normalizes blank/whitespace input to `NULL`, preserves `updated_at` (so the chat does not jump to the top of the sidebar on summary writes), and uses an `expected_updated_at` stale-write guard so an older async summary cannot overwrite a newer turn. Backend. `coderd/x/chatd/chatd.go` decouples summary generation from webpush. Generated summaries persist for completed parent turns even when webpush is unconfigured or has no subscriptions. The same generated text is reused as the webpush body when webpush is configured, so the summary model is not called twice. Generic fallback push text is no longer persisted; it clears any stale summary instead. Error/interrupt/pending-action terminal paths clear `last_turn_summary` for the latest turn. Frontend. `AgentsSidebar.tsx` subtitle priority is now `errorReason \|\| lastTurnSummary \|\| modelName`, normalized via the existing `asNonEmptyString` helper from `blockUtils.ts`. ## Tests - `TestUpdateChatLastTurnSummary` (database): success, whitespace-to-NULL, stale guard rejects, `updated_at` preserved. - `TestUpdateLastTurnSummaryRejectsStaleWrites` (chatd internal): direct stale-`expected_updated_at` test. - `TestSuccessfulChatPersistsTurnSummaryWithoutWebPush`: persistence works without webpush subscriptions. - `TestSuccessfulChatSendsWebPushWithSummary`: same generated text drives both DB and push body. - `TestSuccessfulChatSendsWebPushFallbackWithoutSummaryForEmptyAssistantText`: fallback text is not persisted. - `TestErroredChatClearsLastTurnSummaryAndSendsWebPush`: error path clears the field. - `TestInterruptChatDoesNotSendWebPushNotification`: interrupt path clears the field, no push fires. - `AgentsSidebar.test.tsx`: subtitle priority for summary-present, error-wins, no-summary fallback, whitespace fallback. - `AgentsSidebar.stories.tsx`: `ChatWithTurnSummary` and `ChatWithTurnSummaryAndError`. ## Notes - No backfill. Existing chats keep showing the model name until their next turn completes. - Parent chats only in this iteration; the field is rendered on any `Chat` if a future change extends generation to children. - Decoupling generation from webpush adds quickgen model calls for completed parent turns that previously skipped generation when no subscriptions existed. Existing parent-only, assistant-text-present, `PushSummaryModel` configured, and bounded-timeout gates keep this behavior bounded.	2026-05-06 16:43:35 +02:00
Ethan	e5c7fdff86	fix(coderd/x/chatd): refresh chat status and bound subscriber reads on Subscribe (#24095 ) Tightens the chat stream subscription path on a few related axes. None of these changes touch the steady-state event flow; they all concern the subscribe handshake. ## Motivation `Server.Subscribe` carries three responsibilities that were entangled: 1. Authorize the caller against the chat row. 2. Arm local + pubsub subscriptions before any DB reads (subscribe-first-then-query). 3. Build the initial snapshot from a fresh chat row, message history, and queue. When all three live in one function and share the request context, a few unfortunate behaviors fall out: - The HTTP handler's middleware already loaded and authorized the chat row, but `Subscribe(chatID)` discarded it and re-fetched on every WebSocket connection. - The chat row used to populate the initial `status` event was loaded before the pubsub subscription was armed, so a status transition that happened in that window was silently lost. - Control-path DB reads inherited whatever context the caller passed in. A caller without a deadline could wedge a subscriber goroutine indefinitely on a stalled DB. - A transient failure of the chat re-read collapsed the entire subscription instead of degrading gracefully. ## What changes Split the auth boundary out into the type signature. A new `SubscribeAuthorized(ctx, chat, ...)` takes the already-authorized row directly. The HTTP handler in `coderd/exp_chats.go` calls it with the chat row from `httpmw.ChatParam`, eliminating the redundant `GetChatByID`. `Subscribe(chatID)` is preserved as a thin wrapper for callers that don't have a chat row in hand (tests, internal callers); it does the auth lookup and delegates. Re-read the chat after arming subscriptions. Inside `SubscribeAuthorized`, after the local stream and pubsub subscriptions are active, we reload the chat row to populate the initial `status` event and any enterprise relay setup. Combined with the existing subscribe-first-then-query pattern, this closes the gap where a status transition between the middleware's load and the subscription arming would not appear in either the initial snapshot or a live notification. Fall back to the middleware row on refresh failure. If the post-subscription refresh fails (transient DB blip, brief pool exhaustion), we log a warning and reuse the row that proved authorization in the first place. Messages, queue, and pubsub are all independent of this row, so the stream still works; the initial `status` is just slightly stale and self-corrects via the next pubsub event. Bound subscriber control-path DB reads. A new `streamSubscriberControlFetchContext` helper applies a 5-second fallback timeout only when the caller has no deadline of their own. Used at the chat refresh, the initial queue load, and the queue-update goroutine following pubsub notifications. HTTP-driven callers pass through unchanged; background callers can no longer hang forever on a stalled DB and leak subscriber goroutines, pubsub subscriptions, and `chatStreams` entries.	2026-05-06 14:29:53 +10:00
Ethan	46a60e6d5d	refactor: move chat error kinds into codersdk (#24955 ) Moves the chat error kind taxonomy from `coderd/x/chatd/chaterror` into `codersdk.ChatErrorKind` and types `ChatError.Kind` / `ChatStreamRetry.Kind` so generated TypeScript exposes an SDK-owned union, including `usage_limit`. Backend chat classification now references the SDK constants directly while preserving the existing JSON string values. Keeps chat usage-limit admission failures on their existing 409 response shape. The frontend maps structured usage-limit responses to the SDK-owned `usage_limit` kind, uses generated `TypesGen.ChatErrorKind` directly, and removes the local string union and alias.	2026-05-06 11:57:48 +10:00
Zach	f4197d676c	refactor: remove unused tailnet connIO stats fields (#24911 ) Drop start, lastWrite, and overwrites fields on connIO along with the Stats() and Overwrites() methods. They have had no readers since `52901e121` which rewrote the PG coordinator's debug page to query the database directly.	2026-05-05 08:46:53 -06:00
Ethan	4751416b29	fix!: persist structured chat errors (#24919 ) Breaking change for changelog: > `codersdk.Chat.last_error` now returns a structured `ChatError` object (`{message, kind, provider, retryable, status_code, detail}`) instead of a plain string. The chats API is experimental (`/api/experimental/chats`), so this ships without a deprecation cycle; consumers reading `chat.last_error` as a string must update to read `chat.last_error.message`. SDK/generated TypeScript terminal error payloads now use the single `ChatError` type; the live stream error payload type is renamed from `ChatStreamError` to `ChatError`. Persisted chat errors now carry the same provider-specific detail (kind, provider, retryable, HTTP status, optional detail) as the live stream, so refreshing a failed chat rehydrates with the full structured error instead of a one-line headline. Existing rows are migrated in place: legacy text errors are wrapped into `{message, kind: "generic"}` so already-errored chats still render, and rows with `last_error IS NULL` stay NULL. Internally, persisted fallback decoding now reuses the existing `chaterror.KindGeneric` constant, with no JSON value change. Closes CODAGT-239	2026-05-05 12:56:06 +10:00
Atif Ali	fad69df710	fix: correct SCIM Swagger try it out URLs (#24779 )	2026-05-05 02:54:03 +05:00
Jon Ayers	6b9637d85a	feat: replace pgcoordinator pg_notify triggers with app-level Publish() (#24717 )	2026-05-01 15:00:08 -05:00

1 2 3 4 5 ...

1358 Commits