coder

mirror of https://github.com/coder/coder.git synced 2026-06-04 05:28:20 +00:00

Author	SHA1	Message	Date
Cian Johnston	3f6b40a833	fix: reap idle chatd stream states on a timer (#24476 ) * Adds `streamJanitorLoop` to clean up stale streams every 30s * zeroes dropped slots to aid in gc-eligibliity * Adds regression tests in coderd/x/chatd and enterprise/coderd/x/chatd > 🤖	2026-04-17 19:22:00 +01:00
Dean Sheather	3452ab3166	chore: add client_type field to chats and telemetry (#24342 ) Add a `chat_client_type` enum (`ui` \| `api`) and `client_type` column to the `chats` table. The column defaults to `api` for new rows so API callers don't need to set it explicitly. Existing rows are backfilled to `ui`. The field flows through `CreateChatRequest`, `chatd.CreateOptions`, `InsertChat`, and is returned in the `Chat` response via `db2sdk`. <details> <summary>Implementation notes (Coder Agents generated)</summary> ### Changes Database migration (000469) - New enum `chat_client_type` with values `ui`, `api`. - New `client_type` column, `NOT NULL DEFAULT 'api'`. - Backfill: `UPDATE chats SET client_type = 'ui'`. SQL query — `InsertChat` now includes `client_type`. SDK — `ChatClientType` type added; `ClientType` field added to both `CreateChatRequest` (optional, defaults server-side to `api`) and `Chat` response. Handler — `postChats` maps the request field (defaulting to `api`) and passes it through `chatd.CreateOptions`. Sub-agent — Child chats inherit their parent's `client_type`. db2sdk — Maps the database value to the SDK type. ### Decision log - Default is `api` (not `ui`) so existing API integrations get the correct value without code changes. - Backfill sets existing rows to `ui` per requirement. - Child chats inherit `client_type` from parent rather than defaulting. </details>	2026-04-16 23:57:05 +10:00
Ethan	b9bc0ad6df	test: skip TestSubscribeRelayEstablishedMidStream (#24431 ) Relates to https://github.com/coder/internal/issues/1455 From that issue: > Going to skip this test until the underlying race in chatd is fixed. https://github.com/coder/coder/pull/24279 was a band-aid fix that I no longer think is valuable pursuing short term. Hugo is working on a RFC for a redesign of the system to prevent the class of race condition into the future.	2026-04-16 23:55:41 +10:00
Cian Johnston	d7439a9de0	feat: add Prometheus metrics for chatd subsystem (#24371 ) Adds 7 Prometheus metrics to the chatd subsystem and introduces typed `ActivityBumpReason` for deadline bump attribution. \| Metric \| Type \| Labels \| \|--------\|------\|--------\| \| `coderd_chatd_chats` \| Gauge \| `state` (streaming, waiting) \| \| `coderd_chatd_message_count` \| Histogram \| `provider` \| \| `coderd_chatd_prompt_size_bytes` \| Histogram \| `provider` \| \| `coderd_chatd_tool_result_size_bytes` \| Histogram \| `provider`, `tool_name` \| \| `coderd_chatd_ttft_seconds` \| Histogram \| `provider` \| \| `coderd_chatd_compaction_total` \| Counter \| `provider`, `result` \| \| `coderd_chatd_steps_total` \| Counter \| `provider` \| > 🤖	2026-04-15 19:53:10 +01:00
Yevhenii Shcherbina	dd73ea54bd	feat: add allow-byok option for ai-gateway (#24274 ) ## Summary Adds `--ai-gateway-allow-byok` deployment option to control whether users can use Bring Your Own Key (BYOK) mode with AI Gateway. When disabled (`--ai-gateway-allow-byok=false`), BYOK requests are rejected with a 403 and a message directing the admin to enable the flag. Centralized key authentication works regardless of this setting. Defaults to `true` (BYOK allowed). --------- Co-authored-by: Danny Kopping <danny@coder.com>	2026-04-15 14:16:49 -04:00
Cian Johnston	6194bd6f57	fix: address post-merge review findings for chat org scoping (#24297 ) Addresses review findings from #23827 that were added post-merge: - Persisted attachments now store `organizationId`; mismatched orgs pruned on restore - Workspace selection reconciliation: stale IDs from previous orgs dropped via derived `effectiveWorkspaceId` - Org picker uses `permittedOrganizations()` for RBAC-aware filtering - Org picker hidden when user belongs to only one org - Ref-sync `useEffect` replaced with `useEffectEvent` - `CreateWorkspace()` and `ListTemplates()` take `organizationID` and `db` as required function parameters instead of optional struct fields — compiler enforces them, removes scattered nil guards - Cross-org template check in `CreateWorkspace` is now unconditional - `ListTemplates` org-scoping filter now has test coverage - `setupChatInfra` comment fixed; test helpers use params structs instead of positional UUIDs - Enterprise test documents that org admin only sees own chats (handler hardcodes `OwnerID` — future work needs sidebar UI before lifting that restriction) > 🤖	2026-04-15 11:39:05 +01:00
Cian Johnston	c552f9f281	fix: stop group spend limits from leaking across org boundaries (#24294 ) Three SQL queries (`GetUserGroupSpendLimit`, `ResolveUserChatSpendLimit`, `GetUserChatSpendInPeriod`) aggregated chat spend limits and usage globally across all organizations. A restrictive group limit in org A would bleed into org B. ## Changes - Add `organization_id` parameter to all three SQL queries in `coderd/database/queries/chats.sql` - When nil UUID is passed, queries fall back to global behavior (backward compat for HTTP dashboard endpoints) - When real org ID is passed, limits and spend are scoped to that organization - Thread `organizationID` through `ResolveUsageLimitStatus` → `checkUsageLimit` → all chatd call sites - Update dbauthz wrappers for new param structs - HTTP endpoints (`chatCostSummary`, `getMyChatUsageLimitStatus`) pass `uuid.Nil` with TODO for future org-scoped UI - Add `TestResolveUsageLimitStatus_OrgScoped` with 5 test cases covering org isolation, nil-UUID fallback, spend scoping, and user override priority Closes coder/internal#1466 > 🤖	2026-04-14 16:56:17 +01:00
Yevhenii Shcherbina	b78eba9f9d	feat: make sure creds are always masked (#24241 ) ## Summary Adds a `sanitizeCredentialHint` safety check in the db-to-SDK conversion layer to ensure credential hints are always masked before being exposed in the API. Also adds `credential_kind` and `credential_hint` assertions to the session threads API test.	2026-04-13 10:14:38 -04:00
Cian Johnston	22062ec52e	feat: add organization scoping to chats (#23827 ) Fixes https://github.com/coder/internal/issues/1436 * Adds organization_id to chats with backfill (workspace org → user org membership → default org) * No support yet for ACLs (follow-up issue) - Cross-org workspace binding rejected (both in `CreateChatRequest` and in `create_workspace` tool - Adds `OrganizationAutocomplete` to `AgentCreateForm` - Docs updated with `organization_id` in chats-api.md > 🤖 Written by a Coder Agent. Reviewed by many humans and many agents. --------- Co-authored-by: Mathias Fredriksson <mafredri@gmail.com>	2026-04-13 12:31:25 +01:00
Cian Johnston	7b0421d8c6	fix: revert auto-assign agents-access role enabled (#24170 ) This reverts commit `d4a9c63e91` (#23968). --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-04-08 20:56:17 +01:00
Jon Ayers	08bd9e672a	fix: resolve Test_batcherFlush/RetriesOnTransientFailure flake (#24112 ) fixes https://github.com/coder/internal/issues/1452	2026-04-07 13:46:26 -05:00
Kayla はな	c5f1a2fccf	feat: make service accounts a Premium feature (#24020 )	2026-04-07 12:25:32 -06:00
Kyle Carberry	f3f0a2c553	fix(enterprise/coderd/x/chatd): harden TestSubscribeRelayEstablishedMidStream against CI flakes (#24108 ) Fixes coder/internal#1455 Three changes to eliminate the timing-sensitive flake in `TestSubscribeRelayEstablishedMidStream`: 1. Reduce `PendingChatAcquireInterval` from `time.Hour` to `time.Second`. The primary trigger is still `signalWake()` from `SendMessage`, but a short fallback poll ensures the worker picks up the pending chat even under heavy CI goroutine scheduling contention. 2. Increase context timeout from `WaitLong` (25s) to `WaitSuperLong` (60s). The worker pipeline (model resolution, message loading, LLM call) involves multiple DB round-trips that can be slow when PostgreSQL is shared with many parallel test packages. 3. Add a status-polling loop while waiting for the streaming request. If the worker errors out during chat processing, the test now fails immediately with the error status and message instead of silently timing out. > Generated by Coder Agents	2026-04-07 13:41:33 -04:00
George K	86ca61d6ca	perf: cap count queries and emit native UUID comparisons for audit/connection logs (#23835 ) Audit and connection log pages were timing out due to expensive COUNT(*) queries over large tables. This commit adds opt-in count capping: requests can return a `count_cap` field signaling that the count was truncated at a threshold, avoiding full table scans that caused page timeouts. Text-cast UUID comparisons in regosql-generated authorization queries also contributed to the slowdown by preventing index usage for connection and audit log queries. These now emit native UUID operators. Frontend changes handle the capped state in usePaginatedQuery and PaginationWidget, optionally displaying a capped count in the pagination UI (e.g. "Showing 2,076 to 2,100 of 2,000+ logs") Related to: https://linear.app/codercom/issue/PLAT-31/connectionaudit-log-performance-issue	2026-04-07 07:24:53 -07:00
Kyle Carberry	e18094825a	fix: retain message_part buffer for cross-replica relay (#24031 )	2026-04-04 17:24:41 -04:00
Jon Ayers	a1d51f0dab	feat: batch connection logs to avoid DB lock contention (#23727 ) - Running 30k connections was generating a ton of lock contention in the DB	2026-04-03 15:47:26 -05:00
Paweł Banaszewski	8369fa88fd	feat: add columns for cached tokens from aibridge (#23832 ) Two new columns added to aibridge_token_usages: - cache_read_input_tokens (BIGINT, default 0) - cache_write_input_tokens (BIGINT, default 0) Migration backfills existing rows by extracting values from the metadata JSONB column (cache_read_input, input_cached, prompt_cached for reads (max value selected since only 1 should be set), cache_creation_input for writes). All references to data from metadata were updated to reference new columns. No other changes then changing where data is extracted from. Requires aibridge library version bump to include: https://github.com/coder/aibridge/pull/229 Fixes: https://github.com/coder/aibridge/issues/150	2026-04-03 16:27:31 +02:00
Michael Suchacz	7d0a0c6495	feat: provider key policies and user provider settings (#23751 )	2026-04-02 19:46:42 +02:00
Cian Johnston	d4a9c63e91	feat: auto-assign agents-access role to new users when experiment enabled (#23968 ) When the `agents` experiment is enabled, new users are automatically granted the `agents-access` role at creation time so they can use Coder Agents without manual admin intervention. - Auto-assigns in `CreateUser()` — covers admin API, OAuth, and OIDC creation paths - Skips auto-assign for OIDC users when enterprise site role sync is enabled (sync overwrites roles on every login; those admins should use `--oidc-user-role-default` instead) - CLI `create-admin-user` bypasses `CreateUser()` but creates `owner` users who already have all permissions > 🤖 Written by a Coder Agent. Will be reviewed by a human.	2026-04-02 14:46:47 +01:00
Ethan	7757cd8e08	refactor(coderd/x/chatd): insert chats directly as pending on creation (#23888 ) Previously, `CreateChat` inserted the `chats` row with the DB default status (`waiting`), then updated it to `pending` in the same transaction via `setChatPendingWithStore`. This wasted two extra queries per chat creation (`GetChatByID` + `UpdateChatStatus`) and rewrote the same row immediately after inserting it. Now `CreateChat` passes the status directly to `InsertChat`, so the row is written once in its final create-time state. The `setChatPendingWithStore` helper is removed entirely. `InsertChat` now requires an explicit `status` parameter at all callsites instead of relying on a DB column default. ## Motivation On an experimental branch we're trialing firing all chatd notifications from plpgsql triggers. The old two-step insert made that awkward: in an `AFTER INSERT` trigger, `NEW` only contained the insert-time row (`waiting`), not the final committed state (`pending`). To emit the correct event payload the trigger had to be deferred and re-read the row from `chats` at commit time. With this change, `NEW` already contains the correct row to publish — no deferred trigger, no extra `SELECT`, simpler and cheaper trigger logic. That said, this seems like a worthwhile change regardless of the trigger experiment: writing the final row state once removes unnecessary DB work on every chat creation and makes the create path easier to reason about.	2026-04-02 14:13:51 +11:00
Cian Johnston	d6df78c9b9	chore: remove racy ChatStatusPending assertions after CreateChat (#23882 ) Removes 6 fragile `require.Equal(t, codersdk.ChatStatusPending, chat.Status)` assertions from chat relay and creation tests. Root cause: In HA tests with two replicas sharing the same DB, the worker can acquire a just-created chat (flipping `pending → running` via `AcquireChats`) before the HTTP response reaches the test. All affected tests already synchronize via `require.Eventually` waiting for `running` status, making the initial assertion both redundant and racy. - Remove 5 assertions in `enterprise/coderd/exp_chats_test.go` (all `TestChatStreamRelay` subtests) - Remove 1 assertion in `coderd/exp_chats_test.go` (`TestPostChats`) - An existing comment in `TestPostChats/Success` already documents this exact race Fixes flake: https://github.com/coder/coder/actions/runs/23807597632/job/69385425724 > 🤖 Written by a Coder Agent. Will be reviewed by a human.	2026-04-01 10:00:50 +01:00
Danny Kopping	9fa103929a	perf: make `ListAIBridgeSessions` 10x faster (#23774 ) _Disclaimer: produced using Claude Opus 4.6, reviewed by me, and validated against Dogfood dataset._ The `ListAIBridgeSessions` query materialized and aggregated all matching interceptions before paginating, then ran expensive token/prompt lookups across the full dataset. For a page of 25 sessions against ~200k interceptions (our dogfood dataset), this meant: - Three CTEs scanning all rows (filtered_interceptions, session_tokens, session_root) - ARRAY_AGG(fi.id) collecting every interception ID per session - Lateral prompt lookup via ANY(array_of_all_ids) running for every session, not just the page - ~90MB of disk sorts and JIT compilation kicking in The improvement is to restructure to paginate first and enrich after: a single CTE groups interceptions into sessions with only cheap aggregates (MIN, MAX, COUNT), applies cursor pagination and LIMIT, then lateral joins fetch metadata, tokens, and prompts for just the ~25-row page. Measured against 220k interceptions / 160k sessions: \| Metric \| Before \| After \| \|--------------------\|--------\|-------\| \| Execution time \| 1800ms \| 185ms \| \| Shared buffer hits \| 737k \| 2.6k \| \| Disk sort spill \| 86MB \| 16MB \| \| Lateral loops \| 160k \| 25 \| https://grafana.dev.coder.com/goto/fbODPGtvR?orgId=1 the results are identical, just _much_ faster. --- Also includes some additional tests which I added prior to refactoring the query to ensure no regressions on edge-cases. --------- Signed-off-by: Danny Kopping <danny@coder.com>	2026-03-31 14:42:23 +02:00
Cian Johnston	3ce82bb885	feat: add chat-access site-wide role to gate chat creation (#23724 ) - Add `chat-access` built-in role granting chat CRUD at User scope - Exclude `ResourceChat` from member, org member, and org service account `allPermsExcept` calls - Allow system, owner, and user-admin to assign the new role - Migration auto-assigns role to users who have ever created a chat - Update RBAC test matrix: `memberMe` denied, `chatAccessUser` allowed Breaking change: Members without `chat-access` lose chat creation ability. Migration covers existing chat creators. Members who have never created a chat do not get this role automatically applied. > 🤖 This PR was created by a Coder Agent and reviewed by me.	2026-03-31 10:07:21 +01:00
Ethan	13dfc9a9bb	test: harden chatd relay test setup (#23759 ) These chatd relay tests were seeding chats through `subscriber.CreateChat(...)`, which wakes the subscriber and can race local acquisition against the intended remote-worker setup. Seed waiting and remote-running chats directly in the database instead, and point the default OpenAI provider at a local safety-net server so accidental processing fails locally instead of reaching the live API. Closes https://github.com/coder/internal/issues/1430	2026-03-30 17:52:01 +11:00
Jake Howell	71a492a374	feat: implement `<ClientFilter />` to AI Bridge request logs (#22694 ) Closes #22136 This pull-request implements a `<ClientFilter />` to our `Request Logs` page for AI Bridge. This will allow the user to select a client which they wish to filter against. Technically the backend is able to actually filter against multiple clients at once however the frontend doesn't currently have a nice way of supporting this (future improvement). <img width="1447" height="831" alt="image" src="https://github.com/user-attachments/assets/0be234e2-25f2-4a89-b971-d74817395da1" /> --------- Co-authored-by: Jeremy Ruppel <jeremy.ruppel@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 17:18:28 -04:00
Jaayden Halko	86c3983fc0	feat: add AI Governance seat capacity banners (#23411 ) ## Summary Add site-wide banners for AI Governance seat usage thresholds: 1. 90% capacity warning (admin-only): When actual AI Governance seats are ≥90% and <100% of the license limit, admins see: > "You have used 90% of your AI governance add-on seats." 2. Over-limit banner (admin-only): When actual seats exceed the license limit, admins see a prominent warning: > "Your organization is using {actual} / {limit} AI Governance user seats ({X}% over the limit). Contact sales@coder.com" - Uses floor whole percentage (Go int division / `Math.floor`) - Includes a clickable `mailto:sales@coder.com` link	2026-03-27 05:51:51 +00:00
Danny Kopping	801e57d430	feat: session detail API (#23203 )	2026-03-26 18:09:53 +02:00
Ethan	4d74603045	fix(coderd/x/chatd): respect provider Retry-After headers in chat retry loop (#23351 ) > PR Stack > 1. #23351 ← `#23282` (you are here) > 2. #23282 ← `#23275` > 3. #23275 ← `#23349` > 4. #23349 ← `main` --- ## Summary `chatretry.Retry()` used pure exponential backoff (1 s, 2 s, 4 s, …) and never consulted provider `Retry-After` headers. Fantasy's `ProviderError` carries `ResponseHeaders` including `Retry-After`, but `chaterror.Classify()` only parsed error text and silently dropped the structured transport metadata. This makes `Retry-After` a first-class signal in the classification → retry pipeline. <img width="853" height="346" alt="image" src="https://github.com/user-attachments/assets/65f012b6-8173-43d2-957e-ab9faddea525" /> ## Changes ### `coderd/chatd/chaterror/classify.go` - Added `RetryAfter time.Duration` field to `ClassifiedError` — a normalized minimum retry delay derived from provider response metadata. - `Classify()` now calls `extractProviderErrorDetails()` before falling back to text heuristics. Structured `ProviderError.StatusCode` takes priority over regex extraction. - `normalizeClassification()` preserves and clamps `RetryAfter`. ### `coderd/chatd/chaterror/provider_error.go` (new) Provider-specific extraction, isolated from the text-based classification logic: - `extractProviderErrorDetails()` unwraps `fantasy.ProviderError` from the error chain via `errors.As`. - `retryAfterFromHeaders()` parses headers in priority order: 1. `retry-after-ms` (OpenAI-specific, millisecond precision) 2. `retry-after` (standard HTTP — integer seconds or HTTP-date) - Case-insensitive header key lookup. ### `coderd/chatd/chatretry/chatretry.go` - `effectiveDelay(attempt, classified)` computes `max(Delay(attempt), classified.RetryAfter)` — the provider hint acts as a floor without weakening the local exponential backoff. - `Retry()` now uses `effectiveDelay` and passes the effective delay to both `onRetry(...)` and the sleep timer, so downstream payloads, logs, and the frontend countdown stay aligned automatically. ### Tests - `classify_test.go`: Structured provider status + `Retry-After` extraction, `retry-after-ms` priority, HTTP-date parsing, invalid header fallback, `WithProvider` preservation. - `chatretry_test.go`: Retry-after-as-floor semantics — longer hint wins, shorter hint keeps base delay. ## Design notes - No SDK/API/frontend changes needed.* `codersdk.ChatStreamRetry` already carries `DelayMs` and `RetryingAt`, and the frontend already consumes them. The fix is purely in the server-side delay computation. - Existing retryability rules unchanged. This fixes when we sleep, not whether an error is retryable. - Provider hint is a floor: `max(baseDelay, RetryAfter)` ensures we never retry earlier than the provider asks, and never weaken our own backoff curve.	2026-03-27 01:20:46 +11:00
Danny Kopping	8eade29e68	chore: update AI Bridge warning to require AI Governance Add-On (#23662 ) Disclaimer: implemented by a Coder Agent using Claude Opus 4.6, reviewed by me. Replace the transitional soft warning message: > AI Bridge is now Generally Available in v2.30. In a future Coder version, your deployment will require the AI Governance Add-On to continue using this feature. Please reach out to your account team or sales@coder.com to learn more. with the definitive requirement message: > The AI Governance Add-On is required to use AI Bridge. Please reach out to your account team or sales@coder.com to learn more. Updated in: - `enterprise/coderd/license/license.go` - `enterprise/coderd/license/license_test.go` (2 occurrences)	2026-03-26 11:10:53 +02:00
Jake Howell	0cea4de69e	fix: `AI governance` into `AI Governance` (#23553 )	2026-03-25 20:06:48 +11:00
Ethan	70f031d793	feat(coderd/chatd): structured chat error classification and retry hardening (#23275 ) > PR Stack > 1. #23351 ← `#23282` > 2. #23282 ← `#23275` > 3. #23275 ← `#23349` (you are here) > 4. #23349 ← `main` --- ## Summary Extracts a structured error classification subsystem for agent chat (`chatd`) so that retry and error payloads carry machine-readable metadata — error kind, provider name, HTTP status code, and retryability — instead of raw error strings. This is the backend half of the error-handling work. The frontend counterpart is in #23282. ## Changes ### New package: `coderd/chatd/chaterror/` Canonical error classification — extracts error kind, provider, status code, and user-facing message from raw provider errors. One source of truth that drives both retry policy and stream payloads. - `kind.go`: Error kind enum (`rate_limit`, `timeout`, `auth`, `config`, `overloaded`, `unknown`). - `signals.go`: Signal extraction — parses provider name, HTTP status code, and retryability from error strings and wrapped types. - `classify.go`: Classification logic — maps extracted signals to an error kind. - `message.go`: User-facing message templates keyed by kind + signals. - `payload.go`: Projectors that build `ChatStreamError` and `ChatStreamRetry` payloads from a classified error. ### Modified - `codersdk/chats.go`: Added `Kind`, `Provider`, `Retryable`, `StatusCode` fields to `ChatStreamError` and `ChatStreamRetry`. - `coderd/chatd/chatretry/`: Thinned to retry-policy only; classification logic moved to `chaterror`. - `coderd/chatd/chatloop/`: Added per-attempt first-chunk timeout (60 s) via `guardedStream` wrapper — produces retryable `startup_timeout` errors instead of hanging forever. - `coderd/chatd/chatd.go`: Publishes normalized retry/error payloads via `chaterror` projectors.	2026-03-25 13:47:54 +11:00
Mathias Fredriksson	38f723288f	fix: correct malformed struct tags in organizationroles and scim_test (#23497 ) Fix leading space in table tag and escaped-quote tag syntax. Extracted from #23201.	2026-03-25 13:11:08 +11:00
Asher	81188b9ac9	feat: add filtering by service account (#23468 ) You can now filter by/out service accounts using `service_account:true/false` or using the filter dropdown.	2026-03-24 10:13:25 -08:00
Danny Kopping	dba9f68b11	chore!: remove members' ability to read their own interceptions; rationalize RBAC requirements (#23320 ) _Disclaimer:_ _produced_ _by_ _Claude_ _Opus_ _4\.6,_ _reviewed_ _by_ _me._ This is a breaking change. Users who are not have `owner` or sitewide `auditor` roles will no longer be able to view interceptions. Regular users should not need to view this information; in fact, it could be used by a malicious insider to see what information we track and don't track to exfiltrate data or perform actions unobserved. --- Changed authorization for AI Bridge interception-related operations from system-level permissions to resource-specific permissions. The following functions now authorize against `rbac.ResourceAibridgeInterception` instead of `rbac.ResourceSystem`: - `ListAIBridgeTokenUsagesByInterceptionIDs` - `ListAIBridgeToolUsagesByInterceptionIDs` - `ListAIBridgeUserPromptsByInterceptionIDs` Updated RBAC roles to grant AI Bridge interception permissions: - User/Member roles: Can create and update AI Bridge interceptions but cannot read them back - Service accounts: Same create/update permissions without read access - Owners/Auditors: Retain full read access to all interceptions Removed system-level authorization bypass in `populatedAndConvertAIBridgeInterceptions` function, allowing proper resource-level authorization checks. Updated tests to reflect the new permission model where members cannot view AI Bridge interceptions, even their own, while owners and auditors maintain full visibility.	2026-03-24 12:03:20 +02:00
Danny Kopping	43a1af3cd6	feat: session list API (#23202 ) <!-- If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting. --> _Disclaimer:_ _initially_ _produced_ _by_ _Claude_ _Opus_ _4\.6,_ _heavily_ _modified_ _and_ _reviewed_ _by_ _me._ Closes https://github.com/coder/internal/issues/1360 Adds a new `/api/v2/aibridge/sessions` API which returns "sessions". Sessions, as defined in the [RFC](https://www.notion.so/coderhq/AI-Bridge-Sessions-Threads-2ccd579be59280f28021d3baf7472fbe?source=copy_link), are a set of interceptions logically grouped by a session key issued by the client. The API design for this endpoint was done in [this doc](https://github.com/coder/internal/issues/1360). If the client has not provided a session ID, we will revert to the thread root ID, and if that's not present we use the interception's own ID (i.e. a session of a single interception - which is effectively what we show currently in our `/api/v2/aibridge/interceptions` API). The SQL query looks gnarly but it's relatively simple, and seems to perform well (~200ms) even when I import dogfood's `aibridge_*` tables into my workspace. If we need to improve performance on this later we can investigate materialized views, perhaps, but for now I don't think it's warranted. --- _The PR looks large but it's got a lot of generated code; the actual changes aren't huge._	2026-03-24 08:58:47 +02:00
Cian Johnston	80a172f932	chore: move chatd and related packages to /x/ subpackage (#23445 ) - Moves `coderd/chatd/`, `coderd/gitsync/`, `enterprise/coderd/chatd/` under `x/` parent directories to signal instability - Adds `Experimental:` glue code comments in `coderd/coderd.go` > 🤖 This PR was created with the help of Coder Agents, and was reviewed by my human. 🧑‍💻	2026-03-23 17:34:43 +00:00
Cian Johnston	ef14654078	chore: move chat methods to ExperimentalClient (#23441 ) - Changes all 41 chat method receivers in `codersdk/chats.go` from `Client` to `ExperimentalClient` to ensure that callers are aware that these reference potentially unstable `/api/experimental` endpoints. > 🤖 This PR was created with the help of Coder Agents, and has been reviewed by my human. 🧑‍💻	2026-03-23 14:32:11 +00:00
Asher	24ab216dd1	feat: add new group members endpoint with filtering and pagination (#23067 ) Partially addresses #21813 (still need to make changes to the "add user" button to be complete) Since there are a lot of user tests already, I moved them into `coderdtest` to be shared.	2026-03-20 12:43:03 -08:00
Jaayden Halko	6f244cddde	feat: display the addon license UI (#22948 ) <img width="1052" height="234" alt="Screenshot 2026-03-18 at 21 58 57" src="https://github.com/user-attachments/assets/136ccb1f-e47a-44fd-804d-859301161435" /> --------- Co-authored-by: Steven Masley <stevenmasley@gmail.com>	2026-03-20 16:34:17 +00:00
Ethan	a1e912a763	fix(chatd): deliver retry control events via pubsub (#23349 ) > PR Stack > 1. #23351 ← `#23282` > 2. #23282 ← `#23275` > 3. #23275 ← `#23349` > 4. #23349 ← `main` (you are here) --- Retry events were published only to the local in-process stream via `publishEvent()`. When pubsub is active, `Subscribe()`'s merge loop only forwarded durable events (messages, status, errors) from pubsub notifications, so retry events were silently dropped for cross-replica subscribers. This adds a `publishRetry()` helper that publishes both locally and via pubsub, and extends the `Subscribe()` notification handler to forward retry events. Changes: - `coderd/pubsub/chatstreamnotify.go`: add `Retry` field to notify message - `coderd/chatd/chatd.go`: add `publishRetry()`, update `OnRetry` callback, extend `Subscribe()` to forward `notify.Retry` - `coderd/chatd/chatd_internal_test.go`: focused pubsub delivery test - `enterprise/coderd/chatd/chatd_test.go`: cross-replica end-to-end test	2026-03-20 15:19:41 +00:00
Kyle Carberry	d8ff67fb68	feat: add MCP server configuration backend for chats (#23227 ) ## Summary Adds the database schema, API endpoints, SDK types, and encryption wrappers for admin-managed MCP (Model Context Protocol) server configurations that chatd can consume. This is the backend foundation for allowing external MCP tools (Sentry, Linear, GitHub, etc.) to be used during AI chat sessions. ## Database Two new tables: - `mcp_server_configs`: Admin-managed server definitions with URL, transport (Streamable HTTP / SSE), auth config (none / OAuth2 / API key / custom headers), tool allow/deny lists, and an availability policy (`force_on` / `default_on` / `default_off`). Includes CHECK constraints on transport, auth_type, and availability values. - `mcp_server_user_tokens`: Per-user OAuth2 tokens for servers requiring individual authentication. Cascades on user/config deletion. New column on `chats` table: - `mcp_server_ids UUID[]`: Per-chat MCP server selection, following the same pattern as `model_config_id` — passed at chat creation, changeable per-message with nil-means-no-change semantics. ## API Endpoints All routes are under `/api/experimental/mcp/servers/` and gated behind the `agents` experiment. Admin endpoints (`ResourceDeploymentConfig` auth): - `POST /` — Create MCP server config - `PATCH /{id}` — Update MCP server config (full-replace) - `DELETE /{id}` — Delete MCP server config Authenticated endpoints (all users, enabled servers only for non-admins): - `GET /` — List configs (admins see all, members see enabled-only with admin fields redacted) - `GET /{id}` — Get config by ID (with `auth_connected` populated per-user) OAuth2 per-user auth flow: - `GET /{id}/oauth2/connect` — Initiate OAuth2 flow (state cookie CSRF protection) - `GET /{id}/oauth2/callback` — Handle OAuth2 callback, store tokens - `DELETE /{id}/oauth2/disconnect` — Remove stored OAuth2 tokens ## Security - Secrets never returned: `OAuth2ClientSecret`, `APIKeyValue`, and `CustomHeaders` are never in API responses — only boolean indicators (`has_oauth2_secret`, `has_api_key`, `has_custom_headers`). - Field redaction for non-admins: `convertMCPServerConfigRedacted` strips `OAuth2ClientID`, auth URLs, scopes, and `APIKeyHeader` from non-admin responses. - dbcrypt encryption at rest: All 5 secret fields use `dbcrypt_keys` encryption with full encrypt-on-write / decrypt-on-read wrappers (11 dbcrypt method overrides + 2 helpers), following the same pattern as `chat_providers.api_key`. - OAuth2 CSRF protection: State parameter stored in `HttpOnly` cookie with `HTTPCookies.Apply()` for correct `Secure`/`SameSite` behind TLS-terminating proxies. - dbauthz authorization: All 18 querier methods have authorization wrappers. Read operations use `ActionRead`, write operations use `ActionUpdate` on `ResourceDeploymentConfig`. ## Governance Model \| Control \| Implementation \| \|---------\|---------------\| \| Global kill switch \| `enabled` defaults to `false` \| \| Availability policy \| `force_on` (always injected), `default_on` (pre-selected), `default_off` (opt-in) \| \| Per-chat selection \| `mcp_server_ids` on `CreateChatRequest` / `CreateChatMessageRequest` \| \| Auth gate \| OAuth2 servers require per-user auth before tools are injected \| \| Tool-level allow/deny \| Arrays on `mcp_server_configs` for granular tool filtering \| \| Secrets encrypted at rest \| Uses `dbcrypt_keys` (same pattern as `chat_providers.api_key`) \| ## Tests 8 test functions covering: - Full CRUD lifecycle (create, list, update, delete) - Non-admin visibility filtering (enabled-only, field redaction) - `auth_connected` population for OAuth2 vs non-OAuth2 servers - Availability policy validation (valid values + invalid rejection) - Unique slug enforcement (409 Conflict) - OAuth2 disconnect idempotency - Chat creation with `mcp_server_ids` persistence ## Known Limitations (Deferred) These are documented and intentional for an experimental feature: - Audit logging not yet wired — will add when feature stabilizes - Cross-field validation (e.g., OAuth2 fields required when `auth_type=oauth2`) — admin-only endpoint, will add when stabilizing - `force_on` auto-injection — query exists but not yet wired into chatd tool injection (follow-up) - Additional test coverage — 403 auth tests, GET-by-ID tests, callback CSRF tests planned for follow-up ## What's NOT in this PR - Frontend UI (admin panel + chat picker) - Actual MCP client connections (`chatd/chatmcp/` manager) - Tool injection into `chatloop/`	2026-03-19 14:07:36 +00:00
Steven Masley	84de391f26	chore: add tallyman events for ai seat tracking (#22689 ) AI seat tracking inserted as heartbeat into usage table.	2026-03-18 09:30:22 -05:00
George K	91ec0f1484	feat: add service_accounts workspace sharing mode (#23093 ) Introduce a three-way workspace sharing setting (none, everyone, service_accounts) replacing the boolean workspace_sharing_disabled. In service_accounts mode, only service account-owned workspaces can be shared while regular members' share permissions are removed. Adds a new organization-service-account system role with per-org permissions reconciled alongside the existing organization-member system role. Related to: https://linear.app/codercom/issue/PLAT-28/feat-service-accounts-sharing-mode-and-rbac-role --------- Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com> Co-authored-by: Kayla はな <mckayla@hey.com>	2026-03-17 12:16:43 -07:00
Steven Masley	93b9d70a9b	chore: add audit log entry when ai seat is consumed (#22683 ) When an ai seat is consumed, an audit log entry is made. This only happens the first time a seat is used.	2026-03-16 15:30:25 -05:00
Steven Masley	abf59ee7a6	feat: track ai seat usage (#22682 ) When a user uses an AI feature, we record them in the `ai_seat_state` as consuming a seat. Added in debouching to prevent excessive writes to the db for this feature. There is no need for frequent updates.	2026-03-16 12:36:26 -05:00
Mathias Fredriksson	bdbcd3428b	feat(coderd/chatd): unify chat storage on SDK parts and fix file-reference rendering (#22958 ) File-reference parts in user messages were flattened to `TextContent` at write time because fantasy has no file-reference content type. The frontend never saw them as structured parts. This moves all write paths (user, assistant, tool) from fantasy envelope format to `codersdk.ChatMessagePart`. The streaming layer (`chatloop`) is untouched, the conversion happens at the serialization boundary in `persistStep`. Old rows are still readable. `ParseContent` uses a structural heuristic (`isFantasyEnvelopeFormat`) to distinguish legacy envelopes from SDK parts. We chose this over try/fallback because fantasy envelopes partially unmarshal into `ChatMessagePart` (the `type` field matches) while silently losing content. A guard test enforces that no SDK part can produce the envelope shape. This is forward-only: new rows are unreadable by old code. Chat is behind a feature flag so rollback risk is contained. Also adds a typed `ChatMessageRole` to replace raw strings and `fantasy.MessageRole` casts at the persistence boundary. The type covers `ChatMessage.Role`, `ChatStreamMessagePart.Role`, the `PublishMessagePart` callback chain, and all DB write sites. `fantasy.MessageRole` remains only where we build `fantasy.Message` structs for LLM dispatch. Separately, `ProviderMetadata` was leaking to SSE clients via `publishMessagePart`. `StripInternal` now runs on both the SSE and REST paths, covering this. Other cleanup: - Old `db2sdk.contentBlockToPart` silently dropped metadata on text/reasoning/tool-call content. New code preserves it. - `providerMetadataToOptions` now logs warnings instead of silently returning nil. - `db2sdk` shrinks from ~250 lines of parallel conversion to ~15 lines delegating to `chatprompt.ParseContent()`, removing the `fantasy` import entirely. Refs #22821	2026-03-13 17:53:26 +02:00
Mathias Fredriksson	57af7abf1f	test: add testutil.WaitBuffer and replace time.Sleep in tests (#22922 ) WaitBuffer is a thread-safe io.Writer that supports blocking until accumulated output matches a substring or custom predicate. It replaces ad-hoc safeBuffer/syncWriter types and time.Sleep-based poll loops in tests with signal-driven waits. - WaitFor/WaitForNth/WaitForCond for blocking on output - Replace custom buffer types in cli/sync_test.go and provisionersdk/agent_test.go - Convert time.Sleep poll loops to require.Eventually/require.Never in cli/ssh_test.go, coderd/activitybump_test.go, coderd/workspaceagentsrpc_test.go, workspaceproxy_test.go, and scaletest tests	2026-03-12 18:07:52 +02:00
George K	e5c19d0af4	feat: backend support for creating and storing service accounts (#22698 ) Add is_service_account column to users table with CHECK constraints enforcing login_type='none' and empty email for service accounts. Update user creation API to validate service account constraints. Related to: https://linear.app/codercom/issue/PLAT-27/feat-backend-support-for-creating-and-storing-service-accounts	2026-03-11 10:19:08 -07:00
Kyle Carberry	eecb7d0b66	fix: resolve bugs in chatd streaming system (#22720 ) Split from #22693 per review feedback. Fixes multiple bugs in coderd/chatd and sub-packages including race conditions, transaction safety, stream buffer bounds, retry limits, and enterprise relay improvements. See commit message for full list.	2026-03-06 21:02:25 +00:00
Danny Kopping	13e3df67d6	feat: track client sessions (#22470 ) This change adds support for tracking client session IDs in AI Bridge interceptions to enable better session-based auditing. Depends on https://github.com/coder/aibridge/pull/198 Fixes https://github.com/coder/internal/issues/1337 The session ID field is optional and not universally supported by all clients.	2026-03-06 14:43:53 +02:00

1 2 3 4 5 ...

797 Commits