Introduce a three-way workspace sharing setting (none, everyone,
service_accounts) replacing the boolean workspace_sharing_disabled.
In service_accounts mode, only service account-owned workspaces can be
shared while regular members' share permissions are removed. Adds a
new organization-service-account system role with per-org permissions
reconciled alongside the existing organization-member system role.
Related to:
https://linear.app/codercom/issue/PLAT-28/feat-service-accounts-sharing-mode-and-rbac-role
---------
Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com>
Co-authored-by: Kayla はな <mckayla@hey.com>
This PR adds a `WatchAllWorkspaces` function with `watch-all-workspaces`
endpoint, which can be used to listen on a single global pubsub channel
for _all_ workspace build updates, and makes use of it in the autostart
scaletest.
This negates the need to use a workspace watch pubsub channel _per_
workspace, which has auth overhead associated with each call. This is
especially relevant in situations such as the autostart scaletest, where
we need to start/stop a set of workspaces before we can configure their
autostart config. The overhead associated with all the watch requests
skews the scaletest results and makes it harder to reason about the
performance of the autostart feature itself.
The autostart scaletest also no longer generates its own metrics nor
does it wait for all the workspaces to actually start via autostart. We
should update the scaletest dashboard after both PRs are merged to
measure autostart performance via the new metrics.
The new function/endpoint and its usage in the autostart scaletest are
gated behind an experiment feature flag, this is something we should
discuss whether we want to enable the endpoint in prod by default or
not. If so, we can remove the experiment.
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Callum Styan <callum@coder.com>
Implement the backend for the desktop feature for agents.
- Adds a new `/api/experimental/chats/$id/desktop` endpoint to coderd
which exposes a VNC stream from a
[portabledesktop](https://github.com/coder/portabledesktop) process
running inside the workspace
- Adds a new `spawn_computer_use_agent` tool to chatd, which spawns a
subagent that has access to the `computer` tool which lets it interact
with the `portabledesktop` process running inside the workspace
- Adds the plumbing to make the above possible
There's a follow up frontend PR here:
https://github.com/coder/coder/pull/23006
Add cost tracking for LLM chat interactions with microdollar precision.
## Changes
- Add `chatcost` package for per-message cost calculation using
`shopspring/decimal` for intermediate arithmetic
- **Ceil rounding policy**: fractional micros round UP to next whole
micro (applied once after summing all components)
- Database migration: `total_cost_micros` BIGINT column with historical
backfill and `created_at` index
- API endpoints: per-user cost summary and admin rollup under
`/api/experimental/chats/cost/`
- SDK types: `ChatCostSummary`, `ChatCostModelBreakdown`,
`ChatCostUserRollup`
- Fix `modeloptionsgen` to handle `decimal.Decimal` as opaque numeric
type
- Update frontend pricing test fixtures for string decimal types
## Design decisions
- `NULL` = unpriced (no matching model config), `0` = free
- Reasoning tokens included in output tokens (no double-counting)
- Integer microdollars (BIGINT) for storage and API responses
- Price config uses `decimal.Decimal` for exact parsing; totals use
`int64`
Frontend: #23037
_Disclaimer: implemented by a Coder Agent using Claude Opus 4.6._
Marks the injected MCP approach in AI Bridge as deprecated across the
codebase.
## Changes
- **`codersdk/deployment.go`**: Deprecated `ExternalAuthConfig.MCPURL`,
`.MCPToolAllowRegex`, `.MCPToolDenyRegex` fields; deprecated and hid the
`--aibridge-inject-coder-mcp-tools` server flag; deprecated
`AIBridgeConfig.InjectCoderMCPTools`.
- **`coderd/externalauth/externalauth.go`**: Deprecated `Config.MCPURL`,
`.MCPToolAllowRegex`, `.MCPToolDenyRegex`.
- **`enterprise/aibridgedserver/aibridgedserver.go`**: Added runtime
deprecation warning when `CODER_AIBRIDGE_INJECT_CODER_MCP_TOOLS` is
enabled; deprecated `getCoderMCPServerConfig`.
- **`enterprise/aibridged/mcp.go`**: Deprecated `MCPProxyBuilder`
interface and `MCPProxyFactory` struct.
- **`docs/ai-coder/ai-bridge/mcp.md`**: Added deprecation warning
banner.
- Adds `_API_BASE_URL` to `CODER_EXTERNAL_AUTH_CONFIG_`
- Extracts and refactors existing GitHub PR sync logic to new packages
`coderd/gitsync` and `coderd/externalauth/gitprovider`
- Associated wiring and tests
Created using Opus 4.6
The chat API is experimental (behind `ExperimentAgents`) and not ready
for public documentation yet. This removes swagger annotations from the
chat handlers so they no longer appear in the generated API reference at
https://coder.com/docs/reference/api/chats.
## Changes
- Remove `@swagger` annotations from 5 chat handlers in
`coderd/chats.go`
- Regenerate `coderd/apidoc/swagger.json` and `docs.go`
- Delete `docs/reference/api/chats.md`
- Remove Chats entry from `docs/manifest.json`
This change adds support for image attachments to chat via add button
and clipboard paste. Files are stored in a new `chat_files` table and
referenced by ID in message content. File data is resolved from storage
at LLM dispatch time, keeping the message content column small.
Upload validates MIME types via content type or content sniffing against
an allowlist (png, jpeg, gif, webp). The retrieval endpoint serves files
with immutable caching headers. On the frontend, uploads start eagerly
on attach with a background fetch to pre-warm the browser HTTP cache so
the timeline renders instantly after send.
Adds real-time git status watching for workspace agents, so the frontend
can subscribe over WebSocket and show
git file changes in near real-time.
1. Subscription is scoped to a **chat** via `GET
/api/experimental/chats/{chat}/git/watch`.
2. The workspace agent automatically determines which paths to watch
based on tool calls made by the chat (and its ancestor chats).
3. Workspace agent polls subscribed repo working trees on a 30s
interval, on tools calls, and on explicit `refresh` from the client.
4. Scans are rate-limited to at most once per second.
5. Edited paths are tracked **in-memory** inside the workspace agent.
There is no database persistence — state is lost on agent restart. This
will be addresses in a future PR.
6. Messages sent over WebSocket include a full-repo snapshot (unified
diff, branch, origin). A new message is emitted only when the snapshot
changes.
This PR was implemented with AI with me closely controlling what it's
doing. The code follows a plan file that was updated continuously during
implementation. Here's the file if you'd like to see it:
[project.md](https://gist.github.com/hugodutka/8722cf80c92f8a56555f7bc595b770e2).
It reflects the current state of the PR.
## Description
Adds optional TLS support for the AI Bridge Proxy listener. When TLS cert and key files are provided, the proxy serves over HTTPS instead of plain HTTP.
## Changes
* New configuration options to enable TLS on the proxy listener
* Wraps the TCP listener in `tls.NewListener` when configured
* Tests for validation errors, invalid files, and full integration (tunneled + MITM) through a TLS listener
Note: Documentation for TLS listener setup and client configuration will be handled in a follow-up PR.
Related to: https://github.com/coder/internal/issues/1335
closes https://github.com/coder/internal/issues/464
# Summary
This PR resolves a flaky test that was sensitive to DST transitions in
various time zones. The root of the flake was:
* a bug; the query and its tests assume 24 hours per day
* the tests used local system time, which resulted in failures for dates
proximal to DST transitions
# Changes
Query:
The original query assumed 24 hour intervals between each day, which is
not a valid assumption. It now increments `1 day` at a time.
Database tests:
Database level tests for the query all assumed 24 hour days. They now
increment in DST-aware days instead. Instead of using time.Now() as a
base for testing, the test uses a series of dates over the course of an
entire year, to ensure that DST transition dates are present in every
test run.
# API Endpoint
The endpoint that delivers the user status chart now accepts an IANA
timezone name as a parameter and passes it, keeping the existing offset
as a fallback, to the database query.
API level tests were added to ensure the correct response form and error
behaviour. Correctness of content is tested at the database level.
The pause/resume endpoints were only registered under /api/experimental
but the frontend and Go SDK were calling /api/v2, resulting in 404s.
Register the routes in the v2 group, update the SDK client paths, and
fix swagger annotations (Accept → Produce) since these POST endpoints
have no request body.
This pull-request follows up #22060
Felt wrong to only make use of Geist when there is a Monospace variant
here too. Felt best we default to this as the default font as its inline
with the rest of the application. This also updates the lower line for
Workspace Statistics 🙂
## Summary
The UI has always labeled the action as "Archive agent" but the backend
was performing a hard `DELETE`, permanently destroying chats and all
their messages.
This change replaces the hard delete with a soft archive, consistent
with the pattern used by template versions.
## Changes
### Database
- **Migration 000423**: Add `archived boolean DEFAULT false NOT NULL`
column to `chats` table
- Replace `DeleteChatByID` query with `ArchiveChatByID` (`UPDATE SET
archived = true`)
- Add `UnarchiveChatByID` query (`UPDATE SET archived = false`)
- Filter archived chats from `GetChatsByOwnerID` (`WHERE archived =
false`)
### API
- Remove `DELETE /api/experimental/chats/{chat}`
- Add `POST /api/experimental/chats/{chat}/archive` — archives a chat
and all its descendants
- Add `POST /api/experimental/chats/{chat}/unarchive` — unarchives a
single chat (API only, no UI yet)
### Backend
- `archiveChatTree()` recursively archives child chats (replaces
`deleteChatTree()` which hard-deleted)
- Chat daemon's `ArchiveChat()` archives the full chat tree in a
transaction
- Authorization uses `ActionUpdate` instead of `ActionDelete`
### SDK
- Replace `DeleteChat()` with `ArchiveChat()` and `UnarchiveChat()`
- Add `Archived` field to `Chat` struct
### Frontend
- `archiveChat` API call uses `POST .../archive` instead of `DELETE`
- No UI changes — the "Archive agent" button now actually archives
instead of deleting
## Design Decision
This follows the **template version archive pattern** (Pattern B in the
codebase):
- `archived boolean` column (not `deleted boolean`)
- Dedicated `POST .../archive` and `POST .../unarchive` routes (not
repurposing `DELETE`)
- Reversible — users can unarchive via the API (UI for this will come
later)
This pull-request implements a simple filtering logic so that we're able
to pick which model the user actually used when logs were sent to AI
Bridge.
- Add `GET /aibridge/models` API endpoint that returns distinct model
names from AI Bridge interceptions, with pagination and search support
- New `ListAIBridgeModels` SQL query using case-sensitive prefix
matching (`LIKE model || '%'`) to allow B-tree index usage
- Hand-written `ListAuthorizedAIBridgeModels` in `modelqueries.go` for
RBAC authorization filter injection
- `AIBridgeModels` search query parser in searchquery/search.go
(defaults bare terms to the `model` field)
- dbauthz wrappers, dbmetrics, and dbmock implementations for the new
query
<img width="292" height="185" alt="image"
src="https://github.com/user-attachments/assets/134771df-2d26-4c54-acc4-27f58128b351"
/>
## Summary
Moves expired token filtering from client-side to server-side by adding
an `include_expired` parameter to the `GetAPIKeysByLoginType` and
`GetAPIKeysByUserID` database queries. This is more efficient for large
deployments with many expired/short-lived tokens.
## Changes
- Add `include_expired` parameter to SQL queries using `OR`
short-circuit
- Add `include_expired` query parameter to `GET
/users/{user}/keys/tokens`
- Add `IncludeExpired` field to `codersdk.TokensFilter`
- Remove client-side filtering from CLI `tokens list` command
- Add `TestTokensFilterExpired` test
Fixescoder/internal#1357
`--secure-auth-cookie` now automatically sources it's default value from `--access-url`
If the access url uses HTTPS, secure is set to `true`.
To revert to old behavior, set the value explicitly to `false`
If a deployment has 2 domains, overriding the oidc url allows the oidc
redirect to differ from the access_url
response to https://github.com/coder/coder/discussions/21500
**This config setting is hidden by default**
In relation to
[`internal#1281`](https://github.com/coder/internal/issues/1281)
Remove the `soft_limit` field from the `Feature` type and simplify
license limit handling. This change:
- Removes the `soft_limit` field from the API and SDK
- Uses the soft limit value as the single `limit` value in the UI and
API
- Simplifies warning logic to only show warnings when the limit is
exceeded
- Updates tests to reflect the new behavior
- Updates the UI to use the single limit value for display
## Summary
Custom roles that can create workspaces on behalf of other users need to
be able to list users to populate the owner dropdown in the workspace
creation UI. Previously, this required a separate `user:read`
permission, causing the dropdown to fail for custom roles.
## Changes
- Modified `GetUsers` in `dbauthz` to check if the user can create
workspaces for any owner (`workspace:create` with `owner_id: *`)
- If the user has this permission, they can list all users without
needing explicit `user:read` permission
- Added tests to verify the new behavior
## Testing
- Updated mock tests to assert the new authorization check
- Added integration tests for both positive and negative cases
Fixes#18203
Relates to https://github.com/coder/internal/issues/1252
When a workspace with a TaskID hits its deadline, use
BuildReasonTaskAutoPause instead of BuildReasonAutostop. This allows
downstream systems to distinguish between regular autostop and task
workspace pauses.
Created by Mux using Opus 4.5.
## Summary
> NOTE: Calling this out as a breaking change in case existing consumers
of the CLI depend on being able to see expired tokens OR being able to
delete tokens immediately.
Updates the `coder tokens rm` command to immediately expire a token by
ID, preserving the token record for audit trail purposes. Tokens can
still be deleted by passing `--delete`.
## Problem
During an incident on dev.coder.com, operators needed to urgently expire
an API key that was stuck in a hot loop. The only way to do this was via
direct database access:
```sql
UPDATE api_keys SET expires_at = NOW() WHERE id = '...';
```
This is not ideal for operators who may not have direct DB access or
want to avoid manual SQL.
## Solution
This PR adds:
- **API endpoint**: `PUT /api/v2/users/{user}/keys/{keyid}/expire` -
Sets the token's `expires_at` to now
- **SDK method**: `ExpireAPIKey(ctx, userID, keyID)`
- **Updates CLI**: `coder tokens rm <name|id|token>` now _expires_ by
default. You can still delete by passing the `--delete` flag. The `coder
tokens list` command now also hides expired tokens by default. You can
`--include-expired` if needed to include them.
- **Audit logging**: The expire action is logged with old and new key
states
## Test plan
- Tests cover: owner expiring own token, admin expiring other user's
token, non-admin cannot expire other's token, 404 for non-existent token
Closes#21782🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
**Background**
Reported in #17417, there is a `deleted` query parameter supported by
/api/v2/templates, but we do not respect this field on the client,
showing the "Create Workspace" button for deleted templates.
**Expected Behavior**
Don't show the "Create Workspace" button for deleted templates.
**Notes**
This PR adds a new `deleted` field to the templates API response.
Co-authored-by: Danielle Maywood <danielle@themaywoods.com>
Closes https://github.com/coder/internal/issues/1261.
This pull request adds an endpoint to pause coder tasks by stopping the
underlying workspace.
* Instead of `POST /api/v2/tasks/{user}/{task}/pause`, the endpoint is
currently experimental.
* We do not currently set the build reason to `task_manual_pause`,
because build reasons are currently only used on stop transitions.
Adds support for filtering workspaces by health status using
healthy:true or healthy:false in the search query.
This is done by changing `has-agent` to accept a list of statuses and
aliasing `health:true` to `has-agent:connected` and `healthy:false` to
`has-agent:timeout,disconnected`.
Fixes#21623
* Adds support for parameter `format=text` in the following API routes:
* `/api/v2/workspaceagents/:id/logs`
* `/api/v2/workspacebuilds/:id/logs`
* `/api/v2/templateversions/:id/logs`
* `/api/v2/templateversions/:id/dry-run/:id/logs`
* Adds links to view raw logs on the following pages:
* Workspace build page
* Template editor page
* Template version page
* Refactors existing log formatting in `cli/logs.go` to live in `codersdk`.
🤖 Generated with Claude Opus 4.5, reviewed by me.
---------
Co-authored-by: Claude <noreply@anthropic.com>
Previously the task logs endpoint only worked when the workspace was
running, leaving users unable to view task history after pausing.
This change adds snapshot retrieval with state-based branching: active
tasks fetch live logs from AgentAPI, paused/initializing/pending tasks
return stored snapshots (providing continuity during pause/resume), and
error/unknown states return HTTP 409 Conflict.
The response includes snapshot metadata (snapshot, snapshot_at) to
indicate whether logs are live or historical.
Closescoder/internal#1254
feat: add boundary usage telemetry database schema and RBAC
Adds the foundation for tracking boundary usage telemetry across Coder
replicas. This includes:
- Database schema: `boundary_usage_stats` table with per-replica stats
(unique workspaces, unique users, allowed/denied request counts)
- Database queries: upsert stats, get aggregated summary, reset stats,
delete by replica ID
- RBAC: `boundary_usage` resource type with read/update/delete actions,
accessible only via system `BoundaryUsageTracker` subject (not regular
user roles)
- Tracker skeleton + docs: stub implementation in `coderd/boundaryusage/`
The tracker accumulates stats in memory and periodically flushes to the
database. Stats are aggregated across replicas for telemetry reporting,
then reset when a new reporting period begins. The tracker implementation
and plumbing will be done in a subsequent commit/PR.
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This change adds a POST /workspaceagents/me/tasks/{task}/log-snapshot
endpoint for agents to upload task conversation history during
workspace shutdown. This allows users to view task logs even when the
workspace is stopped.
The endpoint accepts agentapi format payloads (typically last 10
messages, max 64KB), wraps them in a format envelope, and upserts to the
task_snapshots table. Uses agent token auth and validates the task
belongs to the agent's workspace.
Closescoder/internal#1253
## Summary
Add circuit breaker support for AI Bridge to protect against cascading
failures from upstream AI provider rate limits (HTTP 429, 503, and
Anthropic's 529 overloaded responses).
## Changes
- Add 5 new CLI options for circuit breaker configuration:
- `--aibridge-circuit-breaker-enabled` (default: false)
- `--aibridge-circuit-breaker-failure-threshold` (default: 5)
- `--aibridge-circuit-breaker-interval` (default: 10s)
- `--aibridge-circuit-breaker-timeout` (default: 30s)
- `--aibridge-circuit-breaker-max-requests` (default: 3)
- Update aibridge dependency to include circuit breaker support
- Add tests for pool creation with circuit breaker providers
## Notes
- Circuit breaker is **disabled by default** for backward compatibility
- When enabled, applies to both OpenAI and Anthropic providers
- Uses sony/gobreaker internally via the aibridge library
## Testing
```
make test RUN=TestPoolWithCircuitBreakerProviders
```
## Description
Adds upstream proxy support for AI Bridge Proxy passthrough requests.
This allows aiproxy to forward non-allowlisted requests through an
upstream proxy. Currently, the only supported configuration is when
aiproxy is the first proxy in the chain (client → aiproxy → upstream
proxy).
## Changes
* Add `--aibridge-proxy-upstream` option to configure an upstream
HTTP/HTTPS proxy URL for passthrough requests
* Add `--aibridge-proxy-upstream-ca` option to trust custom CA
certificates for HTTPS upstream proxies
* Passthrough requests (non-allowlisted domains) are forwarded through
the upstream proxy
* MITM'd requests (allowlisted domains) continue to go directly to
aibridge, not through the upstream proxy
* Add tests for upstream proxy configuration and request routing
Closes: https://github.com/coder/internal/issues/1204
Add comprehensive OAuth2 enum types to codersdk following RFC specifications:
- OAuth2ProviderGrantType (RFC 6749)
- OAuth2ProviderResponseType (RFC 6749)
- OAuth2TokenEndpointAuthMethod (RFC 7591)
- OAuth2PKCECodeChallengeMethod (RFC 7636)
- OAuth2TokenType (RFC 6749, RFC 9449)
- OAuth2RevocationTokenTypeHint (RFC 7009)
- OAuth2ErrorCode (RFC 6749, RFC 7009, RFC 8707)
Add OAuth2TokenRequest, OAuth2TokenResponse, OAuth2TokenRevocationRequest,
and OAuth2Error structs to the SDK. Update OAuth2ClientRegistrationRequest,
OAuth2ClientRegistrationResponse, OAuth2ClientConfiguration, and
OAuth2AuthorizationServerMetadata to use typed enums instead of raw strings.
This makes codersdk the single source of truth for OAuth2 types, eliminating
duplication between SDK and server-side structs.
Closes#21476
Adds a per-organization setting to disable workspace sharing. When enabled,
all existing workspace ACLs in the organization are cleared and the workspace
ACL mutation API endpoints return `403 Forbidden`.
This complements the existing site-wide `--disable-workspace-sharing` flag by
providing more granular control at the organization level.
Closes https://github.com/coder/internal/issues/1073 (part 2)
---------
Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com>
## Description
Implements selective MITM (Man-in-the-Middle) in `aibridgeproxyd` so
that only requests to allowlisted domains are intercepted and decrypted.
Requests to all other domains are tunneled directly without decryption.
## Changes
* New config option: `CODER_AIBRIDGE_PROXY_DOMAIN_ALLOWLIST` (default:
`api.anthropic.com`,`api.openai.com`)
* Selective MITM: Uses `goproxy.ReqHostIs()` to only intercept `CONNECT`
requests to allowlisted hosts
* Certificate caching: Now only generates/caches certificates for
allowlisted domains
* Validation: Startup fails if domain allowlist is empty or contains
invalid entries
Closes: https://github.com/coder/internal/issues/1182
Closes https://github.com/coder/coder/issues/21360
A few considerations/notes:
- I've kept the number of conns to 10 in all other places, except coderd
- which uses the config value
- I opted to also make idle conns configurable; the greater the delta
between max open and max idle, the more connection churn
- Postgres maintains a [_process_ per
connection](https://www.postgresql.org/docs/current/connect-estab.html),
contrary to what the comment said previously
- Operators should be able to tune this, since process churn can
negatively affect OS scheduling
- I've set the value to `"auto"` by default so it's not another knob one
_has to_ twiddle, and sets max idle = max conns / 3
---------
Signed-off-by: Danny Kopping <danny@coder.com>