## Summary
In this pull request we're updating search to support queries with
spaces in addition to the `field:value` pattern that is currently
supported.
Additionally templates search now defaults to `display_name` (since
`display_name` is optional the search will fallback to `name`) when
searching without the `field:value` pattern
Closes: https://github.com/coder/coder/issues/14384
### Downsides with searching on `name` and `display_name`
Because the `name` field cannot include spaces, we end up in a situation
where including a space in the query will result in no results since the
query searches on both `name` AND `display_name`. In the following
example, we can see the results of searching by both `name` and
`display_name` on these templates:
| Name | Display Name |
| ------ | ------------- |
| docker | Docker Template |
| faketemplate | A Fake Template |
| azure | Fake Azure Template |
| anotherfake | Another Fake Template |
| azurefake | Another Fake Fake Azure Template |
https://github.com/user-attachments/assets/b0e0793e-e77d-46bc-9a42-d7cf4f8bd910
### Proposal: Search on `display_name` by default and allow for `name`
using the `field:value` pattern
If we remove `name` from the default template search, we're now able to
search with spaces on template `display_names`. Since `display_names`
are what users see in the templates list they might expect the search to
work this way.
Below is an example of `name` being removed from the default template
search.
https://github.com/user-attachments/assets/9aba5911-4960-4384-befb-08ea1acaa3ab
With this approach users would still be able to search on template names
by specifying `exact_name:foo`.
### Testing
Added additional test cases to ensure spaces were handled as expected in
combination with `field:value` patterns.
Closes https://github.com/coder/internal/issues/885
Adds a new database method GetProvisionerJobByIDWithLock that uses FOR
UPDATE without SKIP LOCKED to fix workspace build cancellation returning
500 errors when jobs are locked.
* provisionerdserver: Expires prebuild user token for workspace, if it
exists, when regenerating session token.
* dbauthz: disallow prebuilds user from creating api keys
* dbpurge: added functionality to expire stale api keys owned by the
prebuilds user
This PR should resolve https://github.com/coder/internal/issues/719 by
limiting the `workspace_builds` rows selected by the query to the most
recent 100 builds of a template, as opposed to all builds in the last
30d. For our own internal templates with the most builds (1700-2000 in a
30d period) this should cut the query execution time by about 80%.
Unless we have some restriction on keeping the 30d period, contract
related or otherwise, this seems like a safe change to make. In addition
to the execution speed improvements it also means the memory for the
query is bounded as well.
If we want to keep a 30d time period for the avg build time value I
think it's worth exploring a purpose built solution such as histogram
structures where the build times could be bucketized by template ID as
they're observed.
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>
- Removes GetManagedAgentCount query
- Adds new table `usage_events_daily` which stores aggregated usage
events by the type and UTC day
- Adds trigger to update the values in this table when a new row is
inserted into `usage_events`
- Adds a migration that adds `usage_events_daily` rows for existing data
in `usage_events`
- Adds tests for the trigger
- Adds tests for the backfill query in the migration
Since the `usage_events` table is unreleased currently, this migration
will do nothing on real deployments and will only affect preview
deployments such as dogfood.
Closes https://github.com/coder/internal/issues/943
## Description
This PR introduces one counter and two histograms related to workspace
creation and claiming. The goal is to provide clearer observability into
how workspaces are created (regular vs prebuild) and the time cost of
those operations.
### `coderd_workspace_creation_total`
* Metric type: Counter
* Name: `coderd_workspace_creation_total`
* Labels: `organization_name`, `template_name`, `preset_name`
This counter tracks whether a regular workspace (not created from a
prebuild pool) was created using a preset or not.
Currently, we already expose `coderd_prebuilt_workspaces_claimed_total`
for claimed prebuilt workspaces, but we lack a comparable metric for
regular workspace creations. This metric fills that gap, making it
possible to compare regular creations against claims.
Implementation notes:
* Exposed as a `coderd_` metric, consistent with other workspace-related
metrics (e.g. `coderd_api_workspace_latest_build`:
https://github.com/coder/coder/blob/main/coderd/prometheusmetrics/prometheusmetrics.go#L149).
* Every `defaultRefreshRate` (1 minute ), DB query
`GetRegularWorkspaceCreateMetrics` is executed to fetch all regular
workspaces (not created from a prebuild pool).
* The counter is updated with the total from all time (not just since
metric introduction). This differs from the histograms below, which only
accumulate from their introduction forward.
### `coderd_workspace_creation_duration_seconds` &
`coderd_prebuilt_workspace_claim_duration_seconds`
* Metric types: Histogram
* Names:
* `coderd_workspace_creation_duration_seconds`
* Labels: `organization_name`, `template_name`, `preset_name`, `type`
(`regular`, `prebuild`)
* `coderd_prebuilt_workspace_claim_duration_seconds`
* Labels: `organization_name`, `template_name`, `preset_name`
We already have `coderd_provisionerd_workspace_build_timings_seconds`,
which tracks build run times for all workspace builds handled by the
provisioner daemon.
However, in the context of this issue, we are only interested in
creation and claim build times, not all transitions; additionally, this
metric does not include `preset_name`, and adding it there would
significantly increase cardinality. Therefore, separate more focused
metrics are introduced here:
* `coderd_workspace_creation_duration_seconds`: Build time to create a
workspace (either a regular workspace or the build into a prebuild pool,
for prebuild initial provisioning build).
* `coderd_prebuilt_workspace_claim_duration_seconds`: Time to claim a
prebuilt workspace from the pool.
The reason for two separate histograms is that:
* Creation (regular or prebuild): provisioning builds with similar time
magnitude, generally expected to take longer than a claim operation.
* Claim: expected to be a much faster provisioning build.
#### Native histogram usage
Provisioning times vary widely between projects. Using static buckets
risks unbalanced or poorly informative histograms.
To address this, these metrics use [Prometheus native
histograms](https://prometheus.io/docs/specs/native_histograms/):
* First introduced in Prometheus v2.40.0
* Recommended stable usage from v2.45+
* Requires Go client `prometheus/client_golang` v1.15.0+
* Experimental and must be explicitly enabled on the server
(`--enable-feature=native-histograms`)
For compatibility, we also retain a classic bucket definition (aligned
with the existing provisioner metric:
https://github.com/coder/coder/blob/main/provisionerd/provisionerd.go#L182-L189).
* If native histograms are enabled, Prometheus ingests the
high-resolution histogram.
* If not, it falls back to the predefined buckets.
Implementation notes:
* Unlike the counter, these histograms are updated in real-time at
workspace build job completion.
* They reflect data only from the point of introduction forward (no
historical backfill).
## Relates to
Closes: https://github.com/coder/coder/issues/19528
Native histograms tested in observability stack:
https://github.com/coder/observability/pull/50
closes https://github.com/coder/coder/issues/18274
This pull request makes system users visible in various group related
queries so that they can be added to and removed from groups. This
allows system user quotas to be configured. System users are still
ignored in certain queries, such as when license seat consumption is
determined.
This pull request further ensures the existence of a
"coder_prebuilt_workspaces" group in any organization that needs
prebuilt workspaces
---------
Co-authored-by: Susana Ferreira <susana@coder.com>
Closes https://github.com/coder/internal/issues/716
This prevents a scan over the entire `workspace_build` table by removing
a `join`. This is still imperfect as we are still scanning over the
number of builds for the workspaces in the arguments. Ideally we would
have some index or something precomputed. Then we could skip scanning
over the builds for the correct workspaces that are not the latest.
## Summary
In this pull request we're adding support for additional filtering
options to the `provisioners list` CLI command and the
`/provisionerdaemons` API endpoint.
Resolves: https://github.com/coder/coder/issues/18783
### Changes
#### Added CLI Options
- `--show-offline`: When this option is provided, all provisioner
daemons will be returned. This means that when `--show-offline` is not
provided only `idle` and `busy` provisioner daemons will be returned.
- `--status=<list_of_statuses>`: When this option is provided with a
comma-separated list of valid statuses (`idle`, `busy`, or `offline`)
only provisioner daemons that have these statuses will be returned.
- `--max-age=<duration>`: When this option is provided with a valid
duration value (e.g., `24h`, `30s`) only provisioner daemons with a
`last_seen_at` timestamp within the provided max age will be returned.
#### Query Params
- `?offline=true`: Include offline provisioner daemons in the results.
Offline provisioner daemons will be excluded if `?offline=false` or if
offline is not provided.
- `?status=<list_of_statuses>`: Include provisioner daemons with the
specified statuses.
- `?max_age=<duration>`: Include provisioner daemons with a
`last_seen_at` timestamp within the max age duration.
#### Frontend
- Since offline provisioners will not be returned by default anymore
(`--show-offline` has to be provided to see them), a checkbox was added
to the provisioners list page to allow for offline provisioners to be
displayed
- A revamp of the provisioners page will be done in:
https://github.com/coder/coder/issues/17156, this checkbox change was
just added to maintain currently functionality with the backend updates
Current provisioners page (without checkbox)
<img width="1329" height="574" alt="Screenshot 2025-08-20 at 10 51
00 AM"
src="https://github.com/user-attachments/assets/77b73650-0b62-44f0-a77f-acbe5710809f"
/>
Provisioners page with checkbox (unchecked)
<img width="1314" height="626" alt="Screenshot 2025-08-20 at 10 48
40 AM"
src="https://github.com/user-attachments/assets/7ba164ad-6d3f-417b-bd39-338c0161b145"
/>
Provisioner page with checkbox (checked) and URL updated with query
parameters
<img width="1306" height="597" alt="Screenshot 2025-08-20 at 10 50
14 AM"
src="https://github.com/user-attachments/assets/e78d0986-bbf8-491b-9d56-b682973237a0"
/>
### Show Offline vs Offline Status
To list offline provisioner daemons, users can either:
1. Include the `--show-offline` option
OR
2. Include `offline` in the list of values provided to the `--status`
option
## Description
This PR ensures that activity-based deadline extensions ("activity
bumping") are not applied to prebuilt workspaces. Prebuilds are managed
by the reconciliation loop and must not have `deadline` or
`max_deadline` values set or extended, as they are not part of the
regular lifecycle executor path.
## Changes
- Update `ActivityBumpWorkspace` SQL query to discard prebuilt
workspaces
- Update application layer to avoid calling activity bump logic on
prebuilt workspaces
Related with:
* Issue: https://github.com/coder/coder/issues/18898
* PR: https://github.com/coder/coder/pull/19252
Closes https://github.com/coder/coder/issues/18356.
This change finds and selects a matching preset if one was not chosen
during workspace creation. This solidifies the relationship between
presets and parameters.
When a workspace is created without in explicitly chosen preset, it will
now still be eligible to claim a prebuilt workspace if one is available.
## Description
This PR ensures that lifecycle-related changes made via template
schedule updates do **not affect prebuilt workspaces**. Since prebuilds
are managed by the reconciliation loop and do not participate in the
regular lifecycle executor flow, they must be excluded from any updates
triggered by template configuration changes.
This includes changes to TTL, dormant-deletion scheduling, deadline and
autostart scheduling.
## Changes
- Updated SQL query `UpdateWorkspacesTTLByTemplateID` to exclude
prebuilt workspaces
- Updated SQL query `UpdateWorkspacesDormantDeletingAtByTemplateID` to
exclude prebuilt workspaces
- Updated application-layer logic to skip any updates to lifecycle
parameters if a workspace is a prebuild
- Preserved all existing update behavior for regular user workspaces
This change guarantees that only lifecycle-managed workspaces are
affected when template-level configurations are modified, preserving
strict boundaries between prebuild and user workspace lifecycles.
Related with:
* Issue: https://github.com/coder/coder/issues/18898
* PR: https://github.com/coder/coder/pull/19252
This pull request introduces support for external workspace management, allowing users to register and manage workspaces that are provisioned and managed outside of the Coder.
* Added has_external_agent field to workspace builds and template versions
Not used in coderd yet, see stack.
Adds two new packages:
- `coderd/usage`: provides an interface for the "Collector" as well as a stub implementation for AGPL
- `enterprise/coderd/usage`: provides an interface for the "Publisher" as well as a Tallyman implementation
Relates to https://github.com/coder/internal/issues/814
External auth refresh errors lose the original error thrown on the first
refresh. This PR saves that error to the database to be raised on
subsequent refresh attempts
## Description
This PR ensures that prebuilt workspaces are properly excluded from the
lifecycle executor and treated as a separate class of workspaces, fully
managed by the prebuild reconciliation loop.
It introduces two lifecycle guarantees:
* When a prebuilt workspace is created (i.e., when the workspace build
completes), all lifecycle-related fields are unset, ensuring the
workspace does not participate in TTL, autostop, autostart, dormancy, or
auto-deletion logic.
* When a prebuilt workspace is claimed, it transitions into a regular
user workspace. At this point, all lifecycle fields are correctly
populated according to template-level configurations, allowing the
workspace to be managed by the lifecycle executor as expected.
## Changes
* Prebuilt workspaces now have all lifecycle-relevant fields unset
during creation
* When a prebuild is claimed:
* Lifecycle fields are set based on template and workspace level
configurations. This ensures a clean transition into the standard
workspace lifecycle flow.
* Updated lifecycle-related SQL update queries to explicitly exclude
prebuilt workspaces.
## Relates
Related issue: https://github.com/coder/coder/issues/18898
To reduce the scope of this PR and make the review process more
manageable, the original implementation has been split into the
following focused PRs:
* https://github.com/coder/coder/pull/19259
* https://github.com/coder/coder/pull/19263
* https://github.com/coder/coder/pull/19264
* https://github.com/coder/coder/pull/19265
These PRs should be considered in conjunction with this one to
understand the complete set of lifecycle separation changes for prebuilt
workspaces.
Instead of creating tasks with a specialized call to `CreateWorkspace`
on the frontend, we instead lift this to the backend and allow the
frontend to simply call `CreateAITask`.
This change removes the `GetLatestWorkspaceBuilds` query which includes
all workspaces for all time (including deleted). This allows us to also
stop using `GetProvisionerJobsByIDs` for said builds as the job status
is included in `GetWorkspaces` called separately.
**BREAKING CHANGE**: The `coderd_api_workspace_latest_build` Prometheus
metric no longer includes builds belonging to deleted workspaces, as
such, this metric will show fewer statuses.
Fixescoder/internal#717
closes https://github.com/coder/coder/issues/18274
This pull request makes system users visible in various group related
queries so that they can be added to and removed from groups. This
allows system user quotas to be configured. System users are still
ignored in certain queries, such as when license seat consumption is
determined.
This pull request further ensures the existence of a
"coder_prebuilt_workspaces" group in any organization that needs
prebuilt workspaces
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Organization and group member listings now include system users.
* **Bug Fixes**
* Updated tests to reflect the inclusion of system users in member and
group queries.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Closes https://github.com/coder/internal/issues/780
## Summary of changes:
- added `user_secrets` table
- `user_secrets` table contains `env_name` and `file_path` fields which
are not used at the moment, but will be used in later PRs
- `user_secrets` table doesn't contain `value_key_id`, I will add it in
a separate migration in a dbcrypt PR
- on one hand I don't want to add fields which are not used (because
it's a risk smth may change in implementation later), on the other hand
I don't want to add too many migrations for user secrets table
- added unique sql indexes
- added sql queries for CRUD operations on user-secrets
- introduced new `ResourceUserSecret` resource
- basic unit-tests for CRUD ops and authorization behavior
- Role updates:
- owner:
- remove `ResourceUserSecret` from site-wide perms
- add `ResourceUserSecret` to user-wide perms
- orgAdmin
- remove `ResourceUserSecret` from org-wide perms; seems it's not
strictly required, because `ResourceUserSecret` is not tied to
organization in dbauthz wrappers?
- memberRole
- no need to change memberRole because it implicitly has access to
user-secrets thanks to the `allPermsExcept`
- is it enough changes to roles?
Main questions:
- [ ] We will have 2 migrations for user-secrets:
- initial migration (in current PR)
- adding `value_key_id` in dbcrypt PR
- is this approach reasonable?
- [ ] Are changes to roles's permissions are correct?
- [ ] Are changes in roles_test.go are correct?
---------
Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com>
Can do `author:username` to filter templates created by a certain
author. Adding to help clean out some templates that I created on our
dev instance. This makes sorting a bit easier.
This PR sets a constraint of 1MB on the provisioner job logs written to
the database. This is consistent with the constraint we place on
workspace agent logs:
https://github.com/coder/coder/blob/4ac6be6d835dc36c242e35a26b584b784040bf28/coderd/database/dump.sql#L2030
It also adds a message printed to the front end about the provisioner
log overflow, and updates the message printed to the front end when
workspace startup logs exceed the max, as it was causing some customers
to think their startup script had failed to run.
Solves https://github.com/coder/coder/issues/15096
This is a slight rework/refactor of the earlier PRs from @dannykopping
and @Emyrk:
- https://github.com/coder/coder/pull/15669
- https://github.com/coder/coder/pull/15684
- https://github.com/coder/coder/pull/17596
Rather than having a per-app CORS behaviour setting and additionally a
template level setting for ports, this PR adds a single template level
CORS behaviour setting that is then used by all apps/ports for
workspaces created from that template.
The main changes are in `proxy.go` and `request.go` to:
a) get the CORS behaviour setting from the template
b) have `HandleSubdomain` bypass the CORS middleware handler if the
selected behaviour is `passthru`
c) in `proxyWorkspaceApp`, do not modify the response if the selected
behaviour is `passthru`
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added support for configuring CORS behavior ("simple" or "passthru")
at the template level for all shared ports.
* Introduced a new "CORS Behavior" setting in the template creation and
settings forms.
* API endpoints and responses now include the optional `cors_behavior`
property for templates.
* Workspace apps and proxy now honor the specified CORS behavior,
enabling conditional CORS middleware application.
* Enhanced workspace app tests with comprehensive scenarios covering
CORS behaviors and authentication states.
* **Bug Fixes**
* None.
* **Documentation**
* Updated API and admin documentation to describe the new
`cors_behavior` property and its usage.
* Added examples and schema references for CORS behavior in relevant API
docs.
* **Tests**
* Extended automated tests to cover different CORS behavior scenarios
for templates and workspace apps.
* **Chores**
* Updated audit logging to track changes to the `cors_behavior` field on
templates.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>
## Description
This PR adds support for `description` and `icon` fields to
`template_version_presets`. These fields will allow displaying richer
information for presets in the UI, improving the user experience when
creating a workspace.
Both fields are optional, non-nullable, and default to empty strings.
## Changes
* Database migration with the addition of `description VARCHAR(128)` and
`icon VARCHAR(256)` columns to the `template_version_presets` table.
* Updated the `CreateWorkspacePageView` in the UI
Note: UI changes will be addressed in a separate PR
- Adds a query for counting managed agent workspace builds between two
timestamps
- The "Actual" field in the feature entitlement for managed agents is
now populated with the value read from the database
- The wsbuilder package now validates AI agent usage against the limit
when a license is installed
Closescoder/internal#777
### Breaking change (changelog note):
>With new connection events appearing in the Connection Log, connection events older than 90 days will now be deleted from the Audit Log. If you require this legacy data, we recommend querying it from the REST API or making a backup of the database/these events before upgrading your Coder deployment. Please see the PR for details on what exactly will be deleted.
Of note is that there are currently no plans to delete connection events from the Connection Log.
### Context
This is the fifth PR for moving connection events out of the audit log.
In previous PRs:
- **New** connection logs have been routed to the `connection_logs` table. They will *not* appear in the audit log.
- These new connection logs are served from the new `/api/v2/connectionlog` endpoint.
In this PR:
- We'll now clean existing connection events out of the audit log, if they are older than 90 days, We do this in batches of 1000, every 10 minutes.
The criteria for deletion is simple:
```
WHERE
(
action = 'connect'
OR action = 'disconnect'
OR action = 'open'
OR action = 'close'
)
AND "time" < @before_time::timestamp with time zone
```
where `@before_time` is currently configured to 90 days in the past.
Future PRs:
- Write documentation for the endpoint / feature
This is the third PR for moving connection events out of the audit log.
This PR populates `count` on `ConnectionLogResponse` using a separate query, to preemptively mitigate the issue described in #17689. It's structurally identical to a portion of https://github.com/coder/coder/pull/18600, but for the connection log instead of the audit log.
Future PRs:
- Implement a table in the Web UI for viewing connection logs.
- Write a query to delete old events from the audit log, call it from dbpurge.
- Write documentation for the endpoint / feature
This is the second PR for moving connection events out of the audit log.
This PR:
- Adds the `/api/v2/connectionlog` endpoint
- Adds filtering for `GetAuthorizedConnectionLogsOffset` and thus the endpoint.
There's quite a few, but I was aiming for feature parity with the audit log.
1. `organization:<id|name>`
2. `workspace_owner:<username>`
3. `workspace_owner_email:<email>`
4. `type:<ssh|vscode|jetbrains|reconnecting_pty|workspace_app|port_forwarding>`
5. `username:<username>`
- Only includes web-based connection events (workspace apps, web port forwarding) as only those include user metadata.
6. `user_email:<email>`
7. `connected_after:<time>`
8. `connected_before:<time>`
9. `workspace_id:<id>`
10. `connection_id:<id>`
- If you have one snapshot of the connection log, and some sessions are ongoing in that snapshot, you could use this filter to check if they've been closed since.
11. `status:<connected|disconnected>`
- If `connected` only sessions with a null `close_time` are returned, if `disconnected`, only those with a non-null `close_time`. If filter is omitted, both are returned.
Future PRs:
- Populate `count` on `ConnectionLogResponse` using a seperate query (to preemptively mitigate the issue described in #17689)
- Implement a table in the Web UI for viewing connection logs.
- Write a query to delete old events from the audit log, call it from dbpurge.
- Write documentation for the endpoint / feature (including these filters)
### Breaking Change (changelog note):
> User connections to workspaces, and the opening of workspace apps or ports will no longer create entries in the audit log. Those events will now be included in the 'Connection Log'.
Please see the 'Connection Log' page in the dashboard, and the Connection Log [documentation](https://coder.com/docs/admin/monitoring/connection-logs) for details. Those with permission to view the Audit Log will also be able to view the Connection Log. The new Connection Log has the same licensing restrictions as the Audit Log, and requires a Premium Coder deployment.
### Context
This is the first PR of a few for moving connection events out of the audit log, and into a new database table and web UI page called the 'Connection Log'.
This PR:
- Creates the new table
- Adds and tests queries for inserting and reading, including reading with an RBAC filter.
- Implements the corresponding RBAC changes, such that anyone who can view the audit log can read from the table
- Implements, under the enterprise package, a `ConnectionLogger` abstraction to replace the `Auditor` abstraction for these logs. (No-op'd in AGPL, like the `Auditor`)
- Routes SSH connection and Workspace App events into the new `ConnectionLogger`
- Updates all existing tests to check the values of the `ConnectionLogger` instead of the `Auditor`.
Future PRs:
- Add filtering to the query
- Add an enterprise endpoint to query the new table
- Write a query to delete old events from the audit log, call it from dbpurge.
- Implement a table in the Web UI for viewing connection logs.
> [!NOTE]
> The PRs in this stack obviously won't be (completely) atomic. Whilst they'll each pass CI, the stack is designed to be merged all at once. I'm splitting them up for the sake of those reviewing, and so changes can be reviewed as early as possible. Despite this, it's really hard to make this PR any smaller than it already is. I'll be keeping it in draft until it's actually ready to merge.
## Description
This PR updates the lifecycle executor to explicitly exclude prebuilt
workspaces from being considered for lifecycle operations such as
`autostart`, `autostop`, `dormancy`, `default TTL` and `failure TTL`.
Prebuilt workspaces (i.e., those owned by the prebuild system user) are
handled separately by the prebuild reconciliation loop. Including them
in the lifecycle executor could lead to unintended behavior such as
incorrect scheduling or state transitions.
## Changes
* Updated the lifecycle executor query
`GetWorkspacesEligibleForTransition` to exclude workspaces with
`owner_id = 'c42fdf75-3097-471c-8c33-fb52454d81c0'` (prebuilds).
* Added tests to verify prebuilt workspaces are not considered in:
* Autostop
* Autostart
* Default TTL
* Dormancy
* Failure TTL
Fixes: https://github.com/coder/coder/issues/18740
Related to: https://github.com/coder/coder/issues/18658
# Implement OAuth2 Dynamic Client Registration (RFC 7591/7592)
This PR implements OAuth2 Dynamic Client Registration according to RFC 7591 and Client Configuration Management according to RFC 7592. These standards allow OAuth2 clients to register themselves programmatically with Coder as an authorization server.
Key changes include:
1. Added database schema extensions to support RFC 7591/7592 fields in the `oauth2_provider_apps` table
2. Implemented `/oauth2/register` endpoint for dynamic client registration (RFC 7591)
3. Added client configuration management endpoints (RFC 7592):
- GET/PUT/DELETE `/oauth2/clients/{client_id}`
- Registration access token validation middleware
4. Added comprehensive validation for OAuth2 client metadata:
- URI validation with support for custom schemes for native apps
- Grant type and response type validation
- Token endpoint authentication method validation
5. Enhanced developer documentation with:
- RFC compliance guidelines
- Testing best practices to avoid race conditions
- Systematic debugging approaches for OAuth2 implementations
The implementation follows security best practices from the RFCs, including proper token handling, secure defaults, and appropriate error responses. This enables third-party applications to integrate with Coder's OAuth2 provider capabilities programmatically.
This pull request implements RFC 8707, Resource Indicators for OAuth 2.0 (https://datatracker.ietf.org/doc/html/rfc8707), to enhance the security of our OAuth 2.0 provider.
This change enables proper audience validation and binds access tokens to their intended resource, which is crucial
for preventing token misuse in multi-tenant environments or deployments with multiple resource servers.
## Key Changes:
* Resource Parameter Support: Adds support for the resource parameter in both the authorization (`/oauth2/authorize`) and token (`/oauth2/token`) endpoints, allowing clients to specify the intended resource server.
* Audience Validation: Implements server-side validation to ensure that the resource parameter provided during the token exchange matches the one from the authorization request.
* API Middleware Enforcement: Introduces a new validation step in the API authentication middleware (`coderd/httpmw/apikey.go`) to verify that the audience of the access token matches the resource server being accessed.
* Database Schema Updates:
* Adds a `resource_uri` column to the `oauth2_provider_app_codes` table to store the resource requested during authorization.
* Adds an `audience` column to the `oauth2_provider_app_tokens` table to bind the issued token to a specific audience.
* Enhanced PKCE: Includes a minor enhancement to the PKCE implementation to protect against timing attacks.
* Comprehensive Testing: Adds extensive new tests to `coderd/oauth2_test.go` to cover various RFC 8707 scenarios, including valid flows, mismatched resources, and refresh token validation.
## How it Works:
1. An OAuth2 client specifies the target resource (e.g., https://coder.example.com) using the resource parameter in the authorization request.
2. The authorization server stores this resource URI with the authorization code.
3. During the token exchange, the server validates that the client provides the same resource parameter.
4. The server issues an access token with an audience claim set to the validated resource URI.
5. When the client uses the access token to call an API endpoint, the middleware verifies that the token's audience matches the URL of the Coder deployment, rejecting any tokens intended for a different resource.
This ensures that a token issued for one Coder deployment cannot be used to access another, significantly strengthening our authentication security.
---
Change-Id: I3924cb2139e837e3ac0b0bd40a5aeb59637ebc1b
Signed-off-by: Thomas Kosiewski <tk@coder.com>
This PR provides two commands:
* `coder prebuilds pause`
* `coder prebuilds resume`
These allow the suspension of all prebuilds activity, intended for use
if prebuilds are misbehaving.
## Summary
This PR implements critical MCP OAuth2 compliance features for Coder's authorization server, adding PKCE support, resource parameter handling, and OAuth2 server metadata discovery. This brings Coder's OAuth2 implementation significantly closer to production readiness for MCP (Model Context Protocol)
integrations.
## What's Added
### OAuth2 Authorization Server Metadata (RFC 8414)
- Add `/.well-known/oauth-authorization-server` endpoint for automatic client discovery
- Returns standardized metadata including supported grant types, response types, and PKCE methods
- Essential for MCP client compatibility and OAuth2 standards compliance
### PKCE Support (RFC 7636)
- Implement Proof Key for Code Exchange with S256 challenge method
- Add `code_challenge` and `code_challenge_method` parameters to authorization flow
- Add `code_verifier` validation in token exchange
- Provides enhanced security for public clients (mobile apps, CLIs)
### Resource Parameter Support (RFC 8707)
- Add `resource` parameter to authorization and token endpoints
- Store resource URI and bind tokens to specific audiences
- Critical for MCP's resource-bound token model
### Enhanced OAuth2 Error Handling
- Add OAuth2-compliant error responses with proper error codes
- Use standard error format: `{"error": "code", "error_description": "details"}`
- Improve error consistency across OAuth2 endpoints
### Authorization UI Improvements
- Fix authorization flow to use POST-based consent instead of GET redirects
- Remove dependency on referer headers for security decisions
- Improve CSRF protection with proper state parameter validation
## Why This Matters
**For MCP Integration:** MCP requires OAuth2 authorization servers to support PKCE, resource parameters, and metadata discovery. Without these features, MCP clients cannot securely authenticate with Coder.
**For Security:** PKCE prevents authorization code interception attacks, especially critical for public clients. Resource binding ensures tokens are only valid for intended services.
**For Standards Compliance:** These are widely adopted OAuth2 extensions that improve interoperability with modern OAuth2 clients.
## Database Changes
- **Migration 000343:** Adds `code_challenge`, `code_challenge_method`, `resource_uri` to `oauth2_provider_app_codes`
- **Migration 000343:** Adds `audience` field to `oauth2_provider_app_tokens` for resource binding
- **Audit Updates:** New OAuth2 fields properly tracked in audit system
- **Backward Compatibility:** All changes maintain compatibility with existing OAuth2 flows
## Test Coverage
- Comprehensive PKCE test suite in `coderd/identityprovider/pkce_test.go`
- OAuth2 metadata endpoint tests in `coderd/oauth2_metadata_test.go`
- Integration tests covering PKCE + resource parameter combinations
- Negative tests for invalid PKCE verifiers and malformed requests
## Testing Instructions
```bash
# Run the comprehensive OAuth2 test suite
./scripts/oauth2/test-mcp-oauth2.sh
Manual Testing with Interactive Server
# Start Coder in development mode
./scripts/develop.sh
# In another terminal, set up test app and run interactive flow
eval $(./scripts/oauth2/setup-test-app.sh)
./scripts/oauth2/test-manual-flow.sh
# Opens browser with OAuth2 flow, handles callback automatically
# Clean up when done
./scripts/oauth2/cleanup-test-app.sh
Individual Component Testing
# Test metadata endpoint
curl -s http://localhost:3000/.well-known/oauth-authorization-server | jq .
# Test PKCE generation
./scripts/oauth2/generate-pkce.sh
# Run specific test suites
go test -v ./coderd/identityprovider -run TestVerifyPKCE
go test -v ./coderd -run TestOAuth2AuthorizationServerMetadata
```
### Breaking Changes
None. All changes maintain backward compatibility with existing OAuth2 flows.
---
Change-Id: Ifbd0d9a543d545f9f56ecaa77ff2238542ff954a
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Closes#17689
This PR optimizes the audit logs query performance by extracting the
count operation into a separate query and replacing the OR-based
workspace_builds with conditional joins.
## Query changes
* Extracted count query to separate one
* Replaced single `workspace_builds` join with OR conditions with
separate conditional joins
* Added conditional joins
* `wb_build` for workspace_build audit logs (which is a direct lookup)
* `wb_workspace` for workspace create audit logs (via workspace)
Optimized AuditLogsOffset query:
https://explain.dalibo.com/plan/4g1hbedg4a564bg8
New CountAuditLogs query:
https://explain.dalibo.com/plan/ga2fbcecb9efbce3