Commit Graph

356 Commits

Author SHA1 Message Date
Brett Kolodny 854f3c0187 feat: add workspaces/acl [delete] endpoint (#19772)
Closes
[coder/internal#971](https://github.com/coder/internal/issues/971)
2025-09-12 12:21:01 -04:00
Susana Ferreira eec6c8c120 feat: support custom notifications (#19751)
## Description

Adds support for sending an ad‑hoc custom notification to the
authenticated user via API and CLI. This is useful for surfacing the
result of scripts or long‑running tasks. Notifications are delivered
through the configured method and the dashboard Inbox, respecting
existing preferences and delivery settings.

## Changes

* New notification template: “Custom Notification” with a label for a
custom title and a custom message.
* New API endpoint: `POST /api/v2/notifications/custom` to send a custom
notification to the requesting user.
* New API endpoint: `GET /notifications/templates/custom` to get custom
notification template.
* New CLI subcommand: `coder notifications custom <title> <message>` to
send a custom notification to the requesting user.
* Documentation updates: Add a “Custom notifications” section under
Administration > Monitoring > Notifications, including instructions on
sending custom notifications and examples of when to use them.

Closes: https://github.com/coder/coder/issues/19611
2025-09-11 15:08:57 +02:00
Kacper Sawicki 776231d025 fix(coderd): add blocking GetProvisionerJobByIDWithLock for workspace build cancellation (#19737)
Closes https://github.com/coder/internal/issues/885

Adds a new database method GetProvisionerJobByIDWithLock that uses FOR
UPDATE without SKIP LOCKED to fix workspace build cancellation returning
500 errors when jobs are locked.
2025-09-08 15:40:14 +02:00
Cian Johnston 06cbb2890f fix: expire token for prebuilds user when regenerating session token (#19667)
* provisionerdserver: Expires prebuild user token for workspace, if it
exists, when regenerating session token.
* dbauthz: disallow prebuilds user from creating api keys
* dbpurge: added functionality to expire stale api keys owned by the
prebuilds user
2025-09-02 09:38:43 +01:00
Callum Styan 4fab14b40b fix: limit the scope of the template average build time query to the last 100 (#19648)
This PR should resolve https://github.com/coder/internal/issues/719 by
limiting the `workspace_builds` rows selected by the query to the most
recent 100 builds of a template, as opposed to all builds in the last
30d. For our own internal templates with the most builds (1700-2000 in a
30d period) this should cut the query execution time by about 80%.

Unless we have some restriction on keeping the 30d period, contract
related or otherwise, this seems like a safe change to make. In addition
to the execution speed improvements it also means the memory for the
query is bounded as well.

If we want to keep a 30d time period for the avg build time value I
think it's worth exploring a purpose built solution such as histogram
structures where the build times could be bucketized by template ID as
they're observed.

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-09-01 09:31:21 -07:00
Dean Sheather 39bf3ba628 chore: replace GetManagedAgentCount query with aggregate table (#19636)
- Removes GetManagedAgentCount query
- Adds new table `usage_events_daily` which stores aggregated usage
events by the type and UTC day
- Adds trigger to update the values in this table when a new row is
inserted into `usage_events`
- Adds a migration that adds `usage_events_daily` rows for existing data
in `usage_events`
- Adds tests for the trigger
- Adds tests for the backfill query in the migration

Since the `usage_events` table is unreleased currently, this migration
will do nothing on real deployments and will only affect preview
deployments such as dogfood.

Closes https://github.com/coder/internal/issues/943
2025-08-30 03:39:37 +10:00
Hugo Dutka 7365da1110 chore(coderd/database/dbauthz): migrate TestSystemFunctions to mocked DB (#19301)
Related to https://github.com/coder/internal/issues/869

---------

Signed-off-by: Danny Kopping <danny@coder.com>
Co-authored-by: Danny Kopping <danny@coder.com>
2025-08-29 11:04:11 +02:00
Susana Ferreira 0ab345ca84 feat: add prebuild timing metrics to Prometheus (#19503)
## Description

This PR introduces one counter and two histograms related to workspace
creation and claiming. The goal is to provide clearer observability into
how workspaces are created (regular vs prebuild) and the time cost of
those operations.

### `coderd_workspace_creation_total`

* Metric type: Counter
* Name: `coderd_workspace_creation_total`
* Labels: `organization_name`, `template_name`, `preset_name`

This counter tracks whether a regular workspace (not created from a
prebuild pool) was created using a preset or not.
Currently, we already expose `coderd_prebuilt_workspaces_claimed_total`
for claimed prebuilt workspaces, but we lack a comparable metric for
regular workspace creations. This metric fills that gap, making it
possible to compare regular creations against claims.

Implementation notes:
* Exposed as a `coderd_` metric, consistent with other workspace-related
metrics (e.g. `coderd_api_workspace_latest_build`:
https://github.com/coder/coder/blob/main/coderd/prometheusmetrics/prometheusmetrics.go#L149).
* Every `defaultRefreshRate` (1 minute ), DB query
`GetRegularWorkspaceCreateMetrics` is executed to fetch all regular
workspaces (not created from a prebuild pool).
* The counter is updated with the total from all time (not just since
metric introduction). This differs from the histograms below, which only
accumulate from their introduction forward.

### `coderd_workspace_creation_duration_seconds` &
`coderd_prebuilt_workspace_claim_duration_seconds`

* Metric types: Histogram
* Names:
  * `coderd_workspace_creation_duration_seconds`
* Labels: `organization_name`, `template_name`, `preset_name`, `type`
(`regular`, `prebuild`)
  * `coderd_prebuilt_workspace_claim_duration_seconds`
    * Labels: `organization_name`, `template_name`, `preset_name`

We already have `coderd_provisionerd_workspace_build_timings_seconds`,
which tracks build run times for all workspace builds handled by the
provisioner daemon.
However, in the context of this issue, we are only interested in
creation and claim build times, not all transitions; additionally, this
metric does not include `preset_name`, and adding it there would
significantly increase cardinality. Therefore, separate more focused
metrics are introduced here:
* `coderd_workspace_creation_duration_seconds`: Build time to create a
workspace (either a regular workspace or the build into a prebuild pool,
for prebuild initial provisioning build).
* `coderd_prebuilt_workspace_claim_duration_seconds`: Time to claim a
prebuilt workspace from the pool.

The reason for two separate histograms is that:
* Creation (regular or prebuild): provisioning builds with similar time
magnitude, generally expected to take longer than a claim operation.
* Claim: expected to be a much faster provisioning build.

#### Native histogram usage

Provisioning times vary widely between projects. Using static buckets
risks unbalanced or poorly informative histograms.
To address this, these metrics use [Prometheus native
histograms](https://prometheus.io/docs/specs/native_histograms/):
* First introduced in Prometheus v2.40.0
* Recommended stable usage from v2.45+
* Requires Go client `prometheus/client_golang` v1.15.0+
* Experimental and must be explicitly enabled on the server
(`--enable-feature=native-histograms`)

For compatibility, we also retain a classic bucket definition (aligned
with the existing provisioner metric:
https://github.com/coder/coder/blob/main/provisionerd/provisionerd.go#L182-L189).
* If native histograms are enabled, Prometheus ingests the
high-resolution histogram.
* If not, it falls back to the predefined buckets.

Implementation notes:
* Unlike the counter, these histograms are updated in real-time at
workspace build job completion.
* They reflect data only from the point of introduction forward (no
historical backfill).

## Relates to 

Closes: https://github.com/coder/coder/issues/19528
Native histograms tested in observability stack:
https://github.com/coder/observability/pull/50
2025-08-28 15:00:26 +01:00
Cian Johnston 8c731a087e chore(coderd/database/dbauthz): refactor TestPing, TestNew, TestInTX to use dbmock (#19604)
Part of https://github.com/coder/internal/issues/869
2025-08-28 12:37:13 +01:00
Hugo Dutka cc308d1754 chore(coderd/database/dbauthz): migrate TestWorkspace to mocked DB (#19306)
Related to https://github.com/coder/internal/issues/869

---------

Co-authored-by: Cian Johnston <cian@coder.com>
2025-08-27 19:11:28 +02:00
Hugo Dutka 2888055782 chore(coderd/database/dbauthz): migrate the ProvisionerJob and Organization tests to mocked DB (#19303)
Related to https://github.com/coder/internal/issues/869

---------

Co-authored-by: Steven Masley <stevenmasley@gmail.com>
2025-08-27 18:30:04 +02:00
Hugo Dutka 88c0edce24 chore(coderd/database/dbauthz): migrate the Notifications and Prebuilds tests to use mocked DB (#19302)
Related to https://github.com/coder/internal/issues/869

---------

Co-authored-by: Steven Masley <stevenmasley@gmail.com>
2025-08-27 18:27:35 +02:00
Sas Swart 4e9ee80882 feat(enterprise/coderd): allow system users to be added to groups (#19518)
closes https://github.com/coder/coder/issues/18274

This pull request makes system users visible in various group related
queries so that they can be added to and removed from groups. This
allows system user quotas to be configured. System users are still
ignored in certain queries, such as when license seat consumption is
determined.

This pull request further ensures the existence of a
"coder_prebuilt_workspaces" group in any organization that needs
prebuilt workspaces

---------

Co-authored-by: Susana Ferreira <susana@coder.com>
2025-08-27 16:57:59 +02:00
ケイラ d7ee1019c0 feat: add endpoint for retrieving workspace acl (#19375)
Implements `/acl [get]` for workspaces, with tests.
Blocked by experiment enablement
2025-08-25 07:11:18 -05:00
Hugo Dutka d536b91bfc chore(coderd/database/dbauthz): migrate more tests to mocked db (#19300)
Related to https://github.com/coder/internal/issues/869

---------

Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>
Co-authored-by: Steven Masley <stevenmasley@gmail.com>
2025-08-20 14:10:18 +00:00
Dean Sheather 6eb02d1c2a chore: wire up usage tracking for managed agents (#19096)
Wires up the usage collector and publisher to coderd.

Relates to coder/internal#814
2025-08-20 23:38:09 +10:00
Sas Swart f9a6adc704 feat: claim prebuilds based on workspace parameters instead of preset id (#19279)
Closes https://github.com/coder/coder/issues/18356.

This change finds and selects a matching preset if one was not chosen
during workspace creation. This solidifies the relationship between
presets and parameters.

When a workspace is created without in explicitly chosen preset, it will
now still be eligible to claim a prebuilt workspace if one is available.
2025-08-20 11:02:53 +02:00
Kacper Sawicki 5e4aa79a9d feat(coderd): add has_external_agent flag to template_versions and workspace_builds (#19285)
This pull request introduces support for external workspace management, allowing users to register and manage workspaces that are provisioned and managed outside of the Coder.

* Added has_external_agent field to workspace builds and template versions
2025-08-19 10:29:51 +02:00
Dean Sheather a25d85631b chore: add usage tracking package (#19095)
Not used in coderd yet, see stack.

Adds two new packages:
- `coderd/usage`: provides an interface for the "Collector" as well as a stub implementation for AGPL
- `enterprise/coderd/usage`: provides an interface for the "Publisher" as well as a Tallyman implementation

Relates to https://github.com/coder/internal/issues/814
2025-08-16 01:31:00 +10:00
Hugo Dutka af97b78e76 chore(coderd/database/dbauthz): migrate TestTemplate to use mocked DB (#19304)
Related to https://github.com/coder/internal/issues/869

---------

Co-authored-by: Steven Masley <stevenmasley@gmail.com>
2025-08-14 11:32:53 +02:00
Hugo Dutka e10f29c481 chore(coderd/database/dbauthz): migrate File, Group, APIKey, AuditLogs, and ConnectionLogs tests to mocked db (#19299)
Related to https://github.com/coder/internal/issues/869

---------

Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>
Co-authored-by: Steven Masley <stevenmasley@gmail.com>
2025-08-13 09:30:48 -05:00
Danielle Maywood f349edcc3c refactor: create tasks in coderd instead of frontend (#19280)
Instead of creating tasks with a specialized call to `CreateWorkspace`
on the frontend, we instead lift this to the backend and allow the
frontend to simply call `CreateAITask`.
2025-08-12 11:23:55 +01:00
Hugo Dutka cda1a3a593 chore(coderd/database/dbauthz): migrate TestUser to mocked db (#19305)
Related to https://github.com/coder/internal/issues/869

---------

Co-authored-by: Steven Masley <stevenmasley@gmail.com>
2025-08-12 09:20:12 +00:00
Mathias Fredriksson 1b66495b70 fix(coderd/prometheusmetrics)!: filter deleted wsbuilds to reduce db load (#19197)
This change removes the `GetLatestWorkspaceBuilds` query which includes
all workspaces for all time (including deleted). This allows us to also
stop using `GetProvisionerJobsByIDs` for said builds as the job status
is included in `GetWorkspaces` called separately.

**BREAKING CHANGE**: The `coderd_api_workspace_latest_build` Prometheus
metric no longer includes builds belonging to deleted workspaces, as
such, this metric will show fewer statuses.

Fixes coder/internal#717
2025-08-11 14:48:31 +03:00
Steven Masley ce935657f6 test: start migrating dbauthz tests to mocked db (#19257)
This PR adds a framework to move to a mocked db. And therefore massively speed up these tests.
2025-08-08 13:46:24 -05:00
Cian Johnston afb54f6884 chore: revert feat(enterprise/coderd): allow system users to be added to groups (#19254)
This reverts commit b200fc8e67
(https://github.com/coder/coder/pull/18341).
2025-08-08 12:18:07 +01:00
Sas Swart b200fc8e67 feat(enterprise/coderd): allow system users to be added to groups (#18341)
closes https://github.com/coder/coder/issues/18274

This pull request makes system users visible in various group related
queries so that they can be added to and removed from groups. This
allows system user quotas to be configured. System users are still
ignored in certain queries, such as when license seat consumption is
determined.

This pull request further ensures the existence of a
"coder_prebuilt_workspaces" group in any organization that needs
prebuilt workspaces

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
  * Organization and group member listings now include system users.
* **Bug Fixes**
* Updated tests to reflect the inclusion of system users in member and
group queries.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-08-08 11:03:17 +02:00
Yevhenii Shcherbina c65996a041 feat: add user_secrets table (#19162)
Closes https://github.com/coder/internal/issues/780

## Summary of changes:
- added `user_secrets` table
- `user_secrets` table contains `env_name` and `file_path` fields which
are not used at the moment, but will be used in later PRs
- `user_secrets` table doesn't contain `value_key_id`, I will add it in
a separate migration in a dbcrypt PR
- on one hand I don't want to add fields which are not used (because
it's a risk smth may change in implementation later), on the other hand
I don't want to add too many migrations for user secrets table
- added unique sql indexes
- added sql queries for CRUD operations on user-secrets
- introduced new `ResourceUserSecret` resource
- basic unit-tests for CRUD ops and authorization behavior
- Role updates:
  - owner:
    - remove `ResourceUserSecret` from site-wide perms
    - add `ResourceUserSecret` to user-wide perms
   - orgAdmin
- remove `ResourceUserSecret` from org-wide perms; seems it's not
strictly required, because `ResourceUserSecret` is not tied to
organization in dbauthz wrappers?
   - memberRole
- no need to change memberRole because it implicitly has access to
user-secrets thanks to the `allPermsExcept`
   - is it enough changes to roles?
   
Main questions:
- [ ] We will have 2 migrations for user-secrets:
  - initial migration (in current PR)
  - adding `value_key_id` in dbcrypt PR
  - is this approach reasonable?
- [ ] Are changes to roles's permissions are correct?
- [ ] Are changes in roles_test.go are correct?

---------

Co-authored-by: Steven Masley <Emyrk@users.noreply.github.com>
2025-08-07 15:58:59 -04:00
ケイラ 26458cd6f0 refactor: consolidate template and workspace acl validation (#19192) 2025-08-07 10:14:58 -06:00
Danielle Maywood cc609cbd53 fix(site): display tasks link when no templates contain an AI task (#19184)
Fixes https://github.com/coder/coder/issues/19059

This change ensures we always show the "Tasks" link, even when a deployment doesn't yet have a tasks template set up.
2025-08-05 19:30:17 +01:00
ケイラ 1cffd11619 feat: add workspace sharing page (#19107) 2025-07-31 15:05:09 +00:00
Benjamin Peinhardt e4dc2d9418 fix: add constraint and runtime check for provisioner logs size limit (#18893)
This PR sets a constraint of 1MB on the provisioner job logs written to
the database. This is consistent with the constraint we place on
workspace agent logs:
https://github.com/coder/coder/blob/4ac6be6d835dc36c242e35a26b584b784040bf28/coderd/database/dump.sql#L2030

It also adds a message printed to the front end about the provisioner
log overflow, and updates the message printed to the front end when
workspace startup logs exceed the max, as it was causing some customers
to think their startup script had failed to run.
2025-07-30 19:09:53 -05:00
Callum Styan ffbfaf2a6f feat: allow bypassing current CORS magic based on template config (#18706)
Solves https://github.com/coder/coder/issues/15096

This is a slight rework/refactor of the earlier PRs from @dannykopping
and @Emyrk:
- https://github.com/coder/coder/pull/15669
- https://github.com/coder/coder/pull/15684
- https://github.com/coder/coder/pull/17596

Rather than having a per-app CORS behaviour setting and additionally a
template level setting for ports, this PR adds a single template level
CORS behaviour setting that is then used by all apps/ports for
workspaces created from that template.

The main changes are in `proxy.go` and `request.go` to:
a) get the CORS behaviour setting from the template
b) have `HandleSubdomain` bypass the CORS middleware handler if the
selected behaviour is `passthru`
c) in `proxyWorkspaceApp`, do not modify the response if the selected
behaviour is `passthru`

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added support for configuring CORS behavior ("simple" or "passthru")
at the template level for all shared ports.
* Introduced a new "CORS Behavior" setting in the template creation and
settings forms.
* API endpoints and responses now include the optional `cors_behavior`
property for templates.
* Workspace apps and proxy now honor the specified CORS behavior,
enabling conditional CORS middleware application.
* Enhanced workspace app tests with comprehensive scenarios covering
CORS behaviors and authentication states.

* **Bug Fixes**
  * None.

* **Documentation**
* Updated API and admin documentation to describe the new
`cors_behavior` property and its usage.
* Added examples and schema references for CORS behavior in relevant API
docs.

* **Tests**
* Extended automated tests to cover different CORS behavior scenarios
for templates and workspace apps.

* **Chores**
* Updated audit logging to track changes to the `cors_behavior` field on
templates.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-07-30 13:42:39 -07:00
Dean Sheather 9a6dd73f68 feat: add managed agent license limit checks (#18937)
- Adds a query for counting managed agent workspace builds between two
timestamps
- The "Actual" field in the feature entitlement for managed agents is
now populated with the value read from the database
- The wsbuilder package now validates AI agent usage against the limit
when a license is installed

Closes coder/internal#777
2025-07-22 13:39:26 +10:00
Cian Johnston 198d50dbc2 chore: replace original GetPrebuiltWorkspaces with optimized version (#18832)
Fixes https://github.com/coder/internal/issues/715

Follow-up from https://github.com/coder/coder/pull/18717

Now that we've determined the updated query is safe, remove the duplication.
2025-07-21 15:31:11 +01:00
Ethan f42de9fe12 chore!: delete old connection events from audit log (#18735)
### Breaking change (changelog note):
>With new connection events appearing in the Connection Log, connection events older than 90 days will now be deleted from the Audit Log. If you require this legacy data, we recommend querying it from the REST API or making a backup of the database/these events before upgrading your Coder deployment. Please see the PR for details on what exactly will be deleted. 
Of note is that there are currently no plans to delete connection events from the Connection Log.


### Context

This is the fifth PR for moving connection events out of the audit log.

In previous PRs:
- **New** connection logs have been routed to the `connection_logs` table. They will *not* appear in the audit log.
- These new connection logs are served from the new `/api/v2/connectionlog` endpoint.

In this PR:
- We'll now clean existing connection events out of the audit log, if they are older than 90 days, We do this in batches of 1000, every 10 minutes.

The criteria for deletion is simple:
```
WHERE
(
     action = 'connect'
     OR action = 'disconnect'
     OR action = 'open'
     OR action = 'close'
)
AND "time" < @before_time::timestamp with time zone
```
where `@before_time` is currently configured to 90 days in the past.


Future PRs:
- Write documentation for the endpoint / feature
2025-07-15 15:45:36 +10:00
Ethan 7c077d39c5 chore: populate connectionlog count using a separate query (#18629)
This is the third PR for moving connection events out of the audit log.

This PR populates `count` on `ConnectionLogResponse` using a separate query, to preemptively mitigate the issue described in #17689. It's structurally identical to a portion of https://github.com/coder/coder/pull/18600, but for the connection log instead of the audit log.
       
Future PRs:
- Implement a table in the Web UI for viewing connection logs.
- Write a query to delete old events from the audit log, call it from dbpurge.
- Write documentation for the endpoint / feature
2025-07-15 15:03:30 +10:00
Ethan 08e17a07fc chore!: route connection logs to new table (#18340)
### Breaking Change (changelog note):
> User connections to workspaces, and the opening of workspace apps or ports will no longer create entries in the audit log. Those events will now be included in the 'Connection Log'.
Please see the 'Connection Log' page in the dashboard, and the Connection Log [documentation](https://coder.com/docs/admin/monitoring/connection-logs) for details. Those with permission to view the Audit Log will also be able to view the Connection Log. The new Connection Log has the same licensing restrictions as the Audit Log, and requires a Premium Coder deployment.

### Context

This is the first PR of a few for moving connection events out of the audit log, and into a new database table and web UI page called the 'Connection Log'.

This PR:
- Creates the new table
- Adds and tests queries for inserting and reading, including reading with an RBAC filter.
- Implements the corresponding RBAC changes, such that anyone who can view the audit log can read from the table
- Implements, under the enterprise package, a `ConnectionLogger` abstraction to replace the `Auditor` abstraction for these logs. (No-op'd in AGPL, like the `Auditor`)
- Routes SSH connection and Workspace App events into the new `ConnectionLogger`
- Updates all existing tests to check the values of the `ConnectionLogger` instead of the `Auditor`.

Future PRs:
- Add filtering to the query
- Add an enterprise endpoint to query the new table
- Write a query to delete old events from the audit log, call it from dbpurge.
- Implement a table in the Web UI for viewing connection logs.


> [!NOTE]
> The PRs in this stack obviously won't be (completely) atomic. Whilst they'll each pass CI, the stack is designed to be merged all at once. I'm splitting them up for the sake of those reviewing, and so changes can be reviewed as early as possible.  Despite this, it's really hard to make this PR any smaller than it already is. I'll be keeping it in draft until it's actually ready to merge.
2025-07-15 14:36:06 +10:00
Cian Johnston 0367dbac43 chore: optimize GetPrebuiltWorkspaces query (#18717)
* Adds GetRunningPrebuiltWorkspacesOptimized query
* Runs both original and updated query side-by-side and logs diffs
2025-07-09 11:30:42 +01:00
Hugo Dutka 3c2f3d640b chore: remove dbmem (#18803)
Remove the in-memory database. Addresses #15109.
2025-07-09 09:46:31 +02:00
Kacper Sawicki 8202514ce0 feat!: add ability to cancel pending workspace build (#18713)
Closes #17791 

This PR adds ability to cancel workspace builds that are in "pending"
status.

Breaking changes:
- CancelWorkspaceBuild method in codersdk now accepts an optional
request parameter

API:
- Added `expect_status` query parameter to the cancel workspace build
endpoint
- This parameter ensures the job hasn't changed state before canceling
- API returns `412 Precondition Failed` if the job is not in the
expected status
- Valid values: `running` or `pending`
- Wrapped the entire cancel method in a database transaction

UI:
- Added confirmation dialog to the `Cancel` button, since it's a
destructive operation

![image](https://github.com/user-attachments/assets/437aa5f4-5669-45b6-82a0-e46f277114bf)

![image](https://github.com/user-attachments/assets/423b5cb1-a4fb-4a10-933b-c1c73f4b838c)


- Enabled cancel action for pending workspaces (`expect_status=pending`
is sent if workspace is in pending status)

![image](https://github.com/user-attachments/assets/32d35ff1-12e6-4f7b-9f6c-fde9da9de6cf)

---------

Co-authored-by: Dean Sheather <dean@deansheather.com>
2025-07-08 11:02:58 +02:00
Thomas Kosiewski 74e1d5c4b6 feat: implement OAuth2 dynamic client registration (RFC 7591/7592) (#18645)
# Implement OAuth2 Dynamic Client Registration (RFC 7591/7592)

This PR implements OAuth2 Dynamic Client Registration according to RFC 7591 and Client Configuration Management according to RFC 7592. These standards allow OAuth2 clients to register themselves programmatically with Coder as an authorization server.

Key changes include:

1. Added database schema extensions to support RFC 7591/7592 fields in the `oauth2_provider_apps` table
2. Implemented `/oauth2/register` endpoint for dynamic client registration (RFC 7591)
3. Added client configuration management endpoints (RFC 7592):
   - GET/PUT/DELETE `/oauth2/clients/{client_id}`
   - Registration access token validation middleware

4. Added comprehensive validation for OAuth2 client metadata:
   - URI validation with support for custom schemes for native apps
   - Grant type and response type validation
   - Token endpoint authentication method validation

5. Enhanced developer documentation with:
   - RFC compliance guidelines
   - Testing best practices to avoid race conditions
   - Systematic debugging approaches for OAuth2 implementations

The implementation follows security best practices from the RFCs, including proper token handling, secure defaults, and appropriate error responses. This enables third-party applications to integrate with Coder's OAuth2 provider capabilities programmatically.
2025-07-03 18:33:47 +02:00
Thomas Kosiewski f0c9c4dbcd feat: oauth2 - add RFC 8707 resource indicators and audience validation (#18575)
This pull request implements RFC 8707, Resource Indicators for OAuth 2.0 (https://datatracker.ietf.org/doc/html/rfc8707), to enhance the security of our OAuth 2.0 provider. 

This change enables proper audience validation and binds access tokens to their intended resource, which is crucial
  for preventing token misuse in multi-tenant environments or deployments with multiple resource servers.

##  Key Changes:


   * Resource Parameter Support: Adds support for the resource parameter in both the authorization (`/oauth2/authorize`) and token (`/oauth2/token`) endpoints, allowing clients to specify the intended resource server.
   * Audience Validation: Implements server-side validation to ensure that the resource parameter provided during the token exchange matches the one from the authorization request.
   * API Middleware Enforcement: Introduces a new validation step in the API authentication middleware (`coderd/httpmw/apikey.go`) to verify that the audience of the access token matches the resource server being accessed.
   * Database Schema Updates:
       * Adds a `resource_uri` column to the `oauth2_provider_app_codes` table to store the resource requested during authorization.
       * Adds an `audience` column to the `oauth2_provider_app_tokens` table to bind the issued token to a specific audience.
   * Enhanced PKCE: Includes a minor enhancement to the PKCE implementation to protect against timing attacks.
   * Comprehensive Testing: Adds extensive new tests to `coderd/oauth2_test.go` to cover various RFC 8707 scenarios, including valid flows, mismatched resources, and refresh token validation.

##  How it Works:


   1. An OAuth2 client specifies the target resource (e.g., https://coder.example.com) using the resource parameter in the authorization request.
   2. The authorization server stores this resource URI with the authorization code.
   3. During the token exchange, the server validates that the client provides the same resource parameter.
   4. The server issues an access token with an audience claim set to the validated resource URI.
   5. When the client uses the access token to call an API endpoint, the middleware verifies that the token's audience matches the URL of the Coder deployment, rejecting any tokens intended for a different resource.


  This ensures that a token issued for one Coder deployment cannot be used to access another, significantly strengthening our authentication security.

---

Change-Id: I3924cb2139e837e3ac0b0bd40a5aeb59637ebc1b
Signed-off-by: Thomas Kosiewski <tk@coder.com>
2025-07-02 17:49:00 +02:00
Sas Swart 01163ea57b feat: allow users to pause prebuilt workspace reconciliation (#18700)
This PR provides two commands:
* `coder prebuilds pause`
* `coder prebuilds resume`

These allow the suspension of all prebuilds activity, intended for use
if prebuilds are misbehaving.
2025-07-02 15:05:42 +00:00
Thomas Kosiewski 6f2834f62a feat: oauth2 - add authorization server metadata endpoint and PKCE support (#18548)
## Summary

  This PR implements critical MCP OAuth2 compliance features for Coder's authorization server, adding PKCE support, resource parameter handling, and OAuth2 server metadata discovery. This brings Coder's OAuth2 implementation significantly closer to production readiness for MCP (Model Context Protocol)
  integrations.

  ## What's Added

  ### OAuth2 Authorization Server Metadata (RFC 8414)
  - Add `/.well-known/oauth-authorization-server` endpoint for automatic client discovery
  - Returns standardized metadata including supported grant types, response types, and PKCE methods
  - Essential for MCP client compatibility and OAuth2 standards compliance

  ### PKCE Support (RFC 7636)
  - Implement Proof Key for Code Exchange with S256 challenge method
  - Add `code_challenge` and `code_challenge_method` parameters to authorization flow
  - Add `code_verifier` validation in token exchange
  - Provides enhanced security for public clients (mobile apps, CLIs)

  ### Resource Parameter Support (RFC 8707)
  - Add `resource` parameter to authorization and token endpoints
  - Store resource URI and bind tokens to specific audiences
  - Critical for MCP's resource-bound token model

  ### Enhanced OAuth2 Error Handling
  - Add OAuth2-compliant error responses with proper error codes
  - Use standard error format: `{"error": "code", "error_description": "details"}`
  - Improve error consistency across OAuth2 endpoints

  ### Authorization UI Improvements
  - Fix authorization flow to use POST-based consent instead of GET redirects
  - Remove dependency on referer headers for security decisions
  - Improve CSRF protection with proper state parameter validation

  ## Why This Matters

  **For MCP Integration:** MCP requires OAuth2 authorization servers to support PKCE, resource parameters, and metadata discovery. Without these features, MCP clients cannot securely authenticate with Coder.

  **For Security:** PKCE prevents authorization code interception attacks, especially critical for public clients. Resource binding ensures tokens are only valid for intended services.

  **For Standards Compliance:** These are widely adopted OAuth2 extensions that improve interoperability with modern OAuth2 clients.

  ## Database Changes

  - **Migration 000343:** Adds `code_challenge`, `code_challenge_method`, `resource_uri` to `oauth2_provider_app_codes`
  - **Migration 000343:** Adds `audience` field to `oauth2_provider_app_tokens` for resource binding
  - **Audit Updates:** New OAuth2 fields properly tracked in audit system
  - **Backward Compatibility:** All changes maintain compatibility with existing OAuth2 flows

  ## Test Coverage

  - Comprehensive PKCE test suite in `coderd/identityprovider/pkce_test.go`
  - OAuth2 metadata endpoint tests in `coderd/oauth2_metadata_test.go`
  - Integration tests covering PKCE + resource parameter combinations
  - Negative tests for invalid PKCE verifiers and malformed requests

  ## Testing Instructions

  ```bash
  # Run the comprehensive OAuth2 test suite
  ./scripts/oauth2/test-mcp-oauth2.sh

  Manual Testing with Interactive Server

  # Start Coder in development mode
  ./scripts/develop.sh

  # In another terminal, set up test app and run interactive flow
  eval $(./scripts/oauth2/setup-test-app.sh)
  ./scripts/oauth2/test-manual-flow.sh
  # Opens browser with OAuth2 flow, handles callback automatically

  # Clean up when done
  ./scripts/oauth2/cleanup-test-app.sh

  Individual Component Testing

  # Test metadata endpoint
  curl -s http://localhost:3000/.well-known/oauth-authorization-server | jq .

  # Test PKCE generation
  ./scripts/oauth2/generate-pkce.sh

  # Run specific test suites
  go test -v ./coderd/identityprovider -run TestVerifyPKCE
  go test -v ./coderd -run TestOAuth2AuthorizationServerMetadata
```

  ### Breaking Changes

  None. All changes maintain backward compatibility with existing OAuth2 flows.

---

Change-Id: Ifbd0d9a543d545f9f56ecaa77ff2238542ff954a
Signed-off-by: Thomas Kosiewski <tk@coder.com>
2025-07-01 15:39:29 +02:00
Kacper Sawicki 695de6e0c0 chore(coderd/database): optimize AuditLogs queries (#18600)
Closes #17689

This PR optimizes the audit logs query performance by extracting the
count operation into a separate query and replacing the OR-based
workspace_builds with conditional joins.

## Query changes
* Extracted count query to separate one
* Replaced single `workspace_builds` join with OR conditions with
separate conditional joins
* Added conditional joins
* `wb_build` for workspace_build audit logs (which is a direct lookup)
    * `wb_workspace` for workspace create audit logs (via workspace)

Optimized AuditLogsOffset query:
https://explain.dalibo.com/plan/4g1hbedg4a564bg8

New CountAuditLogs query:
https://explain.dalibo.com/plan/ga2fbcecb9efbce3
2025-07-01 07:31:14 +02:00
Sas Swart c6e0ba12d3 feat: graduate prebuilds to general availability (#18607)
This PR removes the prebuilds experiment and allows the use of prebuilds
without opting into an experiment.
2025-06-26 15:54:52 +02:00
Danny Kopping 688d2ee3eb chore: remove chats experiment (#18535) 2025-06-25 13:03:32 +00:00
Susana Ferreira f44969b689 chore: reorder prebuilt workspace authorization logic (#18506)
## Description

Follow-up from PR https://github.com/coder/coder/pull/18333
Related with:
https://github.com/coder/coder/pull/18333#discussion_r2159300881

This changes the authorization logic to first try the normal workspace
authorization check, and only if the resource is a prebuilt workspace,
fall back to the prebuilt workspace authorization check. Since prebuilt
workspaces are a subset of workspaces, the normal workspace check is
more likely to succeed. This is a small optimization to reduce
unnecessary prebuilt authorization calls.
2025-06-24 16:33:21 +01:00
Steven Masley 45ab265df2 chore: add permissions to autobuilder & prebuilder to run wsbuild (#18527)
Read organization member and read files is now required for dynamic
param building.
2025-06-24 08:45:41 -05:00