# Add API key allow_list for resource-scoped tokens
This PR adds support for API key allow lists, enabling tokens to be scoped to specific resources. The implementation:
1. Adds a new `allow_list` field to the `CreateTokenRequest` struct, allowing clients to specify resource-specific scopes when creating API tokens
2. Implements `APIAllowListTarget` type to represent resource targets in the format `<type>:<id>` with support for wildcards
3. Adds validation and normalization logic for allow lists to handle wildcards and deduplication
4. Integrates with RBAC by creating an `APIKeyEffectiveScope` that merges API key scopes with allow list restrictions
5. Updates API documentation and TypeScript types to reflect the new functionality
This feature enables creating tokens that are limited to specific resources (like workspaces or templates) by ID, making it possible to create more granular API tokens with limited access.
relates to: https://github.com/coder/internal/issues/1026
On POSIX systems (macOS and Linux) we compile the dbcleaner binary into a temp directory. This allows us to explicitly separate the compilation step and report the time it takes. We suspect this might be a contributing factor in the above linked flakes we see on macOS.
This doesn't work on Windows because Go tests clean up the temp directory at the end of the test and the dbcleaner binary will still be executing. On Windows you cannot delete a file being executed nor the directory. However, we are not seeing any flakes on Windows so the old behavior seems to be OK.
# Add API Key Scope Wildcards
This PR adds wildcard API key scopes (`resource:*`) for all RBAC resources to ensure every resource has a matching wildcard value. It also adds all individual `resource:action` scopes to the API documentation and TypeScript definitions.
The changes include:
- Adding a new database migration (000377) that adds wildcard API key scopes
- Updating the API documentation to include all available scopes
- Enhancing the scope generation scripts to include all resource wildcards
- Updating the TypeScript definitions to match the expanded scope list
These changes make creating API keys with comprehensive permissions for specific resource types easier.
Relates to https://github.com/coder/internal/issues/934
This PR provides a mechanism to filter provisioner jobs according to who
initiated the job.
This will be used to find pending prebuild jobs when prebuilds have
overwhelmed the provisioner job queue. They can then be canceled.
If prebuilds are overwhelming provisioners, the following steps will be
taken:
```bash
# pause prebuild reconciliation to limit provisioner queue pollution:
coder prebuilds pause
# cancel pending provisioner jobs to clear the queue
coder provisioner jobs list --initiator="prebuilds" --status="pending" | jq ... | xargs -n1 -I{} coder provisioner jobs cancel {}
# push a fixed template and wait for the import to complete
coder templates push ... # push a fixed template
# resume prebuild reconciliation
coder prebuilds resume
```
This interface differs somewhat from what was specified in the issue,
but still provides a mechanism that addresses the issue. The original
proposal was made by myself and this simpler implementation makes sense.
I might add a `--search` parameter in a follow-up if there is appetite
for it.
Potential follow ups:
* Support for this usage: `coder provisioner jobs list --search
"initiator:prebuilds status:pending"`
* Adding the same parameters to `coder provisioner jobs cancel` as a
convenience feature so that operators don't have to pipe through `jq`
and `xargs`
## Description
Send a notification to the workspace owner when an AI task’s app state
becomes `Working` or `Idle`.
An AI task is identified by a workspace build with `HasAITask = true`
and `AITaskSidebarAppID` matching the agent app’s ID.
## Changes
* Add `TemplateTaskWorking` notification template.
* Add `TemplateTaskIdle` notification template.
* Add `GetLatestWorkspaceAppStatusesByAppID` SQL query to get the
workspace app statuses ordered by latest first.
* Update `PATCH /workspaceagents/me/app-status` to enqueue:
* `TemplateTaskWorking` when state transitions to `working`
* `TemplateTaskIdle` when state transitions to `idle`
* Notification labels include:
* `task`: task initial prompt
* `workspace`: workspace name
* Notification dedupe: include a minute-bucketed timestamp (UTC
truncated to the minute) in the enqueue data to allow identical content
to resend within the same day (but not more than once per minute).
Closes: https://github.com/coder/coder/issues/19776
# Add Composite API Key Scopes
This PR adds high-level composite API key scopes to simplify token creation with common permission sets:
- `coder:workspaces.create` - Create and update workspaces
- `coder:workspaces.operate` - Read and update workspaces
- `coder:workspaces.delete` - Read and delete workspaces
- `coder:workspaces.access` - Read, SSH, and connect to workspace applications
- `coder:templates.build` - Read templates and create/read files
- `coder:templates.author` - Full template management with insights
- `coder:apikeys.manage_self` - Manage your own API keys
These composite scopes are persisted in the database and expanded during authorization, providing a more intuitive way to grant permissions compared to the granular resource:action scopes.
## Summary
In this pull request we're removing `agent_name` from subdomains in APP
urls when an `app` is used in the subdomain. `agent_names` will still be
used when a `port` is used in the subdomain.
Closes: https://github.com/coder/coder/issues/18485
### Changes
- Updated regex to support an optional agent name
- Added logic to support checking the app slug for a matching port
(e.g., 8080 or 8080s)
### Testing
- Updated all tests to support an optional `agent_name`
# Canonicalize API Key Scopes
This PR introduces canonical API key scopes with a `coder:` namespace prefix to avoid collisions with low-level resource:action names. It:
1. Renames special API key scopes in the database:
- `all` → `coder:all`
- `application_connect` → `coder:application_connect`
2. Adds support for a new `scopes` field in the API key creation request, allowing multiple scopes to be specified while maintaining backward compatibility with the singular `scope` field.
3. Updates the API documentation to reflect these changes, including the new endpoint for listing public API key scopes.
4. Ensures backward compatibility by mapping between legacy and canonical scope names in relevant code paths.
relates to https://github.com/coder/internal/issues/927
Refactors dbtestutil to use a `Broker` struct to create test databases. Additionally uses a `coder_testing` database to record test databases that are created and when they are dropped.
This is in preparation for the PR above this in the stack which adds a "cleaner" subprocess that cleans out any databases that were left when the test process ends.
Adds shared_with_user and shared_with_group filters to the /workspaces
endpoint.
- `shared_with_user`: filters workspaces shared with a specific user.
Accepts a user UUID or username.
- `shared_with_group`: filters workspaces shared with a specific group.
Accepts:
- a group UUID, or
- `<organization name>/<group name>`, or
- `<group name>` (resolved in the default organization).
Closes
[coder/internal#1004](https://github.com/coder/internal/issues/1004)
Breaking API Change:
> The presence of the `ip` field on `codersdk.ConnectionLog` cannot be
guaranteed, and so the field has been made optional. It may be omitted
on API responses.
When running a scaletest, I noticed logs of the form:
```
2025-09-12 06:34:10.924 [erro] coderd.workspaceapps: upsert connection log failed trace=0xa17580 span=0xa17620 workspace_id=81b937d7-5777-4df5-b5cb-80241f30326f agent_id=78b2ff6d-b4a6-4a4e-88a7-283e05455a88 app_id=00000000-0000-0000-0000-000000000000 user_id=00000000-0000-0000-0000-000000000000 user_agent="" app_slug_or_port=terminal status_code=404 request_id=67f03cf8-9523-444a-97bc-90de080a54c8 ...
error= 1 error occurred:
* pq: null value in column "ip" of relation "connection_logs" violates not-null constraint
```
to ensure logs are never omitted from the connection log due to a
missing IP again (i.e. I'm not sure if we can always rely on a valid,
parseable, IP from `(http.Request).RemoteAddr`), I've removed the `NOT
NULL` constraint on `ip` on `connection_logs`, and made `ip` on the API
response optional.
The specific cause for these null IPs was the
`/workspaceproxies/me/issue-signed-app-token [post]` endpoint
constructing it's own `http.Request` without a `RemoteAddr` set, and
then passing that to the token issuer.
To solve this, we'll have workspace proxies send the real IP of the
client when calling `/workspaceproxies/me/issue-signed-app-token [post]`
via the header `Coder-Workspace-Proxy-Real-IP`.
## Description
Adds support for sending an ad‑hoc custom notification to the
authenticated user via API and CLI. This is useful for surfacing the
result of scripts or long‑running tasks. Notifications are delivered
through the configured method and the dashboard Inbox, respecting
existing preferences and delivery settings.
## Changes
* New notification template: “Custom Notification” with a label for a
custom title and a custom message.
* New API endpoint: `POST /api/v2/notifications/custom` to send a custom
notification to the requesting user.
* New API endpoint: `GET /notifications/templates/custom` to get custom
notification template.
* New CLI subcommand: `coder notifications custom <title> <message>` to
send a custom notification to the requesting user.
* Documentation updates: Add a “Custom notifications” section under
Administration > Monitoring > Notifications, including instructions on
sending custom notifications and examples of when to use them.
Closes: https://github.com/coder/coder/issues/19611
## Summary
In this pull request we're updating search to support queries with
spaces in addition to the `field:value` pattern that is currently
supported.
Additionally templates search now defaults to `display_name` (since
`display_name` is optional the search will fallback to `name`) when
searching without the `field:value` pattern
Closes: https://github.com/coder/coder/issues/14384
### Downsides with searching on `name` and `display_name`
Because the `name` field cannot include spaces, we end up in a situation
where including a space in the query will result in no results since the
query searches on both `name` AND `display_name`. In the following
example, we can see the results of searching by both `name` and
`display_name` on these templates:
| Name | Display Name |
| ------ | ------------- |
| docker | Docker Template |
| faketemplate | A Fake Template |
| azure | Fake Azure Template |
| anotherfake | Another Fake Template |
| azurefake | Another Fake Fake Azure Template |
https://github.com/user-attachments/assets/b0e0793e-e77d-46bc-9a42-d7cf4f8bd910
### Proposal: Search on `display_name` by default and allow for `name`
using the `field:value` pattern
If we remove `name` from the default template search, we're now able to
search with spaces on template `display_names`. Since `display_names`
are what users see in the templates list they might expect the search to
work this way.
Below is an example of `name` being removed from the default template
search.
https://github.com/user-attachments/assets/9aba5911-4960-4384-befb-08ea1acaa3ab
With this approach users would still be able to search on template names
by specifying `exact_name:foo`.
### Testing
Added additional test cases to ensure spaces were handled as expected in
combination with `field:value` patterns.
Closes https://github.com/coder/internal/issues/885
Adds a new database method GetProvisionerJobByIDWithLock that uses FOR
UPDATE without SKIP LOCKED to fix workspace build cancellation returning
500 errors when jobs are locked.
This change adds a support-index for `GetAPIKeysLastUsedAfter`.
On dogfood (24h): called 4.4k times, 380ms average.
Change for tested time range: `170ms` -> `3.3ms`.
Fixescoder/internal#724
This change adds an index to optimize the
`GetProvisionerDaemonsWithStatusByOrganization` query.
Execution time dropped from `18s 838ms` to `107ms`.
see https://github.com/coder/internal/issues/959 but the tl; dr is:
- we call this DB query on an interval (every 15s) and it would be
called on each coderd replica as well
- the generated values update very infrequently (for our most used
internal template I saw the builds created/claimed update twice in a 1h
period)
- we have no index on the initiator ID, so this query has to scan the
entire workspace_builds table on every request
In reality this should likely just be a Prometheus metric, and
Prometheus can handle the counter reset behaviour at query time, but for
now this should at least cut the load of the query to 25% of it's
current impact.
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>
The latest release of all `pg_dump` major versions, going back to 13,
started inserting `\restrict` `\unrestrict` keywords into dumps. This
currently breaks sqlc in `gen/dump` and our check migration script. Full
details of the postgres change are available here:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=575f54d4c
To fix, we'll always use the `pg_dump` in our postgres 13.21 docker
image for schema dumps, instead of what's on the runner/local machine.
Coder doesn't restore from postgres dumps, so we're not vulnerable to
attacks that would be patched by the latest postgres version.
Regardless, we'll unpin ASAP.
Once sqlc is updated to handle these keywords, we need to start
stripping them when comparing the schema in the migration check script,
and then we can unpin the pg_dump version. This is being tracked at
https://github.com/coder/internal/issues/965
* provisionerdserver: Expires prebuild user token for workspace, if it
exists, when regenerating session token.
* dbauthz: disallow prebuilds user from creating api keys
* dbpurge: added functionality to expire stale api keys owned by the
prebuilds user
This PR should resolve https://github.com/coder/internal/issues/719 by
limiting the `workspace_builds` rows selected by the query to the most
recent 100 builds of a template, as opposed to all builds in the last
30d. For our own internal templates with the most builds (1700-2000 in a
30d period) this should cut the query execution time by about 80%.
Unless we have some restriction on keeping the 30d period, contract
related or otherwise, this seems like a safe change to make. In addition
to the execution speed improvements it also means the memory for the
query is bounded as well.
If we want to keep a 30d time period for the avg build time value I
think it's worth exploring a purpose built solution such as histogram
structures where the build times could be bucketized by template ID as
they're observed.
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>
- Removes GetManagedAgentCount query
- Adds new table `usage_events_daily` which stores aggregated usage
events by the type and UTC day
- Adds trigger to update the values in this table when a new row is
inserted into `usage_events`
- Adds a migration that adds `usage_events_daily` rows for existing data
in `usage_events`
- Adds tests for the trigger
- Adds tests for the backfill query in the migration
Since the `usage_events` table is unreleased currently, this migration
will do nothing on real deployments and will only affect preview
deployments such as dogfood.
Closes https://github.com/coder/internal/issues/943
## Description
This PR introduces one counter and two histograms related to workspace
creation and claiming. The goal is to provide clearer observability into
how workspaces are created (regular vs prebuild) and the time cost of
those operations.
### `coderd_workspace_creation_total`
* Metric type: Counter
* Name: `coderd_workspace_creation_total`
* Labels: `organization_name`, `template_name`, `preset_name`
This counter tracks whether a regular workspace (not created from a
prebuild pool) was created using a preset or not.
Currently, we already expose `coderd_prebuilt_workspaces_claimed_total`
for claimed prebuilt workspaces, but we lack a comparable metric for
regular workspace creations. This metric fills that gap, making it
possible to compare regular creations against claims.
Implementation notes:
* Exposed as a `coderd_` metric, consistent with other workspace-related
metrics (e.g. `coderd_api_workspace_latest_build`:
https://github.com/coder/coder/blob/main/coderd/prometheusmetrics/prometheusmetrics.go#L149).
* Every `defaultRefreshRate` (1 minute ), DB query
`GetRegularWorkspaceCreateMetrics` is executed to fetch all regular
workspaces (not created from a prebuild pool).
* The counter is updated with the total from all time (not just since
metric introduction). This differs from the histograms below, which only
accumulate from their introduction forward.
### `coderd_workspace_creation_duration_seconds` &
`coderd_prebuilt_workspace_claim_duration_seconds`
* Metric types: Histogram
* Names:
* `coderd_workspace_creation_duration_seconds`
* Labels: `organization_name`, `template_name`, `preset_name`, `type`
(`regular`, `prebuild`)
* `coderd_prebuilt_workspace_claim_duration_seconds`
* Labels: `organization_name`, `template_name`, `preset_name`
We already have `coderd_provisionerd_workspace_build_timings_seconds`,
which tracks build run times for all workspace builds handled by the
provisioner daemon.
However, in the context of this issue, we are only interested in
creation and claim build times, not all transitions; additionally, this
metric does not include `preset_name`, and adding it there would
significantly increase cardinality. Therefore, separate more focused
metrics are introduced here:
* `coderd_workspace_creation_duration_seconds`: Build time to create a
workspace (either a regular workspace or the build into a prebuild pool,
for prebuild initial provisioning build).
* `coderd_prebuilt_workspace_claim_duration_seconds`: Time to claim a
prebuilt workspace from the pool.
The reason for two separate histograms is that:
* Creation (regular or prebuild): provisioning builds with similar time
magnitude, generally expected to take longer than a claim operation.
* Claim: expected to be a much faster provisioning build.
#### Native histogram usage
Provisioning times vary widely between projects. Using static buckets
risks unbalanced or poorly informative histograms.
To address this, these metrics use [Prometheus native
histograms](https://prometheus.io/docs/specs/native_histograms/):
* First introduced in Prometheus v2.40.0
* Recommended stable usage from v2.45+
* Requires Go client `prometheus/client_golang` v1.15.0+
* Experimental and must be explicitly enabled on the server
(`--enable-feature=native-histograms`)
For compatibility, we also retain a classic bucket definition (aligned
with the existing provisioner metric:
https://github.com/coder/coder/blob/main/provisionerd/provisionerd.go#L182-L189).
* If native histograms are enabled, Prometheus ingests the
high-resolution histogram.
* If not, it falls back to the predefined buckets.
Implementation notes:
* Unlike the counter, these histograms are updated in real-time at
workspace build job completion.
* They reflect data only from the point of introduction forward (no
historical backfill).
## Relates to
Closes: https://github.com/coder/coder/issues/19528
Native histograms tested in observability stack:
https://github.com/coder/observability/pull/50
closes https://github.com/coder/coder/issues/18274
This pull request makes system users visible in various group related
queries so that they can be added to and removed from groups. This
allows system user quotas to be configured. System users are still
ignored in certain queries, such as when license seat consumption is
determined.
This pull request further ensures the existence of a
"coder_prebuilt_workspaces" group in any organization that needs
prebuilt workspaces
---------
Co-authored-by: Susana Ferreira <susana@coder.com>
Closes https://github.com/coder/internal/issues/716
This prevents a scan over the entire `workspace_build` table by removing
a `join`. This is still imperfect as we are still scanning over the
number of builds for the workspaces in the arguments. Ideally we would
have some index or something precomputed. Then we could skip scanning
over the builds for the correct workspaces that are not the latest.
## Summary
In this pull request we're adding support for additional filtering
options to the `provisioners list` CLI command and the
`/provisionerdaemons` API endpoint.
Resolves: https://github.com/coder/coder/issues/18783
### Changes
#### Added CLI Options
- `--show-offline`: When this option is provided, all provisioner
daemons will be returned. This means that when `--show-offline` is not
provided only `idle` and `busy` provisioner daemons will be returned.
- `--status=<list_of_statuses>`: When this option is provided with a
comma-separated list of valid statuses (`idle`, `busy`, or `offline`)
only provisioner daemons that have these statuses will be returned.
- `--max-age=<duration>`: When this option is provided with a valid
duration value (e.g., `24h`, `30s`) only provisioner daemons with a
`last_seen_at` timestamp within the provided max age will be returned.
#### Query Params
- `?offline=true`: Include offline provisioner daemons in the results.
Offline provisioner daemons will be excluded if `?offline=false` or if
offline is not provided.
- `?status=<list_of_statuses>`: Include provisioner daemons with the
specified statuses.
- `?max_age=<duration>`: Include provisioner daemons with a
`last_seen_at` timestamp within the max age duration.
#### Frontend
- Since offline provisioners will not be returned by default anymore
(`--show-offline` has to be provided to see them), a checkbox was added
to the provisioners list page to allow for offline provisioners to be
displayed
- A revamp of the provisioners page will be done in:
https://github.com/coder/coder/issues/17156, this checkbox change was
just added to maintain currently functionality with the backend updates
Current provisioners page (without checkbox)
<img width="1329" height="574" alt="Screenshot 2025-08-20 at 10 51
00 AM"
src="https://github.com/user-attachments/assets/77b73650-0b62-44f0-a77f-acbe5710809f"
/>
Provisioners page with checkbox (unchecked)
<img width="1314" height="626" alt="Screenshot 2025-08-20 at 10 48
40 AM"
src="https://github.com/user-attachments/assets/7ba164ad-6d3f-417b-bd39-338c0161b145"
/>
Provisioner page with checkbox (checked) and URL updated with query
parameters
<img width="1306" height="597" alt="Screenshot 2025-08-20 at 10 50
14 AM"
src="https://github.com/user-attachments/assets/e78d0986-bbf8-491b-9d56-b682973237a0"
/>
### Show Offline vs Offline Status
To list offline provisioner daemons, users can either:
1. Include the `--show-offline` option
OR
2. Include `offline` in the list of values provided to the `--status`
option
Username length and format, via regex, are already enforced at the
application layer, but we have some code paths with database queries
where we could optimize away many of the DB query calls if we could be
sure at the database level that the username is never an empty string.
For example: https://github.com/coder/coder/pull/19395
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>