Commit Graph

1048 Commits

Author SHA1 Message Date
Hugo Dutka e62c5db678 chore: remove references to dbtestutil.WillUsePostgres (#20436)
Addresses https://github.com/coder/internal/issues/758.

This PR only cleans up dead code, it makes no changes to test logic.
2025-10-23 14:24:54 +02:00
Jake Howell d455f6ea2b fix: rename total to count in AIBridgeListInterceptionsResponse (#20410)
Thanks to the great work in #20393, we’ve successfully introduced
offset-based pagination for this endpoint. However, the frontend expects
a `count` field in the response rather than `total`. This PR updates the
response payload to rename the returned key to `count` for consistency
with frontend expectations and existing API patterns.

This is necessary to unblock the work in #20331
2025-10-23 13:19:12 +11:00
Marcin Tojek caeca1097b chore: refactor license validation (#20411) 2025-10-22 16:12:36 +02:00
Marcin Tojek f2a410566c feat: add support buttons (#20339)
Fixes: https://github.com/coder/coder/issues/16804
2025-10-22 15:35:16 +02:00
Dean Sheather 69c2c40512 chore: add user details to aibridge interception list endpoint (#20397)
- Adds FK from `aibridge_interceptions.initiator_id` to `users.id`
- This is enforced by deleting any rows that don't have any users. Since
this is an experimental feature AND coder never deletes user rows I
think this is acceptable.
- Adds `name` as a property on `codersdk.MinimalUser`
- This matches the `visible_users` view in the database. I'm unsure why
`name` wasn't already included given that `username` is.
- Adds a new `initiator` field to `codersdk.AIBridgeInterception` which
contains `codersdk.MinimalUser` (ID, username, name, avatar URL)
- Removes `initiator_id` from `codersdk.AIBridgeInterception`
    - Should be fine since we're still in early access
2025-10-22 16:18:31 +11:00
Dean Sheather ea261a1f7c chore: add offset-based pagination support to aibridge list endpoint (#20393)
Necessary for the frontend to be able to paginate easily. Cursor
pagination is good for fetching all events, but doesn't play very well
when a pagination component gets involved.

Adds support for `?offset=x` to the existing endpoint. The cursor-based
pagination (`?after_id=x`) is still supported. The two pagination modes
are mutually exclusive, and are documented as such. If both are
supplied, the request will be rejected.

Also adds a `total` property to the response that contains the full
count of items matching the filter. We already have indices in place so
I don't think this will impact performance (or we can revisit it before
GA).
2025-10-21 11:50:00 +00:00
Dean Sheather 5887867e9b chore: rework wsproxy mesh tests to avoid flakes (#20296)
- Attempts pings twice per replicasync callback in wsproxy
- Reworks the test setup code to be more lenient and retry proxy
registration on failure

Closes coder/internal#957
2025-10-16 18:39:06 +11:00
Mathias Fredriksson 408b09a1f2 feat(coderd): add audit resource for tasks (#20301)
Updates coder/internal#976
2025-10-15 16:13:59 +00:00
Spike Curtis 05b037bdea fix: avoid deadlock race writing to a disconnected mapper (#20303)
fixes https://github.com/coder/internal/issues/1045

Fixes a race condition in our PG Coordinator when a peer disconnects. We issue database queries to find the peer mappings (node structures for each peer connected via a tunnel), and then send these to the "mapper" that generates diffs and eventually writes the update to the websocket.

Before this change we erroneously used the querier's context for this update, which has the same lifetime as the coordinator itself. If the peer has disconnected, the mapper might not be reading from its channel, and this causes a deadlock in a querier worker. This also prevents us from doing any more work on the peer.

I also added some more debug logging that would have been helpful when tracking this down.
2025-10-15 15:56:07 +04:00
ケイラ caeff49aba chore: refactor roles to support multiple permission sets scoped by org id (#20186)
In preparation for adding the "member" permission level, which will also
be grouped by org ID, do a bit of a refactor to make room for it and the
existing "org" level to live in the same `map`
2025-10-09 11:08:34 -06:00
Sas Swart 544f15523c fix: adjust workspace claims to be initiated by users (#20179)
The prebuilds user never initiates a workspace claim autonomously. A
claim can only happen when a user attempts to create a workspace. When
listing prebuild provisioner jobs, it would not make sense to see jobs
related to users who are creating workspaces and have gotten a prebuilt
workspace. When cleaning up an overwhelmed provisioner queue, we should
not delete claims as they have humans waiting for them and are not part
of the thundering herd.

Therefore, this PR ensures that provisioner jobs that claim workspaces
are considered to be initiated by the user, not the prebuilds system.
2025-10-08 10:40:54 +02:00
Sas Swart d17dd5d787 feat: add filtering by initiator to provisioner job listing in the CLI (#20137)
Relates to https://github.com/coder/internal/issues/934

This PR provides a mechanism to filter provisioner jobs according to who
initiated the job.
This will be used to find pending prebuild jobs when prebuilds have
overwhelmed the provisioner job queue. They can then be canceled.

If prebuilds are overwhelming provisioners, the following steps will be
taken:

```bash
# pause prebuild reconciliation to limit provisioner queue pollution:
coder prebuilds pause 
# cancel pending provisioner jobs to clear the queue
coder provisioner jobs list --initiator="prebuilds" --status="pending" | jq ... | xargs -n1 -I{} coder provisioner jobs cancel {}
# push a fixed template and wait for the import to complete
coder templates push ... # push a fixed template
# resume prebuild reconciliation
coder prebuilds resume
```

This interface differs somewhat from what was specified in the issue,
but still provides a mechanism that addresses the issue. The original
proposal was made by myself and this simpler implementation makes sense.
I might add a `--search` parameter in a follow-up if there is appetite
for it.

Potential follow ups:
* Support for this usage: `coder provisioner jobs list --search
"initiator:prebuilds status:pending"`
* Adding the same parameters to `coder provisioner jobs cancel` as a
convenience feature so that operators don't have to pipe through `jq`
and `xargs`
2025-10-06 08:56:43 +00:00
Zach 4d1003eace fix: remove initial global HTTP client usage (#20128)
This PR makes the initial steps at removing usage of the global Go HTTP
client, which was seen to have impacts on test flakiness in
https://github.com/coder/internal/issues/1020. The first commit removes
uses from tests, with the exception of one test that is tightly coupled
to the default client. The second commit makes easy/low-risk removals
from application code. This should have some impact to reduce test flakiness.
2025-10-02 11:43:13 -06:00
Cian Johnston ff930ad4f3 feat(coderd): add ability to search org members by user_id, is_system, github_user_id (#20048)
Adds the ability to search org members by query.
Supported fields: `user_id`, `is_system`, `github_user_id`.
2025-09-30 23:54:21 +01:00
Cian Johnston 3e1f6afd66 chore: work around timing issue in TestReplicas/ErrorWithoutLicense (#20002)
Closes https://github.com/coder/internal/issues/268

Wraps the assertions in a `testutil.Eventually` so that hopefully any
transient timing issues resolve themselves. If this does not resolve the
issue, we may need to plumb through some kind of `chan struct{}` into
`api.Entitlements.Update()`
2025-09-29 10:20:26 +01:00
Dean Sheather fc58996bbf chore: add StripPrefix to aibridge server handler (#19990)
oops
2025-09-26 15:40:42 +00:00
Dean Sheather 43415f0144 chore: add enterprise feature for aibridge (#19976)
Adds enterprise feature "aibridge" and gates the aibridge CRUD and LLM API endpoints behind it.
2025-09-27 01:13:06 +10:00
Paweł Banaszewski 65f2895c0d chore: add CLI command to list aibridge interceptions (#19935)
Co-authored-by: Dean Sheather <dean@deansheather.com>
2025-09-27 00:58:12 +10:00
Paweł Banaszewski 0a6ba5d51a feat: add endpoint to list aibridge interceptions (#19929)
Co-authored-by: Dean Sheather <dean@deansheather.com>
2025-09-27 00:20:33 +10:00
Thomas Kosiewski d0db9ec88f feat: add multi-scope support to API keys (#19917)
# Canonicalize API Key Scopes

This PR introduces canonical API key scopes with a `coder:` namespace prefix to avoid collisions with low-level resource:action names. It:

1. Renames special API key scopes in the database:
   - `all` → `coder:all`
   - `application_connect` → `coder:application_connect`

2. Adds support for a new `scopes` field in the API key creation request, allowing multiple scopes to be specified while maintaining backward compatibility with the singular `scope` field.

3. Updates the API documentation to reflect these changes, including the new endpoint for listing public API key scopes.

4. Ensures backward compatibility by mapping between legacy and canonical scope names in relevant code paths.
2025-09-26 11:56:34 +02:00
Danny Kopping 0a79817050 feat: initialize aibridged & mount API handler (#19798)
Addresses https://github.com/coder/internal/issues/987
2025-09-25 16:37:28 +02:00
Danny Kopping 6971f612be feat: aibridged mcp handling (#19911)
If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting.
2025-09-25 16:01:56 +02:00
Danny Kopping fc9bff7107 feat: add aibridged package (#19797)
Addresses https://github.com/coder/internal/issues/987
2025-09-25 15:40:25 +02:00
Danny Kopping 615585d5d1 feat: add aibridgedserver pkg (#19902) 2025-09-25 13:32:16 +02:00
Dean Sheather 42dd544d90 fix: use unique cookies for workspace proxies (#19930)
There is currently an issue with subdomain workspace apps on workspace
proxies, where if you have a workspace proxy wildcard nested beneath the
primary wildcard, cookies from the primary may be sent to the server
before cookies from the proxy specifically.

Currently:
1. Use a subdomain app via the primary proxy `*.coder.corp.com`
    a. Client sends no cookies
    a. Server does token smuggling flow
a. Server sets a cookie `coder_subdomain_app_session_token` on
`*.coder.corp.com`
    a. Server redirects client to reload the page
    a. Request should succeed as usual
1. Wait until the primary proxy's session token cookie has expired in
the database (or make it invalid yourself)
1. Use a subdomain app via a separate proxy `*.sydney.coder.corp.com`
a. Client sends `coder_subdomain_app_session_token` cookie from
`*.coder.corp.com`
    a. Server validates supplied cookie, it fails because it's expired
    a. Server does token smuggling flow
a. Server sets a cookie `coder_subdomain_app_session_token` on
`*.sydney.coder.corp.com`
    a. Server redirects client to reload page
    a. Client sends BOTH cookies.
a. The server will only process the first cookie it receives, so if the
expired cookie for the primary proxy is sent first the request will end
up in a permanent loop on step b.

The fix is to append `_{hash(wildcard_access_url)}` to the subdomain
cookies as we cannot control browser behavior further. This avoids the
conflict as each proxy will only read it's specific cookie.
2025-09-25 00:30:02 +10:00
Spike Curtis 5f1ddb6d57 test: skip ProbeOK test until fixed to not flake (#19931)
temp skip of flaky test: https://github.com/coder/internal/issues/957
2025-09-24 10:31:51 +04:00
Thomas Kosiewski fb0ce389a6 feat: implement API key scopes database migration (#19861)
Added database migration for API key scopes.

Fixes #19845
2025-09-22 19:26:51 +02:00
Spike Curtis 606ae897b7 chore: refactor to directly create Client in Command Handlers (#19760)
Refactors the CLI to create the `*codersdk.Client` in the handlers. This is groundwork for changing the `rootCmd.InitClient()` to use the new `ClientOption`​s.

It also improves variable locality, scoping the Client to the handler. This makes misuse less likely and reduces the memory allocations to just the command being executed, rather than allocating a Client for every command regardless of whether it is executed.
2025-09-22 17:14:07 +04:00
Brett Kolodny 38ca98745b feat: add shared_with_group: and shared_with_user: filters to /workspaces endpoint (#19875)
Adds shared_with_user and shared_with_group filters to the /workspaces
endpoint.

- `shared_with_user`: filters workspaces shared with a specific user.
Accepts a user UUID or username.
- `shared_with_group`: filters workspaces shared with a specific group.
Accepts:
  - a group UUID, or
  - `<organization name>/<group name>`, or
  - `<group name>` (resolved in the default organization).


Closes
[coder/internal#1004](https://github.com/coder/internal/issues/1004)
2025-09-19 16:05:27 -04:00
Brett Kolodny e6b04d1918 feat: add shared filter to workspaces query (#19807)
Adds a `shared:<boolean>` search query to the `/workspaces [get]`
endpoint


https://github.com/user-attachments/assets/ccf84bd9-c1fd-4085-825b-2e3176a2d488

Closes
[coder/internal#972](https://github.com/coder/internal/issues/972)
2025-09-16 12:37:39 -04:00
Ethan 995b330250 test: avoid sharing deployment values between subtests (#19833)
Blink didn't figure out a CI failure on main was caused by a data race; fixing it.


I've also updated the [blink prompt](https://gist.githubusercontent.com/ethanndickson/8dea9f1db3957ac1baf30ae8ce6f1a42/raw/060aea7fabb82bef0029a17dad9a5daee7940760/blink-flake-instructions.md).

https://github.com/coder/coder/actions/runs/17737809615
2025-09-16 13:51:26 +10:00
Ethan 6a9b896f5b fix!: use client ip when creating connection logs for workspace proxied app accesses (#19788)
Breaking API Change: 
> The presence of the `ip` field on `codersdk.ConnectionLog` cannot be
guaranteed, and so the field has been made optional. It may be omitted
on API responses.

When running a scaletest, I noticed logs of the form:
```
2025-09-12 06:34:10.924 [erro]  coderd.workspaceapps: upsert connection log failed  trace=0xa17580  span=0xa17620  workspace_id=81b937d7-5777-4df5-b5cb-80241f30326f  agent_id=78b2ff6d-b4a6-4a4e-88a7-283e05455a88  app_id=00000000-0000-0000-0000-000000000000  user_id=00000000-0000-0000-0000-000000000000  user_agent=""  app_slug_or_port=terminal  status_code=404  request_id=67f03cf8-9523-444a-97bc-90de080a54c8 ...
    error= 1 error occurred:
           	* pq: null value in column "ip" of relation "connection_logs" violates not-null constraint
```

to ensure logs are never omitted from the connection log due to a
missing IP again (i.e. I'm not sure if we can always rely on a valid,
parseable, IP from `(http.Request).RemoteAddr`), I've removed the `NOT
NULL` constraint on `ip` on `connection_logs`, and made `ip` on the API
response optional.


The specific cause for these null IPs was the
`/workspaceproxies/me/issue-signed-app-token [post]` endpoint
constructing it's own `http.Request` without a `RemoteAddr` set, and
then passing that to the token issuer.

To solve this, we'll have workspace proxies send the real IP of the
client when calling `/workspaceproxies/me/issue-signed-app-token [post]`
via the header `Coder-Workspace-Proxy-Real-IP`.
2025-09-15 12:30:17 +10:00
Thomas Kosiewski 088d14933c feat: ensure OAuth2 refresh tokens outlive access tokens (#19769) 2025-09-13 08:57:26 +02:00
Brett Kolodny 854f3c0187 feat: add workspaces/acl [delete] endpoint (#19772)
Closes
[coder/internal#971](https://github.com/coder/internal/issues/971)
2025-09-12 12:21:01 -04:00
Brett Kolodny 8d5c566a98 feat: add sharing remove command to the CLI (#19767)
Closes
[coder/internal#861](https://github.com/coder/internal/issues/861)
2025-09-11 16:22:25 -04:00
Kacper Sawicki 3074547322 perf(enterprise): remove expensive GetWorkspaces query from entitlements (#19747)
Closes: https://github.com/coder/internal/issues/964

This PR addresses the significant database load issue where the
`GetWorkspaces` query was causing performance problems in the license
entitlements code.
2025-09-09 15:46:11 +02:00
Brett Kolodny 065c7c3d5d feat: add sharing show command to the CLI (#19707)
Closes https://github.com/coder/internal/issues/860
2025-09-08 09:30:08 -04:00
Brett Kolodny 909acbc833 feat: add sharing add command to the CLI (#19576)
Adds a `sharing add` command for sharing Workspaces with other users and
groups.

The command allows sharing with multiple users, and groups within one
command as well as specifying the role (`use`, or `admin`) defaulting to
`use` if none is specified.

In the current implementation when the command completes we show the
user the current state of the workspace ACL.

```
$ coder sharing add apricot-catfish-86 --user=member:admin --group=contractors:use
USER    GROUP        ROLE
member  -            admin
member  contractors  use
```

If a user is a part of multiple groups, or the workspace has been
individually shared with them they will show up multiple times. Although
this is a bit confusing at first glance it's important to be able to
tell what the maximum role a user may have, and via what ACL they have
it.

---

One piece of UX to consider is that in order to be able to share a
Workspace with a user they must have a role that can read that user. In
the tests we give the user the `ScopedRoleOrgAuditor` role.

Closes
[coder/internal#859](https://github.com/coder/internal/issues/859)
2025-09-04 17:37:16 -04:00
Callum Styan 0ec9df390b fix: reduce impact of GetPrebuildMetrics on database (#19694)
see https://github.com/coder/internal/issues/959 but the tl; dr is:
- we call this DB query on an interval (every 15s) and it would be
called on each coderd replica as well
- the generated values update very infrequently (for our most used
internal template I saw the builds created/claimed update twice in a 1h
period)
- we have no index on the initiator ID, so this query has to scan the
entire workspace_builds table on every request

In reality this should likely just be a Prometheus metric, and
Prometheus can handle the counter reset behaviour at query time, but for
now this should at least cut the load of the query to 25% of it's
current impact.

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-09-04 13:43:50 -07:00
Ethan 50704a5014 ci: improve 'tfail in goroutine' ruleguard rule (#19682)
This PR improves the ruleguard rule for detecting `t.Fail` calls in goroutines. It picks up additional violations, of which are fixed in this PR.
See self-review for details.

The motivation for fixing this comes from a flake I fixed in https://github.com/coder/coder/pull/19599, where tests would fail from a `require` in an `Eventually`.
2025-09-04 14:28:29 +10:00
Spike Curtis 04dfda8a0e fix: change enqueue error to debug log level (#19686)
fixes https://github.com/coder/internal/issues/958

Logging was being done at error level, but most likely any errors are from simple races between an update triggered around the same time as a client disconnecting. Debug is fine for these.
2025-09-03 13:42:02 +04:00
Spike Curtis 1354d84eb4 chore: refactor instance identity to be a SessionTokenProvider (#19566)
Refactors Agent instance identity to be a SessionTokenProvider.

Refactors the CLI to create Agent clients via a centralized function, rather than add-hoc via individual command handlers and their flags.

This allows commands besides `coder agent`, but which still use the agent identity, to support instance identity authentication.

Fixes #19111 by unifying all API requests to go thru the SessionTokenProvider for auth credentials.
2025-09-03 10:38:42 +04:00
Dean Sheather 39bf3ba628 chore: replace GetManagedAgentCount query with aggregate table (#19636)
- Removes GetManagedAgentCount query
- Adds new table `usage_events_daily` which stores aggregated usage
events by the type and UTC day
- Adds trigger to update the values in this table when a new row is
inserted into `usage_events`
- Adds a migration that adds `usage_events_daily` rows for existing data
in `usage_events`
- Adds tests for the trigger
- Adds tests for the backfill query in the migration

Since the `usage_events` table is unreleased currently, this migration
will do nothing on real deployments and will only affect preview
deployments such as dogfood.

Closes https://github.com/coder/internal/issues/943
2025-08-30 03:39:37 +10:00
Susana Ferreira 353f5dedc1 fix(coderd): fix logic for reporting prebuilt workspace duration metric (#19641)
## Description

When creating a prebuilt workspace, both `flags.IsPrebuild` and
`flags.IsFirstBuild` are true. Previously, the logic rejected cases with
multiple flags, so `coderd_workspace_creation_duration_seconds` wasn’t
updated for prebuilt creations. This is the only valid scenario where
two flags can be true.

## Changes

* Fix logic to update `coderd_workspace_creation_duration_seconds`
metric for prebuilt workspaces.
* Add prebuild helper functions to coderdenttest (other prebuild tests
can reuse this).
* Update workspace's provisionerdmetric tests to include this metric.

Follow-up: https://github.com/coder/coder/pull/19503
Related to: https://github.com/coder/coder/issues/19528
2025-08-29 15:48:48 +01:00
Dean Sheather 605dad8b1f fix: suppress license expiry warning if a new license covers the gap (#19601)
Previously, if you had a new license that would start before the current
one fully expired, you would get a warning. Now, the license validity
periods are merged together, and a warning is only generated based on
the end of the current contiguous period of license coverage.

Closes #19498
2025-08-29 13:53:23 +00:00
Spike Curtis 192c81e8f9 chore: refactor codersdk to use SessionTokenProvider (#19565)
Refactors `codersdk.Client`'s use of session tokens to use a `SessionTokenProvider`, which abstracts the obtaining and storing of the session token.

The main motiviation is to unify Agent authentication an an upstack PR, which can use cloud instance identity via token exchange, rather than a fixed session token.

However, the abstraction could also allow functionality like obtaining the session token from other external sources like the OS credential manager, or an external secret/key management system like Vault.
2025-08-29 10:41:32 +02:00
Callum Styan 321c2b8fce fix: fix flake in TestExecutorAutostartSkipsWhenNoProvisionersAvailable (#19478)
The flake here had two causes:
1. related to usage of time.Now() in MustWaitForProvisionersAvailable
and
2. the fact that UpdateProvisionerLastSeenAt can not use a time that is
further in the past than the current LastSeenAt time

Previously the test here was calling
`coderdtest.MustWaitForProvisionersAvailable` which was using `time.Now`
rather than the next tick time like the real `hasProvisionersAvailable`
function does. Additionally, when using `UpdateProvisionerLastSeenAt`
the underlying db query enforces that the time we're trying to set
`LastSeenAt` to cannot be older than the current value.

I was able to reliably reproduce the flake by executing both the
`UpdateProvisionerLastSeenAt` call and `tickCh <- next` in their own
goroutines, the former with a small sleep to reliably ensure we'd
trigger the autobuild before we set the `LastSeenAt` time. That's when I
also noticed that `coderdtest.MustWaitForProvisionersAvailable` was
using `time.Now` instead of the tick time. When I updated that function
to take in a tick time + added a 2nd call to
`UpdateProvisionerLastSeenAt` to set an original non-stale time, we
could then never get the test to pass because the later call to set the
stale time would not actually modify `LastSeenAt`. On top of that,
calling the provisioner daemons closer in the middle of the function
doesn't really do anything of value in this test.

**The fix for the flake is to keep the go routines, ensuring there would
be a flake if there was not a relevant fix, but to include the fix which
is to ensure that we explicitly wait for the provisioner to be stale
before passing the time to `tickCh`.**

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-08-28 12:07:50 -07:00
Susana Ferreira 0ab345ca84 feat: add prebuild timing metrics to Prometheus (#19503)
## Description

This PR introduces one counter and two histograms related to workspace
creation and claiming. The goal is to provide clearer observability into
how workspaces are created (regular vs prebuild) and the time cost of
those operations.

### `coderd_workspace_creation_total`

* Metric type: Counter
* Name: `coderd_workspace_creation_total`
* Labels: `organization_name`, `template_name`, `preset_name`

This counter tracks whether a regular workspace (not created from a
prebuild pool) was created using a preset or not.
Currently, we already expose `coderd_prebuilt_workspaces_claimed_total`
for claimed prebuilt workspaces, but we lack a comparable metric for
regular workspace creations. This metric fills that gap, making it
possible to compare regular creations against claims.

Implementation notes:
* Exposed as a `coderd_` metric, consistent with other workspace-related
metrics (e.g. `coderd_api_workspace_latest_build`:
https://github.com/coder/coder/blob/main/coderd/prometheusmetrics/prometheusmetrics.go#L149).
* Every `defaultRefreshRate` (1 minute ), DB query
`GetRegularWorkspaceCreateMetrics` is executed to fetch all regular
workspaces (not created from a prebuild pool).
* The counter is updated with the total from all time (not just since
metric introduction). This differs from the histograms below, which only
accumulate from their introduction forward.

### `coderd_workspace_creation_duration_seconds` &
`coderd_prebuilt_workspace_claim_duration_seconds`

* Metric types: Histogram
* Names:
  * `coderd_workspace_creation_duration_seconds`
* Labels: `organization_name`, `template_name`, `preset_name`, `type`
(`regular`, `prebuild`)
  * `coderd_prebuilt_workspace_claim_duration_seconds`
    * Labels: `organization_name`, `template_name`, `preset_name`

We already have `coderd_provisionerd_workspace_build_timings_seconds`,
which tracks build run times for all workspace builds handled by the
provisioner daemon.
However, in the context of this issue, we are only interested in
creation and claim build times, not all transitions; additionally, this
metric does not include `preset_name`, and adding it there would
significantly increase cardinality. Therefore, separate more focused
metrics are introduced here:
* `coderd_workspace_creation_duration_seconds`: Build time to create a
workspace (either a regular workspace or the build into a prebuild pool,
for prebuild initial provisioning build).
* `coderd_prebuilt_workspace_claim_duration_seconds`: Time to claim a
prebuilt workspace from the pool.

The reason for two separate histograms is that:
* Creation (regular or prebuild): provisioning builds with similar time
magnitude, generally expected to take longer than a claim operation.
* Claim: expected to be a much faster provisioning build.

#### Native histogram usage

Provisioning times vary widely between projects. Using static buckets
risks unbalanced or poorly informative histograms.
To address this, these metrics use [Prometheus native
histograms](https://prometheus.io/docs/specs/native_histograms/):
* First introduced in Prometheus v2.40.0
* Recommended stable usage from v2.45+
* Requires Go client `prometheus/client_golang` v1.15.0+
* Experimental and must be explicitly enabled on the server
(`--enable-feature=native-histograms`)

For compatibility, we also retain a classic bucket definition (aligned
with the existing provisioner metric:
https://github.com/coder/coder/blob/main/provisionerd/provisionerd.go#L182-L189).
* If native histograms are enabled, Prometheus ingests the
high-resolution histogram.
* If not, it falls back to the predefined buckets.

Implementation notes:
* Unlike the counter, these histograms are updated in real-time at
workspace build job completion.
* They reflect data only from the point of introduction forward (no
historical backfill).

## Relates to 

Closes: https://github.com/coder/coder/issues/19528
Native histograms tested in observability stack:
https://github.com/coder/observability/pull/50
2025-08-28 15:00:26 +01:00
Sas Swart 4e9ee80882 feat(enterprise/coderd): allow system users to be added to groups (#19518)
closes https://github.com/coder/coder/issues/18274

This pull request makes system users visible in various group related
queries so that they can be added to and removed from groups. This
allows system user quotas to be configured. System users are still
ignored in certain queries, such as when license seat consumption is
determined.

This pull request further ensures the existence of a
"coder_prebuilt_workspaces" group in any organization that needs
prebuilt workspaces

---------

Co-authored-by: Susana Ferreira <susana@coder.com>
2025-08-27 16:57:59 +02:00
ケイラ d7ee1019c0 feat: add endpoint for retrieving workspace acl (#19375)
Implements `/acl [get]` for workspaces, with tests.
Blocked by experiment enablement
2025-08-25 07:11:18 -05:00