Commit Graph

732 Commits

Author SHA1 Message Date
Spike Curtis 192c81e8f9 chore: refactor codersdk to use SessionTokenProvider (#19565)
Refactors `codersdk.Client`'s use of session tokens to use a `SessionTokenProvider`, which abstracts the obtaining and storing of the session token.

The main motiviation is to unify Agent authentication an an upstack PR, which can use cloud instance identity via token exchange, rather than a fixed session token.

However, the abstraction could also allow functionality like obtaining the session token from other external sources like the OS credential manager, or an external secret/key management system like Vault.
2025-08-29 10:41:32 +02:00
Callum Styan 321c2b8fce fix: fix flake in TestExecutorAutostartSkipsWhenNoProvisionersAvailable (#19478)
The flake here had two causes:
1. related to usage of time.Now() in MustWaitForProvisionersAvailable
and
2. the fact that UpdateProvisionerLastSeenAt can not use a time that is
further in the past than the current LastSeenAt time

Previously the test here was calling
`coderdtest.MustWaitForProvisionersAvailable` which was using `time.Now`
rather than the next tick time like the real `hasProvisionersAvailable`
function does. Additionally, when using `UpdateProvisionerLastSeenAt`
the underlying db query enforces that the time we're trying to set
`LastSeenAt` to cannot be older than the current value.

I was able to reliably reproduce the flake by executing both the
`UpdateProvisionerLastSeenAt` call and `tickCh <- next` in their own
goroutines, the former with a small sleep to reliably ensure we'd
trigger the autobuild before we set the `LastSeenAt` time. That's when I
also noticed that `coderdtest.MustWaitForProvisionersAvailable` was
using `time.Now` instead of the tick time. When I updated that function
to take in a tick time + added a 2nd call to
`UpdateProvisionerLastSeenAt` to set an original non-stale time, we
could then never get the test to pass because the later call to set the
stale time would not actually modify `LastSeenAt`. On top of that,
calling the provisioner daemons closer in the middle of the function
doesn't really do anything of value in this test.

**The fix for the flake is to keep the go routines, ensuring there would
be a flake if there was not a relevant fix, but to include the fix which
is to ensure that we explicitly wait for the provisioner to be stale
before passing the time to `tickCh`.**

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-08-28 12:07:50 -07:00
Susana Ferreira 0ab345ca84 feat: add prebuild timing metrics to Prometheus (#19503)
## Description

This PR introduces one counter and two histograms related to workspace
creation and claiming. The goal is to provide clearer observability into
how workspaces are created (regular vs prebuild) and the time cost of
those operations.

### `coderd_workspace_creation_total`

* Metric type: Counter
* Name: `coderd_workspace_creation_total`
* Labels: `organization_name`, `template_name`, `preset_name`

This counter tracks whether a regular workspace (not created from a
prebuild pool) was created using a preset or not.
Currently, we already expose `coderd_prebuilt_workspaces_claimed_total`
for claimed prebuilt workspaces, but we lack a comparable metric for
regular workspace creations. This metric fills that gap, making it
possible to compare regular creations against claims.

Implementation notes:
* Exposed as a `coderd_` metric, consistent with other workspace-related
metrics (e.g. `coderd_api_workspace_latest_build`:
https://github.com/coder/coder/blob/main/coderd/prometheusmetrics/prometheusmetrics.go#L149).
* Every `defaultRefreshRate` (1 minute ), DB query
`GetRegularWorkspaceCreateMetrics` is executed to fetch all regular
workspaces (not created from a prebuild pool).
* The counter is updated with the total from all time (not just since
metric introduction). This differs from the histograms below, which only
accumulate from their introduction forward.

### `coderd_workspace_creation_duration_seconds` &
`coderd_prebuilt_workspace_claim_duration_seconds`

* Metric types: Histogram
* Names:
  * `coderd_workspace_creation_duration_seconds`
* Labels: `organization_name`, `template_name`, `preset_name`, `type`
(`regular`, `prebuild`)
  * `coderd_prebuilt_workspace_claim_duration_seconds`
    * Labels: `organization_name`, `template_name`, `preset_name`

We already have `coderd_provisionerd_workspace_build_timings_seconds`,
which tracks build run times for all workspace builds handled by the
provisioner daemon.
However, in the context of this issue, we are only interested in
creation and claim build times, not all transitions; additionally, this
metric does not include `preset_name`, and adding it there would
significantly increase cardinality. Therefore, separate more focused
metrics are introduced here:
* `coderd_workspace_creation_duration_seconds`: Build time to create a
workspace (either a regular workspace or the build into a prebuild pool,
for prebuild initial provisioning build).
* `coderd_prebuilt_workspace_claim_duration_seconds`: Time to claim a
prebuilt workspace from the pool.

The reason for two separate histograms is that:
* Creation (regular or prebuild): provisioning builds with similar time
magnitude, generally expected to take longer than a claim operation.
* Claim: expected to be a much faster provisioning build.

#### Native histogram usage

Provisioning times vary widely between projects. Using static buckets
risks unbalanced or poorly informative histograms.
To address this, these metrics use [Prometheus native
histograms](https://prometheus.io/docs/specs/native_histograms/):
* First introduced in Prometheus v2.40.0
* Recommended stable usage from v2.45+
* Requires Go client `prometheus/client_golang` v1.15.0+
* Experimental and must be explicitly enabled on the server
(`--enable-feature=native-histograms`)

For compatibility, we also retain a classic bucket definition (aligned
with the existing provisioner metric:
https://github.com/coder/coder/blob/main/provisionerd/provisionerd.go#L182-L189).
* If native histograms are enabled, Prometheus ingests the
high-resolution histogram.
* If not, it falls back to the predefined buckets.

Implementation notes:
* Unlike the counter, these histograms are updated in real-time at
workspace build job completion.
* They reflect data only from the point of introduction forward (no
historical backfill).

## Relates to 

Closes: https://github.com/coder/coder/issues/19528
Native histograms tested in observability stack:
https://github.com/coder/observability/pull/50
2025-08-28 15:00:26 +01:00
Sas Swart 4e9ee80882 feat(enterprise/coderd): allow system users to be added to groups (#19518)
closes https://github.com/coder/coder/issues/18274

This pull request makes system users visible in various group related
queries so that they can be added to and removed from groups. This
allows system user quotas to be configured. System users are still
ignored in certain queries, such as when license seat consumption is
determined.

This pull request further ensures the existence of a
"coder_prebuilt_workspaces" group in any organization that needs
prebuilt workspaces

---------

Co-authored-by: Susana Ferreira <susana@coder.com>
2025-08-27 16:57:59 +02:00
ケイラ d7ee1019c0 feat: add endpoint for retrieving workspace acl (#19375)
Implements `/acl [get]` for workspaces, with tests.
Blocked by experiment enablement
2025-08-25 07:11:18 -05:00
Dean Sheather 82f2e15974 chore: add unknown usage event type error (#19436)
- Adds `usagetypes.UnknownEventTypeError` type, which is returned by
`ParseEventWithType`
- Changes `ParseEvent` to not be a generic function since it doesn't
really need it
- Adds `User-Agent` to tallyman requests
2025-08-22 16:32:35 +10:00
Cian Johnston 86f9bed608 chore: fix TestCheckInactiveUsers flake (#19469)
THIS CODE WAS NOT WRITTEN BY A HUMAN.
Use a fixed time interval to avoid timing flakes.
2025-08-21 16:35:31 +01:00
Dean Sheather bfd392b0bf fix: use int64 in publisher delay (#19457) 2025-08-21 11:23:50 +10:00
Rafael Rodriguez 5b1e809862 fix: support oidc group allowlist in oss (#19430)
## Summary

In this pull request we're adding support for OIDC allowed groups in the
OSS version as part of work for
https://github.com/coder/coder/issues/17027.

### Changes

- Restored support for parsing group allow list in OSS code

### Testing

- Added tests for OSS code
- Tested allowed/prohibited group OIDC flows in premium and OSS
2025-08-20 10:09:13 -05:00
Dean Sheather 1a601c30ad chore: move usage types to new package (#19103) 2025-08-20 23:48:38 +10:00
Dean Sheather 6eb02d1c2a chore: wire up usage tracking for managed agents (#19096)
Wires up the usage collector and publisher to coderd.

Relates to coder/internal#814
2025-08-20 23:38:09 +10:00
Susana Ferreira 560cf84251 fix: prevent activity bump for prebuilt workspaces (#19263)
## Description

This PR ensures that activity-based deadline extensions ("activity
bumping") are not applied to prebuilt workspaces. Prebuilds are managed
by the reconciliation loop and must not have `deadline` or
`max_deadline` values set or extended, as they are not part of the
regular lifecycle executor path.

## Changes

- Update `ActivityBumpWorkspace` SQL query to discard prebuilt
workspaces
- Update application layer to avoid calling activity bump logic on
prebuilt workspaces

Related with: 
* Issue: https://github.com/coder/coder/issues/18898
* PR: https://github.com/coder/coder/pull/19252
2025-08-20 12:19:14 +01:00
Callum Styan f2ee89c36a fix: fix more TestWorkspaceAutobuild flakes (#19417)
made these commits yesterday but apparently I forgot to push so they got
missed in https://github.com/coder/coder/pull/19398

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-08-19 10:24:40 -07:00
Callum Styan 9e5c83ae0d fix: fix flakes in TestWorkspaceAutobuild due to incorrect tick time (#19398)
we missed these in the previous PR, we find `tickTime2`
and pass it to the `tickCh`, but we were incorrectly passing `tickTime`
to `UpdateProvisionerLastSeenAt` in some places

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-08-19 09:24:40 -07:00
Susana Ferreira d79a7797c2 fix: exclude prebuilt workspaces from template-level lifecycle updates (#19265)
## Description

This PR ensures that lifecycle-related changes made via template
schedule updates do **not affect prebuilt workspaces**. Since prebuilds
are managed by the reconciliation loop and do not participate in the
regular lifecycle executor flow, they must be excluded from any updates
triggered by template configuration changes.

This includes changes to TTL, dormant-deletion scheduling, deadline and
autostart scheduling.

## Changes

- Updated SQL query `UpdateWorkspacesTTLByTemplateID` to exclude
prebuilt workspaces
- Updated SQL query `UpdateWorkspacesDormantDeletingAtByTemplateID` to
exclude prebuilt workspaces
- Updated application-layer logic to skip any updates to lifecycle
parameters if a workspace is a prebuild
- Preserved all existing update behavior for regular user workspaces

This change guarantees that only lifecycle-managed workspaces are
affected when template-level configurations are modified, preserving
strict boundaries between prebuild and user workspace lifecycles.

Related with: 
* Issue: https://github.com/coder/coder/issues/18898
* PR: https://github.com/coder/coder/pull/19252
2025-08-19 13:08:01 +01:00
Kacper Sawicki c7cfa65961 fix(enterprise): correct order for external agent init cmd (#19408)
### Description

`CODER_AGENT_TOKEN` env variable was incorrectly being passed to the
curl command instead of the executed script during agent initialization.

Fixed the command order to ensure `CODER_AGENT_TOKEN` is properly passed
to the script execution context rather than the download command.
2025-08-19 11:00:40 +00:00
Kacper Sawicki 9edceef0bf feat(coderd): add support for external agents to API's and provisioner (#19286)
This pull request introduces support for external workspace management, allowing users to register and manage workspaces that are provisioned and managed outside of the Coder.

Depends on: https://github.com/coder/terraform-provider-coder/pull/424

* GET /api/v2/init-script - Gets the agent initialization script
  * By default, it returns a script for Linux (amd64), but with query parameters (os and arch) you can get the init script for different platforms
* GET /api/v2/workspaces/{workspace}/external-agent/{agent}/credentials - Gets credentials for an external agent **(enterprise)**
* Updated queries to filter workspaces/templates by the has_external_agent field
2025-08-19 10:41:33 +02:00
Dean Sheather 8f9f0cda11 chore: avoid DNS lookups for DERP in tests (#19385)
Closes https://github.com/coder/internal/issues/886
2025-08-18 03:44:37 +00:00
Callum Styan 6c902a7410 fix: don't create autostart workspace builds with no available provisioners (#19067)
This should fix https://github.com/coder/coder/issues/17941 by introducing a check for whether there are any valid (non-stale provisioners for a job in the autobuild executor code path.

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-08-15 08:50:51 -07:00
Dean Sheather a25d85631b chore: add usage tracking package (#19095)
Not used in coderd yet, see stack.

Adds two new packages:
- `coderd/usage`: provides an interface for the "Collector" as well as a stub implementation for AGPL
- `enterprise/coderd/usage`: provides an interface for the "Publisher" as well as a Tallyman implementation

Relates to https://github.com/coder/internal/issues/814
2025-08-16 01:31:00 +10:00
Ethan d7bdb3cdef ci: add paralleltestctx to lint/go (#19369)
Closes https://github.com/coder/internal/issues/884

We're adding this as a `go run` in `lint/go` for now, since adding it to
golangci-lint ourselves involves recompiling golangci-lint and then
running that new binary. I'll look into proposing it being added to the
public golangci-lint linters.

Doesn't appear to cause the lint ci job to take any longer, which is
nice.
2025-08-15 16:16:18 +10:00
Susana Ferreira 734299de71 fix: disallow lifecycle endpoints for prebuilt workspaces (#19264)
## Description

This PR updates the API to prevent lifecycle configuration endpoints
from being used on prebuilt workspaces. Since prebuilds are managed by
the reconciliation loop and do not participate in the regular workspace
lifecycle, they must not support per-workspace overrides for fields like
deadline, TTL, autostart, or dormancy.

Attempting to use these endpoints on a prebuilt workspace will now
return a clear validation error (`409 Conflict`) with an appropriate
explanation. This prevents accidental misconfiguration and preserves the
lifecycle separation between prebuilds and regular workspaces.

## Changes

The following endpoints now return an error if the target workspace is a
prebuild:

* `PUT /workspaces/{workspace}/extend`
* `PUT /workspaces/{workspace}/ttl`
* `PUT /workspaces/{workspace}/autostart`
* `PUT /workspaces/{workspace}/dormant`

Update endpoints logic to use the API clock in order to allow time
mocking in tests.

Related with: 
* Issue: https://github.com/coder/coder/issues/18898
* PR: https://github.com/coder/coder/pull/19252
2025-08-14 11:30:19 +01:00
Susana Ferreira 8567ecbe52 fix: set prebuilds lifecycle parameters on creation and claim (#19252)
## Description

This PR ensures that prebuilt workspaces are properly excluded from the
lifecycle executor and treated as a separate class of workspaces, fully
managed by the prebuild reconciliation loop.

It introduces two lifecycle guarantees:
* When a prebuilt workspace is created (i.e., when the workspace build
completes), all lifecycle-related fields are unset, ensuring the
workspace does not participate in TTL, autostop, autostart, dormancy, or
auto-deletion logic.
* When a prebuilt workspace is claimed, it transitions into a regular
user workspace. At this point, all lifecycle fields are correctly
populated according to template-level configurations, allowing the
workspace to be managed by the lifecycle executor as expected.

## Changes

* Prebuilt workspaces now have all lifecycle-relevant fields unset
during creation
* When a prebuild is claimed:
* Lifecycle fields are set based on template and workspace level
configurations. This ensures a clean transition into the standard
workspace lifecycle flow.
* Updated lifecycle-related SQL update queries to explicitly exclude
prebuilt workspaces.

## Relates 

Related issue: https://github.com/coder/coder/issues/18898

To reduce the scope of this PR and make the review process more
manageable, the original implementation has been split into the
following focused PRs:
* https://github.com/coder/coder/pull/19259
* https://github.com/coder/coder/pull/19263
* https://github.com/coder/coder/pull/19264
* https://github.com/coder/coder/pull/19265

These PRs should be considered in conjunction with this one to
understand the complete set of lifecycle separation changes for prebuilt
workspaces.
2025-08-13 12:45:46 +01:00
Rafael Rodriguez aab2ccdb38 fix!: support empty or default fields when updating templates (#19256)
Breaking change: Field types in `codersdk.UpdateTemplateMeta` for
`Icon`, `Description`, and `DisplayName` moved to `*string`

## Summary

In this pull request we're updating the `UpdateTemplateMeta` struct to
allow `DisplayName`, `Description`, and `Icon` to be set as empty `""`
or default to the value from the template if not provided in an update
call.

Fixes https://github.com/coder/coder/issues/19036

### The bug

The reported bug occurred when clients were attempting to update a
metadata field in a template via an edit call. When the request was
decoded into an `UpdateTemplateMeta` struct the default values for
fields in the struct were used to update the template even if they
weren't provided. This led to fields like `Icon` being set to `""` (the
default value).

### Changes

To allow for specific fields to be set to `""` these fields were updated
to be `*string` as opposed to `string`. This allows for clients to set
these fields as `""` in an update request or they will default to the
template value if they are not provided in the update request (will be
`nil`).

Added tests to confirm empty and nil values and updated other tests that
use these fields.
2025-08-12 11:37:44 -05:00
Cian Johnston 060fb57d28 chore(enterprise/coderd): ignore log errors in TestGetCryptoKeys/Unauthorized (#19282)
Fixes https://github.com/coder/internal/issues/797
2025-08-11 12:32:29 +01:00
Cian Johnston afb54f6884 chore: revert feat(enterprise/coderd): allow system users to be added to groups (#19254)
This reverts commit b200fc8e67
(https://github.com/coder/coder/pull/18341).
2025-08-08 12:18:07 +01:00
Sas Swart b200fc8e67 feat(enterprise/coderd): allow system users to be added to groups (#18341)
closes https://github.com/coder/coder/issues/18274

This pull request makes system users visible in various group related
queries so that they can be added to and removed from groups. This
allows system user quotas to be configured. System users are still
ignored in certain queries, such as when license seat consumption is
determined.

This pull request further ensures the existence of a
"coder_prebuilt_workspaces" group in any organization that needs
prebuilt workspaces

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
  * Organization and group member listings now include system users.
* **Bug Fixes**
* Updated tests to reflect the inclusion of system users in member and
group queries.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-08-08 11:03:17 +02:00
ケイラ 7bb52e1f8a test: add tests for updating workspace acl (#19240) 2025-08-07 17:09:46 -06:00
Steven Masley 8ba8b4f061 chore: add profiling labels for pprof analysis (#19232)
PProf labels segment the code into groups for determing the source of
cpu/memory profiles. Since the web server and background jobs share a
lot of the same code (eg wsbuilder), it helps to know if the load is
user induced, or background job based.
2025-08-07 11:21:17 -05:00
ケイラ 26458cd6f0 refactor: consolidate template and workspace acl validation (#19192) 2025-08-07 10:14:58 -06:00
Dean Sheather dc598856e3 chore: improve build deadline code (#19203)
- Adds/improves a lot of comments to make the autostop calculation code
clearer
- Changes the behavior of the enterprise template schedule store to
match the behavior of the workspace TTL endpoint when the new TTL is
zero
- Fixes a bug in the workspace TTL endpoint where it could unset the
build deadline, even though a max_deadline was specified
- Adds a new constraint to the workspace_builds table that enforces the
deadline is non-zero and below the max_deadline if it is set
- Adds CHECK constraint enum generation to scripts/dbgen, used for
testing the above constraint
- Adds Dean and Danielle as CODEOWNERS for the autostop calculation code
2025-08-07 11:00:31 +10:00
ケイラ 1cffd11619 feat: add workspace sharing page (#19107) 2025-07-31 15:05:09 +00:00
Dean Sheather 9a05b4679b chore: fix TestManagedAgentLimit flake (#19026)
Closes https://github.com/coder/internal/issues/812
2025-07-24 05:13:15 +00:00
Dean Sheather 0ebd4356a0 fix: use system context for managed agent count query (#18985) 2025-07-22 06:03:35 +00:00
Dean Sheather 9a6dd73f68 feat: add managed agent license limit checks (#18937)
- Adds a query for counting managed agent workspace builds between two
timestamps
- The "Actual" field in the feature entitlement for managed agents is
now populated with the value read from the database
- The wsbuilder package now validates AI agent usage against the limit
when a license is installed

Closes coder/internal#777
2025-07-22 13:39:26 +10:00
Steven Masley aedc019b4e feat: include template variables in dynamic parameter rendering (#18819)
Closes https://github.com/coder/coder/issues/18671

Template variables now loaded into dynamic parameters.
2025-07-21 13:02:31 -05:00
Cian Johnston 198d50dbc2 chore: replace original GetPrebuiltWorkspaces with optimized version (#18832)
Fixes https://github.com/coder/internal/issues/715

Follow-up from https://github.com/coder/coder/pull/18717

Now that we've determined the updated query is safe, remove the duplication.
2025-07-21 15:31:11 +01:00
Dean Sheather 183a6ebbdf chore: add managed_agent_limit licensing feature (#18876)
Note that enforcement and checking usage will come in a future PR.

This feature is implemented differently than existing features in a few
ways.

It's highly recommended that reviewers read:
- This document which outlines the methods we could've used for license
enforcement:
https://www.notion.so/coderhq/AI-Agent-License-Enforcement-21ed579be59280c088b9c1dc5e364ee8
- Phase 0 of the actual RFC document:
https://www.notion.so/coderhq/Usage-based-Billing-AI-b-210d579be592800eb257de7eecd2d26d

### Multiple features in the license, a single feature in codersdk

Firstly, the feature is represented as a single feature in the codersdk
world, but is represented with multiple features in the license.

E.g. in the license you may have:

    {
      "features": {
        "managed_agent_limit_soft": 100,
        "managed_agent_limit_hard": 200
      }
    }

But the entitlements endpoint will return a single feature:

    {
      "features": {
        "managed_agent_limit": {
          "limit": 200,
          "soft_limit": 100
        }
      }
    }

This is required because of our rigid parsing that uses a
`map[string]int64` for features in the license. To avoid requiring all
customers to upgrade to use new licenses, the decision was made to just
use two features and merge them into one. Older Coder deployments will
parse this feature (from new licenses) as two separate features, but
it's not a problem because they don't get used anywhere obviously.

The reason we want to differentiate between a "soft" and "hard" limit is
so we can show admins how much of the usage is "included" vs. how much
they can use before they get hard cut-off.

### Usage period features will be compared and trump based on license
issuance time

The second major difference to other features is that "usage period"
features such as `managed_agent_limit` will now be primarily compared by
the `iat` (issued at) claim of the license they come from. This differs
from previous features. The reason this was done was so we could reduce
limits with newer licenses, which the current comparison code does not
allow for.

This effectively means if you have two active licenses:
- `iat`: 2025-07-14, `managed_agent_limit_soft`: 100,
`managed_agent_limit_hard`: 200
- `iat`: 2025-07-15, `managed_agent_limit_soft`: 50,
`managed_agent_limit_hard`: 100

Then the resulting `managed_agent_limit` entitlement will come from the
second license, even though the values are smaller than another valid
license. The existing comparison code would prefer the first license
even though it was issued earlier.

### Usage period features will count usage between the start and end
dates of the license

Existing limit features, like the user limit, just measure the current
usage value of the feature. The active user count is a gauge that goes
up and down, whereas agent usage can only be incremented, so it doesn't
make sense to use a continually incrementing counter forever and ever
for managed agents.

For managed agent limit, we count the usage between `nbf` (not before)
and `exp` (expires at) of the license that the entitlement comes from.
In the example above, we'd use the issued at date and expiry of the
second license as this date range.

This essentially means, when you get a new license, the usage resets to
zero.

The actual usage counting code will be implemented in a follow-up PR.

### Managed agent limit has a default entitlement value

Temporarily (until further notice), we will be providing licenses with
`feature_set` set to `premium` a default limit.
- Soft limit: `800 * user_limit`
- Hard limit: `1000 * user_limit`

"Enterprise" licenses do not get any default limit and are not entitled
to use the feature.

Unlicensed customers (e.g. OSS) will be permitted to use the feature as
much as they want without limits. This will be implemented when the
counting code is implemented in a follow-up PR.

Closes https://github.com/coder/internal/issues/760
2025-07-17 20:19:01 +10:00
Ethan 7c077d39c5 chore: populate connectionlog count using a separate query (#18629)
This is the third PR for moving connection events out of the audit log.

This PR populates `count` on `ConnectionLogResponse` using a separate query, to preemptively mitigate the issue described in #17689. It's structurally identical to a portion of https://github.com/coder/coder/pull/18600, but for the connection log instead of the audit log.
       
Future PRs:
- Implement a table in the Web UI for viewing connection logs.
- Write a query to delete old events from the audit log, call it from dbpurge.
- Write documentation for the endpoint / feature
2025-07-15 15:03:30 +10:00
Ethan 7a339a1ffe feat: add connectionlogs API (#18628)
This is the second PR for moving connection events out of the audit log.

This PR:
- Adds the `/api/v2/connectionlog` endpoint
- Adds filtering for `GetAuthorizedConnectionLogsOffset` and thus the endpoint. 
There's quite a few, but I was aiming for feature parity with the audit log.
  1. `organization:<id|name>`
  2. `workspace_owner:<username>`
  3. `workspace_owner_email:<email>`
  4. `type:<ssh|vscode|jetbrains|reconnecting_pty|workspace_app|port_forwarding>`
  5. `username:<username>` 
     - Only includes web-based connection events (workspace apps, web port forwarding) as only those include user metadata.
  6. `user_email:<email>`
  7. `connected_after:<time>`
  8. `connected_before:<time>`
  9. `workspace_id:<id>`
  10. `connection_id:<id>`
      - If you have one snapshot of the connection log, and some sessions are ongoing in that snapshot, you could use this filter to check if they've been closed since.
  11. `status:<connected|disconnected>`
       - If `connected` only sessions with a null `close_time` are returned, if `disconnected`, only those with a non-null `close_time`. If filter is omitted, both are returned.
       
Future PRs:
- Populate `count` on `ConnectionLogResponse` using a seperate query (to preemptively mitigate the issue described in #17689)
- Implement a table in the Web UI for viewing connection logs.
- Write a query to delete old events from the audit log, call it from dbpurge.
- Write documentation for the endpoint / feature (including these filters)
2025-07-15 14:55:34 +10:00
Ethan 08e17a07fc chore!: route connection logs to new table (#18340)
### Breaking Change (changelog note):
> User connections to workspaces, and the opening of workspace apps or ports will no longer create entries in the audit log. Those events will now be included in the 'Connection Log'.
Please see the 'Connection Log' page in the dashboard, and the Connection Log [documentation](https://coder.com/docs/admin/monitoring/connection-logs) for details. Those with permission to view the Audit Log will also be able to view the Connection Log. The new Connection Log has the same licensing restrictions as the Audit Log, and requires a Premium Coder deployment.

### Context

This is the first PR of a few for moving connection events out of the audit log, and into a new database table and web UI page called the 'Connection Log'.

This PR:
- Creates the new table
- Adds and tests queries for inserting and reading, including reading with an RBAC filter.
- Implements the corresponding RBAC changes, such that anyone who can view the audit log can read from the table
- Implements, under the enterprise package, a `ConnectionLogger` abstraction to replace the `Auditor` abstraction for these logs. (No-op'd in AGPL, like the `Auditor`)
- Routes SSH connection and Workspace App events into the new `ConnectionLogger`
- Updates all existing tests to check the values of the `ConnectionLogger` instead of the `Auditor`.

Future PRs:
- Add filtering to the query
- Add an enterprise endpoint to query the new table
- Write a query to delete old events from the audit log, call it from dbpurge.
- Implement a table in the Web UI for viewing connection logs.


> [!NOTE]
> The PRs in this stack obviously won't be (completely) atomic. Whilst they'll each pass CI, the stack is designed to be merged all at once. I'm splitting them up for the sake of those reviewing, and so changes can be reviewed as early as possible.  Despite this, it's really hard to make this PR any smaller than it already is. I'll be keeping it in draft until it's actually ready to merge.
2025-07-15 14:36:06 +10:00
Steven Masley 00ba0278d2 chore: modify parameter dynamic immutability behavior (#18583)
Immutability behavior is determined by the current build, not affected by the previous
2025-07-09 08:45:24 -06:00
Cian Johnston 0367dbac43 chore: optimize GetPrebuiltWorkspaces query (#18717)
* Adds GetRunningPrebuiltWorkspacesOptimized query
* Runs both original and updated query side-by-side and logs diffs
2025-07-09 11:30:42 +01:00
Hugo Dutka 3c2f3d640b chore: remove dbmem (#18803)
Remove the in-memory database. Addresses #15109.
2025-07-09 09:46:31 +02:00
Steven Masley 1319ae293f chore: support zip filetypes in the file cache (#18750) 2025-07-08 15:46:39 -06:00
Hugo Dutka b65e133a17 chore(enterprise/coderd): remove dbmem from tests (#18797)
Related to https://github.com/coder/coder/issues/15109.
2025-07-09 01:16:46 +10:00
Susana Ferreira 211393a69c fix: exclude prebuilt workspaces from lifecycle executor (#18762)
## Description

This PR updates the lifecycle executor to explicitly exclude prebuilt
workspaces from being considered for lifecycle operations such as
`autostart`, `autostop`, `dormancy`, `default TTL` and `failure TTL`.

Prebuilt workspaces (i.e., those owned by the prebuild system user) are
handled separately by the prebuild reconciliation loop. Including them
in the lifecycle executor could lead to unintended behavior such as
incorrect scheduling or state transitions.

## Changes

* Updated the lifecycle executor query
`GetWorkspacesEligibleForTransition` to exclude workspaces with
`owner_id = 'c42fdf75-3097-471c-8c33-fb52454d81c0'` (prebuilds).
* Added tests to verify prebuilt workspaces are not considered in:
  * Autostop
  * Autostart
  * Default TTL
  * Dormancy
  * Failure TTL

Fixes: https://github.com/coder/coder/issues/18740
Related to: https://github.com/coder/coder/issues/18658
2025-07-08 11:35:28 +01:00
ケイラ f2983164f5 chore: fix some small groups and acl typos (#18732)
- Add `format:"uri"` to `Group.AvatarURL` (matches `User.AvatarURL`
field)
- `<user_id>` and `<group_id>` were backwards in the `example:` tags
- The `@Success` annotation for `/acl [get]` had an incorrect type
2025-07-07 11:01:17 -06:00
Steven Masley a099a8a25c feat: use preview to compute workspace tags from terraform (#18720)
If using dynamic parameters, workspace tags are extracted using
`coder/preview`.
2025-07-03 14:35:44 -05:00
Sas Swart 01163ea57b feat: allow users to pause prebuilt workspace reconciliation (#18700)
This PR provides two commands:
* `coder prebuilds pause`
* `coder prebuilds resume`

These allow the suspension of all prebuilds activity, intended for use
if prebuilds are misbehaving.
2025-07-02 15:05:42 +00:00