The provisioner state for a workspace build was being loaded for every
long-lived agent rpc connection. Since this state can be anywhere from
kilobytes to megabytes this can gradually cause the `coderd` memory
footprint to grow over time. It's also a lot of unnecessary allocations
for every query that fetches a workspace build since only a few callers
ever actually reference the provisioner state.
This PR removes it from the returned workspace build and adds a query to
fetch the provisioner state explicitly.
feat: add boundary usage telemetry database schema and RBAC
Adds the foundation for tracking boundary usage telemetry across Coder
replicas. This includes:
- Database schema: `boundary_usage_stats` table with per-replica stats
(unique workspaces, unique users, allowed/denied request counts)
- Database queries: upsert stats, get aggregated summary, reset stats,
delete by replica ID
- RBAC: `boundary_usage` resource type with read/update/delete actions,
accessible only via system `BoundaryUsageTracker` subject (not regular
user roles)
- Tracker skeleton + docs: stub implementation in `coderd/boundaryusage/`
The tracker accumulates stats in memory and periodically flushes to the
database. Stats are aggregated across replicas for telemetry reporting,
then reset when a new reporting period begins. The tracker implementation
and plumbing will be done in a subsequent commit/PR.
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Related to
[`internal#1139`](https://github.com/coder/internal/issues/1139)
Continuation of #21074
This implements some RBAC role specificity for `dbpurge`, ensuring that
we follow the least-privileged model for removing data from the
database. It is specified as following.
```go
Site: rbac.Permissions(map[string][]policy.Action{
// DeleteOldWorkspaceAgentLogs
// DeleteOldWorkspaceAgentStats
// DeleteOldProvisionerDaemons
// DeleteOldTelemetryLocks
// DeleteOldAuditLogConnectionEvents
// DeleteOldConnectionLogs
rbac.ResourceSystem.Type: {policy.ActionDelete},
// DeleteOldNotificationMessages
rbac.ResourceNotificationMessage.Type: {policy.ActionDelete},
// ExpirePrebuildsAPIKeys
// DeleteExpiredAPIKeys
rbac.ResourceApiKey.Type: {policy.ActionDelete},
// DeleteOldAIBridgeRecords
rbac.ResourceAibridgeInterception.Type: {policy.ActionDelete},
}),
```
| Position | Pull-request |
| -------- | ------------ |
| | [feat: add prometheus observability metrics for
`dbpurge`](https://github.com/coder/coder/pull/21074) |
| ✅ | [feat: add rbac specificity for
`dbpurge`](https://github.com/coder/coder/pull/21088) |
Not used in coderd yet, see stack.
Adds two new packages:
- `coderd/usage`: provides an interface for the "Collector" as well as a stub implementation for AGPL
- `enterprise/coderd/usage`: provides an interface for the "Publisher" as well as a Tallyman implementation
Relates to https://github.com/coder/internal/issues/814
### Breaking Change (changelog note):
> User connections to workspaces, and the opening of workspace apps or ports will no longer create entries in the audit log. Those events will now be included in the 'Connection Log'.
Please see the 'Connection Log' page in the dashboard, and the Connection Log [documentation](https://coder.com/docs/admin/monitoring/connection-logs) for details. Those with permission to view the Audit Log will also be able to view the Connection Log. The new Connection Log has the same licensing restrictions as the Audit Log, and requires a Premium Coder deployment.
### Context
This is the first PR of a few for moving connection events out of the audit log, and into a new database table and web UI page called the 'Connection Log'.
This PR:
- Creates the new table
- Adds and tests queries for inserting and reading, including reading with an RBAC filter.
- Implements the corresponding RBAC changes, such that anyone who can view the audit log can read from the table
- Implements, under the enterprise package, a `ConnectionLogger` abstraction to replace the `Auditor` abstraction for these logs. (No-op'd in AGPL, like the `Auditor`)
- Routes SSH connection and Workspace App events into the new `ConnectionLogger`
- Updates all existing tests to check the values of the `ConnectionLogger` instead of the `Auditor`.
Future PRs:
- Add filtering to the query
- Add an enterprise endpoint to query the new table
- Write a query to delete old events from the audit log, call it from dbpurge.
- Implement a table in the Web UI for viewing connection logs.
> [!NOTE]
> The PRs in this stack obviously won't be (completely) atomic. Whilst they'll each pass CI, the stack is designed to be merged all at once. I'm splitting them up for the sake of those reviewing, and so changes can be reviewed as early as possible. Despite this, it's really hard to make this PR any smaller than it already is. I'll be keeping it in draft until it's actually ready to merge.
The file cache was caching the `Unauthorized` errors if a user without
the right perms opened the file first. So all future opens would fail.
Now the cache always opens with a subject that can read files. And authz
is checked on the Acquire per user.
Closes https://github.com/coder/internal/issues/619
Implement the `coderd` side of the AgentAPI for the upcoming
dev-container agents work.
`agent/agenttest/client.go` is left unimplemented for a future PR
working to implement the agent side of this feature.
Opting into rego v1. Rego v1 requires `if` for all rule statements.
This PR updates the dependencies and the rego policy itself.
Golang imports upgraded for opa/rego
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* chore: authz 'any_org' to return if at least 1 org has perms
Allows checking if a user can do an action in any organization,
rather than a specific one. Allows asking general questions on the
UI to determine which elements to show.
* more strict, add comments to policy
* add unit tests and extend to /authcheck api
* make field optional
* chore: refactor user subject logic to be in 1 place
* test: implement test to assert deleted custom roles are omitted
* add unit test for deleted role
* chore: create type for unique role names
Using `string` was confusing when something should be combined with
org context, and when not to. Naming this new name, "RoleIdentifier"
Removes our pseudo rbac resources like `WorkspaceApplicationConnect` in favor of additional verbs like `ssh`. This is to make more intuitive permissions for building custom roles.
The source of truth is now `policy.go`
Just moved `rbac.Action` -> `policy.Action`. This is for the stacked PR to not have circular dependencies when doing autogen. Without this, the autogen can produce broken golang code, which prevents the autogen from compiling.
So just avoiding circular dependencies. Doing this in it's own PR to reduce LoC diffs in the primary PR, since this has 0 functional changes.
* chore: merge authorization contexts
Instead of 2 auth contexts from apikey and dbauthz, merge them to
just use dbauthz. It is annoying to have two.
* fixup authorization reference
#7439 added global caching of RBAC results.
Calls are cached based on hash(subject, object, action).
We often use dbauthz.AsSystemRestricted to handle "internal" authz calls, and these are often repeated with similar arguments and are likely to get cached.
So a transient error doing an authz check on a system function will be cached for up to a minute.
I'm just starting off with excluding context.Canceled but there's likely a whole suite of different errors we want to also exclude from the global cache.
* chore: add /v2 to import module path
go mod requires semantic versioning with versions greater than 1.x
This was a mechanical update by running:
```
go install github.com/marwan-at-work/mod/cmd/mod@latest
mod upgrade
```
Migrate generated files to import /v2
* Fix gen
* test: Add benchmark for static rbac roles
* static roles should only be allocated once
* A unit test that modifies the ast value should not mess with the globals
* Cache subject AST values to avoid reallocating slices
* chore: Optimize rego policy evaluation allocations
Manually convert to ast.Value instead of using generic
json.Marshal conversion.
* Add a unit test that prevents regressions of rego input
The optimized input is always compared to the normal json
marshal parser.
* chore: Allow RecordingAuthorizer to record multiple rbac authz calls
Prior iteration only recorded the last call. This is required for
more comprehensive testing
* chore: Implement standard rbac.Subject to be reused everywhere
An rbac subject is created in multiple spots because of the way we
expand roles, scopes, etc. This difference in use creates a list
of arguments which is unwieldy.
Use of the expander interface lets us conform to a single subject
in every case
* chore: Authz should support non-named roles
Named roles are a construct for users to assign/interact with roles.
For authzlayer implementation, we need to create "system" users.
To enforce strict security, we are making specific roles with
the exact required permissions for the system action.
These new roles should not be available to the user. There is a
clear code divide with this implementation that allows a RoleNames
implemenation for users to user, and system users can create their
own implementation
* feat: Implement allow_list for scopes for resource specific permissions
Feature that adds an allow_list for scopes to specify particular resources.
This enables workspace agent tokens to use the same RBAC system as users.
- Add ID to compileSQL matchers
* Plumb through WithID on rbac objects
* Rename Scope -> ScopeName
* Update input.json with scope allow_list
Co-authored-by: Cian Johnston <cian@coder.com>
* chore: More complete tracing for RBAC functions
* Add input.json as example rbac input for rego cli
The input.json is required to play with the rego cli and debug
the policy without golang. It is good to have an example to run
the commands in the readme.md
* Add span events to capture authorize and prepared results
* chore: Add prometheus metrics to rbac authorizer
* chore: Rewrite rbac rego -> SQL clause
Previous code was challenging to read with edge cases
- bug: OrgAdmin could not make new groups
- Also refactor some function names
* feat: Convert rego queries into SQL clauses
* Fix postgres quotes to single quotes
* Ensure all test cases can compile into SQL clauses
* Do not export extra types
* Add custom query with rbac filter
* First draft of a custom authorized db call
* Add comments + tests
* Support better regex style matching for variables
* Handle jsonb arrays
* Remove auth call on workspaces
* Fix PG endpoints test
* Match psql implementation
* Add some comments
* Remove unused argument
* Add query name for tracking
* Handle nested types
This solves it without proper types in our AST.
Might bite the bullet and implement some better types
* Add comment
* Renaming function call to GetAuthorizedWorkspaces