Commit Graph

3372 Commits

Author SHA1 Message Date
Danny Kopping fc9bff7107 feat: add aibridged package (#19797)
Addresses https://github.com/coder/internal/issues/987
2025-09-25 15:40:25 +02:00
Spike Curtis 289f0217c7 feat: add scaletest Runner for dynamicparameters load gen (#19890)
relates to https://github.com/coder/internal/issues/912

Adds a new scaletest Runner to generate dynamic parameters load.

A later PR will add the CLI command, including creating the template & version.
2025-09-25 16:18:37 +04:00
Mathias Fredriksson 5317d309d0 feat(coderd): add experimental tasks send endpoint (#19941)
Fixes coder/internal#902
2025-09-25 12:12:00 +00:00
Danny Kopping 615585d5d1 feat: add aibridgedserver pkg (#19902) 2025-09-25 13:32:16 +02:00
ケイラ a6fc28cc6f chore: bring back x-auth-checks with a length limit (#19928) 2025-09-24 10:46:50 -06:00
Thomas Kosiewski adb7521066 feat: generate RBAC scope name constants (#19896)
# Generate RBAC scope name constants

This PR adds a new generated file `coderd/rbac/scopes_constants_gen.go` that contains typed constants for all RBAC scope names in the format `Scope<Resource><Action>`. For example, `ScopeWorkspaceRead` for the scope "workspace:read".

These constants make it easier to reference specific scopes in code without using string literals, improving type safety and making refactoring easier.

The PR:
- Adds a new template file `scripts/typegen/scopenames.gotmpl`
- Updates the typegen script to support generating scope name constants
- Updates the Makefile to include the new generated file in build targets
2025-09-24 18:40:36 +02:00
Dean Sheather 42dd544d90 fix: use unique cookies for workspace proxies (#19930)
There is currently an issue with subdomain workspace apps on workspace
proxies, where if you have a workspace proxy wildcard nested beneath the
primary wildcard, cookies from the primary may be sent to the server
before cookies from the proxy specifically.

Currently:
1. Use a subdomain app via the primary proxy `*.coder.corp.com`
    a. Client sends no cookies
    a. Server does token smuggling flow
a. Server sets a cookie `coder_subdomain_app_session_token` on
`*.coder.corp.com`
    a. Server redirects client to reload the page
    a. Request should succeed as usual
1. Wait until the primary proxy's session token cookie has expired in
the database (or make it invalid yourself)
1. Use a subdomain app via a separate proxy `*.sydney.coder.corp.com`
a. Client sends `coder_subdomain_app_session_token` cookie from
`*.coder.corp.com`
    a. Server validates supplied cookie, it fails because it's expired
    a. Server does token smuggling flow
a. Server sets a cookie `coder_subdomain_app_session_token` on
`*.sydney.coder.corp.com`
    a. Server redirects client to reload page
    a. Client sends BOTH cookies.
a. The server will only process the first cookie it receives, so if the
expired cookie for the primary proxy is sent first the request will end
up in a permanent loop on step b.

The fix is to append `_{hash(wildcard_access_url)}` to the subdomain
cookies as we cannot control browser behavior further. This avoids the
conflict as each proxy will only read it's specific cookie.
2025-09-25 00:30:02 +10:00
Cian Johnston 4a56a4094b chore: apptest: update test helpers to context-aware testutil.Eventually (#19918)
Relates to https://github.com/coder/internal/issues/960
2025-09-23 09:16:33 +01:00
Thomas Kosiewski fb0ce389a6 feat: implement API key scopes database migration (#19861)
Added database migration for API key scopes.

Fixes #19845
2025-09-22 19:26:51 +02:00
Spike Curtis a30c30724b chore: refactor cli and coderd to use ClientOptions (#19763)
Refactors CLI and coderd to use the ClientBuilder pattern rather than directly instantiating the Client.
2025-09-22 21:02:56 +04:00
Thomas Kosiewski 6fb4cc6b82 fix: add retry logic to OAuth2 metadata tests to avoid race conditions (#19813)
This PR adds a readiness wait to OAuth2 metadata endpoint tests to avoid rare races with server startup. Instead of immediately making HTTP requests, the tests now use `testutil.Eventually` to retry the requests until they succeed, with a short interval between attempts. This helps prevent flaky tests that might fail due to timing issues during server initialization.  
  
Fixes: https://github.com/coder/internal/issues/996
2025-09-22 18:30:36 +02:00
Spike Curtis e2f5401fb2 test: add test database cleaner in subprocess (#19844)
fixes https://github.com/coder/internal/issues/927

Adds a small subprocess that outlives the testing process to clean up any leaked test databases.
2025-09-22 15:27:06 +04:00
Spike Curtis 596fdcba81 test: refactor dbtestutil to record database creation (#19843)
relates to https://github.com/coder/internal/issues/927

Refactors dbtestutil to use a `Broker` struct to create test databases. Additionally uses a `coder_testing` database to record test databases that are created and when they are dropped.

This is in preparation for the PR above this in the stack which adds a "cleaner" subprocess that cleans out any databases that were left when the test process ends.
2025-09-22 15:13:26 +04:00
Brett Kolodny 38ca98745b feat: add shared_with_group: and shared_with_user: filters to /workspaces endpoint (#19875)
Adds shared_with_user and shared_with_group filters to the /workspaces
endpoint.

- `shared_with_user`: filters workspaces shared with a specific user.
Accepts a user UUID or username.
- `shared_with_group`: filters workspaces shared with a specific group.
Accepts:
  - a group UUID, or
  - `<organization name>/<group name>`, or
  - `<group name>` (resolved in the default organization).


Closes
[coder/internal#1004](https://github.com/coder/internal/issues/1004)
2025-09-19 16:05:27 -04:00
Paweł Banaszewski 439b041780 feat: add best effort attempt to revoke oauth access token in external auth provider (#19775)
Solves #15575
Adds OAuth access token revocation when unlinking external auth
provider. Due to revocation not being consistently implemented by
providers this is only best effort attempt. Unsuccessful revocation
won't influence link removal.
2025-09-19 16:27:02 +02:00
Steven Masley 679179f404 feat: scope allow_list to include resource_type (#19748)
This feature allows the `allow_list` in the scopes to specify the `type`
2025-09-17 08:32:14 -05:00
Steven Masley 3df9d8e902 test: set test flags from within an init to limit maximum test parallelism (#19575) 2025-09-17 08:24:19 -05:00
Ethan 356604eca6 chore(coderd/notifications): avoid generating warning logs for trivial enqueue failures (#19840)
I noticed during a scaletest that many warning logs were being generated when enqueuing notifications. The error was:
```
failed to notify of workspace creation: notification is not enabled
```
I don't think we should be warning if automated notifications fail to send to users because they have them disabled.

To fix, we'll stop returning these errors.
2025-09-17 18:17:18 +10:00
Danny Kopping 422bba44d9 chore: add aibridge database resources & define RBAC policies (#19796)
Closes https://github.com/coder/internal/issues/986
2025-09-16 21:31:17 +02:00
Danny Kopping 348a2e0285 feat: add configs for external auth MCP usage + tool allow/denylist (#19794)
Closes https://github.com/coder/internal/issues/988

The logic for allowing/denying tools can be found in https://github.com/coder/aibridge/pull/4/files#diff-330a6371a583dd8cadeed79b95499e3a87960ad8ea4d6a94061e8f88a44834c3 (`ProxyBase.filterAllowedTools`).
2025-09-16 20:31:29 +02:00
Spike Curtis 655a36c392 test: fix TestAgentConnectionMonitor_PingTimeout race with mock assertions (#19836)
Fixes https://github.com/coder/internal/issues/970

The test doesn't wait for `monitor()` to complete, and the mock database call that we assert takes place in a `defer` within `monitor()`. This allows the mock assertions to race with the defer and flake the test.

Solution is to explicitly wait for `monitor()` to complete before the end of the test, so that mock assertions (which happen in a `t.Cleanup()`) don't race.
2025-09-16 21:54:50 +04:00
Brett Kolodny e6b04d1918 feat: add shared filter to workspaces query (#19807)
Adds a `shared:<boolean>` search query to the `/workspaces [get]`
endpoint


https://github.com/user-attachments/assets/ccf84bd9-c1fd-4085-825b-2e3176a2d488

Closes
[coder/internal#972](https://github.com/coder/internal/issues/972)
2025-09-16 12:37:39 -04:00
Danny Kopping 8487216548 chore: add aibridge configs & experiment (#19793) 2025-09-16 11:45:23 +02:00
Ethan 995b330250 test: avoid sharing deployment values between subtests (#19833)
Blink didn't figure out a CI failure on main was caused by a data race; fixing it.


I've also updated the [blink prompt](https://gist.githubusercontent.com/ethanndickson/8dea9f1db3957ac1baf30ae8ce6f1a42/raw/060aea7fabb82bef0029a17dad9a5daee7940760/blink-flake-instructions.md).

https://github.com/coder/coder/actions/runs/17737809615
2025-09-16 13:51:26 +10:00
Thomas Kosiewski d238480c7a fix: trim whitespace from API tokens (#19814) 2025-09-15 10:02:10 +02:00
Ethan 6a9b896f5b fix!: use client ip when creating connection logs for workspace proxied app accesses (#19788)
Breaking API Change: 
> The presence of the `ip` field on `codersdk.ConnectionLog` cannot be
guaranteed, and so the field has been made optional. It may be omitted
on API responses.

When running a scaletest, I noticed logs of the form:
```
2025-09-12 06:34:10.924 [erro]  coderd.workspaceapps: upsert connection log failed  trace=0xa17580  span=0xa17620  workspace_id=81b937d7-5777-4df5-b5cb-80241f30326f  agent_id=78b2ff6d-b4a6-4a4e-88a7-283e05455a88  app_id=00000000-0000-0000-0000-000000000000  user_id=00000000-0000-0000-0000-000000000000  user_agent=""  app_slug_or_port=terminal  status_code=404  request_id=67f03cf8-9523-444a-97bc-90de080a54c8 ...
    error= 1 error occurred:
           	* pq: null value in column "ip" of relation "connection_logs" violates not-null constraint
```

to ensure logs are never omitted from the connection log due to a
missing IP again (i.e. I'm not sure if we can always rely on a valid,
parseable, IP from `(http.Request).RemoteAddr`), I've removed the `NOT
NULL` constraint on `ip` on `connection_logs`, and made `ip` on the API
response optional.


The specific cause for these null IPs was the
`/workspaceproxies/me/issue-signed-app-token [post]` endpoint
constructing it's own `http.Request` without a `RemoteAddr` set, and
then passing that to the token issuer.

To solve this, we'll have workspace proxies send the real IP of the
client when calling `/workspaceproxies/me/issue-signed-app-token [post]`
via the header `Coder-Workspace-Proxy-Real-IP`.
2025-09-15 12:30:17 +10:00
Thomas Kosiewski 088d14933c feat: ensure OAuth2 refresh tokens outlive access tokens (#19769) 2025-09-13 08:57:26 +02:00
Brett Kolodny 854f3c0187 feat: add workspaces/acl [delete] endpoint (#19772)
Closes
[coder/internal#971](https://github.com/coder/internal/issues/971)
2025-09-12 12:21:01 -04:00
Asher 6d39077087 chore: log error when checking if codersdk.Err (#19784) 2025-09-11 13:17:09 -08:00
Brett Kolodny 8d5c566a98 feat: add sharing remove command to the CLI (#19767)
Closes
[coder/internal#861](https://github.com/coder/internal/issues/861)
2025-09-11 16:22:25 -04:00
Susana Ferreira eec6c8c120 feat: support custom notifications (#19751)
## Description

Adds support for sending an ad‑hoc custom notification to the
authenticated user via API and CLI. This is useful for surfacing the
result of scripts or long‑running tasks. Notifications are delivered
through the configured method and the dashboard Inbox, respecting
existing preferences and delivery settings.

## Changes

* New notification template: “Custom Notification” with a label for a
custom title and a custom message.
* New API endpoint: `POST /api/v2/notifications/custom` to send a custom
notification to the requesting user.
* New API endpoint: `GET /notifications/templates/custom` to get custom
notification template.
* New CLI subcommand: `coder notifications custom <title> <message>` to
send a custom notification to the requesting user.
* Documentation updates: Add a “Custom notifications” section under
Administration > Monitoring > Notifications, including instructions on
sending custom notifications and examples of when to use them.

Closes: https://github.com/coder/coder/issues/19611
2025-09-11 15:08:57 +02:00
Rafael Rodriguez e53bc247e9 feat: add tooltip field to workspace app that renders as markdown (#19651)
In this pull request we're adding an optional `tooltip` field. The
`tooltip` field is a string field (with markdown support) that will be
used to display tooltips on hover over app buttons in a workspace
dashboard.

Tooltip screenshot

<img width="816" height="275" alt="Screenshot 2025-08-29 at 4 11 56 PM"
src="https://github.com/user-attachments/assets/52c736a1-f632-465b-89a0-35ca99bd367b"
/>

Tooltip video


https://github.com/user-attachments/assets/21806337-accc-4acf-b8c6-450c031d98f1

Issue: https://github.com/coder/coder/issues/18431
Related provider PR:
https://github.com/coder/terraform-provider-coder/pull/435

### Changes

- Added migration to add `tooltip` column to `workspace_apps` table
- Updated queries to get/set the new `tooltip` column
- Updated frontend to render tooltip as markdown (primary tool tip takes
precedence over template tooltip)

### Testing

- Added storybook test for `Applink` markdown rendering
2025-09-10 11:01:54 -05:00
Danielle Maywood 4cd0ada0bb chore(coderd/database): add tasks data model (#19749)
Part of https://github.com/coder/internal/issues/948

Adds the initial part of the tasks data model as per the RFC.
2025-09-10 10:01:50 +01:00
Asher 4bf63b4068 feat: add coder_workspace_read_file MCP tool (#19562)
Follows similarly to the bash tool (and some code to connect to an agent
was extracted from it).

There are two main parts: a new agent endpoint, and then a new MCP tool
that consumes that endpoint.
2025-09-09 15:12:24 -08:00
Steven Masley d527f91f47 chore: update rego policy to respect user and organisation scopes (#19741)
Prior to this change, user and org scopes were always rejected
2025-09-09 05:50:08 -05:00
Danielle Maywood f3b152bb06 feat(coderd): allow specifying a name for a task (#19745)
Relates to https://github.com/coder/internal/issues/955

Add the ability to the tasks endpoint to give a task a custom name.
2025-09-09 11:08:28 +01:00
Kacper Sawicki b4e4173347 test: improve workspace build job completion logging (#19740)
Closes https://github.com/coder/internal/issues/935

This PR enhances the AwaitWorkspaceBuildJobCompleted func in coderdtest
pkg to provide better visibility into test failures and debugging
information.
2025-09-09 08:38:32 +02:00
Rafael Rodriguez 1677a30a1d fix: add support for spaces in search & enable searching by display name in templates (#19552)
## Summary

In this pull request we're updating search to support queries with
spaces in addition to the `field:value` pattern that is currently
supported.

Additionally templates search now defaults to `display_name` (since
`display_name` is optional the search will fallback to `name`) when
searching without the `field:value` pattern

Closes: https://github.com/coder/coder/issues/14384

### Downsides with searching on `name` and `display_name`

Because the `name` field cannot include spaces, we end up in a situation
where including a space in the query will result in no results since the
query searches on both `name` AND `display_name`. In the following
example, we can see the results of searching by both `name` and
`display_name` on these templates:

| Name | Display Name |
| ------ | ------------- |
| docker | Docker Template |
| faketemplate | A Fake Template |
| azure | Fake Azure Template |
| anotherfake | Another Fake Template |
| azurefake | Another Fake Fake Azure Template |



https://github.com/user-attachments/assets/b0e0793e-e77d-46bc-9a42-d7cf4f8bd910

### Proposal: Search on `display_name` by default and allow for `name`
using the `field:value` pattern

If we remove `name` from the default template search, we're now able to
search with spaces on template `display_names`. Since `display_names`
are what users see in the templates list they might expect the search to
work this way.

Below is an example of `name` being removed from the default template
search.


https://github.com/user-attachments/assets/9aba5911-4960-4384-befb-08ea1acaa3ab

With this approach users would still be able to search on template names
by specifying `exact_name:foo`.

### Testing

Added additional test cases to ensure spaces were handled as expected in
combination with `field:value` patterns.
2025-09-08 17:13:27 -05:00
Kacper Sawicki 776231d025 fix(coderd): add blocking GetProvisionerJobByIDWithLock for workspace build cancellation (#19737)
Closes https://github.com/coder/internal/issues/885

Adds a new database method GetProvisionerJobByIDWithLock that uses FOR
UPDATE without SKIP LOCKED to fix workspace build cancellation returning
500 errors when jobs are locked.
2025-09-08 15:40:14 +02:00
Thomas Kosiewski 2701d5588e fix: support path parameters in OAuth2 metadata endpoints (#19729)
Update OAuth2 metadata endpoint routes to support path suffixes

This PR updates the OAuth2 metadata endpoint routes to include a wildcard character (*) at the end of the paths. This change allows the endpoints to match requests with path suffixes, making our OAuth2 discovery implementation more flexible and compliant with the relevant RFCs.

The updated routes are:
- `/.well-known/oauth-authorization-server*` for RFC 8414 discovery
- `/.well-known/oauth-protected-resource*` for RFC 9728 discovery
2025-09-08 14:21:57 +02:00
Mathias Fredriksson 21402c7aaa perf(coderd/database): optimize GetAPIKeysLastUsedAfter (#19725)
This change adds a support-index for `GetAPIKeysLastUsedAfter`.

On dogfood (24h): called 4.4k times, 380ms average.

Change for tested time range: `170ms` -> `3.3ms`.
2025-09-08 13:13:12 +03:00
Mathias Fredriksson fb9753edf2 fix(coderd/database): rename duplicate migration (#19724) 2025-09-08 08:54:12 +00:00
Mathias Fredriksson 9db265d508 fix(coderd/database): optimize provisioner daemon with status query using index (#19703)
Fixes coder/internal#724

This change adds an index to optimize the
`GetProvisionerDaemonsWithStatusByOrganization` query.

Execution time dropped from `18s 838ms` to `107ms`.
2025-09-08 11:10:53 +03:00
Ethan dae19039d7 test: fix TestCache_DeploymentStats flake (#19683)
Closes https://github.com/coder/internal/issues/961
Likely the same deal as in #19599, the body of `require.Eventually` now fires immediately, when it used to fire after 250ms (the interval). Presumably, the deployment stats become ready before the vs code session count gets incremented. This was never an issue with the 250ms delay, as this flake has only cropped up after the testify version bump.

We'll fix the issue by making it possible to wait for a full metrics cache refresh, i.e. removing `require.Eventually` in this test altogether.
2025-09-08 12:07:38 +10:00
Danielle Maywood e12b621ff0 fix(coderd): ensure agent WebSocket conn is cleaned up (#19711)
When clients disconnected from the /containers/watch endpoint, the WebSocket 
connection between coderd and the agent stayed open. This caused heartbeat 
traffic every 15s that was incorrectly counted as workspace activity, 
extending workspace lifetimes indefinitely.

Now properly cancels the agent connection context when the client disconnects.
2025-09-05 14:26:46 +01:00
Callum Styan 0ec9df390b fix: reduce impact of GetPrebuildMetrics on database (#19694)
see https://github.com/coder/internal/issues/959 but the tl; dr is:
- we call this DB query on an interval (every 15s) and it would be
called on each coderd replica as well
- the generated values update very infrequently (for our most used
internal template I saw the builds created/claimed update twice in a 1h
period)
- we have no index on the initiator ID, so this query has to scan the
entire workspace_builds table on every request

In reality this should likely just be a Prometheus metric, and
Prometheus can handle the counter reset behaviour at query time, but for
now this should at least cut the load of the query to 25% of it's
current impact.

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2025-09-04 13:43:50 -07:00
Ethan 50704a5014 ci: improve 'tfail in goroutine' ruleguard rule (#19682)
This PR improves the ruleguard rule for detecting `t.Fail` calls in goroutines. It picks up additional violations, of which are fixed in this PR.
See self-review for details.

The motivation for fixing this comes from a flake I fixed in https://github.com/coder/coder/pull/19599, where tests would fail from a `require` in an `Eventually`.
2025-09-04 14:28:29 +10:00
Ethan 1b4ce0909c fix: pin pg_dump version when generating schema (#19696)
The latest release of all `pg_dump` major versions, going back to 13,
started inserting `\restrict` `\unrestrict` keywords into dumps. This
currently breaks sqlc in `gen/dump` and our check migration script. Full
details of the postgres change are available here:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=575f54d4c

To fix, we'll always use the `pg_dump` in our postgres 13.21 docker
image for schema dumps, instead of what's on the runner/local machine.

Coder doesn't restore from postgres dumps, so we're not vulnerable to
attacks that would be patched by the latest postgres version.
Regardless, we'll unpin ASAP.

Once sqlc is updated to handle these keywords, we need to start
stripping them when comparing the schema in the migration check script,
and then we can unpin the pg_dump version. This is being tracked at
https://github.com/coder/internal/issues/965
2025-09-04 14:00:21 +10:00
Spike Curtis 1354d84eb4 chore: refactor instance identity to be a SessionTokenProvider (#19566)
Refactors Agent instance identity to be a SessionTokenProvider.

Refactors the CLI to create Agent clients via a centralized function, rather than add-hoc via individual command handlers and their flags.

This allows commands besides `coder agent`, but which still use the agent identity, to support instance identity authentication.

Fixes #19111 by unifying all API requests to go thru the SessionTokenProvider for auth credentials.
2025-09-03 10:38:42 +04:00
Spike Curtis ee35ad3a57 fix: remove hold WaitGroup in TestConcurrentFetch (#19617)
Fixes: https://github.com/coder/internal/issues/950

Pretty sure the intention of the `hold` wait group is to try to get the two goroutines that the test starts running at the same time. But, that should be the case for two goroutines started anyway.

The use of `hold` doesn't actually guarantee concurrent execution of `Acquire`, just that both goroutines get as far as `Done()` --- the go scheduler could run them serially without incident.

So I've chosen to just remove the use of `hold` to simplify.

But, for posterity, the data race was due to incrementing by 1 in the loop along with the goroutine that calls Done. You could increment by 1 and then back down to 0 before the second iteration of the loop starts. This then causes a data race with calling `Wait()` in the first goroutine and `Add()` in the second iteration. c.f. https://pkg.go.dev/sync#WaitGroup.Add
2025-09-03 09:25:49 +04:00