Builds upon https://github.com/coder/coder/pull/19970
I got kinda carried away when I saw the extra stuff we could add in
here, so I went ahead and added it:
* User ID
* Organization IDs
* Roles
This technically duplicates functionality from `coder users show` but I
figure folks may find it useful.
* Improves logic for `exp task status --watch` so that it will also exit
on task idle status.
* Adds workspace agent health to `exp task status` output.
This changes the task get endpoint to omit app statuses for previous
'lifetimes' of a workspace.
It also introduces a [breaking
change](https://github.com/coder/coder/blob/release/2.26/codersdk/aitasks.go#L83)
to bring `TaskStateComplete` in line with
`WorkspaceAppStatusStateComplete`. I can alternatively revert this
change and add a conversion function between the two SDK types.
This addresses a long-standing gripe of mine: to get your logged in
username you would have to do
```bash
coder whoami | awk '{print $9}'
```
This allows you to do:
```
coder whoami -o json | jq -r '.username'
```
or
```
coder whoami -f table -c username
```
Closes#19812
## Problem
> When I try to SSH into my workspace with multiple agents. It does not
provide an intuitive way to do that successfully and instead misguides
by printing wrong instructions.
This PR enhances the error handling to provide suggestions with SSH
commands that users can copy and paste directly.
Before:
```
Encountered an error running "coder ssh", see "coder ssh --help" for more information
error: multiple agents found, please specify the agent name, available agents: [coder dev]
```
After:
```
Encountered an error running "coder ssh", see "coder ssh --help" for more information
error: multiple agents found, please specify the agent name, available agents: [coder dev]
Try running:
$ ssh coder.dogfood.me.coder
$ ssh dev.dogfood.me.coder
```
As part of converting production code to use the new ClientBuilder, I noticed some dead code that creates a client with a URL for the only purpose of later accessing the URL. This PR removes the cruft.
Refactors the CLI to create the `*codersdk.Client` in the handlers. This is groundwork for changing the `rootCmd.InitClient()` to use the new `ClientOption`s.
It also improves variable locality, scoping the Client to the handler. This makes misuse less likely and reduces the memory allocations to just the command being executed, rather than allocating a Client for every command regardless of whether it is executed.
Relates to https://github.com/coder/internal/issues/985.
Some scaletest runners would autogenerate names if they weren't supplied on the config, while others required a name be supplied, and a name was autogenerated in the CLI command handler. This PR unifies the runners to make names and emails optional on each config, and generate them in the scaletest runner if omitted.
The create user runner in the PR above in the stack will do this too.
Solves #15575
Adds OAuth access token revocation when unlinking external auth
provider. Due to revocation not being consistently implemented by
providers this is only best effort attempt. Unsuccessful revocation
won't influence link removal.
fixes https://github.com/coder/internal/issues/946
Some tests tear down the server before we are done with PostgreSQL work, and the default `clitest` infrastructure fails the test if any errors like that are thrown. This PR modifies the tests like that to ignore postgreSQL errors like this.
fixes https://github.com/coder/internal/issues/966
TestCloserStack_Timeout creates `asyncCloser`s which allow control over the exact timing and order of their close method returning. They also, as a final backstop will throw an error if the test context ends before they are unblocked.
TestCloserStack_Timeout unblocks all `asyncCloser`s in a defer and then ends the test. This defer _unblocks_ the running close goroutines, but does not wait for them to finish. Since the test context is canceled as soon as the test completes, this creates a race condition where the close goroutines can trigger the context cancelled arm of the `select` statement.
The fix is to both unblock and wait for all close goroutines to complete before ending the test and cancelling the context.
## Description
Adds support for sending an ad‑hoc custom notification to the
authenticated user via API and CLI. This is useful for surfacing the
result of scripts or long‑running tasks. Notifications are delivered
through the configured method and the dashboard Inbox, respecting
existing preferences and delivery settings.
## Changes
* New notification template: “Custom Notification” with a label for a
custom title and a custom message.
* New API endpoint: `POST /api/v2/notifications/custom` to send a custom
notification to the requesting user.
* New API endpoint: `GET /notifications/templates/custom` to get custom
notification template.
* New CLI subcommand: `coder notifications custom <title> <message>` to
send a custom notification to the requesting user.
* Documentation updates: Add a “Custom notifications” section under
Administration > Monitoring > Notifications, including instructions on
sending custom notifications and examples of when to use them.
Closes: https://github.com/coder/coder/issues/19611
Adds a `sharing add` command for sharing Workspaces with other users and
groups.
The command allows sharing with multiple users, and groups within one
command as well as specifying the role (`use`, or `admin`) defaulting to
`use` if none is specified.
In the current implementation when the command completes we show the
user the current state of the workspace ACL.
```
$ coder sharing add apricot-catfish-86 --user=member:admin --group=contractors:use
USER GROUP ROLE
member - admin
member contractors use
```
If a user is a part of multiple groups, or the workspace has been
individually shared with them they will show up multiple times. Although
this is a bit confusing at first glance it's important to be able to
tell what the maximum role a user may have, and via what ACL they have
it.
---
One piece of UX to consider is that in order to be able to share a
Workspace with a user they must have a role that can read that user. In
the tests we give the user the `ScopedRoleOrgAuditor` role.
Closes
[coder/internal#859](https://github.com/coder/internal/issues/859)
In trying to address confusion with the `-` (for stdin) directory flag last year, I had `template push` read from stdin if stdin was not a TTY. However, I made the mistake of checking if the directory flag was set or not by comparing it to the default value. This meant in something like GitHub Actions, where you don't have a TTY for stdin, it was impossible to read from the current working directory. The fix is just to check if the flag was explicitly set, using pflags.
If users encounter this bug, and this fix is unavailable in their version of Coder, they can workaround it by setting `-d "$(pwd)"`
Fixes https://github.com/coder/internal/issues/933
Refactors CLI tests that check the `--auth` flag parsing for various public clouds into a unit test that just creates the agent Client and asserts on the type.
Testing that the agent client actually authenticates correctly with these auth types is well covered by Coderd tests, so we don't need to retread that ground here, and the deleted tests were flaky on Windows.
Refactors Agent instance identity to be a SessionTokenProvider.
Refactors the CLI to create Agent clients via a centralized function, rather than add-hoc via individual command handlers and their flags.
This allows commands besides `coder agent`, but which still use the agent identity, to support instance identity authentication.
Fixes#19111 by unifying all API requests to go thru the SessionTokenProvider for auth credentials.
Relates to https://github.com/coder/internal/issues/893
Instead of `coder task create <template> --input <input>`, it is now
`coder task create <input> --template <template>`.
If there is only one AI task template on the deployment, the
`--template` parameter can be omitted.
Refactors `codersdk.Client`'s use of session tokens to use a `SessionTokenProvider`, which abstracts the obtaining and storing of the session token.
The main motiviation is to unify Agent authentication an an upstack PR, which can use cloud instance identity via token exchange, rather than a fixed session token.
However, the abstraction could also allow functionality like obtaining the session token from other external sources like the OS credential manager, or an external secret/key management system like Vault.
Relates to https://github.com/coder/internal/issues/888
As part of our renewed connection scaletesting efforts, we want to
scaletest coder in a scenario where direct connections aren't available
(relatively common for our customers), and all concurrent connections
are relayed via DERP.
This PR adds a flag, `--disable-direct` that can be included on the
existing`coder exp scaletest workspace-traffic -ssh` to disable direct
connections.
Closes https://github.com/coder/internal/issues/949
Adds the following fields to `codersdk.Task`
- OwnerName
- TemplateName
- TemplateDisplayName
- TemplateIcon
- WorkspaceAgentID
- WorkspaceAgentLifecycle
- WorkspaceAgentHealth
The implementation is unfortunately not compatible with multiple agents
as we have no reliable way to tell which agent has the AI task running
in it. For now we just pick the first agent found, but in the future
this will need to be changed.
## Description
This PR introduces one counter and two histograms related to workspace
creation and claiming. The goal is to provide clearer observability into
how workspaces are created (regular vs prebuild) and the time cost of
those operations.
### `coderd_workspace_creation_total`
* Metric type: Counter
* Name: `coderd_workspace_creation_total`
* Labels: `organization_name`, `template_name`, `preset_name`
This counter tracks whether a regular workspace (not created from a
prebuild pool) was created using a preset or not.
Currently, we already expose `coderd_prebuilt_workspaces_claimed_total`
for claimed prebuilt workspaces, but we lack a comparable metric for
regular workspace creations. This metric fills that gap, making it
possible to compare regular creations against claims.
Implementation notes:
* Exposed as a `coderd_` metric, consistent with other workspace-related
metrics (e.g. `coderd_api_workspace_latest_build`:
https://github.com/coder/coder/blob/main/coderd/prometheusmetrics/prometheusmetrics.go#L149).
* Every `defaultRefreshRate` (1 minute ), DB query
`GetRegularWorkspaceCreateMetrics` is executed to fetch all regular
workspaces (not created from a prebuild pool).
* The counter is updated with the total from all time (not just since
metric introduction). This differs from the histograms below, which only
accumulate from their introduction forward.
### `coderd_workspace_creation_duration_seconds` &
`coderd_prebuilt_workspace_claim_duration_seconds`
* Metric types: Histogram
* Names:
* `coderd_workspace_creation_duration_seconds`
* Labels: `organization_name`, `template_name`, `preset_name`, `type`
(`regular`, `prebuild`)
* `coderd_prebuilt_workspace_claim_duration_seconds`
* Labels: `organization_name`, `template_name`, `preset_name`
We already have `coderd_provisionerd_workspace_build_timings_seconds`,
which tracks build run times for all workspace builds handled by the
provisioner daemon.
However, in the context of this issue, we are only interested in
creation and claim build times, not all transitions; additionally, this
metric does not include `preset_name`, and adding it there would
significantly increase cardinality. Therefore, separate more focused
metrics are introduced here:
* `coderd_workspace_creation_duration_seconds`: Build time to create a
workspace (either a regular workspace or the build into a prebuild pool,
for prebuild initial provisioning build).
* `coderd_prebuilt_workspace_claim_duration_seconds`: Time to claim a
prebuilt workspace from the pool.
The reason for two separate histograms is that:
* Creation (regular or prebuild): provisioning builds with similar time
magnitude, generally expected to take longer than a claim operation.
* Claim: expected to be a much faster provisioning build.
#### Native histogram usage
Provisioning times vary widely between projects. Using static buckets
risks unbalanced or poorly informative histograms.
To address this, these metrics use [Prometheus native
histograms](https://prometheus.io/docs/specs/native_histograms/):
* First introduced in Prometheus v2.40.0
* Recommended stable usage from v2.45+
* Requires Go client `prometheus/client_golang` v1.15.0+
* Experimental and must be explicitly enabled on the server
(`--enable-feature=native-histograms`)
For compatibility, we also retain a classic bucket definition (aligned
with the existing provisioner metric:
https://github.com/coder/coder/blob/main/provisionerd/provisionerd.go#L182-L189).
* If native histograms are enabled, Prometheus ingests the
high-resolution histogram.
* If not, it falls back to the predefined buckets.
Implementation notes:
* Unlike the counter, these histograms are updated in real-time at
workspace build job completion.
* They reflect data only from the point of introduction forward (no
historical backfill).
## Relates to
Closes: https://github.com/coder/coder/issues/19528
Native histograms tested in observability stack:
https://github.com/coder/observability/pull/50
Closes https://github.com/coder/internal/issues/942
The flakey test, `RemoteForwardUnixSocket`, was using `netstat` to check if the unix socket was forwarded properly. In the flake, it looks like netstat was hanging. This PR has `RemoteForwardUnixSocket` be rewritten to match the implementation of `RemoteForwardMultipleUnixSockets`, where we send bytes over the socket in-process instead. More importantly, that test hasn't flaked (yet).
Note: The implementation has been copied directly from the other test, comments and all.
Partially implements https://github.com/coder/internal/issues/893
This isn't the full implementation of `coder exp tasks create` as
defined in the issue, but it is the minimum required to create a task.
Fixescoder/internal#892Fixescoder/internal#896
Example output:
```
❯ coder exp task list
ID NAME STATUS STATE STATE CHANGED MESSAGE
a7a27450-ca16-4553-a6c5-9d6f04808569 task-hardcore-herschel-bd08 running idle 5h22m3s ago Listed root directory contents, working directory reset
50f92138-f463-4f2b-abad-1816264b065f task-musing-dewdney-f058 running idle 6h3m8s ago Completed arithmetic calculation
```
Note: this commit was partially authored by AI.
- Replaces coderdtest.CreateTemplate/TemplateVersion() with direct dbgen
calls. We do not need a fully functional template for these tests.
- Removes provisioner daemon creation/cleanup. We do not need a running
provisioner daemon here; this functionality is tested elsewhere.
- Simplifies provisioner job creation test helpers.
This reduces the test runtime by over 50%:
Old:
```
time go test -count=100 ./cli -test.run=TestProvisionerJobs
ok github.com/coder/coder/v2/cli 50.149s
```
New:
```
time go test -count=100 ./cli -test.run=TestProvisionerJobs
ok github.com/coder/coder/v2/cli 21.898
```
## Summary
In this pull request we're adding support for additional filtering
options to the `provisioners list` CLI command and the
`/provisionerdaemons` API endpoint.
Resolves: https://github.com/coder/coder/issues/18783
### Changes
#### Added CLI Options
- `--show-offline`: When this option is provided, all provisioner
daemons will be returned. This means that when `--show-offline` is not
provided only `idle` and `busy` provisioner daemons will be returned.
- `--status=<list_of_statuses>`: When this option is provided with a
comma-separated list of valid statuses (`idle`, `busy`, or `offline`)
only provisioner daemons that have these statuses will be returned.
- `--max-age=<duration>`: When this option is provided with a valid
duration value (e.g., `24h`, `30s`) only provisioner daemons with a
`last_seen_at` timestamp within the provided max age will be returned.
#### Query Params
- `?offline=true`: Include offline provisioner daemons in the results.
Offline provisioner daemons will be excluded if `?offline=false` or if
offline is not provided.
- `?status=<list_of_statuses>`: Include provisioner daemons with the
specified statuses.
- `?max_age=<duration>`: Include provisioner daemons with a
`last_seen_at` timestamp within the max age duration.
#### Frontend
- Since offline provisioners will not be returned by default anymore
(`--show-offline` has to be provided to see them), a checkbox was added
to the provisioners list page to allow for offline provisioners to be
displayed
- A revamp of the provisioners page will be done in:
https://github.com/coder/coder/issues/17156, this checkbox change was
just added to maintain currently functionality with the backend updates
Current provisioners page (without checkbox)
<img width="1329" height="574" alt="Screenshot 2025-08-20 at 10 51
00 AM"
src="https://github.com/user-attachments/assets/77b73650-0b62-44f0-a77f-acbe5710809f"
/>
Provisioners page with checkbox (unchecked)
<img width="1314" height="626" alt="Screenshot 2025-08-20 at 10 48
40 AM"
src="https://github.com/user-attachments/assets/7ba164ad-6d3f-417b-bd39-338c0161b145"
/>
Provisioner page with checkbox (checked) and URL updated with query
parameters
<img width="1306" height="597" alt="Screenshot 2025-08-20 at 10 50
14 AM"
src="https://github.com/user-attachments/assets/e78d0986-bbf8-491b-9d56-b682973237a0"
/>
### Show Offline vs Offline Status
To list offline provisioner daemons, users can either:
1. Include the `--show-offline` option
OR
2. Include `offline` in the list of values provided to the `--status`
option
fixes https://github.com/coder/internal/issues/559
This test is looking to see that after calling `coder schedule extend
<workspace> 10h`, the scheduled stop time of the workspace is updated
appropriately (or at least that the information printed to the terminal
indicates that).
By using two `time.Now()` calls for the current time and the expected
time, there was the possibility that the second call just barely crossed
over the hour mark. This is shown in the error message when the test
would flake: `wanted "2025-04-07T22:"; got " 2025-04-07T23:00:00+05:30
\r\n"` (the 00:00 letting us know we just barely crossed the hour).
Using the same time object to construct the expected time should fix the
issue.