For the https://github.com/coder/internal/issues/913 we are going to be targeting running workspaces. So this PR modularizes the CLI flags and logic that select those targets so we can reuse it.
<!--
If you have used AI to produce some or all of this PR, please ensure you have read our [AI Contribution guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING) before submitting.
-->
Add support for scoped API tokens in CLI
This PR adds CLI support for creating and viewing API tokens with scopes and allow lists. It includes:
- New `--scope` and `--allow` flags for the `tokens create` command
- A new `tokens view` command to display detailed information about a token
- Updated table columns in `tokens list` to show scopes and allow list entries
- Updated help text and examples
These changes enable users to create tokens with limited permissions through the CLI, similar to the existing functionality in the web UI.
If you were to somehow get a 401, or some other unexpected HTTP status code when following a workspace's build logs, `coder ssh` would swallow the error, and give you a different error that didn't make sense.
In this case I was getting a 401 on `/templateversions/{templateversion}/dry-run/{jobID}/logs` , but the CLI error would say the workspace had no agents.
I ran into the 401 when running a scaletest, and then whilst attempting to reproduce the issue locally, I ran `coder ssh` from one build of Coder that used `coder_session_token` as the session token cookie name, whilst the other build used `dev_coder_session_token` (as set by `develop.sh`). For reference, the CLI uses cookies when following the build logs.
Adds CompletionHandler to the ssh command that dynamically suggests
workspace and agent targets based on the user's running workspaces.
Features:
- Suggests workspace name for single-agent workspaces
- Suggests agent.workspace format for all agents in multi-agent
workspaces
- Only shows running workspaces (matches immediate availability)
- Alphabetically sorted completions for better UX
Tests cover single-agent, multi-agent, and network error scenarios.
Amp-Thread-ID:
https://ampcode.com/threads/T-d137d343-53f3-4ece-be5a-584249bbd9e8
<!--
If you have used AI to produce some or all of this PR, please ensure you
have read our [AI Contribution
guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING)
before submitting.
-->
closes#20158
Demo:
https://github.com/user-attachments/assets/e1000463-ded6-4bc9-b013-61780453f019
---------
Co-authored-by: Ethan Dickson <ethan@coder.com>
Updates the UI to use the new API endpoints for tasks and use its new
data model.
Disclaimer: Since the base data model for tasks changed, we had to do a
quite large refactor and I'm sorry for that 🙏, but you'll notice most of
the changes are to adjust the types.
Closescoder/internal#976
---------
Co-authored-by: Bruno Quaresma <bruno_nonato_quaresma@hotmail.com>
Since `coder exp scaletest dynamic-parameters` ends up creating template versions, provisioner tags may be required to create the template versions.
On our own scaletest clusters, we tag every provisioner, so an untagged template version won't build and won't get imported.
This PR extends the scaletest notification runner with SMTP support.
If the `--smtp-api-url` flag is provided, the runner will also watch for SMTP notifications using the specified URL.
#### Changes
- Added a new watcher to retrieve emails sent to the runner user
- Tracked WebSocket and SMTP latencies separately
- Updated metrics to include `notification_id` and `notification_type` labels
#### CLI Flags
- `--smtp-api-url`: Address of the SMTP mock HTTP API used to retrieve email notifications
#### Metrics
- `notification_delivery_latency_seconds` now includes:
- `notification_id`
- `notification_type` (`websocket` or `smtp`)
This PR adds a fake SMTP server for scale testing. It collects emails sent during tests, which you can then check using the HTTP API.
#### Changes
- Added mock SMTP server
- Added `coder scaletest smtp` CLI command
- Implemented HTTP API endpoints to retrieve messages by email
- Added auto-purge to prevent memory issues
#### HTTP API Endpoints
- `GET /messages?email=<email>` – Get messages sent to an email address
- `POST /purge` – Clear all messages from memory
The HTTP API parses raw email messages to extract the **date**, **subject**, and **notification ID**.
Notification IDs are sent in emails like this:
```html
<p>
<a href="http://127.0.0.1:3000/settings/notifications?disabled=4e19c0ac-94e1-4532-9515-d1801aa283b2"
style="color: #2563eb; text-decoration: none;">
Stop receiving emails like this
</a>
</p>
```
#### CLI
```bash
coder scaletest smtp --host localhost --port 33199 --api-port 8080 --purge-at-count 1000
```
**Flags:**
- `--host`: Host for the mock SMTP and API server (default: localhost)
- `--port`: Port for the mock SMTP server (random if not specified)
- `--api-port`: Port for the HTTP API server (random if not specified)
- `--purge-at-count`: Max number of messages before auto-purging (default: 100000)
- Adds FK from `aibridge_interceptions.initiator_id` to `users.id`
- This is enforced by deleting any rows that don't have any users. Since
this is an experimental feature AND coder never deletes user rows I
think this is acceptable.
- Adds `name` as a property on `codersdk.MinimalUser`
- This matches the `visible_users` view in the database. I'm unsure why
`name` wasn't already included given that `username` is.
- Adds a new `initiator` field to `codersdk.AIBridgeInterception` which
contains `codersdk.MinimalUser` (ID, username, name, avatar URL)
- Removes `initiator_id` from `codersdk.AIBridgeInterception`
- Should be fine since we're still in early access
Sometimes tests would fail because the port embedded postgres tries to
use is already in use. This is because there's no way to tell postgres
to use an ephemeral port in tests. This change adds retries to starting
embedded postgres when the port is not explicitly defined (e.g. tests) which
should rid of, or at least significantly reduce, these flakes.
https://github.com/coder/internal/issues/658
Adds some coderd integration tests for `coder exp tasks (send|logs)`.
The actual agentapi interaction is faked out. I figure we don't want to
actually start a real agentapi instance here.
Authored by Claude with some manual cleanup.
## Summary
In this pull request we're adding a simple backoff to the workspace
agent polling. This backoff is being added to address seemingly random
cases of elevated number of calls that we've seen to the
`api/v2/workspaceagent/{agent_id}` endpoint.
For more information on the investigation, see:
https://github.com/coder/internal/issues/725
### Changes
- Updated the polling to use predefined progressive intervals for
polling instead of continuously polling every 500ms
### Testing
- Added a test for the function used to calculate the progressive
polling interval
Co-authored-by: Spike Curtis <spike@coder.com>
Closes https://github.com/coder/internal/issues/978
- Introduce `CODER_TASK_ID` and `CODER_TASK_PROMPT` to the provisioner
environment
- Make use of new `app_id` field in provider, with a fallback to
`sidebar_app.id` for backwards compatibility
**For now** I've left the `taskPrompt` and `taskID` as a TODO as we do
not yet create these values.
part of https://github.com/coder/internal/issues/912
Adds CLI command `coder exp scaletest dynamic-parameters`
I've left out the configuration of tracing and timeouts for now. I think I want to do some refactoring of the scaletest CLI to make handling those flags take up less boiler plate.
I will add tracing and timeout flags in a follow up PR.
Reverts coder/coder#20181
I realized we don’t need this in the task response. When loading a task,
we already need much more workspace information, so it makes more sense
to fetch the workspace data separately instead of trying to embed all
its details into the response.
I think we can keep the task response clean and focused on the essential
information needed to list tasks. For more specific details, we can
fetch the related resources as needed. So, I’m reverting this PR.
In https://github.com/coder/coder/pull/20137, we added a new flag to
`coder provisioner jobs list`, namely `--initiator`.
To make some follow-up worth it, I need to rename an API param used in
the process before it becomes part of our released and tagged API.
Instead of only accepting UUIDs, we accept an arbitrary string.
We still validate it as a UUID now, but we will expand its validation to
allow any string and then resolve that string the same way that we
resolve the user parameter elsewhere in the API.
Relates to https://github.com/coder/internal/issues/934
This PR provides a mechanism to filter provisioner jobs according to who
initiated the job.
This will be used to find pending prebuild jobs when prebuilds have
overwhelmed the provisioner job queue. They can then be canceled.
If prebuilds are overwhelming provisioners, the following steps will be
taken:
```bash
# pause prebuild reconciliation to limit provisioner queue pollution:
coder prebuilds pause
# cancel pending provisioner jobs to clear the queue
coder provisioner jobs list --initiator="prebuilds" --status="pending" | jq ... | xargs -n1 -I{} coder provisioner jobs cancel {}
# push a fixed template and wait for the import to complete
coder templates push ... # push a fixed template
# resume prebuild reconciliation
coder prebuilds resume
```
This interface differs somewhat from what was specified in the issue,
but still provides a mechanism that addresses the issue. The original
proposal was made by myself and this simpler implementation makes sense.
I might add a `--search` parameter in a follow-up if there is appetite
for it.
Potential follow ups:
* Support for this usage: `coder provisioner jobs list --search
"initiator:prebuilds status:pending"`
* Adding the same parameters to `coder provisioner jobs cancel` as a
convenience feature so that operators don't have to pipe through `jq`
and `xargs`
## Summary
In this pull request we're adding support in the CLI for prompting the
user for any missing required template variables in the `coder templates
push` command and automatically retrying the template build once a user
has provided any missing variable values.
Closes: https://github.com/coder/coder/issues/19782
### Demo
In the following recording I created a simple template terraform file
that used different variable types (string, number, boolean, and
sensitive) and prompted the user to enter a value for each variable.
<details>
<summary>See example template terraform file</summary>
```tf
...
# Required variables for testing interactive prompting
variable "docker_image" {
description = "Docker image to use for the workspace"
type = string
}
variable "workspace_name" {
description = "Name of the workspace"
type = string
}
variable "cpu_limit" {
description = "CPU limit for the container (number of cores)"
type = number
}
variable "enable_gpu" {
description = "Enable GPU access for the container"
type = bool
}
variable "api_key" {
description = "API key for external services (sensitive)"
type = string
sensitive = true
}
# Optional variable with default
variable "docker_socket" {
default = "/var/run/docker.sock"
description = "Docker socket path"
type = string
}
...
```
</details>
Once the user entered a valid value for each variable, the template
build would be retried.
https://github.com/user-attachments/assets/770cf954-3cbc-4464-925e-2be4e32a97de
<details>
<summary>See output from recording</summary>
```shell
$ ./scripts/coder-dev.sh templates push test-required-params -d examples/templates/test-required-params/
INFO : Overriding codersdk.SessionTokenCookie as we are developing inside a Coder workspace.
/home/coder/coder/build/coder-slim_2.26.0-devel+a68122ca3_linux_amd64
Provisioner tags: <none>
WARN: No .terraform.lock.hcl file found
| When provisioning, Coder will be unable to cache providers without a lockfile and must download them from the internet each time.
| Create one by running terraform init in your template directory.
> Upload "examples/templates/test-required-params"? (yes/no) yes
=== ✔ Queued [0ms]
==> ⧗ Running
==> ⧗ Running
=== ✔ Running [4ms]
==> ⧗ Setting up
=== ✔ Setting up [0ms]
==> ⧗ Parsing template parameters
=== ✔ Parsing template parameters [8ms]
==> ⧗ Cleaning Up
=== ✘ Cleaning Up [4ms]
=== ✘ Cleaning Up [8ms]
Found 5 missing required variables:
- docker_image (string): Docker image to use for the workspace
- workspace_name (string): Name of the workspace
- cpu_limit (number): CPU limit for the container (number of cores)
- enable_gpu (bool): Enable GPU access for the container
- api_key (string): API key for external services (sensitive)
The template requires values for the following variables:
var.docker_image (required)
Description: Docker image to use for the workspace
Type: string
Current value: <empty>
> Enter value: image-name
var.workspace_name (required)
Description: Name of the workspace
Type: string
Current value: <empty>
> Enter value: workspace-name
var.cpu_limit (required)
Description: CPU limit for the container (number of cores)
Type: number
Current value: <empty>
> Enter value: 1
var.enable_gpu (required)
Description: Enable GPU access for the container
Type: bool
Current value: <empty>
? Select value: false
var.api_key (required), sensitive
Description: API key for external services (sensitive)
Type: string
Current value: <empty>
> Enter value: (*redacted*) ******
Retrying template build with provided variables...
=== ✔ Queued [0ms]
==> ⧗ Running
==> ⧗ Running
=== ✔ Running [2ms]
==> ⧗ Setting up
=== ✔ Setting up [0ms]
==> ⧗ Parsing template parameters
=== ✔ Parsing template parameters [7ms]
==> ⧗ Detecting persistent resources
2025-09-25 22:34:14.731Z Terraform 1.13.0
2025-09-25 22:34:15.140Z data.coder_provisioner.me: Refreshing...
2025-09-25 22:34:15.140Z data.coder_workspace.me: Refreshing...
2025-09-25 22:34:15.140Z data.coder_workspace_owner.me: Refreshing...
2025-09-25 22:34:15.141Z data.coder_provisioner.me: Refresh complete after 0s [id=2bd73098-d127-4362-b3a5-628e5bce6998]
2025-09-25 22:34:15.141Z data.coder_workspace_owner.me: Refresh complete after 0s [id=c2006933-4f3e-4c45-9e04-79612c3a5eca]
2025-09-25 22:34:15.141Z data.coder_workspace.me: Refresh complete after 0s [id=36f2dc6f-0bf2-43bd-bc4d-b29768334e02]
2025-09-25 22:34:15.186Z coder_agent.main: Plan to create
2025-09-25 22:34:15.186Z module.code-server[0].coder_app.code-server: Plan to create
2025-09-25 22:34:15.186Z docker_volume.home_volume: Plan to create
2025-09-25 22:34:15.186Z module.code-server[0].coder_script.code-server: Plan to create
2025-09-25 22:34:15.187Z docker_container.workspace[0]: Plan to create
2025-09-25 22:34:15.187Z Plan: 5 to add, 0 to change, 0 to destroy.
=== ✔ Detecting persistent resources [3104ms]
==> ⧗ Detecting ephemeral resources
2025-09-25 22:34:16.033Z Terraform 1.13.0
2025-09-25 22:34:16.428Z data.coder_workspace.me: Refreshing...
2025-09-25 22:34:16.428Z data.coder_provisioner.me: Refreshing...
2025-09-25 22:34:16.429Z data.coder_workspace_owner.me: Refreshing...
2025-09-25 22:34:16.429Z data.coder_provisioner.me: Refresh complete after 0s [id=2d2f7083-88e6-425c-9df3-856a3bf4cc73]
2025-09-25 22:34:16.429Z data.coder_workspace.me: Refresh complete after 0s [id=c723575e-c7d3-43d7-bf54-0e34d0959dc3]
2025-09-25 22:34:16.431Z data.coder_workspace_owner.me: Refresh complete after 0s [id=d43470c2-236e-4ae9-a977-6b53688c2cb1]
2025-09-25 22:34:16.453Z coder_agent.main: Plan to create
2025-09-25 22:34:16.453Z docker_volume.home_volume: Plan to create
2025-09-25 22:34:16.454Z Plan: 2 to add, 0 to change, 0 to destroy.
=== ✔ Detecting ephemeral resources [1278ms]
==> ⧗ Cleaning Up
=== ✔ Cleaning Up [6ms]
┌──────────────────────────────────┐
│ Template Preview │
├──────────────────────────────────┤
│ RESOURCE │
├──────────────────────────────────┤
│ docker_container.workspace │
│ └─ main (linux, amd64) │
├──────────────────────────────────┤
│ docker_volume.home_volume │
└──────────────────────────────────┘
The test-required-params template has been created at Sep 25
22:34:16! Developers can provision a workspace with this template using:
Updated version at Sep 25 22:34:16!
```
</details>
### Changes
- Added a new function to check if the provisioner failed due to a
template missing required variables
- Added a handler function that is called when a provisioner fails due
to the "missing required variables" error. The handler function will:
- Check for provided template variables and identify any missing
variables
- Prompt the user for any missing variables (prompt is adapted based on
the variable type)
- Validate user input for missing variables
- Retry the template build when all variables have been provided by the
user
### Testing
Added tests for the following scenarios:
- Ensure validation based on variable type
- Ensure users are not prompted for variables with a default value
- Ensure variables provided via a variables files (`--variables-file`)
or a variable flag (`--variable`) take precedence over a template
This PR makes the initial steps at removing usage of the global Go HTTP
client, which was seen to have impacts on test flakiness in
https://github.com/coder/internal/issues/1020. The first commit removes
uses from tests, with the exception of one test that is tightly coupled
to the default client. The second commit makes easy/low-risk removals
from application code. This should have some impact to reduce test flakiness.
Fixes https://github.com/coder/internal/issues/1035
Or, at least, closes a remaining race that seems pretty likely.
The tests in question write a file, close the file, then execute the file. Sometimes Linux errors saying "text file busy" which means the file is still open for writing.
What I think is going on is:
1. Test_sshConfigProxyCommandEscape goroutine opens the file and begins writing.
2. Some other, unrelated test execs a command, which causes a `fork()` syscall. The child process now has a copy of the file descriptor to our open file.
3. Test_sshConfigProxyCommandEscape goroutine executes the file and gets "text file busy".
4. The child process calls the `exec` syscall, which closes the file (due to `CLOEXEC` being set).
The race is very tight because 3 has to happen before 4 (and, 3 involves it's own fork/exec), but it's not impossible on a busy system.
c.f. #14233 which was an earlier attempt to fix this. It only prevented the subtests from running in parallel. When the subtests were all running in parallel, the flake was fairly likely because you've got all this fork() activity happening at the same time. But, since the main test was in parallel there is still a chance a totally different test is `fork`'ing at in inopportune time.
Builds upon https://github.com/coder/coder/pull/19970
I got kinda carried away when I saw the extra stuff we could add in
here, so I went ahead and added it:
* User ID
* Organization IDs
* Roles
This technically duplicates functionality from `coder users show` but I
figure folks may find it useful.
* Improves logic for `exp task status --watch` so that it will also exit
on task idle status.
* Adds workspace agent health to `exp task status` output.