This PR extends the scaletest notification runner with SMTP support.
If the `--smtp-api-url` flag is provided, the runner will also watch for SMTP notifications using the specified URL.
#### Changes
- Added a new watcher to retrieve emails sent to the runner user
- Tracked WebSocket and SMTP latencies separately
- Updated metrics to include `notification_id` and `notification_type` labels
#### CLI Flags
- `--smtp-api-url`: Address of the SMTP mock HTTP API used to retrieve email notifications
#### Metrics
- `notification_delivery_latency_seconds` now includes:
- `notification_id`
- `notification_type` (`websocket` or `smtp`)
This PR adds a fake SMTP server for scale testing. It collects emails sent during tests, which you can then check using the HTTP API.
#### Changes
- Added mock SMTP server
- Added `coder scaletest smtp` CLI command
- Implemented HTTP API endpoints to retrieve messages by email
- Added auto-purge to prevent memory issues
#### HTTP API Endpoints
- `GET /messages?email=<email>` – Get messages sent to an email address
- `POST /purge` – Clear all messages from memory
The HTTP API parses raw email messages to extract the **date**, **subject**, and **notification ID**.
Notification IDs are sent in emails like this:
```html
<p>
<a href="http://127.0.0.1:3000/settings/notifications?disabled=4e19c0ac-94e1-4532-9515-d1801aa283b2"
style="color: #2563eb; text-decoration: none;">
Stop receiving emails like this
</a>
</p>
```
#### CLI
```bash
coder scaletest smtp --host localhost --port 33199 --api-port 8080 --purge-at-count 1000
```
**Flags:**
- `--host`: Host for the mock SMTP and API server (default: localhost)
- `--port`: Port for the mock SMTP server (random if not specified)
- `--api-port`: Port for the HTTP API server (random if not specified)
- `--purge-at-count`: Max number of messages before auto-purging (default: 100000)
- Adds FK from `aibridge_interceptions.initiator_id` to `users.id`
- This is enforced by deleting any rows that don't have any users. Since
this is an experimental feature AND coder never deletes user rows I
think this is acceptable.
- Adds `name` as a property on `codersdk.MinimalUser`
- This matches the `visible_users` view in the database. I'm unsure why
`name` wasn't already included given that `username` is.
- Adds a new `initiator` field to `codersdk.AIBridgeInterception` which
contains `codersdk.MinimalUser` (ID, username, name, avatar URL)
- Removes `initiator_id` from `codersdk.AIBridgeInterception`
- Should be fine since we're still in early access
Sometimes tests would fail because the port embedded postgres tries to
use is already in use. This is because there's no way to tell postgres
to use an ephemeral port in tests. This change adds retries to starting
embedded postgres when the port is not explicitly defined (e.g. tests) which
should rid of, or at least significantly reduce, these flakes.
https://github.com/coder/internal/issues/658
Adds some coderd integration tests for `coder exp tasks (send|logs)`.
The actual agentapi interaction is faked out. I figure we don't want to
actually start a real agentapi instance here.
Authored by Claude with some manual cleanup.
## Summary
In this pull request we're adding a simple backoff to the workspace
agent polling. This backoff is being added to address seemingly random
cases of elevated number of calls that we've seen to the
`api/v2/workspaceagent/{agent_id}` endpoint.
For more information on the investigation, see:
https://github.com/coder/internal/issues/725
### Changes
- Updated the polling to use predefined progressive intervals for
polling instead of continuously polling every 500ms
### Testing
- Added a test for the function used to calculate the progressive
polling interval
Co-authored-by: Spike Curtis <spike@coder.com>
Closes https://github.com/coder/internal/issues/978
- Introduce `CODER_TASK_ID` and `CODER_TASK_PROMPT` to the provisioner
environment
- Make use of new `app_id` field in provider, with a fallback to
`sidebar_app.id` for backwards compatibility
**For now** I've left the `taskPrompt` and `taskID` as a TODO as we do
not yet create these values.
part of https://github.com/coder/internal/issues/912
Adds CLI command `coder exp scaletest dynamic-parameters`
I've left out the configuration of tracing and timeouts for now. I think I want to do some refactoring of the scaletest CLI to make handling those flags take up less boiler plate.
I will add tracing and timeout flags in a follow up PR.
Reverts coder/coder#20181
I realized we don’t need this in the task response. When loading a task,
we already need much more workspace information, so it makes more sense
to fetch the workspace data separately instead of trying to embed all
its details into the response.
I think we can keep the task response clean and focused on the essential
information needed to list tasks. For more specific details, we can
fetch the related resources as needed. So, I’m reverting this PR.
In https://github.com/coder/coder/pull/20137, we added a new flag to
`coder provisioner jobs list`, namely `--initiator`.
To make some follow-up worth it, I need to rename an API param used in
the process before it becomes part of our released and tagged API.
Instead of only accepting UUIDs, we accept an arbitrary string.
We still validate it as a UUID now, but we will expand its validation to
allow any string and then resolve that string the same way that we
resolve the user parameter elsewhere in the API.
Relates to https://github.com/coder/internal/issues/934
This PR provides a mechanism to filter provisioner jobs according to who
initiated the job.
This will be used to find pending prebuild jobs when prebuilds have
overwhelmed the provisioner job queue. They can then be canceled.
If prebuilds are overwhelming provisioners, the following steps will be
taken:
```bash
# pause prebuild reconciliation to limit provisioner queue pollution:
coder prebuilds pause
# cancel pending provisioner jobs to clear the queue
coder provisioner jobs list --initiator="prebuilds" --status="pending" | jq ... | xargs -n1 -I{} coder provisioner jobs cancel {}
# push a fixed template and wait for the import to complete
coder templates push ... # push a fixed template
# resume prebuild reconciliation
coder prebuilds resume
```
This interface differs somewhat from what was specified in the issue,
but still provides a mechanism that addresses the issue. The original
proposal was made by myself and this simpler implementation makes sense.
I might add a `--search` parameter in a follow-up if there is appetite
for it.
Potential follow ups:
* Support for this usage: `coder provisioner jobs list --search
"initiator:prebuilds status:pending"`
* Adding the same parameters to `coder provisioner jobs cancel` as a
convenience feature so that operators don't have to pipe through `jq`
and `xargs`
## Summary
In this pull request we're adding support in the CLI for prompting the
user for any missing required template variables in the `coder templates
push` command and automatically retrying the template build once a user
has provided any missing variable values.
Closes: https://github.com/coder/coder/issues/19782
### Demo
In the following recording I created a simple template terraform file
that used different variable types (string, number, boolean, and
sensitive) and prompted the user to enter a value for each variable.
<details>
<summary>See example template terraform file</summary>
```tf
...
# Required variables for testing interactive prompting
variable "docker_image" {
description = "Docker image to use for the workspace"
type = string
}
variable "workspace_name" {
description = "Name of the workspace"
type = string
}
variable "cpu_limit" {
description = "CPU limit for the container (number of cores)"
type = number
}
variable "enable_gpu" {
description = "Enable GPU access for the container"
type = bool
}
variable "api_key" {
description = "API key for external services (sensitive)"
type = string
sensitive = true
}
# Optional variable with default
variable "docker_socket" {
default = "/var/run/docker.sock"
description = "Docker socket path"
type = string
}
...
```
</details>
Once the user entered a valid value for each variable, the template
build would be retried.
https://github.com/user-attachments/assets/770cf954-3cbc-4464-925e-2be4e32a97de
<details>
<summary>See output from recording</summary>
```shell
$ ./scripts/coder-dev.sh templates push test-required-params -d examples/templates/test-required-params/
INFO : Overriding codersdk.SessionTokenCookie as we are developing inside a Coder workspace.
/home/coder/coder/build/coder-slim_2.26.0-devel+a68122ca3_linux_amd64
Provisioner tags: <none>
WARN: No .terraform.lock.hcl file found
| When provisioning, Coder will be unable to cache providers without a lockfile and must download them from the internet each time.
| Create one by running terraform init in your template directory.
> Upload "examples/templates/test-required-params"? (yes/no) yes
=== ✔ Queued [0ms]
==> ⧗ Running
==> ⧗ Running
=== ✔ Running [4ms]
==> ⧗ Setting up
=== ✔ Setting up [0ms]
==> ⧗ Parsing template parameters
=== ✔ Parsing template parameters [8ms]
==> ⧗ Cleaning Up
=== ✘ Cleaning Up [4ms]
=== ✘ Cleaning Up [8ms]
Found 5 missing required variables:
- docker_image (string): Docker image to use for the workspace
- workspace_name (string): Name of the workspace
- cpu_limit (number): CPU limit for the container (number of cores)
- enable_gpu (bool): Enable GPU access for the container
- api_key (string): API key for external services (sensitive)
The template requires values for the following variables:
var.docker_image (required)
Description: Docker image to use for the workspace
Type: string
Current value: <empty>
> Enter value: image-name
var.workspace_name (required)
Description: Name of the workspace
Type: string
Current value: <empty>
> Enter value: workspace-name
var.cpu_limit (required)
Description: CPU limit for the container (number of cores)
Type: number
Current value: <empty>
> Enter value: 1
var.enable_gpu (required)
Description: Enable GPU access for the container
Type: bool
Current value: <empty>
? Select value: false
var.api_key (required), sensitive
Description: API key for external services (sensitive)
Type: string
Current value: <empty>
> Enter value: (*redacted*) ******
Retrying template build with provided variables...
=== ✔ Queued [0ms]
==> ⧗ Running
==> ⧗ Running
=== ✔ Running [2ms]
==> ⧗ Setting up
=== ✔ Setting up [0ms]
==> ⧗ Parsing template parameters
=== ✔ Parsing template parameters [7ms]
==> ⧗ Detecting persistent resources
2025-09-25 22:34:14.731Z Terraform 1.13.0
2025-09-25 22:34:15.140Z data.coder_provisioner.me: Refreshing...
2025-09-25 22:34:15.140Z data.coder_workspace.me: Refreshing...
2025-09-25 22:34:15.140Z data.coder_workspace_owner.me: Refreshing...
2025-09-25 22:34:15.141Z data.coder_provisioner.me: Refresh complete after 0s [id=2bd73098-d127-4362-b3a5-628e5bce6998]
2025-09-25 22:34:15.141Z data.coder_workspace_owner.me: Refresh complete after 0s [id=c2006933-4f3e-4c45-9e04-79612c3a5eca]
2025-09-25 22:34:15.141Z data.coder_workspace.me: Refresh complete after 0s [id=36f2dc6f-0bf2-43bd-bc4d-b29768334e02]
2025-09-25 22:34:15.186Z coder_agent.main: Plan to create
2025-09-25 22:34:15.186Z module.code-server[0].coder_app.code-server: Plan to create
2025-09-25 22:34:15.186Z docker_volume.home_volume: Plan to create
2025-09-25 22:34:15.186Z module.code-server[0].coder_script.code-server: Plan to create
2025-09-25 22:34:15.187Z docker_container.workspace[0]: Plan to create
2025-09-25 22:34:15.187Z Plan: 5 to add, 0 to change, 0 to destroy.
=== ✔ Detecting persistent resources [3104ms]
==> ⧗ Detecting ephemeral resources
2025-09-25 22:34:16.033Z Terraform 1.13.0
2025-09-25 22:34:16.428Z data.coder_workspace.me: Refreshing...
2025-09-25 22:34:16.428Z data.coder_provisioner.me: Refreshing...
2025-09-25 22:34:16.429Z data.coder_workspace_owner.me: Refreshing...
2025-09-25 22:34:16.429Z data.coder_provisioner.me: Refresh complete after 0s [id=2d2f7083-88e6-425c-9df3-856a3bf4cc73]
2025-09-25 22:34:16.429Z data.coder_workspace.me: Refresh complete after 0s [id=c723575e-c7d3-43d7-bf54-0e34d0959dc3]
2025-09-25 22:34:16.431Z data.coder_workspace_owner.me: Refresh complete after 0s [id=d43470c2-236e-4ae9-a977-6b53688c2cb1]
2025-09-25 22:34:16.453Z coder_agent.main: Plan to create
2025-09-25 22:34:16.453Z docker_volume.home_volume: Plan to create
2025-09-25 22:34:16.454Z Plan: 2 to add, 0 to change, 0 to destroy.
=== ✔ Detecting ephemeral resources [1278ms]
==> ⧗ Cleaning Up
=== ✔ Cleaning Up [6ms]
┌──────────────────────────────────┐
│ Template Preview │
├──────────────────────────────────┤
│ RESOURCE │
├──────────────────────────────────┤
│ docker_container.workspace │
│ └─ main (linux, amd64) │
├──────────────────────────────────┤
│ docker_volume.home_volume │
└──────────────────────────────────┘
The test-required-params template has been created at Sep 25
22:34:16! Developers can provision a workspace with this template using:
Updated version at Sep 25 22:34:16!
```
</details>
### Changes
- Added a new function to check if the provisioner failed due to a
template missing required variables
- Added a handler function that is called when a provisioner fails due
to the "missing required variables" error. The handler function will:
- Check for provided template variables and identify any missing
variables
- Prompt the user for any missing variables (prompt is adapted based on
the variable type)
- Validate user input for missing variables
- Retry the template build when all variables have been provided by the
user
### Testing
Added tests for the following scenarios:
- Ensure validation based on variable type
- Ensure users are not prompted for variables with a default value
- Ensure variables provided via a variables files (`--variables-file`)
or a variable flag (`--variable`) take precedence over a template
This PR makes the initial steps at removing usage of the global Go HTTP
client, which was seen to have impacts on test flakiness in
https://github.com/coder/internal/issues/1020. The first commit removes
uses from tests, with the exception of one test that is tightly coupled
to the default client. The second commit makes easy/low-risk removals
from application code. This should have some impact to reduce test flakiness.
Fixes https://github.com/coder/internal/issues/1035
Or, at least, closes a remaining race that seems pretty likely.
The tests in question write a file, close the file, then execute the file. Sometimes Linux errors saying "text file busy" which means the file is still open for writing.
What I think is going on is:
1. Test_sshConfigProxyCommandEscape goroutine opens the file and begins writing.
2. Some other, unrelated test execs a command, which causes a `fork()` syscall. The child process now has a copy of the file descriptor to our open file.
3. Test_sshConfigProxyCommandEscape goroutine executes the file and gets "text file busy".
4. The child process calls the `exec` syscall, which closes the file (due to `CLOEXEC` being set).
The race is very tight because 3 has to happen before 4 (and, 3 involves it's own fork/exec), but it's not impossible on a busy system.
c.f. #14233 which was an earlier attempt to fix this. It only prevented the subtests from running in parallel. When the subtests were all running in parallel, the flake was fairly likely because you've got all this fork() activity happening at the same time. But, since the main test was in parallel there is still a chance a totally different test is `fork`'ing at in inopportune time.
Builds upon https://github.com/coder/coder/pull/19970
I got kinda carried away when I saw the extra stuff we could add in
here, so I went ahead and added it:
* User ID
* Organization IDs
* Roles
This technically duplicates functionality from `coder users show` but I
figure folks may find it useful.
* Improves logic for `exp task status --watch` so that it will also exit
on task idle status.
* Adds workspace agent health to `exp task status` output.
This changes the task get endpoint to omit app statuses for previous
'lifetimes' of a workspace.
It also introduces a [breaking
change](https://github.com/coder/coder/blob/release/2.26/codersdk/aitasks.go#L83)
to bring `TaskStateComplete` in line with
`WorkspaceAppStatusStateComplete`. I can alternatively revert this
change and add a conversion function between the two SDK types.
This addresses a long-standing gripe of mine: to get your logged in
username you would have to do
```bash
coder whoami | awk '{print $9}'
```
This allows you to do:
```
coder whoami -o json | jq -r '.username'
```
or
```
coder whoami -f table -c username
```
Closes#19812
## Problem
> When I try to SSH into my workspace with multiple agents. It does not
provide an intuitive way to do that successfully and instead misguides
by printing wrong instructions.
This PR enhances the error handling to provide suggestions with SSH
commands that users can copy and paste directly.
Before:
```
Encountered an error running "coder ssh", see "coder ssh --help" for more information
error: multiple agents found, please specify the agent name, available agents: [coder dev]
```
After:
```
Encountered an error running "coder ssh", see "coder ssh --help" for more information
error: multiple agents found, please specify the agent name, available agents: [coder dev]
Try running:
$ ssh coder.dogfood.me.coder
$ ssh dev.dogfood.me.coder
```
As part of converting production code to use the new ClientBuilder, I noticed some dead code that creates a client with a URL for the only purpose of later accessing the URL. This PR removes the cruft.
Refactors the CLI to create the `*codersdk.Client` in the handlers. This is groundwork for changing the `rootCmd.InitClient()` to use the new `ClientOption`s.
It also improves variable locality, scoping the Client to the handler. This makes misuse less likely and reduces the memory allocations to just the command being executed, rather than allocating a Client for every command regardless of whether it is executed.
Relates to https://github.com/coder/internal/issues/985.
Some scaletest runners would autogenerate names if they weren't supplied on the config, while others required a name be supplied, and a name was autogenerated in the CLI command handler. This PR unifies the runners to make names and emails optional on each config, and generate them in the scaletest runner if omitted.
The create user runner in the PR above in the stack will do this too.
Solves #15575
Adds OAuth access token revocation when unlinking external auth
provider. Due to revocation not being consistently implemented by
providers this is only best effort attempt. Unsuccessful revocation
won't influence link removal.
fixes https://github.com/coder/internal/issues/946
Some tests tear down the server before we are done with PostgreSQL work, and the default `clitest` infrastructure fails the test if any errors like that are thrown. This PR modifies the tests like that to ignore postgreSQL errors like this.
fixes https://github.com/coder/internal/issues/966
TestCloserStack_Timeout creates `asyncCloser`s which allow control over the exact timing and order of their close method returning. They also, as a final backstop will throw an error if the test context ends before they are unblocked.
TestCloserStack_Timeout unblocks all `asyncCloser`s in a defer and then ends the test. This defer _unblocks_ the running close goroutines, but does not wait for them to finish. Since the test context is canceled as soon as the test completes, this creates a race condition where the close goroutines can trigger the context cancelled arm of the `select` statement.
The fix is to both unblock and wait for all close goroutines to complete before ending the test and cancelling the context.