Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:
```
import (
"context"
"time"
"github.com/prometheus/client_golang/prometheus"
"golang.org/x/xerrors"
"gopkg.in/natefinch/lumberjack.v2"
"cdr.dev/slog/v3"
"github.com/coder/coder/v2/codersdk/agentsdk"
"github.com/coder/serpent"
)
```
3 groups: standard library, 3rd partly libs, Coder libs.
This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).
It also updates dependencies that also use slog and were updated.
I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.
Other dependencies, I pushed new tags.
While scale testing, I noticed that our load generators send basically
all requests to a single Coderd instance.
e.g.

This is because our scale test commands create all `Runner`s using the
same codersdk Client, which means they share an underlying HTTP client.
With HTTP/2 a single TCP session can multiplex many different HTTP
requests (including websockets). So, it creates a single TCP connection
to a single coderd, and then sends all the requests down the one TCP
connections.
This PR modifies the `exp scaletest` load generator commands to create
an independent HTTP client per `Runner`. This means that each runner
will create its own TCP connection. This should help spread the load and
make a more realistic test, because in a real deployment, scaled out
load will be coming over different TCP connections.
Adds `--max-failures` flag to `coder exp scaletest create-workspaces` so that we can tolerate a few failures without failing the command.
When running our scale test infra, we create Kubernetes Jobs to create the initial cluster workspaces, then we have load-generation jobs that depend on them. At high scale, it's kind of expected that some of the requests will fail: even with 99.9% success, you still expect one failure per 1000. It's useful to be able to carry on with the scale test anyway and proceed to traffic generation.
## Description
Fixes the prebuilds scaletest command where the prometheus server was
being shut down before waiting for metrics to be scraped.
The issue was the defer order - since defers execute in LIFO (last-in,
first-out) order:
**Before (broken):**
1. Register tracing defer (includes wait for prometheus scrape)
2. Register prometheus server defer
Execution order: prometheus closes first, then wait happens (server
already gone!)
**After (fixed):**
1. Register prometheus server defer
2. Register tracing defer (includes wait for prometheus scrape)
Execution order: wait happens first (server still up), then prometheus
closes.
This matches the pattern used in other scaletest commands.
## Impact
The `coderd_scaletest_prebuild_deletion_jobs_completed` metric (and
potentially others) was always showing 0 because the server shut down
before Prometheus could scrape the final values.
_This PR was generated by [`mux`](https://github.com/coder/mux) and
reviewed by a human._
For the https://github.com/coder/internal/issues/913 we are going to be targeting running workspaces. So this PR modularizes the CLI flags and logic that select those targets so we can reuse it.
This PR extends the scaletest notification runner with SMTP support.
If the `--smtp-api-url` flag is provided, the runner will also watch for SMTP notifications using the specified URL.
#### Changes
- Added a new watcher to retrieve emails sent to the runner user
- Tracked WebSocket and SMTP latencies separately
- Updated metrics to include `notification_id` and `notification_type` labels
#### CLI Flags
- `--smtp-api-url`: Address of the SMTP mock HTTP API used to retrieve email notifications
#### Metrics
- `notification_delivery_latency_seconds` now includes:
- `notification_id`
- `notification_type` (`websocket` or `smtp`)
This PR adds a fake SMTP server for scale testing. It collects emails sent during tests, which you can then check using the HTTP API.
#### Changes
- Added mock SMTP server
- Added `coder scaletest smtp` CLI command
- Implemented HTTP API endpoints to retrieve messages by email
- Added auto-purge to prevent memory issues
#### HTTP API Endpoints
- `GET /messages?email=<email>` – Get messages sent to an email address
- `POST /purge` – Clear all messages from memory
The HTTP API parses raw email messages to extract the **date**, **subject**, and **notification ID**.
Notification IDs are sent in emails like this:
```html
<p>
<a href="http://127.0.0.1:3000/settings/notifications?disabled=4e19c0ac-94e1-4532-9515-d1801aa283b2"
style="color: #2563eb; text-decoration: none;">
Stop receiving emails like this
</a>
</p>
```
#### CLI
```bash
coder scaletest smtp --host localhost --port 33199 --api-port 8080 --purge-at-count 1000
```
**Flags:**
- `--host`: Host for the mock SMTP and API server (default: localhost)
- `--port`: Port for the mock SMTP server (random if not specified)
- `--api-port`: Port for the HTTP API server (random if not specified)
- `--purge-at-count`: Max number of messages before auto-purging (default: 100000)
part of https://github.com/coder/internal/issues/912
Adds CLI command `coder exp scaletest dynamic-parameters`
I've left out the configuration of tracing and timeouts for now. I think I want to do some refactoring of the scaletest CLI to make handling those flags take up less boiler plate.
I will add tracing and timeout flags in a follow up PR.
Refactors the CLI to create the `*codersdk.Client` in the handlers. This is groundwork for changing the `rootCmd.InitClient()` to use the new `ClientOption`s.
It also improves variable locality, scoping the Client to the handler. This makes misuse less likely and reduces the memory allocations to just the command being executed, rather than allocating a Client for every command regardless of whether it is executed.
Relates to https://github.com/coder/internal/issues/985.
Some scaletest runners would autogenerate names if they weren't supplied on the config, while others required a name be supplied, and a name was autogenerated in the CLI command handler. This PR unifies the runners to make names and emails optional on each config, and generate them in the scaletest runner if omitted.
The create user runner in the PR above in the stack will do this too.
Relates to https://github.com/coder/internal/issues/888
As part of our renewed connection scaletesting efforts, we want to
scaletest coder in a scenario where direct connections aren't available
(relatively common for our customers), and all concurrent connections
are relayed via DERP.
This PR adds a flag, `--disable-direct` that can be included on the
existing`coder exp scaletest workspace-traffic -ssh` to disable direct
connections.
The experimental functions in `golang.org/x/exp/slices` are now
available in the standard library since Go 1.21.
Reference: https://go.dev/doc/go1.21#slices
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
- Adds `--use-host-login` to `coder exp scaletest workspace-traffic`
- Modifies getScaletestWorkspaces to conditionally filter workspaces if `CODER_DISABLE_OWNER_WORKSPACE_ACCESS` is set
- Adds a warning if `CODER_DISABLE_OWNER_WORKSPACE_ACCESS` is set and scaletest workspaces are filtered out due to ownership mismatch.
- Modifies `coderdtest.New` to detect cross-test bleed of `CODER_DISABLE_OWNER_WORKSPACE_ACCESS` and fast-fail.
Currently, importing `codersdk` just to interact with the API requires
importing tailscale, which causes builds to fail unless manually using
our fork.
* fix: separate signals for passive, active, and forced shutdown
`SIGTERM`: Passive shutdown stopping provisioner daemons from accepting new
jobs but waiting for existing jobs to successfully complete.
`SIGINT` (old existing behavior): Notify provisioner daemons to cancel in-flight jobs, wait 5s for jobs to be exited, then force quit.
`SIGKILL`: Untouched from before, will force-quit.
* Revert dramatic signal changes
* Rename
* Fix shutdown behavior for provisioner daemons
* Add test for graceful shutdown
Adds a Logger to cli Invocation and standardizes CLI commands to use it. clitest creates a test logger by default so that CLI command logs are captured in the test logs.
CLI commands that do their own log configuration are modified to add sinks to the existing logger, rather than create a new one. This ensures we still capture logs in CLI tests.
- Set viewport size to avoid responsive mode
- Added way more debug logging
- Added facility to write a screenshot on error in verbose mode.
- Added a deadline for each iteraction of clicking on and waiting for a thing.
- Removes the `exp scaletest` command from the slim binary
- Updates scaletest-runner template to fetch the full binary from the running Coder instance
* Adds a set of actions to automatically interact with a Coder instance using chromedp
* Integrates the chromedp actions into the scaletest dashboard command,
* Re-enables the previously disabled unit tests for scaletest/dashboard
* Removes previous dashboard actions based around codersdk
* chore: add /v2 to import module path
go mod requires semantic versioning with versions greater than 1.x
This was a mechanical update by running:
```
go install github.com/marwan-at-work/mod/cmd/mod@latest
mod upgrade
```
Migrate generated files to import /v2
* Fix gen