Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:
```
import (
"context"
"time"
"github.com/prometheus/client_golang/prometheus"
"golang.org/x/xerrors"
"gopkg.in/natefinch/lumberjack.v2"
"cdr.dev/slog/v3"
"github.com/coder/coder/v2/codersdk/agentsdk"
"github.com/coder/serpent"
)
```
3 groups: standard library, 3rd partly libs, Coder libs.
This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).
It also updates dependencies that also use slog and were updated.
I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.
Other dependencies, I pushed new tags.
This change updates how SMTP notifications are polled during scale
tests.
Before, each of the ~2,000 pollers created its own http.Client, which
opened thousands of short-lived TCP connections.
Under heavy load, this ran out of available network ports and caused
errors like `connect: cannot assign requested address`
Now, all pollers share one HTTP connection pool. This prevents port
exhaustion and makes polling faster and more stable.
If a network error happens, the poller will now retry instead of
stopping, so tests keep running until all notifications are received.
The `SMTPRequestTimeout` is now applied per request using a context,
instead of being set on the `http.Client`.
This PR refactors the notification scale test to use template admins and template deletion as the notification trigger. Additionally, I've added a configurable timeout for SMTP requests.
Previously, notifications were triggered by creating/deleting a user, and notifications were received by users with the owner role. However, because of how many notifications were generated by the runners, we had too many notifications to reliably test notification delivery.
This PR extends the scaletest notification runner with SMTP support.
If the `--smtp-api-url` flag is provided, the runner will also watch for SMTP notifications using the specified URL.
#### Changes
- Added a new watcher to retrieve emails sent to the runner user
- Tracked WebSocket and SMTP latencies separately
- Updated metrics to include `notification_id` and `notification_type` labels
#### CLI Flags
- `--smtp-api-url`: Address of the SMTP mock HTTP API used to retrieve email notifications
#### Metrics
- `notification_delivery_latency_seconds` now includes:
- `notification_id`
- `notification_type` (`websocket` or `smtp`)
Relates to https://github.com/coder/internal/issues/910
This PR adds a scaletest runner that simulates users receiving notifications through WebSocket connections.
An instance of this notification runner does the following:
1. Creates a user (optionally with specific roles like owner).
2. Connects to /api/v2/notifications/inbox/watch via WebSocket to receive notifications in real-time.
3. Waits for all other concurrently executing runners (per the DialBarrier WaitGroup) to also connect their websockets.
4. For receiving users: Watches the WebSocket for expected notifications and records delivery latency for each notification type.
5. For regular users: Maintains WebSocket connections to simulate concurrent load while receiving users wait for notifications.
6. Waits on the ReceivingWatchBarrier to coordinate between receiving and regular users.
7. Cleans up the created user after the test completes.
Exposes three prometheus metrics:
1. notification_delivery_latency_seconds - HistogramVec. Labels = {username, notification_type}
2. notification_delivery_errors_total - CounterVec. Labels = {username, action}
3. notification_delivery_missed_total - CounterVec. Labels = {username}
The runner measures end-to-end notification latency from when a notification-triggering event occurs (e.g., user creation/deletion) to when the notification is received by a WebSocket client.