Commit Graph

179 Commits

Author SHA1 Message Date
Zach 81e2be69e9 test: use typed atomics in test files (#25071)
Use typed atomics (atomic.Int64, atomic.Int32, etc.) in test files to prevent
mixing atomic and non-atomic access on the same value, guarantee 64-bit
alignment on 32-bit platforms, and provide a cleaner API.
2026-05-11 08:41:17 -06:00
Sas Swart 1ba7139f21 feat: add session correlation fields to BoundaryLog proto (#24809)
1 of 9 [next >>](https://github.com/coder/coder/pull/24811)

RFC: [Bridge ↔ Boundaries Correlation
RFC](https://www.notion.so/Bridge-Boundaries-Correlation-313d579be59281f3b4efdbfd6896775a)

Adds three new proto fields for boundary session correlation.

**`ReportBoundaryLogsRequest`**
- `session_id` (string, field 2) — UUID generated by boundary at
startup,
  shared across all batches from a single run.
- `confined_process` (string, field 3) — name of the confined process
  (e.g. `claude-code`, `codex`, `copilot`).

**`BoundaryLog`**
- `sequence_number` (uint64, field 4) — monotonically increasing counter
  per session, primary ordering key when boundary is in use.

`BoundaryLog.time` already existed at field 2; no change needed there.

API version bumped to v2.9.

No behaviour change in coderd or the agent. This is a pure schema bump
that the boundary repo will consume in its own stack.

> Generated by Coder Agents
2026-05-05 10:36:26 +02:00
Zach 1c7064c066 fix: use atomic.Int64 for workspace traffic metrics (#24844)
connMetrics.total was written to via atomic.AddInt64 and read as a plain
int64, producing a data race. Fix by switching the field to atomic.Int64
and using its typed Add/Load methods.

The testMetrics mock had a similar issue where mutex use was missing
where GetTotalBytes read the total bytes, which is also fixed in this
change.
2026-05-01 09:16:42 -06:00
Callum Styan 950660e392 fix(cli): extend exp scaletest cleanup to properly clean up prebuilds (#23628)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 10:28:27 -07:00
Callum Styan 7d044fa598 fix(scaletest/prebuilds): fix Runner.Cleanup() to delete workspaces (#23627) 2026-04-22 13:11:42 -07:00
Callum Styan b714fe8e71 fix(scaletest/prebuilds): make measureDeletion more reliable/less brittle (#23614) 2026-04-22 13:10:34 -07:00
Callum Styan 730edba87a fix: fix false positive disconnected agent metric reporting (#24225)
We noticed during higher active workspace counts that the agent
connection metric, generated via a query to the database, would report a
relatively high amount of agents as disconnected. Somewhere between 5
and 20%. However, other metrics such as # of websocket connections would
suggest that all agent connections are healthy.

Looking at the `Agents` function in prometheus metrics, plus the query
execution time (not accounting for actual database RT time) revealed
that this reporting of agents as disconnected was almost certainly false
positives due to clock drift in the way we're generating the metric
values. At 10k metrics, with a p50 of 2ms and p99 of 5ms, the entire
`agents` function could take upwards of 50s to execute. Because we were
doing a query/database RT to query th apps for each agent individually,
and grabbing a `time.Now` value on each iteration of that loop, it's
likely the portion of agents that were reported as disconnected were
those that had last heartbeat the furthest in the past.

The fix here is to set a consistent `now` before fetching agent data to
avoid clock drift inflating the inactive timeout comparison, and replace
the per-agent app query N+1 with a single batched lookup to prevent loop
execution time from pushing agents over the disconnected threshold.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 22:23:06 -07:00
Jon Ayers 7e63fe68f7 fix: avoid instantiating a logger if provided /dev/null (#24027)
- Adds some additional context to workspace traffic logging
- Fails traffic tests if 0 bytes read from connection
2026-04-03 16:26:14 -05:00
Ethan f7aa46c4ba fix(scaletest/llmmock): emit Anthropic SSE event lines (#23587)
The llmmock Anthropic stream wrote each chunk as `data:` only, so
Anthropic clients never saw the named SSE events they dispatch on and
Claude responses arrived empty even though the stream completed with
HTTP 200.

Update `sendAnthropicStream()` to emit `event: <type>` and `data:
<json>` for each Anthropic chunk while leaving the OpenAI-style streams
unchanged.
2026-03-30 12:21:53 +11:00
Mathias Fredriksson 147df5c971 refactor: replace sort.Strings with slices.Sort (#23457)
The slices package provides type-safe generic replacements for the
old typed sort convenience functions. The codebase already uses
slices.Sort in 43 call sites; this finishes the migration for the
remaining 29.

- sort.Strings(x)          -> slices.Sort(x)
- sort.Float64s(x)         -> slices.Sort(x)
- sort.StringsAreSorted(x) -> slices.IsSorted(x)
2026-03-23 23:19:23 +02:00
Mathias Fredriksson c49170b6b3 fix(scaletest): handle ignored io.ReadAll error in bridge runner (#22850)
Surface the io.ReadAll error in the error message when an HTTP
request fails with a non-200 status, instead of silently
discarding it.
2026-03-23 15:58:14 +02:00
Jon Ayers f135ffdb3a fix: limit calls to GetWorkspaceAgentByID in agentapi (#23015)
We currently call GetWorkspaceAgentByID millions of times at scale
unnecessarily. This PR embeds immutable fields into the relevant
services instead of fetching for them every time.

resolves https://github.com/coder/scaletest/issues/84

Confirmed with a 10k scaletest that this changeset takes the query from
10M+ queries down to 39k
2026-03-20 15:42:05 -05:00
Spike Curtis 4fdd48b3f5 chore: randomize task status update times in load generator (#23058)
fixes https://github.com/coder/scaletest/issues/92

Randomizes the time between task status updates so that we don't send them all at the same time for load testing.
2026-03-16 12:06:29 -04:00
Callum Styan 36665e17b2 feat: add WatchAllWorkspaceBuilds endpoint for autostart scaletests (#22057)
This PR adds a `WatchAllWorkspaces` function with `watch-all-workspaces`
endpoint, which can be used to listen on a single global pubsub channel
for _all_ workspace build updates, and makes use of it in the autostart
scaletest.

This negates the need to use a workspace watch pubsub channel _per_
workspace, which has auth overhead associated with each call. This is
especially relevant in situations such as the autostart scaletest, where
we need to start/stop a set of workspaces before we can configure their
autostart config. The overhead associated with all the watch requests
skews the scaletest results and makes it harder to reason about the
performance of the autostart feature itself.

The autostart scaletest also no longer generates its own metrics nor
does it wait for all the workspaces to actually start via autostart. We
should update the scaletest dashboard after both PRs are merged to
measure autostart performance via the new metrics.



The new function/endpoint and its usage in the autostart scaletest are
gated behind an experiment feature flag, this is something we should
discuss whether we want to enable the endpoint in prod by default or
not. If so, we can remove the experiment.

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Callum Styan <callum@coder.com>
2026-03-13 20:37:41 -07:00
Mathias Fredriksson 57af7abf1f test: add testutil.WaitBuffer and replace time.Sleep in tests (#22922)
WaitBuffer is a thread-safe io.Writer that supports blocking until
accumulated output matches a substring or custom predicate. It
replaces ad-hoc safeBuffer/syncWriter types and time.Sleep-based
poll loops in tests with signal-driven waits.

- WaitFor/WaitForNth/WaitForCond for blocking on output
- Replace custom buffer types in cli/sync_test.go and
  provisionersdk/agent_test.go
- Convert time.Sleep poll loops to require.Eventually/require.Never
  in cli/ssh_test.go, coderd/activitybump_test.go,
  coderd/workspaceagentsrpc_test.go, workspaceproxy_test.go, and
  scaletest tests
2026-03-12 18:07:52 +02:00
Callum Styan c2534c19f6 feat: add codersdk constructor that uses an independent transport (#22282)
This is useful at least in the case of scaletests but potentially in
other places as well. I noticed that scaletest workspace creation
hammers a single coderd replica.
---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2026-03-10 10:33:49 -07:00
Spike Curtis fda181bb26 chore: modify task status scaletest to use Agent API dRPC (#22356)
relates to #21335

Modifies our taskstatus scaletest load generator to use the dRPC connection to mimic what an actual running Task would do via the MCP server (c.f. PRs below this one in the stack).

Disclosure: I used AI to generate large portions of this PR, but hand-reviewed and tweaked.
2026-03-04 22:12:35 +04:00
Jon Ayers 4e365e59b6 fix: add provision/tags to prebuilds scenario (#22294) 2026-02-25 11:16:20 -06:00
Danielle Maywood 92a6d6c2c0 chore: remove unnecessary loop variable captures (#22180)
Since Go 1.22, the loop variable capture issue is resolved. Variables
declared by for loops are now per-iteration rather than per-loop, making
the 'v := v' pattern unnecessary.
2026-02-19 09:02:19 +00:00
Sas Swart b9d237b42c perf: improve memory use and cpu usage for OpenAI requests handled by bridge (#21838)
Apply optimizations:
* https://github.com/openai/openai-go/pull/602
* https://github.com/coder/aibridge/pull/160

These reduce CPU time and allocation count for OpenAI `chat/completions`
and `responses` APIs, making the use of OpenAI chat models through AI
Bridge more performant.

In order to test these changes, we add scaletesting support for the
responses API.
2026-02-02 16:16:16 +02:00
Sas Swart fdd928e01c fix: set bridge scaletest generator timeout per request (#21626)
<!--

If you have used AI to produce some or all of this PR, please ensure you
have read our [AI Contribution
guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING)
before submitting.

-->
2026-01-22 10:56:40 +02:00
Sas Swart 0ebe8e57ad chore: add scaletesting tools for aibridge (#21279)
This pull request adds scaletesting tools for aibridge.

See
https://www.notion.so/Scale-tests-2c5d579be5928088b565d15dd8bdea41?source=copy_link
for information and instructions.

closes: https://github.com/coder/internal/issues/1156
closes: https://github.com/coder/internal/issues/1155
closes: https://github.com/coder/internal/issues/1158
2026-01-15 17:05:46 +02:00
Spike Curtis 61ae5b81ab fix: fix scaletest sdkclient duplication (#21475)
Fixes an issue introduce in #21288 

The default sdkclient created by the CLI root includes several additional http.RoundTripper wrappers to check versions and attach telemetry, so `DupClientCopyingHeaders` would break and scale tests would fail.

Instead of explicitly adding support for these additional wrappers to `DupClientCopyingHeaders` I think we should just stop unwrapping and move on. Scale tests don't need these wrapped functions.

This is a bit fragile, since it depends on the fact that the headers wrapper needs to be outermost, but that needs to be true for other uses, since things like dialing DERP do a similar thing where they unwrap and extract the auth headers. More long term this needs a refactor to make HTTP headers in the SDK a more first-class resource instead of this hacky RoundTripper wrapping, but that's for a different day.
2026-01-13 11:14:06 +04:00
Spike Curtis bddb808b25 chore: arrange imports in a standard way (#21452)
Fixes all our Go file imports to match the preferred spec that we've _mostly_ been using. For example:

```
import (
	"context"
	"time"

	"github.com/prometheus/client_golang/prometheus"
	"golang.org/x/xerrors"
	"gopkg.in/natefinch/lumberjack.v2"

	"cdr.dev/slog/v3"
	"github.com/coder/coder/v2/codersdk/agentsdk"
	"github.com/coder/serpent"
)
```

3 groups: standard library, 3rd partly libs, Coder libs.

This PR makes the change across the codebase. The PR in the stack above modifies our formatting to maintain this state of affairs, and is a separate PR so it's possible to review that one in detail.
2026-01-08 15:24:11 +04:00
Spike Curtis 49b34a716a fix: fix slog to always use array of Fields (#21426)
Upgrades to slog v3 which includes a small, but backward incompatible API change to the acceptible call arguments when logging. This change allows us to verify via compile time type checking that arguments are correct and won't cause a panic, as was possible in slog v1, which this replaces (v2 was tagged but never used in coder/coder).

It also updates dependencies that also use slog and were updated.

I've left the `aibridge` dependency as a commit SHA, under the assumption that the team there (cc @pawbana @dannykopping ) will tag and update the dependency soon and on their own schedule.

Other dependencies, I pushed new tags.
2026-01-08 10:29:41 +04:00
Spike Curtis 73253df6bf fix: use separate HTTP clients in scale test load generators (#21288)
While scale testing, I noticed that our load generators send basically
all requests to a single Coderd instance.

e.g. 


![image.png](https://app.graphite.com/user-attachments/assets/e259862a-adf1-47e7-a37b-fd14e420058e.png)


This is because our scale test commands create all `Runner`s using the
same codersdk Client, which means they share an underlying HTTP client.
With HTTP/2 a single TCP session can multiplex many different HTTP
requests (including websockets). So, it creates a single TCP connection
to a single coderd, and then sends all the requests down the one TCP
connections.

This PR modifies the `exp scaletest` load generator commands to create
an independent HTTP client per `Runner`. This means that each runner
will create its own TCP connection. This should help spread the load and
make a more realistic test, because in a real deployment, scaled out
load will be coming over different TCP connections.
2025-12-19 12:22:49 +04:00
Steven Masley 3194bcfc9e chore: distinct operations for provisioner's 'parse', 'init', 'plan', 'apply', 'graph' (#21064)
Provisioner steps broken into smaller granular actions.
Changes:
- `ExtractArchive` moved to `init` request (was in `configure`)
- Writing `tfstate` moved to `plan` (was in `configure`)
- Moved most plan/apply outputs to `GraphComplete`
2025-12-15 11:26:41 -06:00
Mathias Fredriksson 498c565fc7 test(scaletest/workspacetraffic): fix test flake due to io.EOF on close (#21231)
Fixes coder/internal#119
2025-12-12 11:36:16 +00:00
Kacper Sawicki 6d41bfad81 fix: improve http connection pooling for smtp notifications (#20605)
This change updates how SMTP notifications are polled during scale
tests.

Before, each of the ~2,000 pollers created its own http.Client, which
opened thousands of short-lived TCP connections.
Under heavy load, this ran out of available network ports and caused
errors like `connect: cannot assign requested address`

Now, all pollers share one HTTP connection pool. This prevents port
exhaustion and makes polling faster and more stable.
If a network error happens, the poller will now retry instead of
stopping, so tests keep running until all notifications are received.

The `SMTPRequestTimeout` is now applied per request using a context,
instead of being set on the `http.Client`.
2025-11-24 14:25:18 +01:00
Spike Curtis 8ee6e9457e fix: wait for build in task status load generator (#20800)
Wait for the External workspace build job to complete before attempting to pull its credentials from Coder. This resolves a race in the load generator.
2025-11-19 10:35:31 +04:00
Spike Curtis 0bbb7dd0a3 feat: add cleanup to task-status load test runner (#20799)
Implement Cleanup in the task status Runner, to delete the external workspaces created.
2025-11-19 10:24:30 +04:00
Ethan 897286f335 test: fix flake in scaletest/workspaceupdates/TestRun (#20773)
Closes https://github.com/coder/internal/issues/1127

In the workspace updates scaletest load generator, we end the test once all clients have seen a workspace update for their workspace. These workspace updates are generated when the workspace is created, NOT when the workspace build has finished. 
This means when the runner goes to clean up the workspaces, they may still be building. The runner attempts to address this by cancelling the build, but that fails with a 403 since the runner users don't have permission.

The fix is to simply always wait for the build to finish, regardless of whether we were able to successfully cancel it.
In this test it's probably faster to just wait for the build to finish then the overhead of cancelling it, so that's what I've gone with here.
2025-11-17 12:00:07 +11:00
Ethan b31d09865e feat(scaletest): add runner for prebuilds (#20571)
Relates to https://github.com/coder/internal/issues/914

Adds a runner for scaletesting prebuilds. The runner uploads a no-op template with prebuilds, watches for the corresponding workspaces to be created, and then does the same to tear them down. I didn't originally plan on having metrics for the teardown, but I figured we might as well as it's still the same prebuilds reconciliation loop mechanism being tested.
2025-11-14 17:57:22 +11:00
Spike Curtis a3c851c0e6 feat: add task status reporting load generator runner (#20538)
Adds the Runner, Config, and Metrics for the scaletest load generator for task status.

Part of https://github.com/coder/internal/issues/913
2025-11-13 16:53:02 +04:00
Kacper Sawicki 8f78baddb1 feat(scaletest): switch notification trigger from creating a user to template deletion (#20512)
This PR refactors the notification scale test to use template admins and template deletion as the notification trigger. Additionally, I've added a configurable timeout for SMTP requests.

Previously, notifications were triggered by creating/deleting a user, and notifications were received by users with the owner role. However, because of how many notifications were generated by the runners, we had too many notifications to reliably test notification delivery.
2025-10-31 09:43:06 +01:00
Ethan b90c74a94d chore(scaletest): avoid polling workspace builds during workspace-updates tests (#20534)
This PR is just committing the changes I made while running the
`workspace-updates` load generator.

It ensures we're not polling the workspace build progress in the
background (while we also watch for workspace updates via the tailnet),
and also removes an unnecessary query to `/api/v2/workspace/{id}` after
each workspace is built.
2025-10-30 12:14:25 +11:00
Spike Curtis 0f342ecc04 feat: add provisioner tags to dynamic-parameters scaletest (#20435)
Since `coder exp scaletest dynamic-parameters` ends up creating template versions, provisioner tags may be required to create the template versions.

On our own scaletest clusters, we tag every provisioner, so an untagged template version won't build and won't get imported.
2025-10-23 16:49:09 +04:00
Kacper Sawicki 1230cacf78 feat(scaletest): extend notifications runner with smtp support (#20222)
This PR extends the scaletest notification runner with SMTP support.

If the `--smtp-api-url` flag is provided, the runner will also watch for SMTP notifications using the specified URL.

#### Changes
- Added a new watcher to retrieve emails sent to the runner user  
- Tracked WebSocket and SMTP latencies separately  
- Updated metrics to include `notification_id` and `notification_type` labels  

#### CLI Flags
- `--smtp-api-url`: Address of the SMTP mock HTTP API used to retrieve email notifications  

#### Metrics
- `notification_delivery_latency_seconds` now includes:
  - `notification_id`
  - `notification_type` (`websocket` or `smtp`)
2025-10-22 12:09:35 +02:00
Kacper Sawicki 7bbeef4999 feat(cli): add mock SMTP server for testing scaletest notifications (#20221)
This PR adds a fake SMTP server for scale testing. It collects emails sent during tests, which you can then check using the HTTP API.

#### Changes
- Added mock SMTP server  
- Added `coder scaletest smtp` CLI command  
- Implemented HTTP API endpoints to retrieve messages by email  
- Added auto-purge to prevent memory issues  

#### HTTP API Endpoints
- `GET /messages?email=<email>` – Get messages sent to an email address  
- `POST /purge` – Clear all messages from memory  

The HTTP API parses raw email messages to extract the **date**, **subject**, and **notification ID**.

Notification IDs are sent in emails like this:
```html
<p>
  <a href="http://127.0.0.1:3000/settings/notifications?disabled=4e19c0ac-94e1-4532-9515-d1801aa283b2"
     style="color: #2563eb; text-decoration: none;">
    Stop receiving emails like this
  </a>
</p>
```

#### CLI
```bash
coder scaletest smtp --host localhost --port 33199 --api-port 8080 --purge-at-count 1000
```

**Flags:**
- `--host`: Host for the mock SMTP and API server (default: localhost)  
- `--port`: Port for the mock SMTP server (random if not specified)  
- `--api-port`: Port for the HTTP API server (random if not specified)  
- `--purge-at-count`: Max number of messages before auto-purging (default: 100000)
2025-10-22 11:14:49 +02:00
Spike Curtis f1d3f31c10 test: increase timeout in TestRun (scaletest/workspaceupdates) (#20211)
fixes https://github.com/coder/internal/issues/1050

Extends the timeout on the affected test. It uses a postgres database, and so 15s timeout isn't enough to not flake with our test infra these days.
2025-10-08 16:17:41 +04:00
Spike Curtis 65335bc7d4 feat: add cli command scaletest dynamic-parameters (#20034)
part of https://github.com/coder/internal/issues/912

Adds CLI command `coder exp scaletest dynamic-parameters`

I've left out the configuration of tracing and timeouts for now. I think I want to do some refactoring of the scaletest CLI to make handling those flags take up less boiler plate.

I will add tracing and timeout flags in a follow up PR.
2025-10-07 21:53:59 +04:00
Kacper Sawicki 05f8f67ced feat(scaletest): add runner for notifications delivery (#20091)
Relates to https://github.com/coder/internal/issues/910

This PR adds a scaletest runner that simulates users receiving notifications through WebSocket connections.

An instance of this notification runner does the following:

1. Creates a user (optionally with specific roles like owner).
2. Connects to /api/v2/notifications/inbox/watch via WebSocket to receive notifications in real-time.
3. Waits for all other concurrently executing runners (per the DialBarrier WaitGroup) to also connect their websockets.
4. For receiving users: Watches the WebSocket for expected notifications and records delivery latency for each notification type.
5. For regular users: Maintains WebSocket connections to simulate concurrent load while receiving users wait for notifications.
6. Waits on the ReceivingWatchBarrier to coordinate between receiving and regular users.
7. Cleans up the created user after the test completes.


Exposes three prometheus metrics:

1. notification_delivery_latency_seconds - HistogramVec. Labels = {username, notification_type}
2. notification_delivery_errors_total - CounterVec. Labels = {username, action}
3. notification_delivery_missed_total - CounterVec. Labels = {username}

The runner measures end-to-end notification latency from when a notification-triggering event occurs (e.g., user creation/deletion) to when the notification is received by a WebSocket client.
2025-10-07 09:59:15 +02:00
Ethan 2b4485575c feat(scaletest): add autostart scaletest command (#20161)
Closes https://github.com/coder/internal/issues/911
2025-10-03 20:50:21 +10:00
Ethan f84a789b40 feat(scaletest): add runner for thundering herd autostart (#19998)
Relates to https://github.com/coder/internal/issues/911

This PR adds a scaletest runner that simulates a user with a workspace configured to autostart at a specific time.

An instance of this autostart runner does the following:
1. Creates a user.
2. Creates a workspace.
3. Listens on `/api/v2/workspaces/<ws-id>/watch` to wait for build completions throughout the run.
4. Stops the workspace, waits for the build to finish.
4. Waits for all other concurrently executing runners (per the `SetupBarrier` `WaitGroup`) to also have a stopped workspace.
5. Sets the workspace autostart time to `time.Now().Add(cfg.AutoStartDelay)`
6. Waits for the autostart workspace build to begin, enter the `pending` status, and then `running`.
7. Waits for the autostart workspace build to end.

Exposes four prometheus metrics:
- `autostart_job_creation_latency_seconds` - HistogramVec. Labels = {username, workspace_name}
- `autostart_job_acquired_latency_seconds` - HistogramVec. Labels = {username, workspace_name}
- `autostart_total_latency_seconds` - `HistogramVec`. Labels = `{username, workspace_name}`.
- `autostart_errors_total` - `CounterVec`. Labels = `{username, action}`.
2025-10-02 10:45:52 +10:00
Ethan ec417ded24 refactor(scaletest): make workspacebuild return the built workspace (#19997)
Similar idea as in #19811, this runner doesn't need to conform to `Runnable`, so we have it return the workspace from the `RunReturningWorkspace` function, instead of the more fragile `Run`, followed by a `.WorkspaceID()`.
2025-09-30 18:19:31 +10:00
Ethan 65ac6cb42a feat(scaletest): add runner for coder connect load gen (#19904)
Relates to https://github.com/coder/internal/issues/889.

This PR adds a scaletest runner that simulates a single Coder Connect client receiving workspace updates.

An instance of a workspace updates runner does the following:
- Creates a user, if a session token is not supplied.
- Attempts to repeatedly dial the Coder Connect endpoint, with a configurable (two minutes by default) timeout. 
- Once dialed successfully, waits for any other concurrently executing runners to also dial successfully, or timeout (using the barrier).
- Starts a configurable number of workspace builds.
- Waits for that many workspaces to be seen over the workspace updates stream (with a configurable timeout).

Exposes two prometheus metrics:
- `workspace_updates_latency_seconds` - `HistogramVec`. Labels = `{username, num_owned_workspaces, workspace_name}`
  - This is the time between starting a workspace build, and receiving both the corresponding workspace update.
- `workspace_updates_errors_total` - `NewCounterVec`. Labels = `{username, num_owned_workspaces, action}`
  - The number of times a specific action of the runner has failed, per user/client.
2025-09-30 17:28:58 +10:00
Spike Curtis 289f0217c7 feat: add scaletest Runner for dynamicparameters load gen (#19890)
relates to https://github.com/coder/internal/issues/912

Adds a new scaletest Runner to generate dynamic parameters load.

A later PR will add the CLI command, including creating the template & version.
2025-09-25 16:18:37 +04:00
Cian Johnston a68122ca30 chore: fix Test_Runner/CleanupPendingBuild with testutil.Eventually (#19924)
Fixes https://github.com/coder/internal/issues/968
2025-09-23 16:53:53 +01:00
Ethan e13fcaf865 refactor(scaletest): support exposing arbitrary metrics on scaletest runs (#19886)
Relates to https://github.com/coder/internal/issues/889

The existing implementation for exposing read and written bytes was a little awkward - we're going to be adding a bunch of scaletest runners / load generators that *don't* transfer any bytes. This PR has the scaletest reports expose a map of arbitrary string-keyed metrics instead.

FWIW, the latest iteration of the scaletesting infrastructure doesn't parse these reports right now - they're just logged to stdout, so we're good to break the json schema here.
2025-09-22 17:45:45 +10:00
Ethan d02ff5f9b2 refactor(scaletest): add runner for creating users (#19811)
Closes https://github.com/coder/internal/issues/985

Simple refactor of the user creation logic into it's own test runner. This lets us create users independently of workspaces, for use in a bunch of load generators, including the Coder Connect load generator.

This PR creates the new runner, and has the existing `createworkspaces` runner use it.
2025-09-22 17:35:36 +10:00