DRAFT: I'd like feedback on this approach for 1k before I give the others the same treatment and add a 10k document.
- Bumps database requirements to 8 vCPU, 30 GB memory. In our testing database was nearly always the bottleneck. (This could come back down again with improvements to how we use it.)
- Removes specific machine type recommendations.
- This only applies to VM-based deployments and many of our customers use Kubernetes.
- The major clouds upgrade their machine teirs, so our recommendations go out of date
- In its place we just give CPU and memory requirements
- Removes API requests per second
- It's not a metric that many operators will know until they are already operating
- Our API requests vary wildly in cost depending on what they are
- Replaces them with Users | Running Workspaces | Concurrent Builds - which represents our scale testing scenarios, and are easier for operators to reason about.
- Removes specific advice about workspace sizing, instead gives the minimum specs for the agent
- Gives Kubernetes resource request/limits in notes
- Adds advice about not needing high performance disks for Coderd, but that provisioners will benefit.
Removes references to adding database replicas from the scaling docs, as Coder only allows a single connection URL. These passages where added in error.
## Add Dynamic Parameters test procedure to 10k users validated architecture
This PR adds a new test procedure for Dynamic Parameters to the 10k users validated architecture documentation. No changes to the recommended hardware specs as this test case succeeded with no issues.
Adds a new document for our ongoing efforts achieving 10k user scale. The content is caveated as work in progress, but represents what we have tested so far.
closes: https://github.com/coder/internal/issues/1025
We've successfully migrated the latest iteration of our scaletest
infrastructure (`scaletest/terraform/action`) to
https://github.com/coder/scaletest (private repo). This PR removes the
older iterations, and the scriptsfor spinning up & running the load
generators against that infrastructure (`scaletest.sh`). The tooling for
generating load against a Coder deployment remains untouched, as does
the public documentation for that tooling (i.e. `coder exp scaletest`).
If we ever need that old scaletest Terraform code, it's always in the
git history!
Enhances the Performance efficiency section in the validated
architectures documentation with specific instance type recommendations
for AWS, Azure, and GCP.
**Changes:**
- Added recommended instance types for small, medium, and large
deployments across all three major cloud providers
- Included guidance on avoiding burstable instances (t-family, B-series)
for production workloads
- Added note about CPU baseline limitations for burstable instances
This addresses customer questions about appropriate database instance
sizing.
---------
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>
Co-authored-by: dannykopping <373762+dannykopping@users.noreply.github.com>
Co-authored-by: Danny Kopping <dannykopping@gmail.com>