mirror of
https://github.com/coder/coder.git
synced 2026-06-04 13:38:21 +00:00
e5ba8b7912
from @jatcod3r on Slack: > for the AWS recs on our [validated arch](https://coder.com/docs/admin/infrastructure/validated-architectures/1k-users) docs, should we be referencing customers to use non-T type instances? > Once you've exceeded EC2's [CPU credits](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-performance-instances.html) Coder starts performing poorly. > We do suggest to [scale for peak demand](https://coder.com/docs/tutorials/best-practices/scale-coder#scaling-3), so does recommending something from the [cpu](https://aws.amazon.com/ec2/instance-types/#Compute_Optimized) or [memory optimized](https://aws.amazon.com/ec2/instance-types/#Memory_Optimized) types make sense? [preview](https://coder.com/docs/@aws-ec2-arch/admin/infrastructure/validated-architectures#aws-instance-types) --------- Co-authored-by: EdwardAngert <17991901+EdwardAngert@users.noreply.github.com>
70 lines
3.3 KiB
Markdown
70 lines
3.3 KiB
Markdown
# Reference Architecture: up to 3,000 users
|
|
|
|
The 3,000 users architecture targets large-scale enterprises, possibly with
|
|
on-premises network and cloud deployments.
|
|
|
|
**Target load**: API: up to 550 RPS
|
|
|
|
**High Availability**: Typically, such scale requires a fully-managed HA
|
|
PostgreSQL service, and all Coder observability features enabled for operational
|
|
purposes.
|
|
|
|
**Observability**: Deploy monitoring solutions to gather Prometheus metrics and
|
|
visualize them with Grafana to gain detailed insights into infrastructure and
|
|
application behavior. This allows operators to respond quickly to incidents and
|
|
continuously improve the reliability and performance of the platform.
|
|
|
|
## Hardware recommendations
|
|
|
|
### Coderd nodes
|
|
|
|
| Users | Node capacity | Replicas | GCP | AWS | Azure |
|
|
|-------------|----------------------|-----------------------|-----------------|-------------|-------------------|
|
|
| Up to 3,000 | 8 vCPU, 32 GB memory | 4 node, 1 coderd each | `n1-standard-4` | `m5.xlarge` | `Standard_D4s_v3` |
|
|
|
|
### Provisioner nodes
|
|
|
|
| Users | Node capacity | Replicas | GCP | AWS | Azure |
|
|
|-------------|----------------------|-------------------------------|------------------|--------------|-------------------|
|
|
| Up to 3,000 | 8 vCPU, 32 GB memory | 8 nodes, 30 provisioners each | `t2d-standard-8` | `c5.2xlarge` | `Standard_D8s_v3` |
|
|
|
|
**Footnotes**:
|
|
|
|
- An external provisioner is deployed as Kubernetes pod.
|
|
- It is strongly discouraged to run provisioner daemons on `coderd` nodes at
|
|
this level of scale.
|
|
- Separate provisioners into different namespaces in favor of zero-trust or
|
|
multi-cloud deployments.
|
|
|
|
### Workspace nodes
|
|
|
|
| Users | Node capacity | Replicas | GCP | AWS | Azure |
|
|
|-------------|----------------------|-------------------------------|------------------|--------------|-------------------|
|
|
| Up to 3,000 | 8 vCPU, 32 GB memory | 256 nodes, 12 workspaces each | `t2d-standard-8` | `m5.2xlarge` | `Standard_D8s_v3` |
|
|
|
|
**Footnotes**:
|
|
|
|
- Assumed that a workspace user needs 2 GB memory to perform
|
|
- Maximum number of Kubernetes workspace pods per node: 256
|
|
- As workspace nodes can be distributed between regions, on-premises networks
|
|
and cloud areas, consider different namespaces in favor of zero-trust or
|
|
multi-cloud deployments.
|
|
|
|
### Database nodes
|
|
|
|
| Users | Node capacity | Replicas | Storage | GCP | AWS | Azure |
|
|
|-------------|----------------------|----------|---------|---------------------|-----------------|-------------------|
|
|
| Up to 3,000 | 8 vCPU, 32 GB memory | 2 nodes | 1.5 TB | `db-custom-8-30720` | `db.m5.2xlarge` | `Standard_D8s_v3` |
|
|
|
|
**Footnotes**:
|
|
|
|
- Consider adding more replicas if the workspace activity is higher than 1500
|
|
workspace builds per day or to achieve higher RPS.
|
|
|
|
**Footnotes for AWS instance types**:
|
|
|
|
- For production deployments, we recommend using non-burstable instance types,
|
|
such as `m5` or `c5`, instead of burstable instances, such as `t3`.
|
|
Burstable instances can experience significant performance degradation once
|
|
CPU credits are exhausted, leading to poor user experience under sustained load.
|