docs: add new security doc to best practices section (#15805)

[preview](https://coder.com/docs/@bp-security/tutorials/best-practices/security-best-practices) --------- Co-authored-by: EdwardAngert <2408959-EdwardAngert@users.noreply.gitlab.com> Co-authored-by: EdwardAngert <17991901+EdwardAngert@users.noreply.github.com> Co-authored-by: Spike Curtis <spike@coder.com>
2026-06-02 20:48:20 +00:00 · 2024-12-16 15:47:38 -05:00
parent 89479cc843
commit 74f07ec9ae
6 changed files with 536 additions and 11 deletions
@@ -0,0 +1,502 @@
+# Security - best practices
+
+December 16, 2024
+
+---
+
+This best practices guide is separated into parts to help you secure aspects of
+your Coder deployment.
+
+Each section briefly introduces each threat model, then suggests steps or
+concepts to help implement security improvements such as authentication and
+encryption.
+
+As with any security guide, the steps and suggestions outlined in this document
+are not meant to be exhaustive and do not offer any guarantee.
+
+## Coder Server
+
+Coder Server is the main control core of a Coder deployment.
+
+If the Coder Server is compromised in a security incident, it can affect every
+other part of your deployment. Even a successful read-only attack against the
+Coder Server could result in a complete compromise of the Coder deployment if
+credentials are stolen.
+
+### User authentication
+
+Configure [OIDC authentication](../../admin/users/oidc-auth.md) against your
+organization’s Identity Provider (IdP), such as Okta, to allow single-sign on.
+
+1. Enable and require two-factor authentication in your identity provider.
+1. Enable [IdP Sync](../../admin/users/idp-sync.md) to manage users’ roles and
+   groups in Coder.
+1. Use SCIM to automatically suspend users when they leave the organization.
+
+This allows you to manage user credentials according to your company’s central
+requirements, such as password complexity, 2FA, PassKeys, and others.
+
+Using IdP sync and SCIM means that the central Identity Provider is the source
+of truth, so that when users change roles or leave, their permissions in Coder
+are automatically up to date.
+
+### Encryption in transit
+
+Place Coder behind a TLS-capable reverse-proxy/load balancer and enable
+[Strict Transport Security](../../reference/cli/server.md#--strict-transport-security)
+so that connections from end users are always encrypted.
+
+Enable [TLS](../../reference/cli/server.md#--tls-address) on Coder Server and
+encrypt traffic from the reverse-proxy/load balancer to Coder Server, so that
+even if an attacker gains access to your network, they will not be able to snoop
+on Coder Server traffic.
+
+### Encryption at rest
+
+Coder Server persists no state locally. No action is required.
+
+### Server logs and audit logs
+
+Capture the logging output of all Coder Server instances and persist them.
+
+Retain all logs for a minimum of thirty days, ideally ninety days. Filter audit
+logs (which have `msg: audit_log`) and retain them for a minimum of two years
+(ideally five years) in a secure system that resists tampering.
+
+If a security incident with Coder does occur, audit logs are invaluable in
+determining the nature and scope of the impact.
+
+## PostgreSQL
+
+PostgreSQL is the persistent datastore underlying the entire Coder deployment.
+If the database is compromised, it may leave every other part of your deployment
+vulnerable.
+
+Coder session tokens and API keys are salted and hashed, so a read-only
+compromise of the database is unlikely to allow an attacker to log into Coder.
+However, the database contains the Terraform state for all workspaces, OIDC
+tokens, and agent tokens, so it is possible that a read-only attack could enable
+lateral movement to other systems.
+
+A successful attack that modifies database state could be escalated to a full
+takeover of an owner account in Coder which could lead to a complete compromise
+of the Coder deployment.
+
+### Authentication
+
+1. Generate a strong, random password for accessing PostgreSQL and store it
+   securely.
+
+1. Use environment variables to pass the PostgreSQL URL to Coder.
+
+1. If on Kubernetes, use a Kubernetes secret to set the environment variable.
+
+### Encryption in transit
+
+Enable TLS on PostgreSQL and set `sslmode=verify-full` in your
+[postgres URL](../../reference/cli/server.md#--postgres-url) on Coder Server.
+This configures Coder Server to only establish TLS connections to PostgreSQL and
+check that the PostgreSQL server’s certificate is valid and matches the expected
+hostname.
+
+### Encryption at rest
+
+Run PostgreSQL on servers with full disk encryption enabled and configured.
+
+Coder supports
+[encrypting some particularly sensitive data](../../admin/security/database-encryption.md)
+including OIDC tokens using an encryption key managed independently of the
+database, so even a user with full administrative privileges on the PostgreSQL
+server(s) cannot read the data without the separate key.
+
+If you use this feature:
+
+1. Generate a random encryption key and store it in a central secrets management
+   system like Vault.
+
+1. Inject the secret using an environment variable.
+
+   - If you're using Kubernetes, use a Kubernetes secret rather than including
+     the secret directly in the podspec.
+
+1. Follow your organization's policies about key rotation on a fixed schedule.
+
+   - If you suspect the key has been leaked or compromised,
+     [rotate the key immediately](../../admin/security/database-encryption.md#rotating-keys).
+
+## Provisioner daemons
+
+Provisioner daemons are deployed with credentials that give them power to make
+requests to cluster/cloud APIs.
+
+If one of those credentials is compromised, the potential severity of the
+compromise depends on the permissions granted to the credentials, but will
+almost certainly include code execution inside the cluster/cloud since the whole
+purpose of Coder is to deploy workspaces in the cluster/cloud that can run
+developer code.
+
+In addition, provisioner daemons are given access to parameters entered by end
+users, which could include sensitive data like credentials for additional
+systems.
+
+### External provisioner daemons
+
+When Coder workspaces are deployed into multiple clusters/clouds, or workspaces
+are in a different cluster/cloud than the Coder Server, use external provisioner
+daemons.
+
+Running provisioner daemons within the same cluster/cloud as the workspaces they
+provision:
+
+- Allows you to use infrastructure-provided credentials (see **Authentication**
+  below) which are typically easier to manage and have shorter lifetimes than
+  credentials issued outside the cloud/cluster.
+- Means that you don’t have to open any ingress ports on the clusters/clouds
+  that host workspaces.
+  - The external provisioner daemons dial out to Coder Server.
+  - Provisioner daemons run in the cluster, so you don’t need to expose
+    cluster/cloud APIs externally.
+- Each cloud/cluster is isolated, so a compromise of a provisioner daemon is
+  limited to a single cluster.
+
+### Authentication
+
+1. Use a [scoped key](../../admin/provisioners.md#scoped-key-recommended) to
+   authenticate the provisioner daemons with Coder. These keys can only be used
+   to authenticate provisioner daemons (not other APIs on the Coder Server).
+
+1. Store the keys securely and use environment variables to pass them to the
+   provisioner daemon.
+
+1. If on Kubernetes, use a Kubernetes secret to set the environment variable.
+
+1. Tag provisioners with identifiers for the specific cluster/cloud.
+
+   This allows your templates to target a specific cluster/cloud such as for
+   geographic proximity to the end user, or for specific features like GPUs or
+   managed services.
+
+1. Scope your keys to organizations and the specific cluster/cloud using the
+   same tags when creating the keys.
+
+   This ensures that a compromised key will not allow an attacker to gain access
+   to jobs for other clusters or organizations.
+
+Provisioner daemons should have access only to cluster/cloud API credentials for
+the specific cluster/cloud they are for. This ensures that compromise of one
+Provisioner Daemon does not compromise all clusters/clouds.
+
+Deploy the provisioner daemon to the cloud and leverage infrastructure-provided
+credentials, if available:
+
+- [Service account tokens on Kubernetes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/)
+- [IAM roles for EC2 on AWS](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html)
+- [Attached service accounts on Google Cloud](https://cloud.google.com/iam/docs/attach-service-accounts)
+
+### Encryption in transit
+
+Enable TLS on Coder Server and ensure you use an `https://` URL to access the
+Coder Server.
+
+See the **Encryption in transit** subheading of the
+[Templates](#workspace-templates) section for more about encrypting
+cluster/cloud API calls.
+
+### Encryption at rest
+
+Run provisioner daemons only on systems with full disk encryption enabled.
+
+- Provisioner daemons temporarily persist terraform template files and resource
+  state to disk. Either of these could contain sensitive information, including
+  credentials.
+
+  This temporary state is on disk only while actively building workspaces, but
+  an attacker who compromises physical disks could successfully read this
+  information if not encrypted.
+
+- Provisioner daemons store cached copies of Terraform provider binaries. These
+  are generally not sensitive in terms of confidentiality, but it is important
+  to maintain their integrity. An attacker that can modify these binaries could
+  inject malicious code.
+
+## Workspace proxies
+
+Workspace proxies authenticate end users and then proxy network traffic to
+workspaces.
+
+Coder takes care to ensure the user credentials processed by workspace proxies
+are scoped to application access and do not grant full access to the Coder API
+on behalf of the user. Still, a fully compromised workspace proxy would be in a
+privileged position to phish unrestricted user credentials.
+
+Workspace proxies have unrestricted access to establish encrypted tunnels to
+workspaces and can access any port on any running workspace.
+
+### Authentication
+
+1. Securely store the workspace proxy token generated by
+   [`coder wsproxy create`](../../admin/networking/workspace-proxies.md#step-1-create-the-proxy).
+
+1. Inject the token to the workspace proxy process via an environment variable,
+   rather than via an argument.
+
+1. If on Kubernetes, use a Kubernetes secret to set the environment variable.
+
+### Encryption in transit
+
+Enable TLS on Coder Server and ensure you use an `https://` URL to access the
+Coder Server.
+
+Communication to the proxied workspace applications is always encrypted with
+Wireguard. No action is required.
+
+### Encryption at rest
+
+Workspace proxies persist no state locally. No action is required.
+
+## Workspace templates
+
+Coder templates are executed on provisioner daemons and can include arbitrary
+code via the
+[local-exec provisioner](https://developer.hashicorp.com/terraform/language/resources/provisioners/local-exec).
+
+Furthermore, Coder templates are designed to provision compute resources in one
+or more clusters/clouds, and template authors are generally in full control over
+code and scripts executed by the Coder agent in those compute resources.
+
+This means that template admins have remote code execution privileges for any
+provisioner daemons in their organization and within any cluster/cloud those
+provisioner daemons are credentialed to access.
+
+Template admin is a powerful, highly-trusted role that you should not assign
+lightly. Instead of directly assigning the role to anyone who might need to edit
+a template, use [GitOps](#gitops) to allow users to author and edit templates.
+
+## Secrets
+
+Never include credentials or any other secrets directly in templates, including
+in `.tfvars` or other files uploaded with the template.
+
+Instead do one of the following:
+
+- Store secrets in a central secrets manager.
+
+  - Access the secrets at build time via a Terraform provider.
+
+    This can be through
+    [Vault](https://registry.terraform.io/providers/hashicorp/vault/latest/docs)
+    or
+    [AWS Secrets Manager](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/secretsmanager_secret).
+
+- Place secrets in `TF_VAR_*` environment variables.
+
+  - Provide the secrets to the relevant Provisioner Daemons and access them via
+    Terraform variables with `sensitive = true`.
+
+- Use Coder parameters to accept secrets from end users at build time.
+
+Coder does not attempt to obscure the contents of template files from users
+authorized to view and edit templates, so secrets included directly could
+inadvertently appear on screen while template authors do their work.
+
+Template versions are persisted indefinitely in the PostgreSQL database, so if
+secrets are inadvertently included, they should be revoked as soon as practical.
+Pushing a new template version does not expunge them from the database. Contact
+support if you need assistance expunging any particularly sensitive data.
+
+### Encryption in transit
+
+Always use encrypted transport to access any infrastructure APIs. Crucially,
+this protects confidentiality of the credentials used to access the APIs.
+
+Configuration of this depends on the specific Terraform providers in use and is
+beyond the scope of this document.
+
+### Encryption at rest
+
+While your most privileged secrets should never be included in template files,
+they may inevitably contain confidential or sensitive data about your operations
+and/or infrastructure.
+
+- Ensure that operators who write, review or modify Coder templates are working
+  on laptops/workstations with full disk encryption, or do their work inside a
+  Coder workspace with full disk encryption.
+- Ensure [PostgreSQL](#postgresql) is encrypted at rest.
+- Ensure any [source code repositories that store templates](#gitops) are
+  encrypted at rest and have appropriate access controls.
+
+### GitOps
+
+GitOps is the practice of using a Git repository as the source of truth for
+operational config and reconciling the config in Git with operational systems
+each time the `main` (or, archaically, `master`) branch of the repository is
+updated.
+
+1. Store Coder templates in a single Git repository, or a single repository per
+   Coder organization, and use the
+   [Coderd Terraform provider](https://registry.terraform.io/providers/coder/coderd/latest/docs/resources/template)
+   to push changes from the main branch to Coder using a CI/CD tool.
+
+   This gives you an easily browsable, auditable history of template changes and
+   who made them. Coder audit logs establish who and when changes happen, but
+   git repositories are particularly handy for analyzing exactly what changes to
+   templates are made.
+
+1. Use a Coder user account exclusively for the purpose of pushing template
+   changes and do not give any human users the credentials.
+
+   This ensures any actions taken by the account correspond exactly to CI/CD
+   actions from the repository and allows you to avoid granting the template
+   admin role widely in your organization.
+
+1. Use
+   [GitHub branch protection](https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/managing-protected-branches/about-protected-branches),
+   or the equivalent for your source repository to enforce code review of
+   changes to templates.
+
+   Code review increases the chance that someone will catch a potential security
+   bug in your template.
+
+These protections also mitigate the risk of a single trusted insider “going
+rogue” and acting unilaterally to maliciously modify Coder templates.
+
+## Workspaces
+
+The central purpose of Coder is to give end users access to managed compute in
+clusters/clouds designated by Coder’s operators (like platform or developer
+experience teams). End users are granted shell access and from there can execute
+arbitrary commands.
+
+This means that end users have remote code execution privileges within the
+clusters/clouds that host Coder workspaces.
+
+It is important to limit Coder users to trusted insiders and/or take steps to
+constrain malicious activity that could be undertaken from a Coder workspace.
+
+Example constraints include:
+
+- Network policy or segmentation
+- Runtime protections on the workspace host (e.g. SELinux)
+- Limiting privileges of the account or role assigned to the workspace such as a
+  service account on Kubernetes, or IAM role on public clouds
+- Monitoring and/or auditing for suspicious activity such as cryptomining or
+  exfiltration
+
+### Outbound network access
+
+Identify network assets like production systems or highly confidential
+datastores and configure the network to limit access from Coder workspaces.
+
+If production systems or confidential data reside in the same cluster/cloud, use
+separate node pools and network boundaries.
+
+If extraordinary access is required, follow
+[Zero Trust](https://en.wikipedia.org/wiki/Zero_trust_security_model)
+principles:
+
+- Authenticate the user and the workspace using strong cryptography
+- Apply strict authorization controls
+- Audit access in a tamper resistant secure store
+
+Consider the network assets end users will need to do their job and the level of
+trust the company has with them. In-house full-time employees have different
+access than temporary contractors or third-party service providers. Restrict
+access as appropriate.
+
+A non-exclusive list of network assets to consider:
+
+- Access to the public Internet
+  - If end users will access the workspace over the public Internet, you must
+    allow outbound access to establish the encrypted tunnels.
+- Access to internal corporate networks
+  - If end users will access the workspace over the corporate network, you must
+    allow outbound access to establish the encrypted tunnels.
+- Access to staging or production systems
+- Access to confidential data (e.g. payment processing data, health records,
+  personally identifiable information)
+- Access to other clusters/clouds
+
+### Inbound network access
+
+Coder manages inbound network access to your workspaces via a set of Wireguard
+encrypted tunnels. These tunnels are established by sending outbound packets, so
+on stateful firewalls, disable inbound connections to workspaces to ensure
+inbound connections are handled exclusively by the encrypted tunnels.
+
+#### DERP
+
+[DERP](https://tailscale.com/kb/1232/derp-servers) is a relay protocol developed
+by Tailscale.
+
+Coder Server and Workspace Proxies include a DERP service by default. Tailcale
+also runs a set of public DERP servers, globally distributed.
+
+All DERP messages are end-to-end encrypted, so the DERP service only learns the
+(public) IP addresses of the participants.
+
+If you consider these addresses or the fact that pairs of them communicate over
+DERP to be sensitive, stick to the Coder-provided DERP services which run on
+your own infrastructure. If not, feel free to configure Tailscale DERP servers
+for global coverage.
+
+#### STUN
+
+[STUN](https://en.wikipedia.org/wiki/STUN) is an IETF standard protocol that
+allows network endpoints behind NAT to learn their public address and port
+mappings. It is an essential component of Coder’s networking to enable encrypted
+tunnels to be established without a relay for best performance.
+
+Coder does not ship with a STUN service because it needs to be run directly
+connected to the network, not behind a reverse proxy or load balancer as Coder
+usually is.
+
+STUN messages are not encrypted, but do not transmit any tunneled data, they
+simply query the public address and ports. As such, a STUN service learns the
+public address and port information such as the address and port on the NAT
+device of Coder workspaces and the end user's device if STUN is configured.
+
+Unlike DERP, it doesn’t definitively learn about communicating pairs of IPs.
+
+If you consider the public IP and port information to be sensitive, do not use
+public STUN servers.
+
+You may choose not to configure any STUN servers, in which case most workspace
+traffic will need to be relayed via DERP. You may choose to deploy your own STUN
+servers, either on the public Internet, or on your corporate network and
+[configure Coder to use it](../../reference/cli/server.md#--derp-server-stun-addresses).
+
+If you do not consider the addresses and ports to be sensitive, we recommend
+using the default set of STUN servers operated by Google.
+
+#### Workspace apps
+
+Coder workspace apps are a way to allow users to access web applications running
+in the workspace via the Coder Server or Workspace Proxy.
+
+1. [Disable workspace apps on sub-paths](../../reference/cli/server.md#--disable-path-apps)
+   of the main Coder domain name.
+
+1. [Use a separate, wildcard domain name](../../admin/setup/index.md#wildcard-access-url)
+   for forwarding.
+
+   Because of the default
+   [same-origin policy](https://en.wikipedia.org/wiki/Same-origin_policy) in
+   browsers, serving web apps on the main Coder domain would allow those apps to
+   send API requests to the Coder Server, authenticated as the logged-in user
+   without their explicit consent.
+
+#### Port sharing
+
+Coder supports the option to allow users to designate specific network ports on
+their workspace as shared, which allows others to access those ports via the
+Coder Server.
+
+Consider restricting the maximum sharing level for workspaces, located in the
+template settings for the corresponding template.
+
+### Encryption at rest
+
+Deploy Coder workspaces using full disk encryption for all volumes.
+
+This mitigates attempts to recover sensitive data in the workspace by attackers
+who gain physical access to the disk(s).