docs: add process priority management documentation (#22626)

This commit is contained in:
Jon Ayers
2026-03-05 14:16:29 -06:00
committed by GitHub
parent 51f298f2de
commit 25dac6e5f7
2 changed files with 159 additions and 0 deletions
@@ -0,0 +1,154 @@
# Improving Agent Resiliency
Coder's agent can automatically lower the scheduling priority
and raise the OOM (out-of-memory) kill score of user processes
so the agent itself stays alive under resource pressure.
## Prerequisites
- **Linux** — The feature is ignored on other operating systems.
- **`CAP_SYS_NICE`** — Required if the agent needs to lower
the nice value below its current value. In Kubernetes, add
it to the container's security context:
```hcl
container {
security_context {
capabilities {
add = ["CAP_SYS_NICE"]
}
}
}
```
## Environment variables
Configure the feature with environment variables in the
environment that launches the agent binary. These must be set
on the workspace container or host, not in the `coder_agent`
resource's `env` block — the agent reads them from its own
process environment at startup.
| Variable | Required | Default | Description |
|-------------------------|----------|-------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `CODER_PROC_PRIO_MGMT` | Yes | — | Set to enable the feature. The agent checks whether the variable is present, not its value — even an empty string enables it. Use `1` by convention. To disable, unset the variable entirely. |
| `CODER_PROC_OOM_SCORE` | No | Computed from agent's score | Explicit `oom_score_adj` value for child processes. Range: `-1000` to `1000`. |
| `CODER_PROC_NICE_SCORE` | No | Agent nice + 5 (capped at 19) | Explicit nice value for child processes. Range: `-20` to `19` (higher = lower priority). |
### OOM score defaults
If you do not set `CODER_PROC_OOM_SCORE`, the agent computes a
value based on its own `oom_score_adj`:
| Agent's `oom_score_adj` | Child score | Rationale |
|-------------------------|-------------|------------------------------------------------|
| Negative (< 0) | `0` | Children are treated as normal processes. |
| >= 998 | `1000` | Children get the maximum score (killed first). |
| Any other value | `998` | Children get a near-maximum score. |
The goal is for the kernel's OOM killer to target child
processes before the agent, keeping remote connectivity alive
even when a workspace runs out of memory.
### Nice score defaults
If you do not set `CODER_PROC_NICE_SCORE`, the agent sets
children to its own nice value plus 5, capped at 19. This
gives the agent more CPU scheduling priority than user
workloads.
## Example
The following Kubernetes template snippet enables process
priority management on the workspace container:
```hcl
resource "kubernetes_deployment" "workspace" {
# ... other configuration
spec {
template {
spec {
container {
name = "dev"
image = "codercom/enterprise-base:ubuntu"
env {
name = "CODER_AGENT_TOKEN"
value = coder_agent.main.token
}
env {
name = "CODER_PROC_PRIO_MGMT"
value = "1"
}
env {
name = "CODER_PROC_OOM_SCORE"
value = "10"
}
env {
name = "CODER_PROC_NICE_SCORE"
value = "1"
}
security_context {
capabilities {
add = ["CAP_SYS_NICE"]
}
}
}
}
}
}
}
```
- `CODER_PROC_OOM_SCORE=10` gives child processes a slightly
elevated OOM score while keeping them well below the maximum.
- `CODER_PROC_NICE_SCORE=1` gives children a mildly lower CPU
priority than the agent.
- `CAP_SYS_NICE` allows the agent to set nice values.
## Troubleshooting
### OOM score adjustment fails
If you see `failed to adjust oom score` in stderr but the
process still starts, the agent likely lacks permission to
write to `/proc/self/oom_score_adj`. Ensure the process is
dumpable — this is handled automatically by the agent, but
can fail if the container runtime restricts `prctl` calls.
### Nice value is not applied
If you see `failed to adjust niceness` in stderr, nice values
can only be increased (lowered in priority) without
`CAP_SYS_NICE`. If your template sets a `CODER_PROC_NICE_SCORE`
lower than the agent's current nice value, add the capability
to the container's security context.
### Environment variables leak to nested Coder agents
The agent strips all `CODER_PROC_*` variables from child
environments automatically. This prevents interference in
"Coder on Coder" development scenarios where a workspace
runs another Coder agent.
### Verifying the feature is enabled
The agent logs whether process priority management is active
at startup. Look for these lines in the agent log:
```text
"process priority management enabled"
"process priority management not enabled (linux-only)"
```
The log entry includes the `CODER_PROC_PRIO_MGMT` value and
the operating system. Check the agent log file at
`<log-dir>/coder-agent.log` or stderr output.
### Feature has no effect on macOS or Windows
Process priority management is Linux-only. Setting
`CODER_PROC_PRIO_MGMT` on other operating systems is safe
but has no effect.
+5
View File
@@ -674,6 +674,11 @@
"description": "Extend templates with containerized dev environments",
"path": "./admin/templates/extending-templates/devcontainers.md"
},
{
"title": "Improving Agent Resiliency",
"description": "Manage agent child process CPU and OOM priority",
"path": "./admin/templates/extending-templates/process-priority.md"
},
{
"title": "Process Logging",
"description": "Log workspace processes",