Relates to CODAGT-115
Adds metric `coderd_api_websocket_probes_total`. Every successful
heartbeat for a given path will increment the metric.
Comparing this with `coderd_api_concurrent_websockets` will give an
indication of how many websocket connections are open but in a 'wedged'
state (when heartbeats stopped versus when we closed the connection).
Adds an optional dbcrypt wrapper around gitsshkeys.private_key. The
column is encrypted on insert and update through enterprise/dbcrypt when
external token encryption is configured, and decrypted on read.
A new private_key_key_id column references
dbcrypt_keys(active_key_digest) so revocation safety is enforced by the
existing foreign key. Rows with a NULL key_id stay plaintext and remain
readable. Existing plaintext rows can be backfilled by running `coder
server dbcrypt rotate`.
Generated with assistance from Coder Agents.
Several relative links in the docs pointed at pages that no longer exist
or rendered incorrectly on coder.com.
Fixes:
- `start/first-template.md`: IDE links repointed from the removed
`../ides.md` / `../ides/web-ides.md` to their current homes under
`user-guides/workspace-access/`.
- `tutorials/example-guide.md`: contributing link repointed to
`../about/contributing/documentation.md`.
- `about/contributing/backend.md`: the `migrations/testdata/fixtures`
and `full_dumps` references (and the `000024_example.up.sql` example)
used relative paths that escape `docs/` and render as bogus
`/docs/coderd/...` routes on the site. Normalized to the canonical
`github.com/coder/coder/(blob|tree)/main/...` form already used by ~120
other source links in the docs.
- Normalized extensionless directory links (`ai-coder/ai-gateway`,
`user-guides/workspace-access`, `install`) to their `/index.md` targets
for consistency with the rest of the docs.
This class of bug is invisible to the local doc checks (`make
lint/markdown` / `pnpm check-docs` only run markdownlint + table
formatting); only CI's Linkspector job validates link targets. Found via
a relative-link audit while investigating the docs preview on #25816.
Source-link version-awareness (so older docs versions don't all point at
`main`) is tracked separately in DOCS-268 and will be handled in the
coder.com render layer.
Linear: DOCS-278
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Removes the inline security advisory table and the standalone advisory
file (`0001_user_apikeys_invalidation.md`). The advisories section now
directs readers to [GitHub Security
Advisories](https://github.com/coder/coder/security/advisories).
> Generated by Coder Agents on behalf of @jdomeracki-coder
Add metrics for `aibridged` and `aibridgeproxyd`'s provider statuses. AI providers can be modified, and possibly misconfigured, at runtime. These metrics help operators understand the state of these provider definitions in case unexpected behaviour is observed.
Closes DOCS-66.
Adds a `[!NOTE]` callout to `docs/admin/monitoring/logs.md` documenting
that `--log-human=""` (empty string) does not disable human-readable
logging; the working value is `--log-human=/dev/null`.
## Context
Reported by Bjorn Robertsson in `#docs` on 2026-04-29. Operators trying
to silence the human-readable log stream had been setting `--log-human`
(or `CODER_LOGGING_HUMAN`) to an empty string and getting unchanged log
output. The empty-string path hits a 2023-vintage code path that falls
back to the default `/dev/stderr` instead of disabling output.
This PR documents the workaround on the admin-facing logs page. The CLI
flag reference under `docs/reference/cli/server.md` is auto-generated
and intentionally left unchanged. A separate engineering issue may be
worth filing to fix the root cause (empty string should either disable
or surface a warning).
> [!NOTE]
> This is a docs-only change. No product code was modified.
---
*Generated by Coder Agents on behalf of @nickvigilante.*
## Summary
Three small docs fixes:
- **`docs/admin/integrations/oauth2-provider.md`**: Replace broken
relative link to `scripts/oauth2/README.md` with an absolute GitHub URL.
The previous link escaped the `docs/` tree
(`../../../scripts/oauth2/README.md`) and does not resolve in the
published docs site.
- **`docs/install/releases/feature-stages.md`**: Point the "Coder
documentation" link to `docs/about/contributing/documentation.md`. The
previous `../../README.md` target does not exist under `docs/`.
- **`docs/manifest.json`**: Add the missing `users oidc-claims` entry
alongside the other `users` CLI subcommands so the generated reference
page (`docs/reference/cli/users_oidc-claims.md`) is reachable from the
sidebar.
## Validation
- Confirmed each new link target exists on `main`
(`docs/about/contributing/documentation.md`, `scripts/oauth2/README.md`,
`docs/reference/cli/users_oidc-claims.md`).
- Pre-commit hooks pass (`fmt/markdown`, `lint/markdown`, `lint/emdash`,
`lint/typos`, etc.).
---
_This PR was prepared by a [Coder Agents](https://coder.com/) session on
behalf of @nickvigilante. Human review requested since this is a
docs-only change._
Update the user secrets user guide, the admin security secrets
reference, and the docs manifest to label the feature as Beta instead of
Early Access, and link to the beta section of the feature stages doc.
The `multi-select` form type description in the dynamic parameters docs
incorrectly stated it renders checkboxes. The actual UI is a searchable
dropdown combobox (`MultiSelectCombobox`) with selected items shown as
removable chips.
> This PR was authored by Coder Agents on behalf of @uzair-coder07.
Adds options matching new AI Gateway naming.
New options are added as alias for old options. Old options are still
working.
Old options have deprecated message.
No conflict detection was added.
Updated documentation so it mentions only new options. Added note about
old options still working.
> Various AI tools where used to create this PR
> Mux updated this PR on behalf of Mike.
## Stack Context
This PR is the storage, permissions, API, and SDK layer for experimental
personal skills. #25362 has landed on `main`, so this branch is
restacked directly on `main`.
Stack order:
1. #25363 storage, permissions, API, and SDK
2. #25365 API test coverage
3. #25366 chattool and chatd integration
4. #25066 settings UI and docs
5. #25386 personal skills slash menu
## What?
Adds the `user_skills` database table, generated queries, RBAC resources
and scopes, audit resource handling, experimental user-scoped CRUD
endpoints, SDK types, and generated API/site types.
Follow-up review and restack fixes:
- Enforce a bounded personal skill description in parser and database
constraints.
- Return `403 Forbidden` for unauthorized create and update attempts.
- Return explicit conflict responses when soft-deleted users are
targeted.
- Keep user admins out of personal skills, while site owners can read
and delete but not create or update.
- Document trigger-raised constraint names and keep schema constants
covered by tests.
- Reuse `UserSkillMetadata` in the full `UserSkill` SDK response type.
- Generate user skill IDs in Go instead of relying on a database
default.
- Rebase on latest `main` and renumber the user skills migration to
`000502_user_skills`.
## Why?
Personal skills need durable user-owned storage with owner
authorization, limited site-owner moderation, and a hidden API surface
before chatd can consume them.
## Validation
- `make gen`
- `go test ./coderd/database -run '^TestUserSkillSchemaConstants$'
-count=1`
- `go test ./coderd/database/dbauthz -run
'^TestMethodTestSuite/TestUserSkills$' -count=1`
- `go test ./coderd -run '^TestPatchUserSkill$' -count=1`
- `go test ./codersdk ./coderd/database/db2sdk`
- `make lint`
- pre-commit hook on `97fd58108d`
## Summary
The GitHub App walkthrough in `docs/admin/external-auth/index.md` stops
after \"install the app for your organization,\" which is enough for the
admin who created the app but not for anyone else. Every other Coder
user hitting **Link GitHub** lands on a GitHub 404 (`This is not the web
page you are looking for`) because:
1. New GitHub Apps default to **\"Only on this account\"** / not public.
GitHub returns 404 from the OAuth-authorize URL for any user other than
the owner.
2. `CODER_EXTERNAL_AUTH_0_APP_INSTALL_URL` — the env var that makes
Coder render an \"Install GitHub App\" link in the UI — is undocumented
today.
This PR adds one extra step at the end of the GitHub App configuration
walkthrough covering both.
## Test plan
- [x] \`make fmt/markdown\` clean
- [x] Doc reviewer eyes
Moves the `coderd_agents_first_connection_seconds` histogram from the
polling-based `prometheusmetrics.Agents()` loop to the event-driven
`agentConnectionMonitor.init()` path. The metric is now recorded exactly
once when an agent first connects over the RPC websocket, instead of
being retroactively computed each polling tick.
The `username` and `workspace_name` labels are removed to reduce
cardinality; only `template_name` and `agent_name` are retained.
Adds unit tests covering both the happy path (first connection recorded)
and the negative-duration guard (clock skew logs a warning, no sample
emitted).
When OpenAI's Responses API returns `Previous response with id ... not
found` for a chained turn, classify it as a `ChainBroken` retry, clear
`previous_response_id`, exit chain mode, reload full history, and let
`chatretry` retry. Self-heals chats whose anchor was poisoned before
#25074 stopped truncated streams from being persisted as a successful
turn with a stored response id.
The new state is exposed via the existing
`coderd_chatd_stream_retries_total` counter as a
`chain_broken="true"|"false"` label. Aggregating queries (`sum`, `rate`
over `provider`/`model`/`kind`) keep working without changes; raw-series
matchers without aggregation will now see two series per `(provider,
model, kind)` where they previously saw one. The metric is internal-only
so the blast radius should be small, but if you have dashboards that
index by exact label matchers without aggregation they will need an
extra `sum` or an explicit `chain_broken` selector.
> 🤖 This PR was created with the help of Coder Agents, and was reviewed by a human 🧑💻
#11918 took away advanced settings during template creation however it
did not clean up the documentation of a reference to customising the
template permissions during template creation -
https://coder.com/docs/admin/templates/template-permissions
> By default the Everyone group is assigned to each template meaning any
Coder user can use the template to create a workspace. To prevent this,
disable the Allow everyone to use the template setting when creating a
template.
This setting is no longer present in Coder, so removing it from the
docs.
Adds an `[!IMPORTANT]` callout under the SCIM heading in the OIDC auth
docs noting that Coder's SCIM 2.0 implementation is not a fully
certified or guaranteed implementation of the spec. It covers common
provisioning/deprovisioning flows with major IdPs (Okta, Entra ID, etc.)
but specific attributes, endpoints, or behaviors may not be supported
and may change between releases.
This matches what we say in conversations with prospects and avoids
setting an expectation we can't always meet. Background: #15830 (current
implementation is an MVP scoped to Okta cloud; `PATCH` is not RFC 7644
compliant; user updates only change status, not groups/orgs/roles).
Companion PR: coder/coder.com#738 removes the SCIM row from the pricing
comparison.
> Generated with [Coder Agents](https://coder.com/agents)
Persists the agent-generated turn-end summary on `chats` and shows it as
the Agents sidebar subtitle when present, falling back to the model
name. Errors still take precedence.
> Mux is acting on Mike's behalf.
## What changes
**Storage.** New nullable `last_turn_summary` column on `chats`
(migration `000486`). New `UpdateChatLastTurnSummary` query normalizes
blank/whitespace input to `NULL`, preserves `updated_at` (so the chat
does not jump to the top of the sidebar on summary writes), and uses an
`expected_updated_at` stale-write guard so an older async summary cannot
overwrite a newer turn.
**Backend.** `coderd/x/chatd/chatd.go` decouples summary generation from
webpush. Generated summaries persist for completed parent turns even
when webpush is unconfigured or has no subscriptions. The same generated
text is reused as the webpush body when webpush is configured, so the
summary model is not called twice. Generic fallback push text is no
longer persisted; it clears any stale summary instead.
Error/interrupt/pending-action terminal paths clear `last_turn_summary`
for the latest turn.
**Frontend.** `AgentsSidebar.tsx` subtitle priority is now `errorReason
|| lastTurnSummary || modelName`, normalized via the existing
`asNonEmptyString` helper from `blockUtils.ts`.
## Tests
- `TestUpdateChatLastTurnSummary` (database): success,
whitespace-to-NULL, stale guard rejects, `updated_at` preserved.
- `TestUpdateLastTurnSummaryRejectsStaleWrites` (chatd internal): direct
stale-`expected_updated_at` test.
- `TestSuccessfulChatPersistsTurnSummaryWithoutWebPush`: persistence
works without webpush subscriptions.
- `TestSuccessfulChatSendsWebPushWithSummary`: same generated text
drives both DB and push body.
-
`TestSuccessfulChatSendsWebPushFallbackWithoutSummaryForEmptyAssistantText`:
fallback text is not persisted.
- `TestErroredChatClearsLastTurnSummaryAndSendsWebPush`: error path
clears the field.
- `TestInterruptChatDoesNotSendWebPushNotification`: interrupt path
clears the field, no push fires.
- `AgentsSidebar.test.tsx`: subtitle priority for summary-present,
error-wins, no-summary fallback, whitespace fallback.
- `AgentsSidebar.stories.tsx`: `ChatWithTurnSummary` and
`ChatWithTurnSummaryAndError`.
## Notes
- No backfill. Existing chats keep showing the model name until their
next turn completes.
- Parent chats only in this iteration; the field is rendered on any
`Chat` if a future change extends generation to children.
- Decoupling generation from webpush adds quickgen model calls for
completed parent turns that previously skipped generation when no
subscriptions existed. Existing parent-only, assistant-text-present,
`PushSummaryModel` configured, and bounded-timeout gates keep this
behavior bounded.
Emit user secret audit log entries for create/update/delete operations.
Reads stay un-audited, matching every other resource.
Audit log entries record changes in user secret name, environment
variable name, file path, and value. The secret value column is marked
`ActionSecret` so the diff records the change without showing the
ciphertext or plaintext.
Closes a TOCTOU window on delete to ensure no phantom audit logs for a
delete of a non-existent secret. Secret update accepts a small TOCTOU
window matching the other audited resources (templates, workspaces,
chats). The two-query pattern is wrapped in a transaction so audit state
can't leak from a failed mutation.
Adds a new "Governance Layer" section to the architecture page with
short descriptions of AI Gateway and Agent Firewall, linking to their
dedicated reference pages.
> Generated by Coder Agents
---------
Co-authored-by: Danny Kopping <danny@coder.com>
Adds production-observability metrics to coderd/x/chatd/ for
model-level correlation and a chatStreams memory-leak investigation.
- Label per-request chatd metrics (steps_total, message_count,
prompt_size_bytes, tool_result_size_bytes, ttft_seconds,
compaction_total) with `model` and enrich the per-turn logger
with provider/model.
- Add `coderd_chatd_stream_retries_total{provider, model, kind}`
counter incremented in chatloop before OnRetry.
- Register a prometheus.Collector exposing `streams_active`,
`stream_buffer_size_max`, `stream_buffer_events`,
`stream_subscribers` from p.chatStreams.
- Add `coderd_chatd_stream_buffer_dropped_total` counter,
incremented per publishToStream drop independently of the
existing log-rate-limited bufferDropCount.
- Snapshot logger/model before the title-generation goroutine to
avoid a data race with the logger/model rebind below it.
> 🤖
Adds a coder secret command group for managing user secrets from the
CLI, with create, update, list, and delete subcommands backed by the
existing user secret API.
This branch adds CLI test coverage and refreshes the generated help
output and CLI reference docs for the new command group.
_Disclaimer: produced by Claude Opus 4.6_
Adds a `coder_build_info` metric which allows operators to see which
versions of Coder are currently running.
---------
Signed-off-by: Danny Kopping <danny@coder.com>
## Summary
Add `coderd_agents_first_connection_seconds` histogram metric that
records the
duration from workspace agent creation to first connection. This fills
an
observability gap — provisioner job timings and startup script metrics
exist,
but the agent connection phase (which can take several minutes) was not
exposed
to Prometheus.
Closes https://github.com/coder/coder/issues/21282
## Changes
- **`coderd/prometheusmetrics/prometheusmetrics.go`** — Define and
register a
`HistogramVec` in the existing `Agents()` polling loop. Observe
`first_connected_at - created_at` exactly once per agent via a
deduplication
map, pruned each tick to prevent unbounded memory growth.
- **`coderd/prometheusmetrics/prometheusmetrics_test.go`** — Update
`TestAgents`
to set `first_connected_at` on the test agent and assert the histogram
is
collected with correct labels, sample count, and sample sum.
- **`docs/admin/integrations/prometheus.md`**,
**`scripts/metricsdocgen/generated_metrics`** —
Auto-generated documentation updates from `make gen`.
## Metric details
| Property | Value |
|---|---|
| Name | `coderd_agents_first_connection_seconds` |
| Type | histogram |
| Labels | `template_name`, `agent_name`, `username`, `workspace_name` |
| Buckets | 1s, 10s, 30s, 1m, 2m, 5m, 10m, 30m, 1h |
## Example PromQL
```promql
# P95 agent connection time by template
histogram_quantile(0.95,
sum(rate(coderd_agents_first_connection_seconds_bucket[1h])) by (le, template_name)
)
```
<details>
<summary>Implementation notes</summary>
### Design decisions
- **Histogram over gauge**: Enables `histogram_quantile()` for
percentile queries.
- **Observe in `Agents()` polling loop**: All required data is already
fetched by
`GetWorkspaceAgentsForMetrics()` — no new DB queries.
- **Dedup via `map[uuid.UUID]struct{}`**: Prevents re-observing the same
agent
across polling ticks. Pruned each cycle to bound memory.
- **Buckets**: Aligned with
`coderd_provisionerd_workspace_build_timings_seconds`
range (1s–1h).
### Overhead at scale (100k active workspaces)
The deduplication map (`observedFirstConnection`) and per-tick pruning
map
(`currentAgentIDs`) are both `map[[16]byte]struct{}`. At 100k agents:
- **Memory**: ~2.25 MB persistent + ~2.25 MB transient per tick = **~4.5
MB peak**.
- **CPU**: ~25 ms of map operations per tick (one tick per minute) =
**<0.05% of one core**.
Both are negligible relative to the existing cost of the `Agents()` loop
(the DB
query, per-agent `GetWorkspaceAppsByAgentID` calls, and coordinator node
lookups
dominate).
</details>
> 🤖 Generated by Coder Agents
Add dbcrypt support for user secret values. When database encryption is
enabled, secret values are transparently encrypted on write and
decrypted on read through the existing dbcrypt store wrapper.
- Wrap `CreateUserSecret`, `GetUserSecretByUserIDAndName`,
`ListUserSecretsWithValues`, and `UpdateUserSecretByUserIDAndName` in
enterprise/dbcrypt/dbcrypt.go.
- Add rotate and decrypt support for user secrets in
enterprise/dbcrypt/cliutil.go (`server dbcrypt rotate` and `server
dbcrypt decrypt`).
- Add internal tests covering encrypt-on-create, decrypt-on-read,
re-encrypt-on-update, and plaintext passthrough when no cipher is
configured.
Go's html/template has a built-in security filter (urlFilter) that only
allows http, https, and mailto URL schemes. Any other scheme gets
replaced with #ZgotmplZ.
The OAuth2 app's callback URL uses custom URI scheme which the filter
considers unsafe. For example the Coder JetBrains plugin exposes a
callback URI with the scheme jetbrains:// - which was effectively
changed by the template engine into #ZgotmplZ. Of course this is not an
actual callback. When users clicked the cancel button nothing happened.
The fix was simple - we now wrap the apps registered callback URI into
htmltemplate.URL. Usually this needs some validation otherwise the
linter will complain about it. The callback URI used by the Cancel logic
is actually validated by our backend when the client app
programmatically registered via the dynamic OAuth2 registration
endpoints, so we refactored the validation around that code and re-used
some of it in the Cancel handling to make sure we don't allow URIs like
`javascript` and `data`, even though in theory these URIs were already
validated.
In addition, while testing this PR with
https://github.com/coder/coder-jetbrains-toolbox/pull/209 I discovered
that we are also not compliant with
https://www.rfc-editor.org/rfc/rfc6749#section-4.1.2.1 which requires
the server to attach the local state if it was provided by the client in
the original request. Also it is optional but generally a good practice
to include `error_description` in the error responses. In fact we follow
this pattern for the other types of error responses. So this is not a
one off.
- resolves#20323
<img width="1485" height="771" alt="Cancel_page_with_invalid_uri"
src="https://github.com/user-attachments/assets/5539d234-9ce3-4dda-b421-d023fc9aa99e"
/>
<img width="486" height="746" alt="Coder Toolbox handling the Cancel
button"
src="https://github.com/user-attachments/assets/acab71a6-d29c-4fa9-80ba-3c0095bbdc8f"
/>
<!--
If you have used AI to produce some or all of this PR, please ensure you
have read our [AI Contribution
guidelines](https://coder.com/docs/about/contributing/AI_CONTRIBUTING)
before submitting.
-->
## Problem
The Sysbox docker-in-workspaces docs examples use `sudo dockerd &` in
`startup_script` to start Docker. This causes workspaces to report as
unhealthy because `dockerd` keeps references to stdout/stderr after the
script exits.
## Fix
Replace `sudo dockerd &` with `sudo service docker start`, which
properly daemonizes Docker through the service manager and returns
cleanly. This matches the pattern used in our [dogfood
template](https://github.com/coder/coder/blob/main/dogfood/coder/main.tf#L614).
## Validation
Created a test template and workspace on dogfood — agent reported `✔
healthy` and `docker info` confirmed the daemon running inside the
workspace.
Fixes#21166
> 🤖 This PR was created with the help of Coder Agents, and has been
reviewed by my human. 🧑💻
## What
Documents that Coder license keys are validated locally using
cryptographic signatures and do not require an outbound connection to
Coder's servers. This is a common question from customers evaluating
Coder for air-gapped environments.
## Changes
- **`docs/admin/licensing/index.md`**: Added an "Offline license
validation" section explaining that license keys are signed JWTs
validated locally with no phone-home requirement.
- **`docs/install/airgap.md`**: Added a "License validation" row to the
air-gapped comparison table, confirming no changes are needed for
offline license validation and linking to the licensing docs.
## Why
While the air-gapped docs state that "all Coder features are supported"
offline, there was no explicit mention that the license itself doesn't
require connectivity. This is a frequent question from
security-conscious and air-gapped customers.
---------
Co-authored-by: blink-so[bot] <211532188+blink-so[bot]@users.noreply.github.com>
Co-authored-by: Matyas Danter <mdanter@gmail.com>