feat: add harness engineering layer for agent workflows (#24791)

This PR adds an opinionated harness-engineering layer for agent-driven
workflows: a small set of agent-readable docs, mechanical structure
checks, structured CI failure summaries, an architecture-lint umbrella,
and per-worktree dev-server isolation. The goal is to make local dev,
tests, and CI mechanically inspectable by agents without changing app
runtime behavior.

## What landed

**Agent docs and navigation**
- `.claude/docs/OBSERVABILITY.md`, `.claude/docs/DEV_ISOLATION.md`,
`.claude/docs/AGENT_FAILURES.md`: task-oriented guides for logs,
tracing, Prometheus, dev-server isolation, and a seeded failure catalog.
- `AGENTS.md`: added an `Agent navigation` block, then trimmed the file
from 375 to 229 lines by migrating duplicated detail into
`WORKFLOWS.md`, `GO.md`, `TESTING.md`, and `DATABASE.md`. The
user-managed custom-instructions block is preserved.
- `.agents/docs`: symlink mirror of `.claude/docs` for agent runtimes
that look under `.agents`.

**Mechanical checks**
- `scripts/check_agents_structure.sh`: validates `@...` references in
tracked `AGENTS.md` files and warns when root grows past 600 lines.
Wired as `make lint/agents` and into `make lint`.
- `scripts/audit-agent-readiness.sh`: report-first audit of harness
readiness. Currently `10 ok, 0 warn, 0 fail`.
- `scripts/check_architecture.sh` / `make lint/architecture`: umbrella
architecture-lint target. Consolidates the existing
`check_enterprise_imports.sh` and `check_codersdk_imports.sh` so they
run exactly once via the umbrella. Slot is open for new high-confidence
rules.

**Structured CI failure summaries**
- `scripts/playwright-failure-summary.sh`: parses
`site/test-results/results.json` and writes Markdown to
`$GITHUB_STEP_SUMMARY` on failure. Wired into the `test-e2e` matrix job.
- `scripts/go-test-failure-summary.sh`: parses `go test -json`
line-delimited output the same way. Wired into `test-go-pg`,
`test-go-pg-17`, and `test-go-race-pg` by injecting `gotestsum
--jsonfile` in the workflow without touching `Makefile`. JSON also
uploaded as a CI artifact on failure.
- `site/e2e/playwright.config.ts`: enables `screenshot:
only-on-failure`, `trace: retain-on-failure`, JSON reporter, and HTML
reporter alongside existing reporters.
- `.github/workflows/ci.yaml`: failure artifact uploads for Playwright
now use `if: failure()` and predictable names
(`playwright-artifacts-<variant>-<sha>`).

**Per-worktree dev-server isolation** (`scripts/develop/main.go`)
- Deterministic FNV-64a hash of the worktree path produces a port offset
in `[0, 1000)` (50 buckets, step 20 to avoid API/proxy overlap across
adjacent buckets).
- Offset is applied only to defaults; both env vars (`CODER_DEV_PORT`,
`CODER_DEV_WEB_PORT`, `CODER_DEV_PROXY_PORT`,
`CODER_DEV_PROMETHEUS_PORT`) and CLI flags retain priority.
- Hardcoded ports `9090` (embedded Prometheus UI) and `12345` (Delve)
are unchanged by design.
- Startup banner shows each port's source: `default`, `offset`, or
`explicit`.
- Unit tests in `scripts/develop/main_test.go` cover determinism,
bounds, no-overlap across the four ports, and explicit-skip behavior.
- State (`.coderv2/`) was already worktree-isolated via `os.Getwd()`, so
no state-dir changes were needed.

## Validation

`make lint/agents`, `make lint/architecture`, `make lint/emdash`, `bash
scripts/audit-agent-readiness.sh` (10 ok, 0 warn, 0 fail), `shellcheck`
on all 5 new scripts, `go test ./scripts/develop/...`, and `js-yaml`
parse of `ci.yaml` all pass. Synthetic fixtures verify both
failure-summary scripts handle empty/missing input (silent exit 0),
ANSI-stripped output, and parent/subtest formatting.

## Known follow-ups (deferred)

- Frontend Storybook/Vitest failure summary: lowest-leverage slice of
the failure-summary work. Skipping until observed pain.
- Architecture lint currently only delegates to existing import checks;
new rules (`InTx` outer-store detection, swagger-annotation lint) plug
in as needed.
- 50 port-offset buckets means two worktree paths can occasionally
collide. The DEV_ISOLATION doc tells users to set the relevant env var
when this happens.

> Mux opened this PR on Mike's behalf.
This commit is contained in:
Michael Suchacz
2026-05-11 17:27:29 +02:00
committed by GitHub
parent 915956460a
commit 85792d08bc
20 changed files with 1561 additions and 237 deletions
+60 -194
View File
@@ -3,6 +3,15 @@
You are an experienced, pragmatic software engineer. You don't over-engineer a solution when a simple one is possible.
Rule #1: If you want exception to ANY rule, YOU MUST STOP and get explicit permission first. BREAKING THE LETTER OR SPIRIT OF THE RULES IS FAILURE.
## Agent navigation
- Day-to-day: Start with [Development Workflows and Guidelines](.claude/docs/WORKFLOWS.md) for dev servers, git workflow, hooks, and routine checks.
- Observability and isolation: Use [Observability Guide for Agents](.claude/docs/OBSERVABILITY.md) for logs, tracing, and metrics, and [Development Isolation Guide for Agents](.claude/docs/DEV_ISOLATION.md) for ports, state, readiness, and cleanup.
- Failures: Use [Agent Failure Catalog](.claude/docs/AGENT_FAILURES.md) for repeatable failure formats and seeded diagnostics.
- Language and area docs: Use [Modern Go](.claude/docs/GO.md), [Testing Patterns and Best Practices](.claude/docs/TESTING.md), [Database Development Patterns](.claude/docs/DATABASE.md), [OAuth2 Development Guide](.claude/docs/OAUTH2.md), [Coder Architecture](.claude/docs/ARCHITECTURE.md), [Troubleshooting Guide](.claude/docs/TROUBLESHOOTING.md), [Documentation Style Guide](.claude/docs/DOCS_STYLE_GUIDE.md), and [Pull Request Description Style Guide](.claude/docs/PR_STYLE_GUIDE.md) when that area is in scope.
- Compatibility: `.agents/docs` symlinks to `.claude/docs` for agent runtimes that look there.
- Frontend: Read [Frontend Development Guidelines](site/AGENTS.md) before changing anything under `site/`.
## Foundational rules
- Doing it right is better than doing it fast. You are not in a rush. NEVER skip steps or take shortcuts.
@@ -60,82 +69,33 @@ Only pause to ask for confirmation when:
## Critical Patterns
### Database Changes (ALWAYS FOLLOW)
Detailed workflow and topic guidance lives in the imported docs. Keep root
instructions focused on guardrails that agents should see immediately.
1. Modify `coderd/database/queries/*.sql` files
2. Run `make gen`
3. If audit errors: update `enterprise/audit/table.go`
4. Run `make gen` again
### LSP Navigation (USE FIRST)
#### Go LSP (for backend code)
- **Find definitions**: `mcp__go-language-server__definition symbolName`
- **Find references**: `mcp__go-language-server__references symbolName`
- **Get type info**: `mcp__go-language-server__hover filePath line column`
- **Rename symbol**: `mcp__go-language-server__rename_symbol filePath line column newName`
#### TypeScript LSP (for frontend code in site/)
- **Find definitions**: `mcp__typescript-language-server__definition symbolName`
- **Find references**: `mcp__typescript-language-server__references symbolName`
- **Get type info**: `mcp__typescript-language-server__hover filePath line column`
- **Rename symbol**: `mcp__typescript-language-server__rename_symbol filePath line column newName`
### OAuth2 Error Handling
```go
// OAuth2-compliant error responses
writeOAuth2Error(ctx, rw, http.StatusBadRequest, "invalid_grant", "description")
```
### Authorization Context
```go
// Public endpoints needing system access
app, err := api.Database.GetOAuth2ProviderAppByClientID(dbauthz.AsSystemRestricted(ctx), clientID)
// Authenticated endpoints with user context
app, err := api.Database.GetOAuth2ProviderAppByClientID(ctx, clientID)
```
### API Design
- Add swagger annotations when introducing new HTTP endpoints. Do this in
the same change as the handler so the docs do not get missed before
release.
- For user-scoped or resource-scoped routes, prefer path parameters over
query parameters when that matches existing route patterns.
- For experimental or unstable API paths, skip public doc generation with
`// @x-apidocgen {"skip": true}` after the `@Router` annotation. This
keeps them out of the published API reference until they stabilize.
### Database Query Naming
- Use `ByX` when `X` is the lookup or filter column.
- Use `PerX` or `GroupedByX` when `X` is the aggregation or grouping
dimension.
- Avoid `ByX` names for grouped queries.
### Database-to-SDK Conversions
- Extract explicit db-to-SDK conversion helpers instead of inlining large
conversion blocks inside handlers.
- Keep nullable-field handling, type coercion, and response shaping in the
converter so handlers stay focused on request flow and authorization.
### Transactions and `InTx`
- Inside `db.InTx(...)` closures, do not use the outer store (`api.Database`,
`p.db`, etc.) directly or indirectly. Use the `tx` handle for DB work inside
the closure, or fetch read-only inputs before opening the transaction.
- Watch for helper methods on a receiver that hide outer-store access. A call
like `p.someHelper(ctx)` is still unsafe inside `InTx` if that helper uses
`p.db` internally.
- Using the outer store while a transaction is open can hold one connection and
then block on another pool checkout, which can cause pool starvation and
`idle in transaction` incidents under load.
- **Database changes**: Follow
[Database Development Patterns](.claude/docs/DATABASE.md). Modify
`coderd/database/queries/*.sql`, run `make gen`, update
`enterprise/audit/table.go` for audit errors, then run `make gen` again.
- **LSP navigation**: Use LSP tools first. See
[Modern Go](.claude/docs/GO.md) for Go LSP and
[Frontend Development Guidelines](site/AGENTS.md) for TypeScript LSP.
- **OAuth2 and authorization**: Follow
[OAuth2 Development Guide](.claude/docs/OAUTH2.md). OAuth2 endpoints must
use RFC-compliant errors such as `writeOAuth2Error(...)`, and public
endpoints that need system access should use `dbauthz.AsSystemRestricted`.
- **API design**: Follow the API guardrails in
[Development Workflows and Guidelines](.claude/docs/WORKFLOWS.md),
including swagger annotations for new public HTTP endpoints.
- **Transactions and conversions**: Keep `InTx` work on the transaction
handle, and prefer explicit db-to-SDK converters. See
[Database Development Patterns](.claude/docs/DATABASE.md).
- **Testing**: Follow
[Testing Patterns and Best Practices](.claude/docs/TESTING.md). Use unique
identifiers in concurrent tests and do not use `time.Sleep` to mitigate
timing issues.
- **Frontend**: Read [Frontend Development Guidelines](site/AGENTS.md)
before changing anything under `site/`. Reuse shared UI primitives when
possible and prefer Storybook stories for component and page testing.
## Quick Reference
@@ -143,61 +103,26 @@ app, err := api.Database.GetOAuth2ProviderAppByClientID(ctx, clientID)
### Git Hooks (MANDATORY - DO NOT SKIP)
**You MUST install and use the git hooks. NEVER bypass them with
`--no-verify`. Skipping hooks wastes CI cycles and is unacceptable.**
You MUST install and use the git hooks. NEVER bypass them with
`--no-verify`. Skipping hooks wastes CI cycles and is unacceptable.
The first run will be slow as caches warm up. Consecutive runs are
**significantly faster** (often 10x) thanks to Go build cache,
generated file timestamps, and warm node_modules. This is NOT a
reason to skip them. Wait for hooks to complete before proceeding,
no matter how long they take.
The first run can be slow while caches warm up. Wait for hooks to complete,
even when `git commit` or `git push` appears to hang.
```sh
git config core.hooksPath scripts/githooks
```
Two hooks run automatically:
- **pre-commit**: Classifies staged files by type and runs either
the full `make pre-commit` or the lightweight `make pre-commit-light`
depending on whether Go, TypeScript, SQL, proto, or Makefile
changes are present. Falls back to the full target when
`CODER_HOOK_RUN_ALL=1` is set. A markdown-only commit takes
seconds; a Go change takes several minutes.
- **pre-push**: Classifies changed files (vs remote branch or
merge-base) and runs `make pre-push` when Go, TypeScript, SQL,
proto, or Makefile changes are detected. Skips tests entirely
for lightweight changes. Allowlisted in
`scripts/githooks/pre-push`. Runs only for developers who opt
in. Falls back to `make pre-push` when the diff range can't
be determined or `CODER_HOOK_RUN_ALL=1` is set. Allow at least
15 minutes for a full run.
`git commit` and `git push` will appear to hang while hooks run.
This is normal. Do not interrupt, retry, or reduce the timeout.
NEVER run `git config core.hooksPath` to change or disable hooks.
If a hook fails, fix the issue and retry. Do not work around the
failure by skipping the hook.
See [Development Workflows and Guidelines](.claude/docs/WORKFLOWS.md) for
hook setup, pre-commit behavior, pre-push behavior, and failure handling.
### Git Workflow
When working on existing PRs, check out the branch first:
```sh
git fetch origin
git checkout branch-name
git pull origin branch-name
```
Don't use `git push --force` unless explicitly requested.
When working on existing PRs, check out the branch first. See
[Development Workflows and Guidelines](.claude/docs/WORKFLOWS.md) for the
full workflow. Don't use `git push --force` unless explicitly requested.
### New Feature Checklist
- [ ] Run `git pull` to ensure latest code
- [ ] Check if feature touches database - you'll need migrations
- [ ] Check if feature touches audit logs - update `enterprise/audit/table.go`
See [Development Workflows and Guidelines](.claude/docs/WORKFLOWS.md) for
the new feature checklist, including `git pull`, database migration checks,
and audit table checks.
## Architecture
@@ -206,23 +131,6 @@ Don't use `git push --force` unless explicitly requested.
- **Agents**: Workspace services (SSH, port forwarding)
- **Database**: PostgreSQL with `dbauthz` authorization
## Testing
### Race Condition Prevention
- Use unique identifiers: `fmt.Sprintf("test-client-%s-%d", t.Name(), time.Now().UnixNano())`
- Never use hardcoded names in concurrent tests
### OAuth2 Testing
- Full suite: `./scripts/oauth2/test-mcp-oauth2.sh`
- Manual testing: `./scripts/oauth2/test-manual-flow.sh`
### Timing Issues
NEVER use `time.Sleep` to mitigate timing issues. If an issue
seems like it should use `time.Sleep`, read through https://github.com/coder/quartz and specifically the [README](https://github.com/coder/quartz/blob/main/README.md) to better understand how to handle timing issues.
## Code Style
### Detailed guidelines in imported WORKFLOWS.md
@@ -250,38 +158,11 @@ seems like it should use `time.Sleep`, read through https://github.com/coder/qua
`renderHook()` that do not require DOM assertions, and query/cache
operations with no rendered output.
### Writing Comments
### Writing Comments and Avoiding Unnecessary Changes
Code comments should be clear, well-formatted, and add meaningful context.
**Proper sentence structure**: Comments are sentences and should end with
periods or other appropriate punctuation. This improves readability and
maintains professional code standards.
**Explain why, not what**: Good comments explain the reasoning behind code
rather than describing what the code does. The code itself should be
self-documenting through clear naming and structure. Focus your comments on
non-obvious decisions, edge cases, or business logic that isn't immediately
apparent from reading the implementation.
**Line length and wrapping**: Keep comment lines to 80 characters wide
(including the comment prefix like `//` or `#`). When a comment spans multiple
lines, wrap it naturally at word boundaries rather than writing one sentence
per line. This creates more readable, paragraph-like blocks of documentation.
```go
// Good: Explains the rationale with proper sentence structure.
// We need a custom timeout here because workspace builds can take several
// minutes on slow networks, and the default 30s timeout causes false
// failures during initial template imports.
ctx, cancel := context.WithTimeout(ctx, 5*time.Minute)
// Bad: Describes what the code does without punctuation or wrapping
// Set a custom timeout
// Workspace builds can take a long time
// Default timeout is too short
ctx, cancel := context.WithTimeout(ctx, 5*time.Minute)
```
See [Modern Go](.claude/docs/GO.md) for comment formatting and the rule to
avoid unrelated edits. Preserve existing comments that explain non-obvious
behavior unless the task directly requires changing them.
### No Emdash or Endash
@@ -299,21 +180,6 @@ caught by `make lint/emdash`.
// This is slow, so we should cache it.
```
### Avoid Unnecessary Changes
When fixing a bug or adding a feature, don't modify code unrelated to your
task. Unnecessary changes make PRs harder to review and can introduce
regressions.
**Don't reword existing comments or code** unless the change is directly
motivated by your task. Rewording comments to be shorter or "cleaner" wastes
reviewer time and clutters the diff.
**Don't delete existing comments** that explain non-obvious behavior. These
comments preserve important context about why code works a certain way.
**When adding tests for new behavior**, read existing tests first to understand what's covered. Add new cases for uncovered behavior. Edit existing tests as needed, but don't change what they verify.
## Detailed Development Guides
@.claude/docs/ARCHITECTURE.md
@@ -330,18 +196,18 @@ manually before starting work:
**Always read:**
- `.claude/docs/WORKFLOWS.md` dev server, git workflow, hooks
- `.claude/docs/WORKFLOWS.md` - dev server, git workflow, hooks
**Read when relevant to your task:**
- `.claude/docs/GO.md` Go patterns and modern Go usage (any Go changes)
- `.claude/docs/TESTING.md` testing patterns, race conditions (any test changes)
- `.claude/docs/DATABASE.md` migrations, SQLC, audit table (any DB changes)
- `.claude/docs/ARCHITECTURE.md` system overview (orientation or architecture work)
- `.claude/docs/PR_STYLE_GUIDE.md` PR description format (when writing PRs)
- `.claude/docs/OAUTH2.md` OAuth2 and RFC compliance (when touching auth)
- `.claude/docs/TROUBLESHOOTING.md` common failures and fixes (when stuck)
- `.claude/docs/DOCS_STYLE_GUIDE.md` docs conventions (when writing `docs/`)
- `.claude/docs/GO.md` - Go patterns and modern Go usage (any Go changes)
- `.claude/docs/TESTING.md` - testing patterns, race conditions (any test changes)
- `.claude/docs/DATABASE.md` - migrations, SQLC, audit table (any DB changes)
- `.claude/docs/ARCHITECTURE.md` - system overview (orientation or architecture work)
- `.claude/docs/PR_STYLE_GUIDE.md` - PR description format (when writing PRs)
- `.claude/docs/OAUTH2.md` - OAuth2 and RFC compliance (when touching auth)
- `.claude/docs/TROUBLESHOOTING.md` - common failures and fixes (when stuck)
- `.claude/docs/DOCS_STYLE_GUIDE.md` - docs conventions (when writing `docs/`)
**For frontend work**, also read `site/AGENTS.md` before making any changes
in `site/`.