Files
coder/.claude/docs/AGENT_FAILURES.md
T
Michael Suchacz bb8c40e764 feat: stream go test failure summary and drop raw json artifact (#25146)
This follows up on
https://github.com/coder/coder/actions/runs/25684936801/job/75406131184?pr=25139
by replacing the large raw Go test JSON artifact with inline structured
summaries and a compact failures-only artifact.

## What changed

- Added `scripts/gotestsummary`, a streaming Go tool that reads
gotestsum JSON and renders failed tests as Markdown.
- Updated the three Go test jobs to publish per-test `<details>`
sections in the job summary.
- Removed upload of the raw `go-test.json` artifact.
- Added upload of `go-test-failures-*.ndjson` with compact failure
records for deeper inspection.
- Deleted the old bash and `jq` summary script.

## Why

- The previous raw artifact was about 35 MB compressed and 445 MB raw in
the linked run.
- Passing-test output made the artifact noisy and slow to inspect.
- The old summary truncated output to 600 characters.
- The new path keeps streaming, bounded output and writes structured
diagnostics for only final failed tests.

## Validation

- `gofmt -w scripts/gotestsummary`
- `gofmt -l scripts/gotestsummary`
- `go test ./scripts/gotestsummary/...`
- `go vet ./scripts/gotestsummary/...`
- `grep -rn 'go-test-failure-summary.sh' . || true`
- `grep -rn 'go-test-failure-summary.sh\|go-test.json\|go-test-json-'
.claude .agents docs AGENTS.md || true`
- `make lint/agents`
- `make lint/emdash`
- `make lint/markdown`
- `make lint/shellcheck`
- `git diff --check origin/main..HEAD`

> This PR was prepared by Mux working on Mike's behalf.
2026-05-12 00:08:37 +02:00

142 lines
7.1 KiB
Markdown

# Agent Failure Catalog
Use this catalog for repeatable agent failures. Keep each entry short,
actionable, and tied to existing docs or tools. Use the exact entry format
shown below when adding new failures.
```markdown
## Symptom: <short description>
- Likely cause:
- How to reproduce:
- How to diagnose:
- Existing docs or tools:
- Missing harness piece:
- Proposed prevention:
```
## Symptom: Stale generated DB code after SQL changes
- Likely cause: A query or migration changed without running `make gen`.
- How to reproduce: Modify `coderd/database/queries/*.sql` and run tests or
builds without regenerating `coderd/database/queries.sql.go` and related
generated files.
- How to diagnose: Check `git diff` for SQL changes without generated Go
changes. Run `make gen` and inspect the resulting diff.
- Existing docs or tools: `AGENTS.md`, [Database Development Patterns](DATABASE.md),
and the `make gen` target.
- Missing harness piece: No preflight doc checklist currently points agents at
generated DB drift before they run unrelated checks.
- Proposed prevention: Always run `make gen` after database query or migration
edits, then include the generated diff in the same commit.
## Symptom: Missing audit table updates
- Likely cause: A database schema change affects audited data but
`enterprise/audit/table.go` was not updated.
- How to reproduce: Add or change a table that audit logging expects, run
`make gen`, and observe audit-related generation or test failures.
- How to diagnose: Inspect the `make gen` failure, then compare the changed
database tables with `enterprise/audit/table.go`.
- Existing docs or tools: `AGENTS.md`, [Database Development Patterns](DATABASE.md),
and `make gen`.
- Missing harness piece: Agents need a failure catalog entry that connects
generation failures to audit table maintenance.
- Proposed prevention: After database changes, run `make gen`, update
`enterprise/audit/table.go` when generation reports audit drift, and rerun
`make gen`.
## Symptom: Playwright failure without artifacts
- Likely cause: The failing run did not preserve screenshots, traces, videos,
browser console output, or the Playwright report path.
- How to reproduce: Run a Playwright test from `site` with
`pnpm playwright:test`, let it fail, and discard the generated output before
reporting the failure.
- How to diagnose: Check `site/e2e/playwright.config.ts`, `site/e2e/README.md`,
and the terminal output for the report or `test-results` location.
- Existing docs or tools: [Frontend Development Guidelines](../../site/AGENTS.md),
`site/e2e/README.md`, and `pnpm playwright:test`.
- Missing harness piece: No central checklist tells agents which browser
artifacts must be attached to a failure report.
- Proposed prevention: Capture the Playwright report path, screenshot, trace,
video, browser console output, and command output before retrying or cleaning
the workspace.
## Symptom: Go test failure without preserved diagnostics
- Likely cause: The failing CI job summary or compact failures artifact was
discarded before reporting or retrying the failure.
- How to reproduce: Let a Go test job fail in CI, then report the failure using
only the final job status instead of the job summary and artifacts.
- How to diagnose: Open the failed Go test job summary for the inline failure
table and per-test details. Download `go-test-failures-*.ndjson` for deeper
inspection of the compact failures-only records.
- Existing docs or tools: `.github/workflows/ci.yaml` Go test jobs and
`scripts/gotestsummary`.
- Missing harness piece: Agents need a central reminder to preserve the small
Go test diagnostics artifact instead of the old raw test log.
- Proposed prevention: Attach or summarize the inline job summary and preserve
`go-test-failures-*.ndjson` when reporting CI Go test failures.
## Symptom: Port collision across worktrees
- Likely cause: Multiple worktrees use the same default develop ports.
- How to reproduce: Start `./scripts/develop.sh` in one worktree, then start it
in another worktree without overriding ports.
- How to diagnose: Look for `port <n> is already in use` or conflict errors in
the develop output. Check listeners with `lsof -iTCP:<port> -sTCP:LISTEN`.
- Existing docs or tools: [Development Isolation Guide for Agents](DEV_ISOLATION.md)
and `scripts/develop/main.go`.
- Missing harness piece: There is no automatic per-worktree port allocator.
- Proposed prevention: Assign each worktree a unique `CODER_DEV_PORT`,
`CODER_DEV_WEB_PORT`, `CODER_DEV_PROXY_PORT`, and
`CODER_DEV_PROMETHEUS_PORT` before starting the app.
## Symptom: Test using `time.Sleep`
- Likely cause: A test waits for time to pass instead of synchronizing on a
deterministic condition or using the quartz clock.
- How to reproduce: Add a test that depends on `time.Sleep`, then run it under
load or with the race detector until it flakes.
- How to diagnose: Search the test diff for `time.Sleep`. Inspect whether the
code under test can use `quartz` or another explicit synchronization point.
- Existing docs or tools: `AGENTS.md`, [Testing Patterns and Best Practices](TESTING.md),
and the quartz README referenced from `AGENTS.md`.
- Missing harness piece: Agents need a failure entry that labels sleep-based
waiting as a flake risk before review.
- Proposed prevention: Replace `time.Sleep` with a fake clock, trapped ticker,
channel, poll with timeout, or another deterministic signal.
## Symptom: DB work inside `InTx` uses the outer store
- Likely cause: Code inside a transaction closure calls `api.Database`, `p.db`,
or a helper that uses the outer store instead of the `tx` handle.
- How to reproduce: Add DB work inside `db.InTx(...)` that calls back into the
outer store, then exercise it under concurrent load.
- How to diagnose: Inspect the closure and helper call graph for database calls
that do not use the transaction handle. Look for pool waits, idle in
transaction symptoms, or deadlocks under load.
- Existing docs or tools: `AGENTS.md`, [Database Development Patterns](DATABASE.md),
and code review of `InTx` closures.
- Missing harness piece: No automated check currently proves every helper used
inside `InTx` stays on the transaction handle.
- Proposed prevention: Fetch read-only inputs before opening the transaction,
pass `tx` into helpers that need DB access, and avoid receiver helpers that
hide outer-store usage.
## Symptom: New API endpoint missing swagger annotations
- Likely cause: A handler or route was added without matching swagger comments.
- How to reproduce: Add a stable HTTP endpoint and skip `@Summary`, `@Router`,
or related annotations.
- How to diagnose: Compare the new handler with nearby handlers and inspect
generated API docs for the route.
- Existing docs or tools: `AGENTS.md`, [Documentation Style Guide](DOCS_STYLE_GUIDE.md),
and API generation checks.
- Missing harness piece: Agents need a doc reminder that endpoint work includes
docs unless the route is intentionally experimental.
- Proposed prevention: Add swagger annotations in the same change as stable
endpoints. For experimental or unstable API paths, add
`// @x-apidocgen {"skip": true}` after `@Router`.