Files
coder/.claude/docs/TESTING.md
T
Michael Suchacz 85792d08bc feat: add harness engineering layer for agent workflows (#24791)
This PR adds an opinionated harness-engineering layer for agent-driven
workflows: a small set of agent-readable docs, mechanical structure
checks, structured CI failure summaries, an architecture-lint umbrella,
and per-worktree dev-server isolation. The goal is to make local dev,
tests, and CI mechanically inspectable by agents without changing app
runtime behavior.

## What landed

**Agent docs and navigation**
- `.claude/docs/OBSERVABILITY.md`, `.claude/docs/DEV_ISOLATION.md`,
`.claude/docs/AGENT_FAILURES.md`: task-oriented guides for logs,
tracing, Prometheus, dev-server isolation, and a seeded failure catalog.
- `AGENTS.md`: added an `Agent navigation` block, then trimmed the file
from 375 to 229 lines by migrating duplicated detail into
`WORKFLOWS.md`, `GO.md`, `TESTING.md`, and `DATABASE.md`. The
user-managed custom-instructions block is preserved.
- `.agents/docs`: symlink mirror of `.claude/docs` for agent runtimes
that look under `.agents`.

**Mechanical checks**
- `scripts/check_agents_structure.sh`: validates `@...` references in
tracked `AGENTS.md` files and warns when root grows past 600 lines.
Wired as `make lint/agents` and into `make lint`.
- `scripts/audit-agent-readiness.sh`: report-first audit of harness
readiness. Currently `10 ok, 0 warn, 0 fail`.
- `scripts/check_architecture.sh` / `make lint/architecture`: umbrella
architecture-lint target. Consolidates the existing
`check_enterprise_imports.sh` and `check_codersdk_imports.sh` so they
run exactly once via the umbrella. Slot is open for new high-confidence
rules.

**Structured CI failure summaries**
- `scripts/playwright-failure-summary.sh`: parses
`site/test-results/results.json` and writes Markdown to
`$GITHUB_STEP_SUMMARY` on failure. Wired into the `test-e2e` matrix job.
- `scripts/go-test-failure-summary.sh`: parses `go test -json`
line-delimited output the same way. Wired into `test-go-pg`,
`test-go-pg-17`, and `test-go-race-pg` by injecting `gotestsum
--jsonfile` in the workflow without touching `Makefile`. JSON also
uploaded as a CI artifact on failure.
- `site/e2e/playwright.config.ts`: enables `screenshot:
only-on-failure`, `trace: retain-on-failure`, JSON reporter, and HTML
reporter alongside existing reporters.
- `.github/workflows/ci.yaml`: failure artifact uploads for Playwright
now use `if: failure()` and predictable names
(`playwright-artifacts-<variant>-<sha>`).

**Per-worktree dev-server isolation** (`scripts/develop/main.go`)
- Deterministic FNV-64a hash of the worktree path produces a port offset
in `[0, 1000)` (50 buckets, step 20 to avoid API/proxy overlap across
adjacent buckets).
- Offset is applied only to defaults; both env vars (`CODER_DEV_PORT`,
`CODER_DEV_WEB_PORT`, `CODER_DEV_PROXY_PORT`,
`CODER_DEV_PROMETHEUS_PORT`) and CLI flags retain priority.
- Hardcoded ports `9090` (embedded Prometheus UI) and `12345` (Delve)
are unchanged by design.
- Startup banner shows each port's source: `default`, `offset`, or
`explicit`.
- Unit tests in `scripts/develop/main_test.go` cover determinism,
bounds, no-overlap across the four ports, and explicit-skip behavior.
- State (`.coderv2/`) was already worktree-isolated via `os.Getwd()`, so
no state-dir changes were needed.

## Validation

`make lint/agents`, `make lint/architecture`, `make lint/emdash`, `bash
scripts/audit-agent-readiness.sh` (10 ok, 0 warn, 0 fail), `shellcheck`
on all 5 new scripts, `go test ./scripts/develop/...`, and `js-yaml`
parse of `ci.yaml` all pass. Synthetic fixtures verify both
failure-summary scripts handle empty/missing input (silent exit 0),
ANSI-stripped output, and parent/subtest formatting.

## Known follow-ups (deferred)

- Frontend Storybook/Vitest failure summary: lowest-leverage slice of
the failure-summary work. Skipping until observed pain.
- Architecture lint currently only delegates to existing import checks;
new rules (`InTx` outer-store detection, swagger-annotation lint) plug
in as needed.
- 50 port-offset buckets means two worktree paths can occasionally
collide. The DEV_ISOLATION doc tells users to set the relevant env var
when this happens.

> Mux opened this PR on Mike's behalf.
2026-05-11 17:27:29 +02:00

5.8 KiB

Testing Patterns and Best Practices

Testing Best Practices

Avoiding Race Conditions

  1. Unique Test Identifiers:

    • Never use hardcoded names in concurrent tests
    • Use time.Now().UnixNano() or similar for unique identifiers
    • Example: fmt.Sprintf("test-client-%s-%d", t.Name(), time.Now().UnixNano())
  2. Database Constraint Awareness:

    • Understand unique constraints that can cause test conflicts
    • Generate unique values for all constrained fields
    • Test name isolation prevents cross-test interference

Testing Patterns

  • Use table-driven tests for comprehensive coverage
  • Mock external dependencies
  • Test both positive and negative cases
  • Use testutil.WaitLong for timeouts in tests

Timing Issues

NEVER use time.Sleep to mitigate timing issues. If an issue seems like it should use time.Sleep, read through https://github.com/coder/quartz and specifically the README to better understand how to handle timing issues.

Test Package Naming

  • Test packages: Use package_test naming (e.g., identityprovider_test) for black-box testing

RFC Protocol Testing

Compliance Test Coverage

  1. Test all RFC-defined error codes and responses
  2. Validate proper HTTP status codes for different scenarios
  3. Test protocol-specific edge cases (URI formats, token formats, etc.)

Security Boundary Testing

  1. Test client isolation and privilege separation
  2. Verify information disclosure protections
  3. Test token security and proper invalidation

Test Organization

Test File Structure

coderd/
├── oauth2.go                    # Implementation
├── oauth2_test.go              # Main tests
├── oauth2_test_helpers.go      # Test utilities
└── oauth2_validation.go        # Validation logic

Test Categories

  1. Unit Tests: Test individual functions in isolation
  2. Integration Tests: Test API endpoints with database
  3. End-to-End Tests: Full workflow testing
  4. Race Tests: Concurrent access testing

Test Commands

Running Tests

Command Purpose
make test Run all Go tests
make test RUN=TestFunctionName Run specific test
go test -v ./path/to/package -run TestFunctionName Run test with verbose output
make test-race Run tests with Go race detector
make test-e2e Run end-to-end tests

Frontend Testing

Command Purpose
pnpm test Run frontend tests
pnpm check Run code checks

Common Testing Issues

  1. SQL type errors - Use sql.Null* types for nullable fields
  2. Race conditions in tests - Use unique identifiers instead of hardcoded names

OAuth2 Testing

  1. PKCE tests failing - Verify both authorization code storage and token exchange handle PKCE fields
  2. Resource indicator validation failing - Ensure database stores and retrieves resource parameters correctly

OAuth2 Test Scripts

  • Full suite: ./scripts/oauth2/test-mcp-oauth2.sh
  • Manual testing: ./scripts/oauth2/test-manual-flow.sh

General Issues

  1. Missing newlines - Ensure files end with newline character
  2. Package naming errors - Use package_test naming for test files
  3. Log message formatting errors - Use lowercase, descriptive messages without special characters

Systematic Testing Approach

Multi-Issue Problem Solving

When facing multiple failing tests or complex integration issues:

  1. Identify Root Causes:

    • Run failing tests individually to isolate issues
    • Use LSP tools to trace through call chains
    • Check both compilation and runtime errors
  2. Fix in Logical Order:

    • Address compilation issues first (imports, syntax)
    • Fix authorization and RBAC issues next
    • Resolve business logic and validation issues
    • Handle edge cases and race conditions last
  3. Verification Strategy:

    • Test each fix individually before moving to next issue
    • Use make lint and make gen after database changes
    • Verify RFC compliance with actual specifications
    • Run comprehensive test suites before considering complete

Test Data Management

Unique Test Data

// Good: Unique identifiers prevent conflicts
clientName := fmt.Sprintf("test-client-%s-%d", t.Name(), time.Now().UnixNano())

// Bad: Hardcoded names cause race conditions
clientName := "test-client"

Test Cleanup

func TestSomething(t *testing.T) {
    // Setup
    client := coderdtest.New(t, nil)

    // Test code here

    // Cleanup happens automatically via t.Cleanup() in coderdtest
}

Test Utilities

Common Test Patterns

// Table-driven tests
tests := []struct {
    name     string
    input    InputType
    expected OutputType
    wantErr  bool
}{
    {
        name:     "valid input",
        input:    validInput,
        expected: expectedOutput,
        wantErr:  false,
    },
    // ... more test cases
}

for _, tt := range tests {
    t.Run(tt.name, func(t *testing.T) {
        result, err := functionUnderTest(tt.input)
        if tt.wantErr {
            require.Error(t, err)
            return
        }
        require.NoError(t, err)
        require.Equal(t, tt.expected, result)
    })
}

Test Assertions

// Use testify/require for assertions
require.NoError(t, err)
require.Equal(t, expected, actual)
require.NotNil(t, result)
require.True(t, condition)

Performance Testing

Load Testing

  • Use scaletest/ directory for load testing scenarios
  • Run ./scaletest/scaletest.sh for performance testing

Benchmarking

func BenchmarkFunction(b *testing.B) {
    for i := 0; i < b.N; i++ {
        // Function call to benchmark
        _ = functionUnderTest(input)
    }
}

Run benchmarks with:

go test -bench=. -benchmem ./package/path