mirror of https://github.com/coder/coder.git synced 2026-06-02 20:48:20 +00:00

Files

T

Michael Suchacz 85792d08bc feat: add harness engineering layer for agent workflows (#24791 )

This PR adds an opinionated harness-engineering layer for agent-driven
workflows: a small set of agent-readable docs, mechanical structure
checks, structured CI failure summaries, an architecture-lint umbrella,
and per-worktree dev-server isolation. The goal is to make local dev,
tests, and CI mechanically inspectable by agents without changing app
runtime behavior.

## What landed

**Agent docs and navigation**
- `.claude/docs/OBSERVABILITY.md`, `.claude/docs/DEV_ISOLATION.md`,
`.claude/docs/AGENT_FAILURES.md`: task-oriented guides for logs,
tracing, Prometheus, dev-server isolation, and a seeded failure catalog.
- `AGENTS.md`: added an `Agent navigation` block, then trimmed the file
from 375 to 229 lines by migrating duplicated detail into
`WORKFLOWS.md`, `GO.md`, `TESTING.md`, and `DATABASE.md`. The
user-managed custom-instructions block is preserved.
- `.agents/docs`: symlink mirror of `.claude/docs` for agent runtimes
that look under `.agents`.

**Mechanical checks**
- `scripts/check_agents_structure.sh`: validates `@...` references in
tracked `AGENTS.md` files and warns when root grows past 600 lines.
Wired as `make lint/agents` and into `make lint`.
- `scripts/audit-agent-readiness.sh`: report-first audit of harness
readiness. Currently `10 ok, 0 warn, 0 fail`.
- `scripts/check_architecture.sh` / `make lint/architecture`: umbrella
architecture-lint target. Consolidates the existing
`check_enterprise_imports.sh` and `check_codersdk_imports.sh` so they
run exactly once via the umbrella. Slot is open for new high-confidence
rules.

**Structured CI failure summaries**
- `scripts/playwright-failure-summary.sh`: parses
`site/test-results/results.json` and writes Markdown to
`$GITHUB_STEP_SUMMARY` on failure. Wired into the `test-e2e` matrix job.
- `scripts/go-test-failure-summary.sh`: parses `go test -json`
line-delimited output the same way. Wired into `test-go-pg`,
`test-go-pg-17`, and `test-go-race-pg` by injecting `gotestsum
--jsonfile` in the workflow without touching `Makefile`. JSON also
uploaded as a CI artifact on failure.
- `site/e2e/playwright.config.ts`: enables `screenshot:
only-on-failure`, `trace: retain-on-failure`, JSON reporter, and HTML
reporter alongside existing reporters.
- `.github/workflows/ci.yaml`: failure artifact uploads for Playwright
now use `if: failure()` and predictable names
(`playwright-artifacts-<variant>-<sha>`).

**Per-worktree dev-server isolation** (`scripts/develop/main.go`)
- Deterministic FNV-64a hash of the worktree path produces a port offset
in `[0, 1000)` (50 buckets, step 20 to avoid API/proxy overlap across
adjacent buckets).
- Offset is applied only to defaults; both env vars (`CODER_DEV_PORT`,
`CODER_DEV_WEB_PORT`, `CODER_DEV_PROXY_PORT`,
`CODER_DEV_PROMETHEUS_PORT`) and CLI flags retain priority.
- Hardcoded ports `9090` (embedded Prometheus UI) and `12345` (Delve)
are unchanged by design.
- Startup banner shows each port's source: `default`, `offset`, or
`explicit`.
- Unit tests in `scripts/develop/main_test.go` cover determinism,
bounds, no-overlap across the four ports, and explicit-skip behavior.
- State (`.coderv2/`) was already worktree-isolated via `os.Getwd()`, so
no state-dir changes were needed.

## Validation

`make lint/agents`, `make lint/architecture`, `make lint/emdash`, `bash
scripts/audit-agent-readiness.sh` (10 ok, 0 warn, 0 fail), `shellcheck`
on all 5 new scripts, `go test ./scripts/develop/...`, and `js-yaml`
parse of `ci.yaml` all pass. Synthetic fixtures verify both
failure-summary scripts handle empty/missing input (silent exit 0),
ANSI-stripped output, and parent/subtest formatting.

## Known follow-ups (deferred)

- Frontend Storybook/Vitest failure summary: lowest-leverage slice of
the failure-summary work. Skipping until observed pain.
- Architecture lint currently only delegates to existing import checks;
new rules (`InTx` outer-store detection, swagger-annotation lint) plug
in as needed.
- 50 port-offset buckets means two worktree paths can occasionally
collide. The DEV_ISOLATION doc tells users to set the relevant env var
when this happens.

> Mux opened this PR on Mike's behalf.

2026-05-11 17:27:29 +02:00

5.8 KiB

Raw Blame History

Testing Patterns and Best Practices

Testing Best Practices

Avoiding Race Conditions

Unique Test Identifiers:
- Never use hardcoded names in concurrent tests
- Use time.Now().UnixNano() or similar for unique identifiers
- Example: fmt.Sprintf("test-client-%s-%d", t.Name(), time.Now().UnixNano())
Database Constraint Awareness:
- Understand unique constraints that can cause test conflicts
- Generate unique values for all constrained fields
- Test name isolation prevents cross-test interference

Testing Patterns

Use table-driven tests for comprehensive coverage
Mock external dependencies
Test both positive and negative cases
Use testutil.WaitLong for timeouts in tests

Timing Issues

NEVER use time.Sleep to mitigate timing issues. If an issue seems like it should use time.Sleep, read through https://github.com/coder/quartz and specifically the README to better understand how to handle timing issues.

Test Package Naming

Test packages: Use package_test naming (e.g., identityprovider_test) for black-box testing

RFC Protocol Testing

Compliance Test Coverage

Test all RFC-defined error codes and responses
Validate proper HTTP status codes for different scenarios
Test protocol-specific edge cases (URI formats, token formats, etc.)

Security Boundary Testing

Test client isolation and privilege separation
Verify information disclosure protections
Test token security and proper invalidation

Test Organization

Test File Structure

coderd/
├── oauth2.go                    # Implementation
├── oauth2_test.go              # Main tests
├── oauth2_test_helpers.go      # Test utilities
└── oauth2_validation.go        # Validation logic

Test Categories

Unit Tests: Test individual functions in isolation
Integration Tests: Test API endpoints with database
End-to-End Tests: Full workflow testing
Race Tests: Concurrent access testing

Test Commands

Running Tests

Command	Purpose
`make test`	Run all Go tests
`make test RUN=TestFunctionName`	Run specific test
`go test -v ./path/to/package -run TestFunctionName`	Run test with verbose output
`make test-race`	Run tests with Go race detector
`make test-e2e`	Run end-to-end tests

Frontend Testing

Command	Purpose
`pnpm test`	Run frontend tests
`pnpm check`	Run code checks

Common Testing Issues

SQL type errors - Use sql.Null* types for nullable fields
Race conditions in tests - Use unique identifiers instead of hardcoded names

OAuth2 Testing

PKCE tests failing - Verify both authorization code storage and token exchange handle PKCE fields
Resource indicator validation failing - Ensure database stores and retrieves resource parameters correctly

OAuth2 Test Scripts

Full suite: ./scripts/oauth2/test-mcp-oauth2.sh
Manual testing: ./scripts/oauth2/test-manual-flow.sh

General Issues

Missing newlines - Ensure files end with newline character
Package naming errors - Use package_test naming for test files
Log message formatting errors - Use lowercase, descriptive messages without special characters

Systematic Testing Approach

Multi-Issue Problem Solving

When facing multiple failing tests or complex integration issues:

Identify Root Causes:
- Run failing tests individually to isolate issues
- Use LSP tools to trace through call chains
- Check both compilation and runtime errors
Fix in Logical Order:
- Address compilation issues first (imports, syntax)
- Fix authorization and RBAC issues next
- Resolve business logic and validation issues
- Handle edge cases and race conditions last
Verification Strategy:
- Test each fix individually before moving to next issue
- Use make lint and make gen after database changes
- Verify RFC compliance with actual specifications
- Run comprehensive test suites before considering complete

Test Data Management

Unique Test Data

// Good: Unique identifiers prevent conflicts
clientName := fmt.Sprintf("test-client-%s-%d", t.Name(), time.Now().UnixNano())

// Bad: Hardcoded names cause race conditions
clientName := "test-client"

Test Cleanup

func TestSomething(t *testing.T) {
    // Setup
    client := coderdtest.New(t, nil)

    // Test code here

    // Cleanup happens automatically via t.Cleanup() in coderdtest
}

Test Utilities

Common Test Patterns

// Table-driven tests
tests := []struct {
    name     string
    input    InputType
    expected OutputType
    wantErr  bool
}{
    {
        name:     "valid input",
        input:    validInput,
        expected: expectedOutput,
        wantErr:  false,
    },
    // ... more test cases
}

for _, tt := range tests {
    t.Run(tt.name, func(t *testing.T) {
        result, err := functionUnderTest(tt.input)
        if tt.wantErr {
            require.Error(t, err)
            return
        }
        require.NoError(t, err)
        require.Equal(t, tt.expected, result)
    })
}

Test Assertions

// Use testify/require for assertions
require.NoError(t, err)
require.Equal(t, expected, actual)
require.NotNil(t, result)
require.True(t, condition)

Performance Testing

Load Testing

Use scaletest/ directory for load testing scenarios
Run ./scaletest/scaletest.sh for performance testing

Benchmarking

func BenchmarkFunction(b *testing.B) {
    for i := 0; i < b.N; i++ {
        // Function call to benchmark
        _ = functionUnderTest(input)
    }
}

Run benchmarks with:

go test -bench=. -benchmem ./package/path

5.8 KiB Raw Blame History