diff --git a/.agents/docs b/.agents/docs new file mode 120000 index 0000000000..daf0269c61 --- /dev/null +++ b/.agents/docs @@ -0,0 +1 @@ +../.claude/docs \ No newline at end of file diff --git a/.claude/docs/AGENT_FAILURES.md b/.claude/docs/AGENT_FAILURES.md new file mode 100644 index 0000000000..62e651f2a8 --- /dev/null +++ b/.claude/docs/AGENT_FAILURES.md @@ -0,0 +1,125 @@ +# Agent Failure Catalog + +Use this catalog for repeatable agent failures. Keep each entry short, +actionable, and tied to existing docs or tools. Use the exact entry format +shown below when adding new failures. + +```markdown +## Symptom: + +- Likely cause: +- How to reproduce: +- How to diagnose: +- Existing docs or tools: +- Missing harness piece: +- Proposed prevention: +``` + +## Symptom: Stale generated DB code after SQL changes + +- Likely cause: A query or migration changed without running `make gen`. +- How to reproduce: Modify `coderd/database/queries/*.sql` and run tests or + builds without regenerating `coderd/database/queries.sql.go` and related + generated files. +- How to diagnose: Check `git diff` for SQL changes without generated Go + changes. Run `make gen` and inspect the resulting diff. +- Existing docs or tools: `AGENTS.md`, [Database Development Patterns](DATABASE.md), + and the `make gen` target. +- Missing harness piece: No preflight doc checklist currently points agents at + generated DB drift before they run unrelated checks. +- Proposed prevention: Always run `make gen` after database query or migration + edits, then include the generated diff in the same commit. + +## Symptom: Missing audit table updates + +- Likely cause: A database schema change affects audited data but + `enterprise/audit/table.go` was not updated. +- How to reproduce: Add or change a table that audit logging expects, run + `make gen`, and observe audit-related generation or test failures. +- How to diagnose: Inspect the `make gen` failure, then compare the changed + database tables with `enterprise/audit/table.go`. +- Existing docs or tools: `AGENTS.md`, [Database Development Patterns](DATABASE.md), + and `make gen`. +- Missing harness piece: Agents need a failure catalog entry that connects + generation failures to audit table maintenance. +- Proposed prevention: After database changes, run `make gen`, update + `enterprise/audit/table.go` when generation reports audit drift, and rerun + `make gen`. + +## Symptom: Playwright failure without artifacts + +- Likely cause: The failing run did not preserve screenshots, traces, videos, + browser console output, or the Playwright report path. +- How to reproduce: Run a Playwright test from `site` with + `pnpm playwright:test`, let it fail, and discard the generated output before + reporting the failure. +- How to diagnose: Check `site/e2e/playwright.config.ts`, `site/e2e/README.md`, + and the terminal output for the report or `test-results` location. +- Existing docs or tools: [Frontend Development Guidelines](../../site/AGENTS.md), + `site/e2e/README.md`, and `pnpm playwright:test`. +- Missing harness piece: No central checklist tells agents which browser + artifacts must be attached to a failure report. +- Proposed prevention: Capture the Playwright report path, screenshot, trace, + video, browser console output, and command output before retrying or cleaning + the workspace. + +## Symptom: Port collision across worktrees + +- Likely cause: Multiple worktrees use the same default develop ports. +- How to reproduce: Start `./scripts/develop.sh` in one worktree, then start it + in another worktree without overriding ports. +- How to diagnose: Look for `port is already in use` or conflict errors in + the develop output. Check listeners with `lsof -iTCP: -sTCP:LISTEN`. +- Existing docs or tools: [Development Isolation Guide for Agents](DEV_ISOLATION.md) + and `scripts/develop/main.go`. +- Missing harness piece: There is no automatic per-worktree port allocator. +- Proposed prevention: Assign each worktree a unique `CODER_DEV_PORT`, + `CODER_DEV_WEB_PORT`, `CODER_DEV_PROXY_PORT`, and + `CODER_DEV_PROMETHEUS_PORT` before starting the app. + +## Symptom: Test using `time.Sleep` + +- Likely cause: A test waits for time to pass instead of synchronizing on a + deterministic condition or using the quartz clock. +- How to reproduce: Add a test that depends on `time.Sleep`, then run it under + load or with the race detector until it flakes. +- How to diagnose: Search the test diff for `time.Sleep`. Inspect whether the + code under test can use `quartz` or another explicit synchronization point. +- Existing docs or tools: `AGENTS.md`, [Testing Patterns and Best Practices](TESTING.md), + and the quartz README referenced from `AGENTS.md`. +- Missing harness piece: Agents need a failure entry that labels sleep-based + waiting as a flake risk before review. +- Proposed prevention: Replace `time.Sleep` with a fake clock, trapped ticker, + channel, poll with timeout, or another deterministic signal. + +## Symptom: DB work inside `InTx` uses the outer store + +- Likely cause: Code inside a transaction closure calls `api.Database`, `p.db`, + or a helper that uses the outer store instead of the `tx` handle. +- How to reproduce: Add DB work inside `db.InTx(...)` that calls back into the + outer store, then exercise it under concurrent load. +- How to diagnose: Inspect the closure and helper call graph for database calls + that do not use the transaction handle. Look for pool waits, idle in + transaction symptoms, or deadlocks under load. +- Existing docs or tools: `AGENTS.md`, [Database Development Patterns](DATABASE.md), + and code review of `InTx` closures. +- Missing harness piece: No automated check currently proves every helper used + inside `InTx` stays on the transaction handle. +- Proposed prevention: Fetch read-only inputs before opening the transaction, + pass `tx` into helpers that need DB access, and avoid receiver helpers that + hide outer-store usage. + +## Symptom: New API endpoint missing swagger annotations + +- Likely cause: A handler or route was added without matching swagger comments. +- How to reproduce: Add a stable HTTP endpoint and skip `@Summary`, `@Router`, + or related annotations. +- How to diagnose: Compare the new handler with nearby handlers and inspect + generated API docs for the route. +- Existing docs or tools: `AGENTS.md`, [Documentation Style Guide](DOCS_STYLE_GUIDE.md), + and API generation checks. +- Missing harness piece: Agents need a doc reminder that endpoint work includes + docs unless the route is intentionally experimental. +- Proposed prevention: Add swagger annotations in the same change as stable + endpoints. For experimental or unstable API paths, add + `// @x-apidocgen {"skip": true}` after `@Router`. diff --git a/.claude/docs/DATABASE.md b/.claude/docs/DATABASE.md index 0bbca221db..84f6125fa4 100644 --- a/.claude/docs/DATABASE.md +++ b/.claude/docs/DATABASE.md @@ -34,6 +34,13 @@ - **MUST DO**: Queries are grouped in files relating to context - e.g. `prebuilds.sql`, `users.sql`, `oauth2.sql` - After making changes to any `coderd/database/queries/*.sql` files you must run `make gen` to generate respective ORM changes +### Query Naming + +- Use `ByX` when `X` is the lookup or filter column. +- Use `PerX` or `GroupedByX` when `X` is the aggregation or grouping + dimension. +- Avoid `ByX` names for grouped queries. + ## Handling Nullable Fields Use `sql.NullString`, `sql.NullBool`, etc. for optional database fields: @@ -47,6 +54,13 @@ CodeChallenge: sql.NullString{ Set `.Valid = true` when providing values. +## Database-to-SDK Conversions + +- Extract explicit db-to-SDK conversion helpers instead of inlining large + conversion blocks inside handlers. +- Keep nullable-field handling, type coercion, and response shaping in the + converter so handlers stay focused on request flow and authorization. + ## Audit Table Updates If adding fields to auditable types: @@ -129,6 +143,19 @@ func TestDatabaseFunction(t *testing.T) { 3. **Use transactions**: For related operations that must succeed together 4. **Optimize queries**: Use EXPLAIN to understand query performance +### Transaction Safety with `InTx` + +- Inside `db.InTx(...)` closures, do not use the outer store + (`api.Database`, `p.db`, etc.) directly or indirectly. Use the `tx` + handle for DB work inside the closure, or fetch read-only inputs before + opening the transaction. +- Watch for helper methods on a receiver that hide outer-store access. A + call like `p.someHelper(ctx)` is still unsafe inside `InTx` if that + helper uses `p.db` internally. +- Using the outer store while a transaction is open can hold one + connection and then block on another pool checkout, which can cause + pool starvation and `idle in transaction` incidents under load. + ### Migration Writing 1. **Make migrations reversible**: Always include down migration diff --git a/.claude/docs/DEV_ISOLATION.md b/.claude/docs/DEV_ISOLATION.md new file mode 100644 index 0000000000..770f2fd527 --- /dev/null +++ b/.claude/docs/DEV_ISOLATION.md @@ -0,0 +1,131 @@ +# Development Isolation Guide for Agents + +This guide documents the local resources that the existing harness uses. It is +for avoiding collisions across worktrees and cleaning up after failed runs. Do +not add new readiness or debug endpoints for these workflows. + +## Default local ports + +`scripts/develop/main.go` defines these base defaults: + +| Resource | Base default | Override | +|----------|--------------|----------| +| API server | `3000` | `--port`, `CODER_DEV_PORT` | +| Frontend dev server | `8080` | `--web-port`, `CODER_DEV_WEB_PORT` | +| Workspace proxy | `3010` | `--proxy-port`, `CODER_DEV_PROXY_PORT` | +| Coder Prometheus metrics | `2114` | `--prometheus-port`, `CODER_DEV_PROMETHEUS_PORT` | +| Embedded Prometheus UI | `9090` | Fixed in `scripts/develop/main.go` | +| Delve debugger | `12345` | Fixed when `--debug` is used | + +By default, plain `./scripts/develop.sh` uses the base defaults exactly: +`3000`, `8080`, `3010`, and `2114` for Coder Prometheus metrics. Set +`--port-offset` or `CODER_DEV_PORT_OFFSET=true` to opt in to a deterministic +per-worktree offset for API, frontend, workspace proxy, and Coder Prometheus +metrics ports. + +When enabled, the develop script hashes the project root with FNV-64a, maps it +into one of 50 buckets, multiplies by 20, and adds that value to each unset base +default. The same worktree path always gets the same effective ports. A flag or +environment variable overrides only that port. Other unset ports still receive +the opt-in offset. The workspace proxy is only started when `--use-proxy` is +set. The embedded Prometheus UI is only started when `--prometheus-server` or +`CODER_DEV_PROMETHEUS_SERVER` is set, Docker is available, and the host is +Linux. The Prometheus UI port `9090` and Delve port `12345` remain hardcoded. + +## Other useful develop flags and environment variables + +The develop script also supports these existing flags and environment +variables: + +| Purpose | Flag | Environment variable | +|---------|------|----------------------| +| Per-worktree port offset | `--port-offset` | `CODER_DEV_PORT_OFFSET` | +| Access URL | `--access-url` | `CODER_DEV_ACCESS_URL` | +| Admin password | `--password` | `CODER_DEV_ADMIN_PASSWORD` | +| Starter template | `--starter-template` | `CODER_DEV_STARTER_TEMPLATE` | +| Roll back missing migrations | `--db-rollback` | `CODER_DEV_DB_ROLLBACK` | +| Reset the development database | `--db-reset` | `CODER_DEV_DB_RESET` | +| Accept changed migration tracking | `--db-continue` | `CODER_DEV_DB_CONTINUE` | + +Extra `coder server` flags can be passed after `--`. For example, +`./scripts/develop.sh -- --trace` passes `--trace` to the API server. + +## Multi-worktree guidance + +Each worktree gets its own `.coderv2` directory because `scripts/develop.sh` +sets the global config directory to `/.coderv2`. This isolates +built-in Postgres data, local session data, and Prometheus container storage on +disk. + +The configurable develop ports use canonical defaults unless you opt in with +`--port-offset` or `CODER_DEV_PORT_OFFSET=true`. Enable the offset when running +multiple worktrees in parallel and you want most concurrent runs to avoid manual +port selection. When the offset is enabled, the startup banner prints the +effective API, web, proxy, and Coder metrics ports with their offset status. + +Use overrides when you need fixed ports or when two worktree paths hash to the +same offset. For example: + +```sh +CODER_DEV_PORT=3100 \ +CODER_DEV_WEB_PORT=8180 \ +CODER_DEV_PROXY_PORT=3110 \ +CODER_DEV_PROMETHEUS_PORT=2214 \ +./scripts/develop.sh --use-proxy +``` + +If you also need the embedded Prometheus UI in more than one worktree, use only +one at a time. The UI port is fixed at `9090`, and the Docker container name is +fixed to `coder-prometheus`. Delve is fixed at `127.0.0.1:12345` when `--debug` +is used. + +## Known collision risks + +- Two worktree paths can hash to the same opt-in offset. If preflight reports a + busy effective port, set the relevant `CODER_DEV_*` environment variables or + flags for one worktree. +- The embedded Prometheus UI always uses port `9090`. +- The embedded Prometheus Docker container name is always `coder-prometheus`. +- The Delve debugger always listens on `127.0.0.1:12345` when `--debug` is + used. +- The develop script only checks the proxy port when `--use-proxy` is set, so + a stale process on the effective proxy port can go unnoticed until the proxy + is enabled. +- External databases configured through `CODER_PG_CONNECTION_URL` are shared if + multiple worktrees point at the same database. + +## Readiness without new probes + +Do not invent a new readiness probe. The develop script already waits for the +API server to answer `GET /healthz` for up to 60 seconds, then logs `server is +ready to accept connections`. After setup completes, it prints a banner with +`Coder is now running in development mode`, the effective port list, and the API +and Web UI URLs. + +For agent-driven runs, treat the banner as the ready signal for browser work. +If the banner does not appear, inspect the preceding `api`, `site`, database +recovery, and port conflict logs. + +## Cleanup + +Use the least destructive cleanup that fixes the problem: + +1. Stop `./scripts/develop.sh` with `Ctrl+C` so child processes receive the + orchestrator shutdown signal. +2. If a child process remains, identify it with `lsof -iTCP: -sTCP:LISTEN` + or `ps`, then terminate only that stale process. +3. To reset the built-in development database for the current worktree, rerun + with `./scripts/develop.sh --db-reset` or remove `.coderv2/postgres` after + stopping the app. +4. To clear local Coder session and generated state for the current worktree, + remove the specific files under `.coderv2` that are relevant to the failure. +5. To clean the embedded Prometheus container, stop the develop script first, + then remove the `coder-prometheus` container if it remains. +6. To clean test databases, prefer the owning test harness cleanup. If tests + were interrupted, inspect the local PostgreSQL instance used by the test + suite before dropping any database. + +For database migration mismatches, prefer the develop script's recovery flags +before deleting state. Use `--db-rollback` when a migration disappeared from the +current branch, `--db-continue` after you manually reconcile changed migration +tracking, and `--db-reset` only when data loss is acceptable. diff --git a/.claude/docs/GO.md b/.claude/docs/GO.md index 65511709e2..a9e2631533 100644 --- a/.claude/docs/GO.md +++ b/.claude/docs/GO.md @@ -1,10 +1,59 @@ -# Modern Go (1.18–1.26) +# Modern Go (1.18-1.26) Reference for writing idiomatic Go. Covers what changed, what it replaced, and what to reach for. Respect the project's `go.mod` `go` line: don't emit features from a version newer than what the module declares. Check `go.mod` before writing code. +## Go LSP Navigation + +Use Go LSP tools first for backend code navigation: + +- **Find definitions**: `mcp__go-language-server__definition symbolName` +- **Find references**: `mcp__go-language-server__references symbolName` +- **Get type info**: `mcp__go-language-server__hover filePath line column` +- **Rename symbol**: `mcp__go-language-server__rename_symbol filePath line column newName` + +## Code Comments + +Code comments should be clear, well-formatted, and add meaningful context. + +- Comments are sentences and should end with periods or other appropriate + punctuation. +- Explain why, not what. The code itself should be self-documenting + through clear naming and structure. Focus comments on non-obvious + decisions, edge cases, or business logic. +- Keep comment lines to 80 characters wide, including the comment prefix + like `//` or `#`. When a comment spans multiple lines, wrap it + naturally at word boundaries. + +```go +// Good: Explains the rationale with proper sentence structure. +// We need a custom timeout here because workspace builds can take several +// minutes on slow networks, and the default 30s timeout causes false +// failures during initial template imports. +ctx, cancel := context.WithTimeout(ctx, 5*time.Minute) + +// Bad: Describes what the code does without punctuation or wrapping. +// Set a custom timeout +// Workspace builds can take a long time +// Default timeout is too short +ctx, cancel := context.WithTimeout(ctx, 5*time.Minute) +``` + +## Avoid Unnecessary Changes + +When fixing a bug or adding a feature, don't modify code unrelated to your +task. Unnecessary changes make PRs harder to review and can introduce +regressions. + +- Don't reword existing comments or code unless the change is directly + motivated by your task. +- Don't delete existing comments that explain non-obvious behavior. +- When adding tests for new behavior, read existing tests first to + understand what's covered. Add new cases for uncovered behavior. Edit + existing tests as needed, but don't change what they verify. + ## How modern Go thinks differently **Generics** (1.18): Design reusable code with type parameters instead @@ -24,7 +73,7 @@ etc., they replace ad-hoc "loop and append" code with composable, lazy pipelines. When a sequence is consumed only once, prefer an iterator over materializing a slice. -**Error trees** (1.20–1.26): Errors compose as trees, not chains. +**Error trees** (1.20-1.26): Errors compose as trees, not chains. `errors.Join` aggregates multiple errors. `fmt.Errorf` accepts multiple `%w` verbs. `errors.Is`/`As` traverse the full tree. Custom error types that wrap multiple causes must implement `Unwrap() []error` (the diff --git a/.claude/docs/OBSERVABILITY.md b/.claude/docs/OBSERVABILITY.md new file mode 100644 index 0000000000..629a40915b --- /dev/null +++ b/.claude/docs/OBSERVABILITY.md @@ -0,0 +1,148 @@ +# Observability Guide for Agents + +This guide maps the observability surfaces that already exist in local +Coder development. Do not add new endpoints for agent debugging. Prefer the +existing logs, tracing, Prometheus metrics, browser artifacts, and command +output described here. + +## Start the app + +Use `./scripts/develop.sh` for local development. See +[Development Workflows and Guidelines](WORKFLOWS.md) for the full workflow. +The script builds the dev orchestrator, starts the API server and frontend, +waits for the API server to answer `/healthz`, creates the first user if +needed, and prints a banner with the local URLs. + +Useful defaults from `scripts/develop/main.go` are: + +- API server: `http://localhost:3000`. +- Frontend dev server: `http://localhost:8080`. +- Workspace proxy, when `--use-proxy` is set: `http://localhost:3010`. +- Coder Prometheus metrics: `http://localhost:2114/`. +- Embedded Prometheus UI, when `--prometheus-server` is set and Docker is + available on Linux: `http://localhost:9090`. + +## Local logs + +`./scripts/develop.sh` writes orchestrator and child process logs to the +terminal. The orchestrator uses `sloghuman`, and each child process is logged +under a named logger such as `api`, `site`, `proxy`, `ext-provisioner`, or +`prometheus`. + +HTTP request logging is implemented in `coderd/httpmw/loggermw`. Request log +fields include `user_agent`, `host`, `path`, `proto`, `remote_addr`, `start`, +`status_code`, `latency_ms`, route params, and selected safe query params. +Responses with status codes of 500 or higher include the response body in the +request log. Successful `GET /api/v2` requests are skipped. + +When investigating failures, keep the full terminal output from +`./scripts/develop.sh`. If you ran a command through Mux or another harness, +record the command, exit code, and artifact path for the captured output. + +## Tracing + +HTTP tracing lives in `coderd/tracing`. The middleware covers `/api`, +`/api/**`, workspace app routes, and external auth callback routes. When an +active trace span exists, responses include `X-Trace-ID`, `X-Span-ID`, and a +W3C `traceparent` header. + +Tracing export is controlled by existing server flags and environment +variables, not by the develop orchestrator itself: + +- `--trace` or `CODER_TRACE_ENABLE` enables application tracing. +- `--trace-logs` or `CODER_TRACE_LOGS` adds log events to traces. +- `--trace-honeycomb-api-key` or `CODER_TRACE_HONEYCOMB_API_KEY` enables the + Honeycomb exporter. +- `--trace-datadog` or `CODER_TRACE_DATADOG` enables sending Go runtime + traces to the local DataDog agent. + +To pass server flags through the develop script, put them after `--`. For +example, use `./scripts/develop.sh -- --trace` when you already have an OTLP +backend configured through the standard OpenTelemetry environment variables. + +## Prometheus metrics + +`./scripts/develop.sh` enables Coder Prometheus metrics by default on +`0.0.0.0:2114`, served at `http://localhost:2114/`. The port is controlled by +`--prometheus-port` or `CODER_DEV_PROMETHEUS_PORT`. Set it to `0` to disable +metrics. The develop script passes these existing server flags when metrics are +enabled: `--prometheus-enable`, `--prometheus-address`, +`--prometheus-collect-agent-stats`, and `--prometheus-collect-db-metrics`. + +If `--prometheus-server` or `CODER_DEV_PROMETHEUS_SERVER` is set, the develop +script attempts to start a Docker container named `coder-prometheus` on Linux. +The Prometheus UI listens on `http://localhost:9090`. If a previous container +is reused, confirm the scrape target because it may point at an older metrics +port. + +Relevant metric implementations include: + +- `coderd/httpmw/prometheus.go` for HTTP request counters, concurrency gauges, + websocket gauges, and latency histograms. +- `coderd/prometheusmetrics/` for active users, workspaces, agents, build + info, experiments, insights, and agent stats collectors. +- `coderd/database/dbmetrics/` for database query and transaction metrics. +- `docs/admin/integrations/prometheus.md` for the user-facing Prometheus + integration guide and metric reference. + +## Correlating a failed action + +Use this sequence when a browser or API action fails: + +1. Record the local clock time, browser action, URL, HTTP method, and response + status from the browser network panel or test output. +2. If the response includes `X-Trace-ID` or `X-Span-ID`, copy both values. If + not, copy the `traceparent` header if present. +3. Search the `./scripts/develop.sh` terminal output for the route, method, + status code, response body, or timestamp. Match fields such as `path`, + `status_code`, and `latency_ms`. +4. Check `http://localhost:2114/` for metrics that match the route or subsystem. + Start with `coderd_api_requests_processed_total`, + `coderd_api_request_latencies_seconds`, and database metrics under the + `coderd_db_` prefix. +5. Attach the browser screenshot, trace, video, or command output artifact to + the failure report when the harness produced one. + +## If an API request fails + +- Capture method, URL, status code, response body, and response headers. +- Check the API log line for matching `path`, `status_code`, and `latency_ms`. +- If the status is 500 or higher, include the logged response body. +- Check `coderd_api_requests_processed_total` and + `coderd_api_request_latencies_seconds` for the matching route. +- If database work is involved, check `coderd_db_query_counts_total`, + `coderd_db_query_latencies_seconds`, and transaction metrics. + +## If the frontend hangs + +- Confirm that the develop banner printed both the API and Web UI URLs. +- Check the `site` logger output for Vite errors and dependency failures. +- Use the browser network panel to separate frontend asset failures from API + failures. +- If API calls are pending or failing, follow the API request checklist above. +- Capture browser console output and screenshots before retrying. + +## If a workspace provision fails + +- Capture the workspace build ID, template name, workspace name, user, and + action that triggered the build. +- Search logs for `provisioner`, `workspace`, `build`, and the workspace build + ID. +- Check whether `ext-provisioner` is running in the develop output. +- Review metrics for API request failures, database latency, and agent stats if + the failure reaches agent startup. +- Preserve provisioner logs, template files, command output, and any browser + artifacts from the failed flow. + +## Failure report checklist + +Include these details in every observability failure report: + +- Absolute timestamp with timezone and the local command that was running. +- Git branch, commit SHA, and whether generated files were fresh. +- Browser action, API method, URL, route, status code, and response body. +- `X-Trace-ID`, `X-Span-ID`, or `traceparent` when present. +- Relevant log lines with nearby context. +- Prometheus metrics checked and the observed values or absence of values. +- Artifact paths for screenshots, traces, videos, logs, and command output. +- Any cleanup performed before reproducing the failure again. diff --git a/.claude/docs/TESTING.md b/.claude/docs/TESTING.md index 392db0fdf3..21aff372bb 100644 --- a/.claude/docs/TESTING.md +++ b/.claude/docs/TESTING.md @@ -21,6 +21,13 @@ - Test both positive and negative cases - Use `testutil.WaitLong` for timeouts in tests +### Timing Issues + +NEVER use `time.Sleep` to mitigate timing issues. If an issue seems like +it should use `time.Sleep`, read through https://github.com/coder/quartz +and specifically the README to better understand how to handle timing +issues. + ### Test Package Naming - **Test packages**: Use `package_test` naming (e.g., `identityprovider_test`) for black-box testing @@ -89,6 +96,11 @@ coderd/ 1. **PKCE tests failing** - Verify both authorization code storage and token exchange handle PKCE fields 2. **Resource indicator validation failing** - Ensure database stores and retrieves resource parameters correctly +### OAuth2 Test Scripts + +- Full suite: `./scripts/oauth2/test-mcp-oauth2.sh` +- Manual testing: `./scripts/oauth2/test-manual-flow.sh` + ### General Issues 1. **Missing newlines** - Ensure files end with newline character diff --git a/.claude/docs/WORKFLOWS.md b/.claude/docs/WORKFLOWS.md index 4d2bab4898..f549d702d1 100644 --- a/.claude/docs/WORKFLOWS.md +++ b/.claude/docs/WORKFLOWS.md @@ -103,6 +103,17 @@ 4. **Add tests** in `coderd/*_test.go` files 5. **Update OpenAPI** by running `make gen` +### API Design Guardrails + +- Add swagger annotations when introducing new HTTP endpoints. Do this in + the same change as the handler so the docs do not get missed before + release. +- For user-scoped or resource-scoped routes, prefer path parameters over + query parameters when that matches existing route patterns. +- For experimental or unstable API paths, skip public doc generation with + `// @x-apidocgen {"skip": true}` after the `@Router` annotation. This + keeps them out of the published API reference until they stabilize. + ## Testing Workflows ### Test Execution @@ -122,6 +133,46 @@ ## Git Workflow +### Git Hooks + +**You MUST install and use the git hooks. NEVER bypass them with +`--no-verify`. Skipping hooks wastes CI cycles and is unacceptable.** + +The first run will be slow as caches warm up. Consecutive runs are +**significantly faster** (often 10x) thanks to Go build cache, +generated file timestamps, and warm node_modules. This is NOT a +reason to skip them. Wait for hooks to complete before proceeding, +no matter how long they take. + +```sh +git config core.hooksPath scripts/githooks +``` + +Two hooks run automatically: + +- **pre-commit**: Classifies staged files by type and runs either + the full `make pre-commit` or the lightweight `make pre-commit-light` + depending on whether Go, TypeScript, SQL, proto, or Makefile + changes are present. Falls back to the full target when + `CODER_HOOK_RUN_ALL=1` is set. A markdown-only commit takes + seconds; a Go change takes several minutes. +- **pre-push**: Classifies changed files (vs remote branch or + merge-base) and runs `make pre-push` when Go, TypeScript, SQL, + proto, or Makefile changes are detected. Skips tests entirely + for lightweight changes. Allowlisted in + `scripts/githooks/pre-push`. Runs only for developers who opt + in. Falls back to `make pre-push` when the diff range can't + be determined or `CODER_HOOK_RUN_ALL=1` is set. Allow at least + 15 minutes for a full run. + +`git commit` and `git push` will appear to hang while hooks run. +This is normal. Do not interrupt, retry, or reduce the timeout. + +NEVER run `git config core.hooksPath` to change or disable hooks. + +If a hook fails, fix the issue and retry. Do not work around the +failure by skipping the hook. + ### Working on PR branches When working on an existing PR branch: diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml index 54ffeeabe2..2407459cb3 100644 --- a/.github/workflows/ci.yaml +++ b/.github/workflows/ci.yaml @@ -464,6 +464,24 @@ jobs: source scripts/normalize_path.sh normalize_path_with_symlinks "$RUNNER_TEMP/sym" "$(dirname "$(which terraform)")" + - name: Configure Go test JSON capture + if: runner.os == 'Linux' + shell: bash + run: | + set -euo pipefail + bin_dir="${RUNNER_TEMP}/go-test-json-bin" + mkdir -p "$bin_dir" + + real_gotestsum="$(command -v gotestsum)" + real_gotestsum_quoted="$(printf '%q' "$real_gotestsum")" + printf '%s\n' \ + '#!/usr/bin/env bash' \ + 'set -euo pipefail' \ + "exec ${real_gotestsum_quoted} --jsonfile \"\${RUNNER_TEMP}/go-test.json\" \"\$@\"" \ + > "${bin_dir}/gotestsum" + chmod +x "${bin_dir}/gotestsum" + echo "$bin_dir" >> "$GITHUB_PATH" + - name: Setup RAM disk for Embedded Postgres (Windows) if: runner.os == 'Windows' shell: bash @@ -540,6 +558,18 @@ jobs: embedded-pg-path: "R:/temp/embedded-pg" embedded-pg-cache: ${{ steps.embedded-pg-cache.outputs.embedded-pg-cache }} + - name: Publish Go test failure summary + if: failure() && github.actor != 'dependabot[bot]' && runner.os == 'Linux' && !github.event.pull_request.head.repo.fork + run: bash scripts/go-test-failure-summary.sh "${RUNNER_TEMP}/go-test.json" >> "$GITHUB_STEP_SUMMARY" + + - name: Upload Go test JSON + if: failure() && github.actor != 'dependabot[bot]' && runner.os == 'Linux' && !github.event.pull_request.head.repo.fork + uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1 + with: + name: go-test-json-${{ github.job }}-${{ github.sha }} + path: ${{ runner.temp }}/go-test.json + retention-days: 7 + - name: Upload failed test db dumps uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1 with: @@ -610,6 +640,24 @@ jobs: source scripts/normalize_path.sh normalize_path_with_symlinks "$RUNNER_TEMP/sym" "$(dirname "$(which terraform)")" + - name: Configure Go test JSON capture + if: runner.os == 'Linux' + shell: bash + run: | + set -euo pipefail + bin_dir="${RUNNER_TEMP}/go-test-json-bin" + mkdir -p "$bin_dir" + + real_gotestsum="$(command -v gotestsum)" + real_gotestsum_quoted="$(printf '%q' "$real_gotestsum")" + printf '%s\n' \ + '#!/usr/bin/env bash' \ + 'set -euo pipefail' \ + "exec ${real_gotestsum_quoted} --jsonfile \"\${RUNNER_TEMP}/go-test.json\" \"\$@\"" \ + > "${bin_dir}/gotestsum" + chmod +x "${bin_dir}/gotestsum" + echo "$bin_dir" >> "$GITHUB_PATH" + - name: Test with PostgreSQL Database uses: ./.github/actions/test-go-pg with: @@ -621,6 +669,18 @@ jobs: # On main, run tests without cache for the inverse. test-count: ${{ github.ref == 'refs/heads/main' && '1' || '' }} + - name: Publish Go test failure summary + if: failure() && github.actor != 'dependabot[bot]' && runner.os == 'Linux' && !github.event.pull_request.head.repo.fork + run: bash scripts/go-test-failure-summary.sh "${RUNNER_TEMP}/go-test.json" >> "$GITHUB_STEP_SUMMARY" + + - name: Upload Go test JSON + if: failure() && github.actor != 'dependabot[bot]' && runner.os == 'Linux' && !github.event.pull_request.head.repo.fork + uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1 + with: + name: go-test-json-${{ github.job }}-${{ github.sha }} + path: ${{ runner.temp }}/go-test.json + retention-days: 7 + - name: Upload Test Cache uses: ./.github/actions/test-cache/upload with: @@ -678,6 +738,24 @@ jobs: # c.f. discussion on https://github.com/coder/coder/pull/15106 # Our Linux runners have 16 cores, but we reduce parallelism since race detection adds a lot of overhead. # We aim to have parallelism match CPU count (4*4=16) to avoid making flakes worse. + - name: Configure Go test JSON capture + if: runner.os == 'Linux' + shell: bash + run: | + set -euo pipefail + bin_dir="${RUNNER_TEMP}/go-test-json-bin" + mkdir -p "$bin_dir" + + real_gotestsum="$(command -v gotestsum)" + real_gotestsum_quoted="$(printf '%q' "$real_gotestsum")" + printf '%s\n' \ + '#!/usr/bin/env bash' \ + 'set -euo pipefail' \ + "exec ${real_gotestsum_quoted} --jsonfile \"\${RUNNER_TEMP}/go-test.json\" \"\$@\"" \ + > "${bin_dir}/gotestsum" + chmod +x "${bin_dir}/gotestsum" + echo "$bin_dir" >> "$GITHUB_PATH" + - name: Run Tests uses: ./.github/actions/test-go-pg with: @@ -686,6 +764,18 @@ jobs: test-parallelism-tests: "4" race-detection: "true" + - name: Publish Go test failure summary + if: failure() && github.actor != 'dependabot[bot]' && runner.os == 'Linux' && !github.event.pull_request.head.repo.fork + run: bash scripts/go-test-failure-summary.sh "${RUNNER_TEMP}/go-test.json" >> "$GITHUB_STEP_SUMMARY" + + - name: Upload Go test JSON + if: failure() && github.actor != 'dependabot[bot]' && runner.os == 'Linux' && !github.event.pull_request.head.repo.fork + uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1 + with: + name: go-test-json-${{ github.job }}-${{ github.sha }} + path: ${{ runner.temp }}/go-test.json + retention-days: 7 + - name: Upload Test Cache uses: ./.github/actions/test-cache/upload with: @@ -820,27 +910,36 @@ jobs: CODER_E2E_REQUIRE_PREMIUM_TESTS: "1" working-directory: site - - name: Upload Playwright Failed Tests - if: always() && github.actor != 'dependabot[bot]' && runner.os == 'Linux' && !github.event.pull_request.head.repo.fork + - name: Upload Playwright failure artifacts + if: failure() && github.actor != 'dependabot[bot]' && runner.os == 'Linux' && !github.event.pull_request.head.repo.fork uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1 with: - name: failed-test-videos${{ matrix.variant.premium && '-premium' || '' }} - path: ./site/test-results/**/*.webm + name: playwright-artifacts-${{ matrix.variant.name }}-${{ github.sha }} + path: | + ./site/test-results/** + ./site/playwright-report/** retention-days: 7 + - name: Publish Playwright failure summary + if: failure() && github.actor != 'dependabot[bot]' && runner.os == 'Linux' && !github.event.pull_request.head.repo.fork + env: + MATRIX_VARIANT: ${{ matrix.variant.name }} + GITHUB_SHA_SHORT: ${{ github.sha }} + run: bash scripts/playwright-failure-summary.sh site/test-results/results.json >> "$GITHUB_STEP_SUMMARY" + - name: Upload debug log - if: always() && github.actor != 'dependabot[bot]' && runner.os == 'Linux' && !github.event.pull_request.head.repo.fork + if: failure() && github.actor != 'dependabot[bot]' && runner.os == 'Linux' && !github.event.pull_request.head.repo.fork uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1 with: - name: coderd-debug-logs${{ matrix.variant.premium && '-premium' || '' }} + name: coderd-debug-logs-${{ matrix.variant.name }}-${{ github.sha }} path: ./site/e2e/test-results/debug.log retention-days: 7 - name: Upload pprof dumps - if: always() && github.actor != 'dependabot[bot]' && runner.os == 'Linux' && !github.event.pull_request.head.repo.fork + if: failure() && github.actor != 'dependabot[bot]' && runner.os == 'Linux' && !github.event.pull_request.head.repo.fork uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1 with: - name: debug-pprof-dumps${{ matrix.variant.premium && '-premium' || '' }} + name: debug-pprof-dumps-${{ matrix.variant.name }}-${{ github.sha }} path: ./site/test-results/**/debug-pprof-*.txt retention-days: 7 diff --git a/AGENTS.md b/AGENTS.md index 5542dded10..4517ffe21c 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -3,6 +3,15 @@ You are an experienced, pragmatic software engineer. You don't over-engineer a solution when a simple one is possible. Rule #1: If you want exception to ANY rule, YOU MUST STOP and get explicit permission first. BREAKING THE LETTER OR SPIRIT OF THE RULES IS FAILURE. +## Agent navigation + +- Day-to-day: Start with [Development Workflows and Guidelines](.claude/docs/WORKFLOWS.md) for dev servers, git workflow, hooks, and routine checks. +- Observability and isolation: Use [Observability Guide for Agents](.claude/docs/OBSERVABILITY.md) for logs, tracing, and metrics, and [Development Isolation Guide for Agents](.claude/docs/DEV_ISOLATION.md) for ports, state, readiness, and cleanup. +- Failures: Use [Agent Failure Catalog](.claude/docs/AGENT_FAILURES.md) for repeatable failure formats and seeded diagnostics. +- Language and area docs: Use [Modern Go](.claude/docs/GO.md), [Testing Patterns and Best Practices](.claude/docs/TESTING.md), [Database Development Patterns](.claude/docs/DATABASE.md), [OAuth2 Development Guide](.claude/docs/OAUTH2.md), [Coder Architecture](.claude/docs/ARCHITECTURE.md), [Troubleshooting Guide](.claude/docs/TROUBLESHOOTING.md), [Documentation Style Guide](.claude/docs/DOCS_STYLE_GUIDE.md), and [Pull Request Description Style Guide](.claude/docs/PR_STYLE_GUIDE.md) when that area is in scope. +- Compatibility: `.agents/docs` symlinks to `.claude/docs` for agent runtimes that look there. +- Frontend: Read [Frontend Development Guidelines](site/AGENTS.md) before changing anything under `site/`. + ## Foundational rules - Doing it right is better than doing it fast. You are not in a rush. NEVER skip steps or take shortcuts. @@ -60,82 +69,33 @@ Only pause to ask for confirmation when: ## Critical Patterns -### Database Changes (ALWAYS FOLLOW) +Detailed workflow and topic guidance lives in the imported docs. Keep root +instructions focused on guardrails that agents should see immediately. -1. Modify `coderd/database/queries/*.sql` files -2. Run `make gen` -3. If audit errors: update `enterprise/audit/table.go` -4. Run `make gen` again - -### LSP Navigation (USE FIRST) - -#### Go LSP (for backend code) - -- **Find definitions**: `mcp__go-language-server__definition symbolName` -- **Find references**: `mcp__go-language-server__references symbolName` -- **Get type info**: `mcp__go-language-server__hover filePath line column` -- **Rename symbol**: `mcp__go-language-server__rename_symbol filePath line column newName` - -#### TypeScript LSP (for frontend code in site/) - -- **Find definitions**: `mcp__typescript-language-server__definition symbolName` -- **Find references**: `mcp__typescript-language-server__references symbolName` -- **Get type info**: `mcp__typescript-language-server__hover filePath line column` -- **Rename symbol**: `mcp__typescript-language-server__rename_symbol filePath line column newName` - -### OAuth2 Error Handling - -```go -// OAuth2-compliant error responses -writeOAuth2Error(ctx, rw, http.StatusBadRequest, "invalid_grant", "description") -``` - -### Authorization Context - -```go -// Public endpoints needing system access -app, err := api.Database.GetOAuth2ProviderAppByClientID(dbauthz.AsSystemRestricted(ctx), clientID) - -// Authenticated endpoints with user context -app, err := api.Database.GetOAuth2ProviderAppByClientID(ctx, clientID) -``` - -### API Design - -- Add swagger annotations when introducing new HTTP endpoints. Do this in - the same change as the handler so the docs do not get missed before - release. -- For user-scoped or resource-scoped routes, prefer path parameters over - query parameters when that matches existing route patterns. -- For experimental or unstable API paths, skip public doc generation with - `// @x-apidocgen {"skip": true}` after the `@Router` annotation. This - keeps them out of the published API reference until they stabilize. - -### Database Query Naming - -- Use `ByX` when `X` is the lookup or filter column. -- Use `PerX` or `GroupedByX` when `X` is the aggregation or grouping - dimension. -- Avoid `ByX` names for grouped queries. - -### Database-to-SDK Conversions - -- Extract explicit db-to-SDK conversion helpers instead of inlining large - conversion blocks inside handlers. -- Keep nullable-field handling, type coercion, and response shaping in the - converter so handlers stay focused on request flow and authorization. - -### Transactions and `InTx` - -- Inside `db.InTx(...)` closures, do not use the outer store (`api.Database`, - `p.db`, etc.) directly or indirectly. Use the `tx` handle for DB work inside - the closure, or fetch read-only inputs before opening the transaction. -- Watch for helper methods on a receiver that hide outer-store access. A call - like `p.someHelper(ctx)` is still unsafe inside `InTx` if that helper uses - `p.db` internally. -- Using the outer store while a transaction is open can hold one connection and - then block on another pool checkout, which can cause pool starvation and - `idle in transaction` incidents under load. +- **Database changes**: Follow + [Database Development Patterns](.claude/docs/DATABASE.md). Modify + `coderd/database/queries/*.sql`, run `make gen`, update + `enterprise/audit/table.go` for audit errors, then run `make gen` again. +- **LSP navigation**: Use LSP tools first. See + [Modern Go](.claude/docs/GO.md) for Go LSP and + [Frontend Development Guidelines](site/AGENTS.md) for TypeScript LSP. +- **OAuth2 and authorization**: Follow + [OAuth2 Development Guide](.claude/docs/OAUTH2.md). OAuth2 endpoints must + use RFC-compliant errors such as `writeOAuth2Error(...)`, and public + endpoints that need system access should use `dbauthz.AsSystemRestricted`. +- **API design**: Follow the API guardrails in + [Development Workflows and Guidelines](.claude/docs/WORKFLOWS.md), + including swagger annotations for new public HTTP endpoints. +- **Transactions and conversions**: Keep `InTx` work on the transaction + handle, and prefer explicit db-to-SDK converters. See + [Database Development Patterns](.claude/docs/DATABASE.md). +- **Testing**: Follow + [Testing Patterns and Best Practices](.claude/docs/TESTING.md). Use unique + identifiers in concurrent tests and do not use `time.Sleep` to mitigate + timing issues. +- **Frontend**: Read [Frontend Development Guidelines](site/AGENTS.md) + before changing anything under `site/`. Reuse shared UI primitives when + possible and prefer Storybook stories for component and page testing. ## Quick Reference @@ -143,61 +103,26 @@ app, err := api.Database.GetOAuth2ProviderAppByClientID(ctx, clientID) ### Git Hooks (MANDATORY - DO NOT SKIP) -**You MUST install and use the git hooks. NEVER bypass them with -`--no-verify`. Skipping hooks wastes CI cycles and is unacceptable.** +You MUST install and use the git hooks. NEVER bypass them with +`--no-verify`. Skipping hooks wastes CI cycles and is unacceptable. -The first run will be slow as caches warm up. Consecutive runs are -**significantly faster** (often 10x) thanks to Go build cache, -generated file timestamps, and warm node_modules. This is NOT a -reason to skip them. Wait for hooks to complete before proceeding, -no matter how long they take. +The first run can be slow while caches warm up. Wait for hooks to complete, +even when `git commit` or `git push` appears to hang. -```sh -git config core.hooksPath scripts/githooks -``` - -Two hooks run automatically: - -- **pre-commit**: Classifies staged files by type and runs either - the full `make pre-commit` or the lightweight `make pre-commit-light` - depending on whether Go, TypeScript, SQL, proto, or Makefile - changes are present. Falls back to the full target when - `CODER_HOOK_RUN_ALL=1` is set. A markdown-only commit takes - seconds; a Go change takes several minutes. -- **pre-push**: Classifies changed files (vs remote branch or - merge-base) and runs `make pre-push` when Go, TypeScript, SQL, - proto, or Makefile changes are detected. Skips tests entirely - for lightweight changes. Allowlisted in - `scripts/githooks/pre-push`. Runs only for developers who opt - in. Falls back to `make pre-push` when the diff range can't - be determined or `CODER_HOOK_RUN_ALL=1` is set. Allow at least - 15 minutes for a full run. - -`git commit` and `git push` will appear to hang while hooks run. -This is normal. Do not interrupt, retry, or reduce the timeout. - -NEVER run `git config core.hooksPath` to change or disable hooks. - -If a hook fails, fix the issue and retry. Do not work around the -failure by skipping the hook. +See [Development Workflows and Guidelines](.claude/docs/WORKFLOWS.md) for +hook setup, pre-commit behavior, pre-push behavior, and failure handling. ### Git Workflow -When working on existing PRs, check out the branch first: - -```sh -git fetch origin -git checkout branch-name -git pull origin branch-name -``` - -Don't use `git push --force` unless explicitly requested. +When working on existing PRs, check out the branch first. See +[Development Workflows and Guidelines](.claude/docs/WORKFLOWS.md) for the +full workflow. Don't use `git push --force` unless explicitly requested. ### New Feature Checklist -- [ ] Run `git pull` to ensure latest code -- [ ] Check if feature touches database - you'll need migrations -- [ ] Check if feature touches audit logs - update `enterprise/audit/table.go` +See [Development Workflows and Guidelines](.claude/docs/WORKFLOWS.md) for +the new feature checklist, including `git pull`, database migration checks, +and audit table checks. ## Architecture @@ -206,23 +131,6 @@ Don't use `git push --force` unless explicitly requested. - **Agents**: Workspace services (SSH, port forwarding) - **Database**: PostgreSQL with `dbauthz` authorization -## Testing - -### Race Condition Prevention - -- Use unique identifiers: `fmt.Sprintf("test-client-%s-%d", t.Name(), time.Now().UnixNano())` -- Never use hardcoded names in concurrent tests - -### OAuth2 Testing - -- Full suite: `./scripts/oauth2/test-mcp-oauth2.sh` -- Manual testing: `./scripts/oauth2/test-manual-flow.sh` - -### Timing Issues - -NEVER use `time.Sleep` to mitigate timing issues. If an issue -seems like it should use `time.Sleep`, read through https://github.com/coder/quartz and specifically the [README](https://github.com/coder/quartz/blob/main/README.md) to better understand how to handle timing issues. - ## Code Style ### Detailed guidelines in imported WORKFLOWS.md @@ -250,38 +158,11 @@ seems like it should use `time.Sleep`, read through https://github.com/coder/qua `renderHook()` that do not require DOM assertions, and query/cache operations with no rendered output. -### Writing Comments +### Writing Comments and Avoiding Unnecessary Changes -Code comments should be clear, well-formatted, and add meaningful context. - -**Proper sentence structure**: Comments are sentences and should end with -periods or other appropriate punctuation. This improves readability and -maintains professional code standards. - -**Explain why, not what**: Good comments explain the reasoning behind code -rather than describing what the code does. The code itself should be -self-documenting through clear naming and structure. Focus your comments on -non-obvious decisions, edge cases, or business logic that isn't immediately -apparent from reading the implementation. - -**Line length and wrapping**: Keep comment lines to 80 characters wide -(including the comment prefix like `//` or `#`). When a comment spans multiple -lines, wrap it naturally at word boundaries rather than writing one sentence -per line. This creates more readable, paragraph-like blocks of documentation. - -```go -// Good: Explains the rationale with proper sentence structure. -// We need a custom timeout here because workspace builds can take several -// minutes on slow networks, and the default 30s timeout causes false -// failures during initial template imports. -ctx, cancel := context.WithTimeout(ctx, 5*time.Minute) - -// Bad: Describes what the code does without punctuation or wrapping -// Set a custom timeout -// Workspace builds can take a long time -// Default timeout is too short -ctx, cancel := context.WithTimeout(ctx, 5*time.Minute) -``` +See [Modern Go](.claude/docs/GO.md) for comment formatting and the rule to +avoid unrelated edits. Preserve existing comments that explain non-obvious +behavior unless the task directly requires changing them. ### No Emdash or Endash @@ -299,21 +180,6 @@ caught by `make lint/emdash`. // This is slow, so we should cache it. ``` -### Avoid Unnecessary Changes - -When fixing a bug or adding a feature, don't modify code unrelated to your -task. Unnecessary changes make PRs harder to review and can introduce -regressions. - -**Don't reword existing comments or code** unless the change is directly -motivated by your task. Rewording comments to be shorter or "cleaner" wastes -reviewer time and clutters the diff. - -**Don't delete existing comments** that explain non-obvious behavior. These -comments preserve important context about why code works a certain way. - -**When adding tests for new behavior**, read existing tests first to understand what's covered. Add new cases for uncovered behavior. Edit existing tests as needed, but don't change what they verify. - ## Detailed Development Guides @.claude/docs/ARCHITECTURE.md @@ -330,18 +196,18 @@ manually before starting work: **Always read:** -- `.claude/docs/WORKFLOWS.md` — dev server, git workflow, hooks +- `.claude/docs/WORKFLOWS.md` - dev server, git workflow, hooks **Read when relevant to your task:** -- `.claude/docs/GO.md` — Go patterns and modern Go usage (any Go changes) -- `.claude/docs/TESTING.md` — testing patterns, race conditions (any test changes) -- `.claude/docs/DATABASE.md` — migrations, SQLC, audit table (any DB changes) -- `.claude/docs/ARCHITECTURE.md` — system overview (orientation or architecture work) -- `.claude/docs/PR_STYLE_GUIDE.md` — PR description format (when writing PRs) -- `.claude/docs/OAUTH2.md` — OAuth2 and RFC compliance (when touching auth) -- `.claude/docs/TROUBLESHOOTING.md` — common failures and fixes (when stuck) -- `.claude/docs/DOCS_STYLE_GUIDE.md` — docs conventions (when writing `docs/`) +- `.claude/docs/GO.md` - Go patterns and modern Go usage (any Go changes) +- `.claude/docs/TESTING.md` - testing patterns, race conditions (any test changes) +- `.claude/docs/DATABASE.md` - migrations, SQLC, audit table (any DB changes) +- `.claude/docs/ARCHITECTURE.md` - system overview (orientation or architecture work) +- `.claude/docs/PR_STYLE_GUIDE.md` - PR description format (when writing PRs) +- `.claude/docs/OAUTH2.md` - OAuth2 and RFC compliance (when touching auth) +- `.claude/docs/TROUBLESHOOTING.md` - common failures and fixes (when stuck) +- `.claude/docs/DOCS_STYLE_GUIDE.md` - docs conventions (when writing `docs/`) **For frontend work**, also read `site/AGENTS.md` before making any changes in `site/`. diff --git a/Makefile b/Makefile index 617bfc9efe..5f02824d7f 100644 --- a/Makefile +++ b/Makefile @@ -728,7 +728,7 @@ endif # GitHub Actions linters are run in a separate CI job (lint-actions) that only # triggers when workflow files change, so we skip them here when CI=true. LINT_ACTIONS_TARGETS := $(if $(CI),,lint/actions/actionlint) -lint: lint/shellcheck lint/go lint/ts lint/examples lint/helm lint/site-icons lint/markdown lint/check-scopes lint/migrations lint/bootstrap lint/emdash $(LINT_ACTIONS_TARGETS) +lint: lint/shellcheck lint/go lint/ts lint/examples lint/helm lint/site-icons lint/markdown lint/check-scopes lint/migrations lint/bootstrap lint/architecture lint/emdash lint/agents $(LINT_ACTIONS_TARGETS) .PHONY: lint # Subset of lint that does not require Go or Node toolchains. @@ -745,8 +745,6 @@ lint/ts: site/node_modules/.installed .PHONY: lint/ts lint/go: - ./scripts/check_enterprise_imports.sh - ./scripts/check_codersdk_imports.sh linter_ver=$$(grep -oE 'GOLANGCI_LINT_VERSION=\S+' dogfood/coder/ubuntu-26.04/Dockerfile | cut -d '=' -f 2) go run github.com/golangci/golangci-lint/cmd/golangci-lint@v$$linter_ver run go tool github.com/coder/paralleltestctx/cmd/paralleltestctx -custom-funcs="testutil.Context,chatdTestContext" ./... @@ -771,6 +769,13 @@ lint/emdash: bash scripts/check_emdash.sh .PHONY: lint/emdash +lint/architecture: + ./scripts/check_architecture.sh +.PHONY: lint/architecture + +lint/agents: + ./scripts/check_agents_structure.sh +.PHONY: lint/agents lint/helm: cd helm/ diff --git a/scripts/audit-agent-readiness.sh b/scripts/audit-agent-readiness.sh new file mode 100755 index 0000000000..e08c75ebcd --- /dev/null +++ b/scripts/audit-agent-readiness.sh @@ -0,0 +1,130 @@ +#!/usr/bin/env bash +set -euo pipefail +# shellcheck source=scripts/lib.sh +# shellcheck disable=SC1091 +source "$(dirname "${BASH_SOURCE[0]}")/lib.sh" +cdroot + +usage() { + cat <<'USAGE' +Usage: scripts/audit-agent-readiness.sh [--help] + +Print a report-first audit of agent harness readiness. Warnings identify +aspirational checks and do not fail the script. Missing required harness docs +fail the script. Run manually with: + + bash scripts/audit-agent-readiness.sh +USAGE +} + +if [[ "${1:-}" == "--help" ]]; then + usage + exit 0 +fi + +ok_count=0 +warn_count=0 +fail_count=0 + +ok() { + printf '[ok] %s\n' "$1" + ((ok_count++)) || true +} + +warn() { + printf '[warn] %s\n' "$1" + ((warn_count++)) || true +} + +fail() { + printf '[fail] %s\n' "$1" + ((fail_count++)) || true +} + +contains() { + local file="$1" + local pattern="$2" + grep -qiE "$pattern" "$file" +} + +echo "Agent harness readiness audit" +echo +echo "Required harness docs" + +for doc in \ + ".claude/docs/OBSERVABILITY.md" \ + ".claude/docs/DEV_ISOLATION.md" \ + ".claude/docs/AGENT_FAILURES.md"; do + if [[ -f "$doc" ]]; then + ok "$doc exists." + else + fail "$doc is missing." + fi +done + +if [[ -L ".agents/docs" ]]; then + agents_docs_target="$(readlink ".agents/docs")" + if [[ "$agents_docs_target" == "../.claude/docs" ]]; then + ok ".agents/docs points to .claude/docs." + else + fail ".agents/docs points to $agents_docs_target, expected ../.claude/docs." + fi +else + fail ".agents/docs compatibility symlink is missing." +fi + +echo +echo "Navigation and report-first checks" + +if contains AGENTS.md '^##[[:space:]].*(Agent navigation|Where to look)' || + { grep -qF ".claude/docs/OBSERVABILITY.md" AGENTS.md && + grep -qF ".claude/docs/DEV_ISOLATION.md" AGENTS.md && + grep -qF ".claude/docs/AGENT_FAILURES.md" AGENTS.md; }; then + ok "Root AGENTS.md appears to include agent navigation." +else + warn "Root AGENTS.md may be missing agent navigation." +fi + +if contains site/e2e/playwright.config.ts 'screenshot' && + contains site/e2e/playwright.config.ts 'video' && + contains site/e2e/playwright.config.ts 'trace' && + contains site/e2e/playwright.config.ts 'failure'; then + ok "Playwright failure artifact settings appear configured." +else + warn "Playwright failure artifact settings were not all detected." +fi + +if grep -qi "playwright" .github/workflows/ci.yaml && + grep -q "upload-artifact" .github/workflows/ci.yaml && + grep -qF "failure()" .github/workflows/ci.yaml; then + ok "E2E CI failure artifact upload appears configured." +else + warn "E2E CI failure artifact upload was not detected." +fi + +if contains .claude/docs/OBSERVABILITY.md 'Prometheus' && + contains .claude/docs/OBSERVABILITY.md 'log'; then + ok "Observability doc mentions logs and Prometheus." +else + warn "Observability doc may be missing logs or Prometheus coverage." +fi + +if contains .claude/docs/DEV_ISOLATION.md 'port' && + contains .claude/docs/DEV_ISOLATION.md 'CODER_DEV|override'; then + ok "Development isolation doc mentions ports and overrides." +else + warn "Development isolation doc may be missing ports or override coverage." +fi + +if grep -q 'lint/architecture' Makefile; then + ok "Architecture lint target exists." +else + warn "Architecture lint target is not present yet." +fi + +echo +printf 'Summary: %d ok, %d warn, %d fail.\n' "$ok_count" "$warn_count" "$fail_count" + +if ((fail_count > 0)); then + exit 1 +fi diff --git a/scripts/check_agents_structure.sh b/scripts/check_agents_structure.sh new file mode 100755 index 0000000000..bdaddbc355 --- /dev/null +++ b/scripts/check_agents_structure.sh @@ -0,0 +1,96 @@ +#!/usr/bin/env bash +set -euo pipefail +# shellcheck source=scripts/lib.sh +# shellcheck disable=SC1091 +source "$(dirname "${BASH_SOURCE[0]}")/lib.sh" +cdroot + +echo "--- check agent docs structure" + +required_docs=( + ".claude/docs/OBSERVABILITY.md" + ".claude/docs/DEV_ISOLATION.md" + ".claude/docs/AGENT_FAILURES.md" +) + +fail=0 + +for doc in "${required_docs[@]}"; do + if [[ ! -f "$doc" ]]; then + echo "error: required harness doc is missing: $doc" + fail=1 + fi +done + +if [[ ! -L ".agents/docs" ]]; then + echo "error: agent docs compatibility symlink is missing: .agents/docs -> ../.claude/docs" + fail=1 +elif [[ "$(readlink ".agents/docs")" != "../.claude/docs" ]]; then + echo "error: agent docs compatibility symlink points to $(readlink ".agents/docs"), expected ../.claude/docs" + fail=1 +fi + +is_reference_path() { + local ref="$1" + case "$ref" in + */* | package.json | AGENTS.local.md) + return 0 + ;; + *) + return 1 + ;; + esac +} + +# TODO: Add circular AGENTS.md include detection if nested agent docs begin +# referencing each other. Current checks validate file existence only. +mapfile -t agent_files < <(git ls-files '*AGENTS.md' | sort) + +for agent_file in "${agent_files[@]}"; do + agent_dir="$(dirname "$agent_file")" + while IFS=$'\t' read -r line_number ref; do + if [[ -z "${line_number:-}" || -z "${ref:-}" ]]; then + continue + fi + if ! is_reference_path "$ref"; then + continue + fi + + candidate="$agent_dir/$ref" + candidate="${candidate#./}" + if [[ -e "$candidate" ]]; then + continue + fi + + if [[ "$(basename "$ref")" == "AGENTS.local.md" ]]; then + echo "warning: $agent_file:$line_number: optional local agent file is not present: $ref" + continue + fi + + echo "error: $agent_file:$line_number: referenced file does not exist: $ref" + fail=1 + done < <( + awk ' + /^[[:space:]]*(-[[:space:]]+)?@/ { + ref = $0 + sub(/^[[:space:]]*(-[[:space:]]+)?@/, "", ref) + sub(/[[:space:]`)>].*$/, "", ref) + sub(/[,:;)]+$/, "", ref) + print FNR "\t" ref + } + ' "$agent_file" + ) +done + +if [[ -f AGENTS.md ]]; then + root_agent_lines=$(wc -l 600)); then + echo "warning: AGENTS.md is $root_agent_lines lines, consider keeping the root guide concise." + fi +fi + +if [[ "$fail" -ne 0 ]]; then + exit 1 +fi + +echo "OK: agent docs structure looks valid." diff --git a/scripts/check_architecture.sh b/scripts/check_architecture.sh new file mode 100755 index 0000000000..bb8abe04bd --- /dev/null +++ b/scripts/check_architecture.sh @@ -0,0 +1,15 @@ +#!/usr/bin/env bash +# Umbrella architecture-boundary check. +# +# Delegates to existing import-boundary scripts. New architecture rules can be +# added here as needed. +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +echo "--- check architecture (import boundaries)" + +"$SCRIPT_DIR/check_enterprise_imports.sh" +"$SCRIPT_DIR/check_codersdk_imports.sh" + +echo "OK: architecture checks passed." diff --git a/scripts/develop/main.go b/scripts/develop/main.go index 6fa169cc70..8ddcce8f5c 100644 --- a/scripts/develop/main.go +++ b/scripts/develop/main.go @@ -10,6 +10,7 @@ import ( "context" "encoding/json" "fmt" + "hash/fnv" "net" "net/http" "net/url" @@ -52,7 +53,13 @@ const ( prometheusContainerName = "coder-prometheus" // defaultPrometheusPort avoids 2112 (agent prometheus) and // 2113 (agent debug) already bound inside Coder workspaces. - defaultPrometheusPort = "2114" + defaultPrometheusPort = "2114" + // portOffsetBuckets keeps the offset below 1000 while leaving + // enough hash buckets for common multi-worktree use. + portOffsetBuckets = 50 + // portOffsetStep avoids overlap between the default API and proxy + // ports when two worktrees land in adjacent buckets. + portOffsetStep = 20 prometheusImage = "prom/prometheus:v3.11.2" defaultAccessURL = "http://127.0.0.1:%d" defaultPassword = "SomeSecurePassword!" @@ -96,6 +103,13 @@ func main() { Description: "Prometheus metrics port. Set to 0 to disable.", Value: serpent.Int64Of(&cfg.coderMetricsPort), }, + { + Flag: "port-offset", + Env: "CODER_DEV_PORT_OFFSET", + Default: "false", + Description: "Apply a deterministic per-worktree offset to default API, web, proxy, and Coder metrics ports. Useful when running multiple worktrees in parallel.", + Value: serpent.BoolOf(&cfg.portOffsetEnabled), + }, { Flag: "prometheus-server", Env: "CODER_DEV_PROMETHEUS_SERVER", @@ -171,12 +185,13 @@ func main() { }, Handler: func(inv *serpent.Invocation) error { cfg.serverExtraArgs = inv.Args + cfg.portExplicit = portExplicitFromInvocation(inv) logger := slog.Make(sloghuman.Sink(inv.Stderr)) - if err := cfg.validate(); err != nil { + if err := cfg.resolveEnv(); err != nil { return err } - if err := cfg.resolveEnv(); err != nil { + if err := cfg.validate(); err != nil { return err } return develop(inv.Context(), logger, &cfg) @@ -191,30 +206,123 @@ func main() { } type devConfig struct { - apiPort int64 - webPort int64 - proxyPort int64 - coderMetricsPort int64 - prometheusServer bool - agpl bool - accessURL string - password string - useProxy bool - debug bool - skipSetup bool - multiOrg bool - starterTemplate string - dbRollback bool - dbReset bool - dbContinue bool - projectRoot string - binaryPath string - configDir string - childEnv []string + apiPort int64 + webPort int64 + proxyPort int64 + coderMetricsPort int64 + portOffsetEnabled bool + prometheusServer bool + agpl bool + accessURL string + password string + useProxy bool + debug bool + skipSetup bool + multiOrg bool + starterTemplate string + dbRollback bool + dbReset bool + dbContinue bool + projectRoot string + binaryPath string + configDir string + childEnv []string + portExplicit portExplicit + portOffset int + apiPortSource portSource + webPortSource portSource + proxyPortSource portSource + metricsPortSource portSource // Extra args after flags forwarded to "coder server". serverExtraArgs []string } +type portExplicit struct { + api bool + web bool + proxy bool + metrics bool +} + +type portSource string + +const ( + portSourceDefault portSource = "default" + portSourceExplicit portSource = "explicit" + portSourceOffset portSource = "offset" +) + +func portExplicitFromInvocation(inv *serpent.Invocation) portExplicit { + return portExplicit{ + api: isPortExplicit(inv, "port", "CODER_DEV_PORT"), + web: isPortExplicit(inv, "web-port", "CODER_DEV_WEB_PORT"), + proxy: isPortExplicit(inv, "proxy-port", "CODER_DEV_PROXY_PORT"), + metrics: isPortExplicit(inv, "prometheus-port", "CODER_DEV_PROMETHEUS_PORT"), + } +} + +func isPortExplicit(inv *serpent.Invocation, flagName, envName string) bool { + if flag := inv.ParsedFlags().Lookup(flagName); flag != nil && flag.Changed { + return true + } + if val, ok := inv.Environ.Lookup(envName); ok && val != "" { + return true + } + for _, opt := range inv.Command.Options { + if opt.Flag == flagName { + return opt.ValueSource == serpent.ValueSourceFlag || + opt.ValueSource == serpent.ValueSourceEnv + } + } + return false +} + +// portOffset returns a deterministic offset in [0, 1000) derived from the +// worktree path. Successive callers with the same projectRoot get the same +// offset; different projectRoots get different offsets with high probability. +func portOffset(projectRoot string) int { + h := fnv.New64a() + _, _ = h.Write([]byte(projectRoot)) + bucket := h.Sum64() % uint64(portOffsetBuckets) + return int(bucket) * portOffsetStep //nolint:gosec // Bucket is less than portOffsetBuckets. +} + +func (c *devConfig) applyPortOffset() { + c.portOffset = 0 + if !c.portOffsetEnabled { + return + } + c.portOffset = portOffset(c.projectRoot) + if c.portExplicit.api { + c.apiPortSource = portSourceExplicit + } else { + c.apiPortSource = c.applyDefaultPortOffset(&c.apiPort) + } + if c.portExplicit.web { + c.webPortSource = portSourceExplicit + } else { + c.webPortSource = c.applyDefaultPortOffset(&c.webPort) + } + if c.portExplicit.proxy { + c.proxyPortSource = portSourceExplicit + } else { + c.proxyPortSource = c.applyDefaultPortOffset(&c.proxyPort) + } + if c.portExplicit.metrics { + c.metricsPortSource = portSourceExplicit + } else { + c.metricsPortSource = c.applyDefaultPortOffset(&c.coderMetricsPort) + } +} + +func (c *devConfig) applyDefaultPortOffset(port *int64) portSource { + if c.portOffset == 0 { + return portSourceDefault + } + *port += int64(c.portOffset) + return portSourceOffset +} + func (c *devConfig) validate() error { if c.agpl && c.useProxy { return xerrors.New("cannot use both --agpl and --use-proxy") @@ -293,10 +401,6 @@ func (c *devConfig) validate() error { // resolveEnv sets defaults, unsets leaked credentials, resolves // filesystem paths, and computes the child process environment. func (c *devConfig) resolveEnv() error { - if strings.Contains(c.accessURL, "%d") { - c.accessURL = fmt.Sprintf(c.accessURL, c.apiPort) - } - // Prevent inherited credentials from leaking into child // processes or being picked up by config reads. _ = os.Unsetenv("CODER_SESSION_TOKEN") @@ -311,6 +415,11 @@ func (c *devConfig) resolveEnv() error { fmt.Sprintf("coder_%s_%s", runtime.GOOS, runtime.GOARCH)) c.configDir = filepath.Join(c.projectRoot, ".coderv2") + c.applyPortOffset() + if strings.Contains(c.accessURL, "%d") { + c.accessURL = fmt.Sprintf(c.accessURL, c.apiPort) + } + // Compute once, reused by cmd(). c.childEnv = filterEnv(os.Environ(), "CODER_SESSION_TOKEN", "CODER_URL") @@ -1120,6 +1229,28 @@ func prometheusBannerEntry(cfg *devConfig, prometheusServerStarted bool) (label } } +func portBannerLine(label string, port int64, source portSource, offset int) string { + portValue := strconv.FormatInt(port, 10) + if port == 0 { + portValue = "disabled" + } + if source == "" { + return fmt.Sprintf("%s: %s", label, portValue) + } + return fmt.Sprintf("%s: %s (%s)", label, portValue, portSourceLabel(source, offset)) +} + +func portSourceLabel(source portSource, offset int) string { + switch source { + case portSourceExplicit: + return fmt.Sprintf("explicit, offset +%d skipped", offset) + case portSourceOffset: + return fmt.Sprintf("offset +%d", offset) + default: + return fmt.Sprintf("default, offset +%d", offset) + } +} + func printBanner(ctx context.Context, logger slog.Logger, cfg *devConfig, prometheusServerStarted bool) { ifaces := []string{"localhost"} if addrs, err := net.InterfaceAddrs(); err == nil { @@ -1153,6 +1284,12 @@ func printBanner(ctx context.Context, logger slog.Logger, cfg *devConfig, promet "", indent("Coder is now running in development mode."), "", + "Effective ports:", + indent(portBannerLine("API", cfg.apiPort, cfg.apiPortSource, cfg.portOffset)), + indent(portBannerLine("Web UI", cfg.webPort, cfg.webPortSource, cfg.portOffset)), + indent(portBannerLine("Proxy", cfg.proxyPort, cfg.proxyPortSource, cfg.portOffset)), + indent(portBannerLine("Coder metrics", cfg.coderMetricsPort, cfg.metricsPortSource, cfg.portOffset)), + "", "API:", ) diff --git a/scripts/develop/main_test.go b/scripts/develop/main_test.go index e4112c20dc..98bcd79f06 100644 --- a/scripts/develop/main_test.go +++ b/scripts/develop/main_test.go @@ -146,6 +146,127 @@ func TestShellBool(t *testing.T) { assert.Equal(t, "0", shellBool(false)) } +func TestPortOffset(t *testing.T) { + t.Parallel() + + root := "/tmp/coder/worktree-a" + offset := portOffset(root) + assert.Equal(t, offset, portOffset(root)) + assert.GreaterOrEqual(t, offset, 0) + assert.Less(t, offset, 1000) + assert.Equal(t, 0, offset%10) + + var foundDifferent bool + for _, otherRoot := range []string{ + "/tmp/coder/worktree-b", + "/tmp/coder/worktree-c", + "/tmp/coder/worktree-d", + } { + if portOffset(otherRoot) != offset { + foundDifferent = true + break + } + } + assert.True(t, foundDifferent, "expected typical worktree paths to use different offsets") +} + +func TestApplyPortOffsetSkipsExplicitPorts(t *testing.T) { + t.Parallel() + + projectRoot := "/tmp/coder/worktree-offset" + for i := range 100 { + candidate := fmt.Sprintf("/tmp/coder/worktree-offset-%d", i) + if portOffset(candidate) != 0 { + projectRoot = candidate + break + } + } + offset := portOffset(projectRoot) + require.NotZero(t, offset) + + cfg := &devConfig{ + apiPort: 3000, + webPort: 8080, + proxyPort: 3010, + coderMetricsPort: 2114, + portOffsetEnabled: true, + projectRoot: projectRoot, + portExplicit: portExplicit{ + web: true, + metrics: true, + }, + } + cfg.applyPortOffset() + + assert.Equal(t, int64(3000+offset), cfg.apiPort) + assert.Equal(t, int64(8080), cfg.webPort) + assert.Equal(t, int64(3010+offset), cfg.proxyPort) + assert.Equal(t, int64(2114), cfg.coderMetricsPort) + assert.Equal(t, portSourceOffset, cfg.apiPortSource) + assert.Equal(t, portSourceExplicit, cfg.webPortSource) + assert.Equal(t, portSourceOffset, cfg.proxyPortSource) + assert.Equal(t, portSourceExplicit, cfg.metricsPortSource) +} + +func TestApplyPortOffsetDisabledUsesDefaultPorts(t *testing.T) { + t.Parallel() + + projectRoot := "/tmp/coder/worktree-offset" + for i := range 100 { + candidate := fmt.Sprintf("/tmp/coder/worktree-offset-disabled-%d", i) + if portOffset(candidate) != 0 { + projectRoot = candidate + break + } + } + require.NotZero(t, portOffset(projectRoot)) + + cfg := &devConfig{ + apiPort: 3000, + webPort: 8080, + proxyPort: 3010, + coderMetricsPort: 2114, + projectRoot: projectRoot, + } + cfg.applyPortOffset() + + assert.Equal(t, int64(3000), cfg.apiPort) + assert.Equal(t, int64(8080), cfg.webPort) + assert.Equal(t, int64(3010), cfg.proxyPort) + assert.Equal(t, int64(2114), cfg.coderMetricsPort) + assert.Zero(t, cfg.portOffset) + assert.Empty(t, cfg.apiPortSource) + assert.Empty(t, cfg.webPortSource) + assert.Empty(t, cfg.proxyPortSource) + assert.Empty(t, cfg.metricsPortSource) + assert.Equal(t, "API: 3000", portBannerLine("API", cfg.apiPort, cfg.apiPortSource, cfg.portOffset)) +} + +func TestPortOffsetDefaultPortsDoNotOverlap(t *testing.T) { + t.Parallel() + + ports := []struct { + name string + base int + }{ + {name: "API", base: 3000}, + {name: "Web UI", base: 8080}, + {name: "Proxy", base: 3010}, + {name: "Coder metrics", base: 2114}, + } + seen := make(map[int]string) + for bucket := range portOffsetBuckets { + offset := bucket * portOffsetStep + for _, port := range ports { + effective := port.base + offset + if other, ok := seen[effective]; ok { + t.Fatalf("%s collides with %s on port %d", port.name, other, effective) + } + seen[effective] = fmt.Sprintf("%s with offset %d", port.name, offset) + } + } +} + func TestDevelopInCoder(t *testing.T) { t.Run("DEVELOP_IN_CODER", func(t *testing.T) { t.Setenv("DEVELOP_IN_CODER", "1") @@ -431,15 +552,17 @@ func TestDevConfigResolveEnv(t *testing.T) { t.Setenv("CODER_SESSION_TOKEN", "leaked") t.Setenv("CODER_URL", "https://leaked.example.com") + wd, _ := os.Getwd() cfg := &devConfig{apiPort: 3000, accessURL: defaultAccessURL} require.NoError(t, cfg.resolveEnv()) - wd, _ := os.Getwd() assert.Equal(t, wd, cfg.projectRoot) assert.Equal(t, filepath.Join(wd, "build", fmt.Sprintf("coder_%s_%s", runtime.GOOS, runtime.GOARCH)), cfg.binaryPath) assert.Equal(t, filepath.Join(wd, ".coderv2"), cfg.configDir) assert.Equal(t, "http://127.0.0.1:3000", cfg.accessURL) + assert.Equal(t, int64(3000), cfg.apiPort) + assert.Zero(t, cfg.portOffset) // Should have unset leaked env vars. assert.Empty(t, os.Getenv("CODER_SESSION_TOKEN")) @@ -454,11 +577,92 @@ func TestDevConfigResolveEnv(t *testing.T) { } } +func TestDevConfigResolveEnvUsesDefaultPortsWithoutPortOffset(t *testing.T) { + t.Setenv("CODER_SESSION_TOKEN", "") + t.Setenv("CODER_URL", "") + + baseRoot := t.TempDir() + projectRoot := filepath.Join(baseRoot, "worktree") + for i := range 100 { + candidate := filepath.Join(baseRoot, fmt.Sprintf("worktree-default-%d", i)) + if portOffset(candidate) != 0 { + projectRoot = candidate + break + } + } + require.NotZero(t, portOffset(projectRoot)) + require.NoError(t, os.MkdirAll(projectRoot, 0o755)) + t.Chdir(projectRoot) + + cfg := &devConfig{ + apiPort: 3000, + webPort: 8080, + proxyPort: 3010, + coderMetricsPort: 2114, + accessURL: defaultAccessURL, + } + require.NoError(t, cfg.resolveEnv()) + + assert.Equal(t, projectRoot, cfg.projectRoot) + assert.Equal(t, int64(3000), cfg.apiPort) + assert.Equal(t, int64(8080), cfg.webPort) + assert.Equal(t, int64(3010), cfg.proxyPort) + assert.Equal(t, int64(2114), cfg.coderMetricsPort) + assert.Zero(t, cfg.portOffset) + assert.Empty(t, cfg.apiPortSource) + assert.Empty(t, cfg.webPortSource) + assert.Empty(t, cfg.proxyPortSource) + assert.Empty(t, cfg.metricsPortSource) + assert.Equal(t, "http://127.0.0.1:3000", cfg.accessURL) +} + +func TestDevConfigResolveEnvAppliesPortOffsetWhenEnabled(t *testing.T) { + t.Setenv("CODER_SESSION_TOKEN", "") + t.Setenv("CODER_URL", "") + + baseRoot := t.TempDir() + projectRoot := filepath.Join(baseRoot, "worktree") + for i := range 100 { + candidate := filepath.Join(baseRoot, fmt.Sprintf("worktree-%d", i)) + if portOffset(candidate) != 0 { + projectRoot = candidate + break + } + } + require.NotZero(t, portOffset(projectRoot)) + require.NoError(t, os.MkdirAll(projectRoot, 0o755)) + t.Chdir(projectRoot) + + cfg := &devConfig{ + apiPort: 3000, + webPort: 8080, + proxyPort: 3010, + coderMetricsPort: 2114, + portOffsetEnabled: true, + accessURL: defaultAccessURL, + } + require.NoError(t, cfg.resolveEnv()) + + offset := portOffset(projectRoot) + assert.Equal(t, projectRoot, cfg.projectRoot) + assert.Equal(t, int64(3000+offset), cfg.apiPort) + assert.Equal(t, int64(8080+offset), cfg.webPort) + assert.Equal(t, int64(3010+offset), cfg.proxyPort) + assert.Equal(t, int64(2114+offset), cfg.coderMetricsPort) + assert.Equal(t, offset, cfg.portOffset) + assert.Equal(t, portSourceOffset, cfg.apiPortSource) + assert.Equal(t, fmt.Sprintf("http://127.0.0.1:%d", 3000+offset), cfg.accessURL) +} + func TestDevConfigResolveEnvExplicitAccessURL(t *testing.T) { t.Setenv("CODER_SESSION_TOKEN", "") t.Setenv("CODER_URL", "") - cfg := &devConfig{apiPort: 5000, accessURL: "http://myhost:5000"} + cfg := &devConfig{ + apiPort: 5000, + accessURL: "http://myhost:5000", + portExplicit: portExplicit{api: true}, + } require.NoError(t, cfg.resolveEnv()) assert.Equal(t, "http://myhost:5000", cfg.accessURL) } diff --git a/scripts/go-test-failure-summary.sh b/scripts/go-test-failure-summary.sh new file mode 100755 index 0000000000..b3bfac4898 --- /dev/null +++ b/scripts/go-test-failure-summary.sh @@ -0,0 +1,100 @@ +#!/usr/bin/env bash + +# Summarize failed Go tests from go test JSON output. + +set -euo pipefail +# shellcheck source=scripts/lib.sh +# shellcheck disable=SC1091 +source "$(dirname "${BASH_SOURCE[0]}")/lib.sh" +cdroot + +if [[ $# -ne 1 ]]; then + error "Usage: go-test-failure-summary.sh " +fi + +results_file=$1 +if [[ ! -s "$results_file" ]]; then + exit 0 +fi + +if ! command -v jq >/dev/null; then + error "jq is required to summarize Go test failures." +fi + +jq -sr ' + def clean_block: + tostring + | gsub("\u001b\\[[0-9;?]*[ -/]*[@-~]"; "") + | gsub("```"; "``"); + def clean_inline: + tostring | gsub("`"; "") | gsub("[\r\n]"; " "); + def truncate($max): + if length > $max then .[0:$max] + "..." else . end; + def terminal_action: + .Action == "pass" or .Action == "fail" or .Action == "skip"; + def test_key: + (.Package // "") + "\u0000" + (.Test // ""); + def output_for($events; $package; $test): + [ + $events[] + | select(.Action == "output") + | select((.Package // "") == $package) + | select((.Test // "") == $test) + | .Output // "" + ] + | join("") + | clean_block + | if . == "" then "No output recorded." else . end + | truncate(600); + + map(select(type == "object")) as $events + | [ + $events + | to_entries[] + | .value + {idx: .key} + | select((.Test // "") != "") + | select(terminal_action) + ] as $terminal_tests + | [ + $terminal_tests + | group_by(test_key) + | .[] + | max_by(.idx) + | select(.Action == "fail") + | { + package: ((.Package // "unknown") | clean_inline), + test: ((.Test // "unknown") | clean_inline), + elapsed: (.Elapsed // 0), + output: output_for($events; (.Package // ""); (.Test // "")) + } + ] as $failures + | if ($failures | length) == 0 then + empty + else + ($failures | length) as $failed + | ($failures | map(.package) | unique | length) as $packages + | ([ + $events[] + | select((.Test // "") == "") + | select(.Action == "pass" or .Action == "fail") + | .Elapsed // 0 + ] | add // 0) as $duration + | ([ + $events[] + | select((.Test // "") == "") + | select(.Action == "fail") + | .Package // empty + ] | unique | length) as $package_failures + | [ + "## Go test failures (\($failed) in \($packages))", + "- Duration: \($duration)s", + "- Package failures: \($package_failures)", + "", + ($failures[] + | "### \(.package) :: \(.test)\n" + + "- Elapsed: \(.elapsed)s\n\n" + + "```\n\(.output)\n```\n") + ] + | join("\n") + end +' "$results_file" diff --git a/scripts/playwright-failure-summary.sh b/scripts/playwright-failure-summary.sh new file mode 100755 index 0000000000..8a4d268d62 --- /dev/null +++ b/scripts/playwright-failure-summary.sh @@ -0,0 +1,104 @@ +#!/usr/bin/env bash + +# Summarize failed Playwright tests from the JSON reporter output. + +set -euo pipefail +# shellcheck source=scripts/lib.sh +# shellcheck disable=SC1091 +source "$(dirname "${BASH_SOURCE[0]}")/lib.sh" +cdroot + +if [[ $# -ne 1 ]]; then + error "Usage: playwright-failure-summary.sh " +fi + +results_file=$1 +if [[ ! -f "$results_file" ]]; then + exit 0 +fi + +if ! command -v jq >/dev/null; then + error "jq is required to summarize Playwright failures." +fi + +artifact="playwright-artifacts-${MATRIX_VARIANT:-unknown}-${GITHUB_SHA_SHORT:-unknown}" + +jq -r --arg artifact "$artifact" --arg root "$PROJECT_ROOT" ' + def clean_block: + tostring + | gsub("\u001b\\[[0-9;]*[A-Za-z]"; "") + | gsub("```"; "``"); + def clean_inline: + tostring | gsub("`"; ""); + def truncate($max): + if length > $max then .[0:$max] + "..." else . end; + def failure_status: + . == "failed" or . == "timedOut" or . == "interrupted"; + def relpath($root): + if startswith($root + "/") then .[($root | length) + 1:] + elif startswith("site/") then . + elif startswith("e2e/") then "site/" + . + else "site/e2e/" + . + end; + def all_specs($titles): + ([$titles[], (.title // empty)] | map(select(. != ""))) as $next_titles + | ( + .specs[]? + | . + { + titlePath: ($next_titles + ([.title // ""] | map(select(. != "")))) + } + ), + (.suites[]? | all_specs($next_titles)); + def failure_entries: + [ + .suites[]? + | all_specs([]) as $spec + | $spec.tests[]? as $test + | select(($test.status // "") != "flaky") + | select( + (($test.status // "") == "unexpected") + or any($test.results[]?; .status | failure_status) + ) + | ([ $test.results[]? | select(.status | failure_status) ][0] + // ($test.results[0] // {})) as $result + | ((($result.error.message // "") | clean_block) as $message + | (($result.error.stack // "") | clean_block) as $stack + | { + file: (($spec.file // "") | relpath($root)), + line: ($spec.line // 0), + title: (($spec.titlePath // [$spec.title // ""]) | join(" > ") | clean_inline), + project: (($test.projectName // "unknown") | clean_inline), + message: (if $message != "" then $message else $stack end | if . != "" then . else "No error message recorded." end | truncate(600)), + attachments: ([ $result.attachments[]? | .name // empty | clean_inline ] | unique) + }) + ]; + failure_entries as $entries + | if ($entries | length) == 0 then + empty + else + (.stats // {}) as $stats + | ($stats.unexpected // 0) as $stats_failed + | ([($stats_failed | tonumber), ($entries | length)] | max) as $failed + | (($stats.expected // 0) + ($stats.unexpected // 0) + ($stats.flaky // 0) + ($stats.skipped // 0)) as $computed_total + | ($stats.total // $computed_total) as $total + | [ + "## Playwright failures (\($failed) of \($total))", + "- Duration: \($stats.duration // 0)ms", + "- Skipped: \($stats.skipped // 0), Flaky: \($stats.flaky // 0)", + "- Artifact: `\($artifact)` (download from the run summary)", + "", + ($entries[] + | "### \(.file):\(.line)\n" + + "- Test: `\(.title)`\n" + + "- Project: `\(.project)`\n" + + "- Attachments:\n" + + (if (.attachments | length) == 0 then + " - None recorded in artifact `\($artifact)`" + else + (.attachments | map(" - `\(.)` in artifact `\($artifact)`") | join("\n")) + end) + + "\n\n```\n\(.message)\n```\n") + ] + | join("\n") + end +' "$results_file" | sed -E $'s/\x1b\[[0-9;]*m//g' diff --git a/site/AGENTS.md b/site/AGENTS.md index 891a034f53..194765ff2e 100644 --- a/site/AGENTS.md +++ b/site/AGENTS.md @@ -26,6 +26,19 @@ When investigating or editing TypeScript/React code, always use the TypeScript l - `pnpm playwright:test` - Run playwright e2e tests. When running e2e tests, remind the user that a license is required to run all the tests - `pnpm format` - Format frontend code. Always run before creating a PR +## Failure artifacts + +Playwright writes per-test failure artifacts to `site/test-results/` when +running `pnpm playwright:test` from `site/`. Failed tests keep screenshots, +videos, and traces through the Playwright config. The HTML report is written +to `site/playwright-report/`, and the coderd debug log is written to +`site/e2e/test-results/debug.log`. + +In CI, the `test-e2e` job uploads failure artifacts to the workflow run's +Artifacts section. Look for artifact names prefixed with +`playwright-artifacts-`, followed by the matrix job name and commit SHA. +Debug logs and pprof dumps use the same job name and commit SHA convention. + ## Components - MUI components are deprecated - migrate away from these when encountered diff --git a/site/e2e/playwright.config.ts b/site/e2e/playwright.config.ts index 247eee1793..8a220d8d76 100644 --- a/site/e2e/playwright.config.ts +++ b/site/e2e/playwright.config.ts @@ -35,6 +35,7 @@ const localURL = (port: number, path: string): string => { export default defineConfig({ retries, globalSetup: require.resolve("./setup/preflight"), + outputDir: "../test-results", projects: [ { name: "testsSetup", @@ -47,10 +48,20 @@ export default defineConfig({ timeout: 30_000, }, ], - reporter: [["list"], ["./reporter.ts"]], + reporter: [ + ["list"], + ["html", { open: "never" }], + [ + "json", + { outputFile: path.join(__dirname, "../test-results/results.json") }, + ], + ["./reporter.ts"], + ], use: { actionTimeout: 5000, baseURL: `http://localhost:${coderPort}`, + screenshot: "only-on-failure", + trace: "retain-on-failure", video: "retain-on-failure", ...(wsEndpoint ? {