mirror of
https://github.com/coder/coder.git
synced 2026-06-03 04:58:23 +00:00
b1e80e6f3a
## Problem The gitsync worker polls every 10s and refreshes up to 50 stale `chat_diff_status` rows **sequentially**, sharing a single 10-second context timeout. With 50 rows × 1–3 HTTP calls each, the timeout is exhausted quickly, causing cascading `context deadline exceeded` errors. Rows with no linked OAuth token (`ErrNoTokenAvailable`) fail fast but recur every 120s, wasting batch capacity. ## Solution Three targeted fixes: ### 1. Concurrent refresh processing `Refresher.Refresh()` now launches goroutines bounded by a semaphore (`defaultConcurrency = 10`). Provider/token resolution remains sequential (fast DB lookups); only the HTTP calls run in parallel. Per-group rate-limit detection uses `atomic.Pointer[RateLimitError]` with best-effort skip of remaining rows — a rate-limit hit on one provider doesn't stall requests to other providers. ### 2. Decoupled tick timeout New `defaultTickTimeout = 30s`, separate from `defaultInterval = 10s`. The `tick()` method uses `tickTimeout` for its context deadline, giving concurrent HTTP calls enough headroom to complete without stalling the next polling cycle. ### 3. Longer backoff for no-token errors New `NoTokenBackoff = 10 * time.Minute` (exported). When `errors.Is(err, ErrNoTokenAvailable)`, the worker applies a 10-minute backoff instead of `DiffStatusTTL` (2 minutes). Retrying every 2 minutes is pointless until the user manually links their external auth account. ## Design decisions - Both `NewRefresher` and `NewWorker` accept variadic option functions (`RefresherOption`, `WorkerOption`) for backward compatibility — existing callers in `coderd/coderd.go` need no changes. - `WithConcurrency(n)` and `WithTickTimeout(d)` are available for tests and future tuning. - Added `resolvedGroup` struct to cleanly separate the pre-resolution phase from the concurrent execution phase. ## Testing - **`TestRefresher_RateLimitSkipsRemainingInGroup`** — rewritten to be goroutine-order-independent (verifies aggregate counts instead of per-index results). - **`TestRefresher_ConcurrentProcessing`** — new test using a gate channel to prove N goroutines enter `FetchPullRequestStatus` simultaneously. - **`TestWorker_RefresherError_BacksOffRow`** — rewritten to use branch-name-based failure determination instead of non-deterministic `callCount`. - **`TestWorker_NoTokenBackoff`** — new test verifying `ErrNoTokenAvailable` triggers 10-minute backoff. - All tests pass under `-race -count=3`.