Files
coder/coderd/gitsync
Kyle Carberry b1e80e6f3a fix(gitsync): concurrent refresh, decoupled timeout, and no-token backoff (#23004)
## Problem

The gitsync worker polls every 10s and refreshes up to 50 stale
`chat_diff_status` rows **sequentially**, sharing a single 10-second
context timeout. With 50 rows × 1–3 HTTP calls each, the timeout is
exhausted quickly, causing cascading `context deadline exceeded` errors.
Rows with no linked OAuth token (`ErrNoTokenAvailable`) fail fast but
recur every 120s, wasting batch capacity.

## Solution

Three targeted fixes:

### 1. Concurrent refresh processing
`Refresher.Refresh()` now launches goroutines bounded by a semaphore
(`defaultConcurrency = 10`). Provider/token resolution remains
sequential (fast DB lookups); only the HTTP calls run in parallel.
Per-group rate-limit detection uses `atomic.Pointer[RateLimitError]`
with best-effort skip of remaining rows — a rate-limit hit on one
provider doesn't stall requests to other providers.

### 2. Decoupled tick timeout
New `defaultTickTimeout = 30s`, separate from `defaultInterval = 10s`.
The `tick()` method uses `tickTimeout` for its context deadline, giving
concurrent HTTP calls enough headroom to complete without stalling the
next polling cycle.

### 3. Longer backoff for no-token errors
New `NoTokenBackoff = 10 * time.Minute` (exported). When `errors.Is(err,
ErrNoTokenAvailable)`, the worker applies a 10-minute backoff instead of
`DiffStatusTTL` (2 minutes). Retrying every 2 minutes is pointless until
the user manually links their external auth account.

## Design decisions

- Both `NewRefresher` and `NewWorker` accept variadic option functions
(`RefresherOption`, `WorkerOption`) for backward compatibility —
existing callers in `coderd/coderd.go` need no changes.
- `WithConcurrency(n)` and `WithTickTimeout(d)` are available for tests
and future tuning.
- Added `resolvedGroup` struct to cleanly separate the pre-resolution
phase from the concurrent execution phase.

## Testing

- **`TestRefresher_RateLimitSkipsRemainingInGroup`** — rewritten to be
goroutine-order-independent (verifies aggregate counts instead of
per-index results).
- **`TestRefresher_ConcurrentProcessing`** — new test using a gate
channel to prove N goroutines enter `FetchPullRequestStatus`
simultaneously.
- **`TestWorker_RefresherError_BacksOffRow`** — rewritten to use
branch-name-based failure determination instead of non-deterministic
`callCount`.
- **`TestWorker_NoTokenBackoff`** — new test verifying
`ErrNoTokenAvailable` triggers 10-minute backoff.
- All tests pass under `-race -count=3`.
2026-03-12 18:08:06 +00:00
..