Adds a new **Provider Configuration** reference page (`providers.md`) covering: - The migration from environment-variable-based provider config to database-backed management introduced in v2.34, including the one-time seeding behavior and deprecation of `CODER_AI_GATEWAY_PROVIDER_<N>_*` and related flags - All supported provider types (`openai`, `anthropic`, `bedrock`, `copilot`, `azure`, `google`, `openrouter`, `vercel`, `openai-compat`) with setup notes for each - Provider lifecycle statuses (`enabled`, `disabled`, `error`) and their effect on request handling - Reload behavior and how configuration changes apply without restarting `coderd` - Bring Your Own Key (BYOK) and failure mode reference table Updates **Setup** (`setup.md`) to replace the environment-variable-based provider configuration instructions with dashboard-driven steps (Add provider form, provider list, edit/disable flow), referencing the new `providers.md` page for deeper detail. Screenshots of the provider list, add, and edit forms are included. Adds a **Provider metrics** section to **Monitoring** (`monitoring.md`) documenting the `coder_aibridged_*` and `coder_aibridgeproxyd_*` Prometheus metrics for provider status and reload timestamps, along with two suggested PromQL alert queries.
9.8 KiB
Monitoring
Note
AI Gateway requires the AI Governance Add-On. As of Coder v2.32, deployments without the add-on will not be able to access AI Gateway.
AI Gateway records the last user prompt, token usage, model reasoning, and every tool invocation for each intercepted request. Each capture is tied to a single "interception" that maps back to the authenticated Coder identity, making it easy to attribute spend and behaviour.
We provide an example Grafana dashboard that you can import as a starting point for your metrics. See the Grafana dashboard README.
These logs and metrics can be used to determine usage patterns, track costs, and evaluate tooling adoption.
Provider metrics
aibridged (the in-process daemon) and aibridgeproxyd (the external
proxy) each export Prometheus metrics describing the configured
provider pool and its reload loop. See
Provider Configuration for the lifecycle these
metrics describe.
| Metric | Type | Labels | Purpose |
|---|---|---|---|
coder_aibridged_provider_info |
gauge | provider_name, provider_type, status |
One series per configured provider. Value is always 1; the status label (enabled, disabled, error) carries the alertable signal. |
coder_aibridged_providers_last_reload_timestamp_seconds |
gauge | Unix timestamp of the last reload attempt, success or failure. | |
coder_aibridged_providers_last_reload_success_timestamp_seconds |
gauge | Unix timestamp of the last reload that successfully refreshed the pool. | |
coder_aibridgeproxyd_provider_info |
gauge | provider_name, provider_type, status |
Same shape as aibridged_provider_info but reported by the external proxy. |
coder_aibridgeproxyd_providers_last_reload_timestamp_seconds |
gauge | Last reload attempt timestamp in aibridgeproxyd. |
|
coder_aibridgeproxyd_providers_last_reload_success_timestamp_seconds |
gauge | Last successful reload timestamp in aibridgeproxyd. |
|
coder_aibridgeproxyd_connect_sessions_total |
counter | type (mitm, tunneled) |
CONNECT sessions established by the proxy. |
coder_aibridgeproxyd_mitm_requests_total |
counter | provider |
MITM requests handled. |
coder_aibridgeproxyd_inflight_mitm_requests |
gauge | provider |
In-flight MITM requests. |
coder_aibridgeproxyd_mitm_responses_total |
counter | code, provider |
MITM responses by HTTP status code. |
Suggested alerts
Alert on any provider entering a non-enabled status:
sum by (provider_name, status) (coder_aibridged_provider_info{status!="enabled"}) > 0
Alert when the reload loop is firing but failing to refresh the pool for longer than a few minutes:
(coder_aibridged_providers_last_reload_timestamp_seconds
- coder_aibridged_providers_last_reload_success_timestamp_seconds) > 300
Repeat the same query against coder_aibridgeproxyd_* if you run the
external proxy.
Structured Logging
AI Bridge can emit structured logs for every interception event to your existing log pipeline. This is useful for exporting data to external SIEM or observability platforms. See Structured Logging in the setup guide for configuration and a full list of record types.
Exporting Data
AI Gateway interception data can be exported for external analysis, compliance reporting, or integration with log aggregation systems.
REST API
You can retrieve AI Gateway sessions via the Coder API, with filtering and pagination support.
curl -X GET "https://coder.example.com/api/v2/aibridge/sessions" \
-H "Coder-Session-Token: $CODER_SESSION_TOKEN"
Available query filters:
-
client- Filter by client name.Possible
clientvaluesNote
Client classification is done on best effort basis using the
User-Agentheader; not all clients send these headers in an easily-identifiable manner.Claude CodeCodexZedGitHub Copilot (VS Code)GitHub Copilot (CLI)Kilo CodeCoder AgentsMuxCursorUnknown
-
initiator- Filter by user ID or username -
provider- Filter by AI provider (e.g.,openai,anthropic) -
model- Filter by model name -
started_after- Filter interceptions after a timestamp -
started_before- Filter interceptions before a timestamp
See the API documentation for full details.
CLI
Export interceptions as JSON using the CLI:
coder aibridge interceptions list --initiator me --limit 1000
You can filter by time range, provider, model, and user:
coder aibridge interceptions list \
--started-after "2025-01-01T00:00:00Z" \
--started-before "2025-02-01T00:00:00Z" \
--provider anthropic
See coder aibridge interceptions list --help for all options.
Data Retention
AI Gateway data is retained for 60 days by default. Configure the retention period to balance storage costs with your organization's compliance and analysis needs.
For configuration options and details, see Data Retention in the AI Gateway setup guide.
Tracing
AI Gateway supports tracing via OpenTelemetry, providing visibility into request processing, upstream API calls, and MCP server interactions.
Enabling Tracing
AI Gateway tracing is enabled when tracing is enabled for the Coder server.
To enable tracing set CODER_TRACE_ENABLE environment variable or
--trace CLI flag:
export CODER_TRACE_ENABLE=true
coder server --trace
What is Traced
AI Gateway creates spans for the following operations:
| Span Name | Description |
|---|---|
CachedBridgePool.Acquire |
Acquiring a request bridge instance from the pool |
Intercept |
Top-level span for processing an intercepted request |
Intercept.CreateInterceptor |
Creating the request interceptor |
Intercept.ProcessRequest |
Processing the request through the bridge |
Intercept.ProcessRequest.Upstream |
Forwarding the request to the upstream AI provider |
Intercept.ProcessRequest.ToolCall |
Executing a tool call requested by the AI model |
Intercept.RecordInterception |
Recording creating interception record |
Intercept.RecordPromptUsage |
Recording prompt/message data |
Intercept.RecordTokenUsage |
Recording token consumption |
Intercept.RecordToolUsage |
Recording tool/function calls |
Intercept.RecordInterceptionEnded |
Recording the interception as completed |
ServerProxyManager.Init |
Initializing MCP server proxy connections |
StreamableHTTPServerProxy.Init |
Setting up HTTP-based MCP server proxies |
StreamableHTTPServerProxy.Init.fetchTools |
Fetching available tools from MCP servers |
Example trace of an interception using Jaeger backend:
Capturing Logs in Traces
Note
Enabling log capture may generate a large volume of trace events.
To include log messages as trace events, enable trace log capture
by setting CODER_TRACE_LOGS environment variable or using
--trace-logs flag:
export CODER_TRACE_ENABLE=true
export CODER_TRACE_LOGS=true
coder server --trace --trace-logs


