Anthropic replay can fail when stored history contains a
provider-executed tool call like `web_search` without the matching
provider-executed result. That orphaned call is incomplete
provider-internal state, so replaying it can make an otherwise usable
chat unreplayable even though there is no search result to preserve.
This fixes replay by dropping orphan provider-executed tool calls from
the model-visible prompt, preserving signed reasoning and the rest of
the assistant content, then revalidating before the request. We do not
synthesize tool results or drop reasoning. The database can retain the
historical artifact for inspection, while Anthropic only sees replayable
content.
This matches permissively licensed prior art. Vercel AI SDK
(Apache-2.0), used by mux, keeps incomplete tool state in UI/history but
omits it from model requests with `convertToModelMessages(..., {
ignoreIncompleteToolCalls: true })`. LangChain, LiteLLM, and OpenAI
Agents (MIT for the relevant open-source code) also preserve Anthropic
signed reasoning as opaque replay data. Coder applies that model-visible
replay boundary explicitly because our persisted history is already in
provider-message form.
This matches mux, is cleaner than the older idea around not persisting
the search query tool, and the model handles the repaired prompt fine.
Closes CODAGT-448
## Before
<img width="963" height="491" alt="image"
src="https://github.com/user-attachments/assets/a7788ebf-2728-4420-90cf-5e4f6905bdf7"
/>
## After
<img width="842" height="513" alt="image"
src="https://github.com/user-attachments/assets/ae39c262-7586-4e2d-b7db-1b639a7e8e15"
/>
Anthropic is strict about replaying the latest assistant turn once it
contains signed or redacted reasoning. We were still mutating that turn
in a few Coder-owned places: dropping empty reasoning blocks on replay,
rewriting provider-tool history during sanitization, and in the worst
case sending a prompt we already knew Anthropic would reject.
This patch keeps the latest signed assistant immutable through Coder's
replay and sanitization paths, preserves empty signed or redacted
reasoning anywhere Coder owns the ledger, and fails before the provider
call if the prompt is still unsafe.
It also bumps the existing `coder/fantasy` `coder_2_33` fork that `main`
already uses to the commit containing coder/fantasy#35. These fixes have
also been upstreamed to charmbracelet/fantasy.
Closes CODAGT-409.
## Problem
Anthropic returns HTTP 400 when an assistant message contains a
`web_search_tool_result` block whose `tool_use_id` has no matching
earlier `server_tool_use` block in the same assistant message. A
previous fix (#24706) sanitized provider-executed tool calls without
matching results, but the opposite direction, orphaned or misordered
provider-executed results, could still slip through both the prompt
sanitizer and the persistence path.
## Fix
Tighten Anthropic provider-executed tool history handling while
preserving the useful result payload as normal assistant text when the
provider-tool metadata is unsafe.
1. Extract Anthropic provider-tool sanitization into
`coderd/x/chatd/chatsanitize` so provider-specific repair logic is no
longer spread through `chatprompt` and `chatloop`.
2. `chatsanitize.SanitizeAnthropicProviderToolHistory` removes invalid
provider-executed tool structure for Anthropic prompts: orphans in
either direction, result-before-call, duplicate IDs, invalid JSON
inputs, empty IDs and tool names, unsupported tool names, mismatched
`ProviderExecuted` flags, provider-executed blocks outside assistant
messages, and web-search results without serializable Anthropic result
metadata. Provider-executed result payloads are textified instead of
being discarded when there is text to preserve.
3. `chatsanitize.SanitizeAnthropicProviderToolContent` mirrors the same
rule at the streamed step content level. Persisted history no longer
carries invalid provider-tool blocks forward, but it keeps the result
text for future turns.
4. `chatsanitize.ApplyAnthropicProviderToolGuard` only repairs
structurally invalid Anthropic provider-tool history. It no longer
strips otherwise-valid historical `web_search` blocks just because web
search is disabled for the current request. The fail-closed fallback
also textifies provider results before removing provider-tool metadata.
Tests cover prompt sanitization, validation reason strings, result
payload textification, content-level persistence sanitization, disabled
web-search history preservation, direct pre-request guard behavior, and
the fallback strip path.
> Mux is acting on Mike's behalf.