Commit Graph

4 Commits

Author SHA1 Message Date
Hugo Dutka 48ab492f49 feat: agents git watch backend (#22565)
Adds real-time git status watching for workspace agents, so the frontend
can subscribe over WebSocket and show
git file changes in near real-time.

1. Subscription is scoped to a **chat** via `GET
/api/experimental/chats/{chat}/git/watch`.
2. The workspace agent automatically determines which paths to watch
based on tool calls made by the chat (and its ancestor chats).
3. Workspace agent polls subscribed repo working trees on a 30s
interval, on tools calls, and on explicit `refresh` from the client.
4. Scans are rate-limited to at most once per second.
5. Edited paths are tracked **in-memory** inside the workspace agent.
There is no database persistence — state is lost on agent restart. This
will be addresses in a future PR.
6. Messages sent over WebSocket include a full-repo snapshot (unified
diff, branch, origin). A new message is emitted only when the snapshot
changes.

This PR was implemented with AI with me closely controlling what it's
doing. The code follows a plan file that was updated continuously during
implementation. Here's the file if you'd like to see it:
[project.md](https://gist.github.com/hugodutka/8722cf80c92f8a56555f7bc595b770e2).
It reflects the current state of the PR.
2026-03-06 10:47:55 +01:00
Kyle Carberry 5945febf06 feat(agent): add fuzzy whitespace matching to edit_files tool (#22446)
Inspired by openai/codex's `apply_patch` implementation, this changes
the `edit_files` search-and-replace to use a cascading match strategy
when the exact search string isn't found:

1. **Exact substring match** (byte-for-byte) — existing behavior,
unchanged
2. **Line-by-line match ignoring trailing whitespace** — handles
trailing spaces/tabs the LLM omits
3. **Line-by-line match ignoring all leading/trailing whitespace** —
handles tabs-vs-spaces and wrong indentation depth

## Problem

When the chat agent uses `edit_files`, it generates a search string that
must match the file content exactly. LLMs frequently get whitespace
wrong:
- Emitting spaces when the file uses tabs (or vice versa)
- Getting the indentation depth wrong by one or more levels
- Omitting trailing whitespace that exists in the file

When this happens, the edit silently does nothing, and the agent falls
into a retry loop using `cat -A` to diagnose the exact whitespace
characters.

## Solution

Adopted the same cascading fuzzy match strategy that [openai/codex uses
in
`seek_sequence.rs`](https://github.com/openai/codex/blob/main/codex-rs/apply-patch/src/seek_sequence.rs):

- Pass 1: exact match (existing behavior)
- Pass 2: `TrimRight` each line before comparing (trailing whitespace
tolerance)
- Pass 3: `TrimSpace` each line before comparing (full indentation
tolerance)

When a fuzzy match is found, the matched lines in the original file are
replaced with the replacement text. This preserves surrounding content
exactly.

## Changes

- `agent/agentfiles/files.go`: Replaced `icholy/replace` streaming
transformer with in-memory `fuzzyReplace` + helper functions
(`seekLines`, `spliceLines`)
- `agent/agentfiles/files_test.go`: Added 6 new test cases covering
trailing whitespace, tabs-vs-spaces, different indent depths, exact
match preference, no-match behavior, and mixed whitespace multiline
edits
- Removed `icholy/replace` dependency from go.mod/go.sum

---------

Co-authored-by: Kyle Carberry <kylecarbs@users.noreply.github.com>
2026-02-28 17:02:57 -05:00
Kyle Carberry b65c0766d2 feat: add line-based read_file tool with safety limits (#22400)
## Summary

Adds a new line-based file reading endpoint to the workspace agent,
replacing the unbounded byte-based approach for the `read_file` chat
tool and `coder_workspace_read_file` MCP tool.

**Problem**: The current `read_file` tool returns the entire file
contents with no limits, which can blow up LLM context windows and cause
OOM issues with large files.

**Solution**: Inspired by [`coder/mux`](https://github.com/coder/mux)
and [`openai/codex`](https://github.com/openai/codex), implement a
line-based reader with safety limits.

## Changes

### Agent (`agent/agentfiles/`)
- New `/read-file-lines` endpoint with `HandleReadFileLines` handler
- Line-based `offset` (1-based line number, default: 1) and `limit`
(line count, default: 2000)
- Safety constants:
  | Constant | Value | Purpose |
  |---|---|---|
  | `MaxFileSize` | 1 MB | Reject files larger than this at stat |
| `MaxLineBytes` | 1,024 | Per-line truncation with `... [truncated]`
marker |
  | `MaxResponseLines` | 2,000 | Max lines per response |
  | `MaxResponseBytes` | 32 KB | Max total response size |
  | `DefaultLineLimit` | 2,000 | Default when no limit specified |
- Line numbering format: `1\tcontent` (tab-separated)
- Structured JSON response: `{ success, file_size, total_lines,
lines_read, content, error }`
- Hard errors when limits exceeded — tells the LLM to use
`offset`/`limit`
- Existing byte-based `/read-file` endpoint preserved (used by
`instruction.go`)

### SDK (`codersdk/workspacesdk/`)
- `ReadFileLinesResponse` type added
- `ReadFileLines` method added to `AgentConn` interface
- Mock regenerated

### Chat tool (`coderd/chatd/chattool/`)
- `read_file` tool now uses `conn.ReadFileLines()` instead of
`conn.ReadFile()`
- Updated tool description to document line-based parameters
- Response includes `file_size`, `total_lines`, `lines_read` metadata

### MCP tool (`codersdk/toolsdk/`)
- `coder_workspace_read_file` updated to use line-based reading
- Schema descriptions updated for line-based offset/limit
- Removed `maxFileLimit` constant (agent handles limits now)

### Tests
- 13 new test cases for `TestReadFileLines`:
- Path validation (empty, relative, non-existent, directory, no
permissions)
  - Empty file handling
  - Basic read, offset, limit, offset+limit combinations
  - Offset beyond file length
  - Long line truncation (>1024 bytes)
  - Large file rejection (>1MB)
- All existing tests pass unchanged

## Design decisions

| Decision | Rationale |
|---|---|
| Line-based, not byte-based | Both coder/mux and openai/codex use
line-based — matches how LLMs reason about code |
| Default limit of 2000 | Matches codex; prevents accidental full-file
dumps while being generous |
| 32 KB response cap | Compromise between mux (16 KB) and codex (no cap)
|
| 1024 byte/line truncation with marker | More generous than codex
(500), marker helps LLM know data is missing |
| Hard errors on overflow | Matches mux; forces LLM to paginate rather
than getting partial data |
| Preserve byte-based endpoint | `instruction.go` needs raw byte access
for AGENTS.md |
2026-02-27 15:12:56 -05:00
Asher ff9ed91811 chore: move agent's file API into separate package (#21531)
This makes it so we can test it directly without having to go through
Tailnet, which appears to be causing flakes in CI where the requests
time out and never make it to the agent.

Takes inspiration from the container-related API endpoints.

Would probably make sense to refactor the ls tests to also go through
the API (rather than be internal tests like they are currently) but I
left those alone for now to keep the diff minimal.
2026-01-16 17:03:17 -09:00