Skip to content

fix(agent): preserve tool_call structure across turns + Markdown->tool_call salvage parser — stop the apr-code text-loop that defeats tool-calling (CCPA m296)#2245

Open
noahgift wants to merge 1 commit into
mainfrom
beat/apr-code-harness-toolcall-retention
Open

fix(agent): preserve tool_call structure across turns + Markdown->tool_call salvage parser — stop the apr-code text-loop that defeats tool-calling (CCPA m296)#2245
noahgift wants to merge 1 commit into
mainfrom
beat/apr-code-harness-toolcall-retention

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Grounded bug (CCPA m296)

The apr-code agentic loop had a harness bug independent of the model. Even a format-correct model (one that emits a valid <tool_call>) reverted to 0/N tool_calls across a multi-turn run because a prior assistant TOOL_CALL turn could be retained / re-rendered as raw Markdown prose, re-priming prose mode — a self-reinforcing text loop. This is a hard prerequisite: until fixed, any fine-tune result is uninterpretable.

What was wrong

runtime.rs (the EndTurn branch) pushed response.text verbatim into multi-turn history as Message::Assistant(...). When the driver's parser failed to recognize a model's tool-call shape — anything outside the exact <tool_call> / ```json envelope (a bare {"name","input"} object, or a ```tool_call/```rust fence) — that turn was scored as inert prose and its raw tool-call Markdown was re-injected into the next turn's prompt, eroding tool-calling.

The fix — two correctness surfaces

  1. Salvage parser (realizar.rs::salvage_tool_calls): when the envelope parser finds nothing, conservatively recover a tool call from (a) a generically-fenced block (any language tag or none) whose body is tool-call JSON, or (b) a bare top-level {"name","input"} JSON object. Only objects with a string name AND an input field are salvaged — plain JSON / prose are never mistaken for tool calls. Salvage events are logged (salvage-N ids). Shared parser, so it applies to both the embedded RealizarDriver and the apr serve HTTP path. This recovers the "model almost emitted a tool_call" near-misses.

  2. Structured retention (runtime.rs::retain_assistant_text): an assistant turn's text is stripped of lingering <tool_call>/<tool_result> markup before it enters history, so a tool-using turn is never re-rendered as capability-breaking raw Markdown that re-primes prose mode. Genuine prose passes through unchanged; the structured AssistantToolUse/ToolResult messages already carry that turn's tool semantics (chat_template.rs renders them as the canonical <tool_call> + <tool_result> envelope — and there is no ### Continue: prose nudge after a tool turn).

Falsifiers (mutation-verified — oracle = the structured/expected render)

  • falsify_toolcall_retention_001 — next-turn render of a prior tool-call turn preserves <tool_call>+<tool_result> and contains no ### Continue: nudge.
  • falsify_toolcall_retention_002 — history keeps the structured tool messages, never raw <tool_call> markup as Assistant prose.
  • falsify_toolcall_retention_003retain_assistant_text strips residue, keeps prose.
  • test_salvage_* (realizar) — recover bare/fenced tool-call JSON; reject plain JSON and name-without-input; envelope still takes precedence.

Mutation results: making retain_assistant_text an identity turns 003 RED; reverting salvage to envelope-only turns the 3 salvage-recovery tests RED. (Per feedback_contracts_ratchet_not_radar.)

Contract

contracts/apr-code-toolcall-retention-v1.yaml (OBLIG-APR-CODE-TOOLCALL-RETENTION) — kind: kernel, 5 single-line cargo test falsifier refs. pv validate + pv lint contracts/ PASS (0 errors / 0 warnings on this file).

Green

  • cargo test -p aprender-orchestrate --lib: 6514 pass. The lone failure is agent::tool::mcp_client::test_discover_tools_via_echo — a pre-existing subprocess/stdio flake unrelated to this change (1 fail / 3 runs, different module; passes on pristine origin/main and 2/3 with this change).
  • cargo clippy -p aprender-orchestrate --all-targets: clean (exit 0).
  • cargo fmt: clean.

🤖 Generated with Claude Code

…l_call salvage parser — stop the apr-code text-loop that defeats tool-calling (CCPA m296)

GROUNDED BUG (CCPA m296 distill feasibility spike): the apr-code agentic loop
had a HARNESS bug independent of the model. Even a format-correct model (one
that emits a valid <tool_call>) reverted to 0/N tool_calls across a multi-turn
run because a prior assistant TOOL_CALL turn could be retained / re-rendered as
raw Markdown prose, re-priming prose mode — a self-reinforcing text loop. Until
fixed, ANY fine-tune result is uninterpretable.

WHAT WAS WRONG
- runtime.rs (EndTurn branch) pushed `response.text` verbatim into multi-turn
  history as `Message::Assistant(...)`. When the driver's parser failed to
  recognize a model's tool-call shape (anything outside the exact <tool_call> /
  ```json envelope — a bare {"name","input"} object or a ```tool_call/```rust
  fence), that turn was scored as inert prose AND its raw tool-call Markdown was
  re-injected into the next turn's prompt, eroding tool-calling.

THE FIX (two correctness surfaces)
1. SALVAGE PARSER (realizar.rs `salvage_tool_calls`): when the envelope parser
   finds nothing, conservatively recover a tool call from (a) a generically-
   fenced block (any language tag or none) whose body is tool-call JSON, or (b)
   a bare top-level {"name","input"} JSON object. Only objects with a string
   `name` AND an `input` field are salvaged — plain JSON / prose are never
   mistaken for tool calls. Salvage events are logged (id `salvage-N`). Applies
   to BOTH the embedded RealizarDriver and the apr-serve HTTP path (shared
   parser). This recovers the "model almost emitted a tool_call" near-misses.
2. STRUCTURED RETENTION (runtime.rs `retain_assistant_text`): an assistant
   turn's text is stripped of lingering <tool_call>/<tool_result> markup before
   it enters history, so a tool-using turn is NEVER re-rendered as capability-
   breaking raw Markdown that re-primes prose mode. Genuine prose passes through
   unchanged; the structured AssistantToolUse/ToolResult messages already carry
   that turn's tool semantics (chat_template.rs renders them as the canonical
   <tool_call> + <tool_result> envelope — no "### Continue:" prose nudge).

FALSIFIERS (mutation-verified, oracle = the structured/expected render)
- falsify_toolcall_retention_001: next-turn render of a prior tool-call turn
  preserves <tool_call>+<tool_result> and contains no "### Continue:" nudge.
- falsify_toolcall_retention_002: history keeps structured tool messages, never
  raw <tool_call> markup as Assistant prose.
- falsify_toolcall_retention_003: retain_assistant_text strips residue, keeps
  prose.
- test_salvage_* (realizar): recover bare/fenced tool-call JSON; reject plain
  JSON / name-without-input; envelope still takes precedence.
- Mutation verified: making retain_assistant_text identity turns 003 RED;
  reverting salvage to envelope-only turns the 3 salvage-recovery tests RED.

CONTRACT: contracts/apr-code-toolcall-retention-v1.yaml
(OBLIG-APR-CODE-TOOLCALL-RETENTION) — kind: kernel, 5 single-line cargo-test
falsifier refs. `pv validate` + `pv lint contracts/` PASS (0 errors/0 warnings
on this file).

GREEN: cargo test -p aprender-orchestrate --lib (6514 pass; the lone failure is
agent::tool::mcp_client::test_discover_tools_via_echo — a pre-existing
subprocess/stdio flake unrelated to this change, 1 fail / 3 runs, different
module). clippy -p aprender-orchestrate --all-targets clean (exit 0). fmt clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge June 25, 2026 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant