fix(agent): preserve tool_call structure across turns + Markdown->tool_call salvage parser — stop the apr-code text-loop that defeats tool-calling (CCPA m296) by noahgift · Pull Request #2245 · paiml/aprender

noahgift · 2026-06-25T19:21:03Z

Grounded bug (CCPA m296)

The apr-code agentic loop had a harness bug independent of the model. Even a format-correct model (one that emits a valid <tool_call>) reverted to 0/N tool_calls across a multi-turn run because a prior assistant TOOL_CALL turn could be retained / re-rendered as raw Markdown prose, re-priming prose mode — a self-reinforcing text loop. This is a hard prerequisite: until fixed, any fine-tune result is uninterpretable.

What was wrong

runtime.rs (the EndTurn branch) pushed response.text verbatim into multi-turn history as Message::Assistant(...). When the driver's parser failed to recognize a model's tool-call shape — anything outside the exact <tool_call> / ```json envelope (a bare {"name","input"} object, or a ```tool_call/```rust fence) — that turn was scored as inert prose and its raw tool-call Markdown was re-injected into the next turn's prompt, eroding tool-calling.

The fix — two correctness surfaces

Salvage parser (realizar.rs::salvage_tool_calls): when the envelope parser finds nothing, conservatively recover a tool call from (a) a generically-fenced block (any language tag or none) whose body is tool-call JSON, or (b) a bare top-level {"name","input"} JSON object. Only objects with a string name AND an input field are salvaged — plain JSON / prose are never mistaken for tool calls. Salvage events are logged (salvage-N ids). Shared parser, so it applies to both the embedded RealizarDriver and the apr serve HTTP path. This recovers the "model almost emitted a tool_call" near-misses.
Structured retention (runtime.rs::retain_assistant_text): an assistant turn's text is stripped of lingering <tool_call>/<tool_result> markup before it enters history, so a tool-using turn is never re-rendered as capability-breaking raw Markdown that re-primes prose mode. Genuine prose passes through unchanged; the structured AssistantToolUse/ToolResult messages already carry that turn's tool semantics (chat_template.rs renders them as the canonical <tool_call> + <tool_result> envelope — and there is no ### Continue: prose nudge after a tool turn).

Falsifiers (mutation-verified — oracle = the structured/expected render)

falsify_toolcall_retention_001 — next-turn render of a prior tool-call turn preserves <tool_call>+<tool_result> and contains no ### Continue: nudge.
falsify_toolcall_retention_002 — history keeps the structured tool messages, never raw <tool_call> markup as Assistant prose.
falsify_toolcall_retention_003 — retain_assistant_text strips residue, keeps prose.
test_salvage_* (realizar) — recover bare/fenced tool-call JSON; reject plain JSON and name-without-input; envelope still takes precedence.

Mutation results: making retain_assistant_text an identity turns 003 RED; reverting salvage to envelope-only turns the 3 salvage-recovery tests RED. (Per feedback_contracts_ratchet_not_radar.)

Contract

contracts/apr-code-toolcall-retention-v1.yaml (OBLIG-APR-CODE-TOOLCALL-RETENTION) — kind: kernel, 5 single-line cargo test falsifier refs. pv validate + pv lint contracts/ PASS (0 errors / 0 warnings on this file).

Green

cargo test -p aprender-orchestrate --lib: 6514 pass. The lone failure is agent::tool::mcp_client::test_discover_tools_via_echo — a pre-existing subprocess/stdio flake unrelated to this change (1 fail / 3 runs, different module; passes on pristine origin/main and 2/3 with this change).
cargo clippy -p aprender-orchestrate --all-targets: clean (exit 0).
cargo fmt: clean.

🤖 Generated with Claude Code

…l_call salvage parser — stop the apr-code text-loop that defeats tool-calling (CCPA m296) GROUNDED BUG (CCPA m296 distill feasibility spike): the apr-code agentic loop had a HARNESS bug independent of the model. Even a format-correct model (one that emits a valid <tool_call>) reverted to 0/N tool_calls across a multi-turn run because a prior assistant TOOL_CALL turn could be retained / re-rendered as raw Markdown prose, re-priming prose mode — a self-reinforcing text loop. Until fixed, ANY fine-tune result is uninterpretable. WHAT WAS WRONG - runtime.rs (EndTurn branch) pushed `response.text` verbatim into multi-turn history as `Message::Assistant(...)`. When the driver's parser failed to recognize a model's tool-call shape (anything outside the exact <tool_call> / ```json envelope — a bare {"name","input"} object or a ```tool_call/```rust fence), that turn was scored as inert prose AND its raw tool-call Markdown was re-injected into the next turn's prompt, eroding tool-calling. THE FIX (two correctness surfaces) 1. SALVAGE PARSER (realizar.rs `salvage_tool_calls`): when the envelope parser finds nothing, conservatively recover a tool call from (a) a generically- fenced block (any language tag or none) whose body is tool-call JSON, or (b) a bare top-level {"name","input"} JSON object. Only objects with a string `name` AND an `input` field are salvaged — plain JSON / prose are never mistaken for tool calls. Salvage events are logged (id `salvage-N`). Applies to BOTH the embedded RealizarDriver and the apr-serve HTTP path (shared parser). This recovers the "model almost emitted a tool_call" near-misses. 2. STRUCTURED RETENTION (runtime.rs `retain_assistant_text`): an assistant turn's text is stripped of lingering <tool_call>/<tool_result> markup before it enters history, so a tool-using turn is NEVER re-rendered as capability- breaking raw Markdown that re-primes prose mode. Genuine prose passes through unchanged; the structured AssistantToolUse/ToolResult messages already carry that turn's tool semantics (chat_template.rs renders them as the canonical <tool_call> + <tool_result> envelope — no "### Continue:" prose nudge). FALSIFIERS (mutation-verified, oracle = the structured/expected render) - falsify_toolcall_retention_001: next-turn render of a prior tool-call turn preserves <tool_call>+<tool_result> and contains no "### Continue:" nudge. - falsify_toolcall_retention_002: history keeps structured tool messages, never raw <tool_call> markup as Assistant prose. - falsify_toolcall_retention_003: retain_assistant_text strips residue, keeps prose. - test_salvage_* (realizar): recover bare/fenced tool-call JSON; reject plain JSON / name-without-input; envelope still takes precedence. - Mutation verified: making retain_assistant_text identity turns 003 RED; reverting salvage to envelope-only turns the 3 salvage-recovery tests RED. CONTRACT: contracts/apr-code-toolcall-retention-v1.yaml (OBLIG-APR-CODE-TOOLCALL-RETENTION) — kind: kernel, 5 single-line cargo-test falsifier refs. `pv validate` + `pv lint contracts/` PASS (0 errors/0 warnings on this file). GREEN: cargo test -p aprender-orchestrate --lib (6514 pass; the lone failure is agent::tool::mcp_client::test_discover_tools_via_echo — a pre-existing subprocess/stdio flake unrelated to this change, 1 fail / 3 runs, different module). clippy -p aprender-orchestrate --all-targets clean (exit 0). fmt clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

noahgift enabled auto-merge June 25, 2026 19:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(agent): preserve tool_call structure across turns + Markdown->tool_call salvage parser — stop the apr-code text-loop that defeats tool-calling (CCPA m296)#2245

fix(agent): preserve tool_call structure across turns + Markdown->tool_call salvage parser — stop the apr-code text-loop that defeats tool-calling (CCPA m296)#2245
noahgift wants to merge 1 commit into
mainfrom
beat/apr-code-harness-toolcall-retention

noahgift commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

noahgift commented Jun 25, 2026

Grounded bug (CCPA m296)

What was wrong

The fix — two correctness surfaces

Falsifiers (mutation-verified — oracle = the structured/expected render)

Contract

Green

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant