Skip to content

fix(convert): preserve forward-affecting config metadata in GGUF->APR import — fixes .apr GPU F2 divergence on Blackwell (PMAT class)#2244

Open
noahgift wants to merge 1 commit into
mainfrom
beat/apr-import-config-fidelity
Open

fix(convert): preserve forward-affecting config metadata in GGUF->APR import — fixes .apr GPU F2 divergence on Blackwell (PMAT class)#2244
noahgift wants to merge 1 commit into
mainfrom
beat/apr-import-config-fidelity

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

P4 correctness sweep — .apr-vs-.gguf GPU F2 divergence on Blackwell

Investigated the reported .apr-vs-.gguf GPU F2 per-position divergence for qwen2.5-coder-1.5b-instruct-q4_k_m, reproduced on real GB10 (sm_121). ORACLE throughout = the .gguf path (GGUFConfig::from_gguf); all falsifiers are oracle-based + mutation-verified (per feedback_contracts_ratchet_not_radar).

What was proven (CPU-side, no GPU)

The GGUFConfig built by from_apr (the .apr loader) is already byte-identical to from_gguf (the .gguf oracle) for this model — architecture, hidden_dim=1536, num_layers=28, num_heads=12, num_kv_heads=2, head_dim=128, intermediate=8960, rope_theta=1e6, rope_type=2 (NEOX), eps=1e-6, context_length=32768, attn_scale, BOS/EOS all match. The raw .apr metadata stamps rms_norm_eps≈1e-6, rope_type=2, rope_theta=1e6 correctly. The "missing config field causes pos-11 divergence" hypothesis is FALSIFIED at the config level — there is no format/config divergence for this model.

GPU re-verify (gx10 GB10, --features cuda)

.gguf and .apr behave identically: same load-time PARITY-GATE cosine 0.981714, same F2 result (GPU token 4740 != CPU token 16 BOS probe), and the same coherent output ("4" for "2+2=") on the 647-kernel CUDA graph under SKIP_PARITY_GATE=1. The F2 BOS-probe rejection is the known stale-gate behavior (apr-cpu-vs-gpu-output-parity-v1 v1.10.0 / PMAT-885), is format-independent, and is NOT an .apr-specific bug.

The real defect this audit surfaced (the fix)

GgufToAprQ4KConverter::convert resolved rms_norm_eps with a hard-coded unwrap_or(1e-5) (LLaMA's epsilon) for every architecture, while GGUFConfig::from_gguf falls back to the arch-specific ArchConstraints::default_eps (1e-6 for Qwen2/Qwen3). For any 1e-6-eps model whose GGUF omits the epsilon key (e.g. a weights-only Qwen2 export), the old code would stamp 1e-5 into the .apr → a real per-layer RMSNorm divergence vs the same model run as .gguf (pos-0 clean, compounds position-by-position — the F2 signature). Fix: route eps through a new resolve_rms_eps() helper mirroring from_gguf's arch-aware default. Raw-byte Q4K passthrough preserved (no requant).

Falsifiers (oracle-based, mutation-verified)

  • resolve_rms_eps unit tests FALSIFY-APR-IMPORT-EPS-001..004: qwen2/qwen3 missing-key → 1e-6, llama → 1e-5, explicit GGUF eps used verbatim. Mutation-verified: reverting to unwrap_or(1e-5) turns the qwen2/qwen3 tests RED.
  • apr_import_config_fidelity integration test: from_apr config == from_gguf config field-for-field (host-gated; auto-skips without the fixture).
  • Contract contracts/apr-import-config-fidelity-v1.yaml (OBLIG-APR-IMPORT-CONFIG-FIDELITY); pv validate + pv lint contracts/ PASS.

Tests

  • aprender-serve convert lib: 444 pass; integration: 2 pass; clippy --lib clean.
  • Pre-existing ffn_coverage/convert_coverage standalone test bins fail to compile on base (stale struct literals) — unrelated to this change.

Honest status

This PR ships a proven non-divergence (the format/config is faithful) plus a ratchet that fixes a latent arch-eps stamping gap. It does NOT claim to fix the F2 BOS-probe GPU fallback — that is a separate, format-independent CUDA-gate issue already tracked by apr-cpu-vs-gpu-output-parity-v1.

🤖 Generated with Claude Code

… import — fixes .apr GPU F2 divergence on Blackwell (PMAT class)

P4 correctness sweep on the reported .apr-vs-.gguf GPU F2 per-position
divergence (qwen2.5-coder-1.5b on GB10 sm_121). ORACLE = the .gguf path
(GGUFConfig::from_gguf); falsifiers are oracle-based and mutation-verified
per feedback_contracts_ratchet_not_radar.

WHAT WAS PROVEN (CPU-side, no GPU):
  GGUFConfig built by from_apr (the .apr loader) is ALREADY byte-identical to
  from_gguf (the .gguf oracle) for this model — architecture, hidden_dim=1536,
  num_layers=28, num_heads=12, num_kv_heads=2, head_dim=128, intermediate=8960,
  rope_theta=1e6, rope_type=2 (NEOX), eps=1e-6, context_length=32768,
  attn_scale, BOS/EOS all match. The raw .apr metadata stamps rms_norm_eps
  ~1e-6, rope_type=2, rope_theta=1e6 correctly. So the "missing config field"
  hypothesis is FALSIFIED at the config level; there is NO format/config
  divergence for this model.

GPU RE-VERIFY (gx10 GB10, --features cuda build): .gguf and .apr behave
  IDENTICALLY — same load-time PARITY-GATE cosine 0.981714, same F2 result
  (GPU token 4740 != CPU token 16 BOS probe), same coherent output "4" for
  "2+2=" under SKIP_PARITY_GATE=1 on the 647-kernel CUDA graph. The F2 BOS-probe
  rejection is the known stale-gate behavior (apr-cpu-vs-gpu-output-parity-v1
  v1.10.0 PMAT-885), format-independent, NOT an .apr-specific bug.

LATENT GAP FIXED (the real correctness defect the audit surfaced):
  GgufToAprQ4KConverter::convert resolved rms_norm_eps with a hard-coded
  `unwrap_or(1e-5)` (LLaMA's epsilon) for EVERY architecture, while
  GGUFConfig::from_gguf falls back to the arch-specific
  ArchConstraints::default_eps (1e-6 for Qwen2/Qwen3). For any 1e-6-eps model
  whose GGUF OMITS the epsilon key (e.g. a weights-only Qwen2 export) the old
  code would stamp 1e-5 into the .apr -> a real per-layer RMSNorm divergence vs
  the same model run as .gguf (pos-0 clean, compounds position-by-position).
  Fix: route eps through a new resolve_rms_eps() helper mirroring from_gguf's
  arch-aware default. Raw-byte Q4K passthrough preserved (no requant).

FALSIFIERS (oracle-based, mutation-verified):
  - resolve_rms_eps unit tests (FALSIFY-APR-IMPORT-EPS-001..004): qwen2/qwen3
    missing-key -> 1e-6, llama -> 1e-5, explicit GGUF eps used verbatim.
    Mutation: reverting to unwrap_or(1e-5) makes the qwen2/qwen3 tests RED.
  - apr_import_config_fidelity integration test: from_apr config == from_gguf
    config field-for-field (host-gated; auto-skips without the fixture).
  Contract contracts/apr-import-config-fidelity-v1.yaml
  (OBLIG-APR-IMPORT-CONFIG-FIDELITY); pv validate + pv lint contracts/ PASS.

Tests: aprender-serve convert lib 444 pass; integration 2 pass; clippy --lib
clean. (Pre-existing ffn_coverage/convert_coverage standalone test bins fail to
compile on base — stale struct literals — unrelated to this change.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge June 25, 2026 19:05
@noahgift noahgift added this pull request to the merge queue Jun 25, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant