fix(convert): preserve forward-affecting config metadata in GGUF->APR import — fixes .apr GPU F2 divergence on Blackwell (PMAT class)#2244
Open
noahgift wants to merge 1 commit into
Conversation
… import — fixes .apr GPU F2 divergence on Blackwell (PMAT class)
P4 correctness sweep on the reported .apr-vs-.gguf GPU F2 per-position
divergence (qwen2.5-coder-1.5b on GB10 sm_121). ORACLE = the .gguf path
(GGUFConfig::from_gguf); falsifiers are oracle-based and mutation-verified
per feedback_contracts_ratchet_not_radar.
WHAT WAS PROVEN (CPU-side, no GPU):
GGUFConfig built by from_apr (the .apr loader) is ALREADY byte-identical to
from_gguf (the .gguf oracle) for this model — architecture, hidden_dim=1536,
num_layers=28, num_heads=12, num_kv_heads=2, head_dim=128, intermediate=8960,
rope_theta=1e6, rope_type=2 (NEOX), eps=1e-6, context_length=32768,
attn_scale, BOS/EOS all match. The raw .apr metadata stamps rms_norm_eps
~1e-6, rope_type=2, rope_theta=1e6 correctly. So the "missing config field"
hypothesis is FALSIFIED at the config level; there is NO format/config
divergence for this model.
GPU RE-VERIFY (gx10 GB10, --features cuda build): .gguf and .apr behave
IDENTICALLY — same load-time PARITY-GATE cosine 0.981714, same F2 result
(GPU token 4740 != CPU token 16 BOS probe), same coherent output "4" for
"2+2=" under SKIP_PARITY_GATE=1 on the 647-kernel CUDA graph. The F2 BOS-probe
rejection is the known stale-gate behavior (apr-cpu-vs-gpu-output-parity-v1
v1.10.0 PMAT-885), format-independent, NOT an .apr-specific bug.
LATENT GAP FIXED (the real correctness defect the audit surfaced):
GgufToAprQ4KConverter::convert resolved rms_norm_eps with a hard-coded
`unwrap_or(1e-5)` (LLaMA's epsilon) for EVERY architecture, while
GGUFConfig::from_gguf falls back to the arch-specific
ArchConstraints::default_eps (1e-6 for Qwen2/Qwen3). For any 1e-6-eps model
whose GGUF OMITS the epsilon key (e.g. a weights-only Qwen2 export) the old
code would stamp 1e-5 into the .apr -> a real per-layer RMSNorm divergence vs
the same model run as .gguf (pos-0 clean, compounds position-by-position).
Fix: route eps through a new resolve_rms_eps() helper mirroring from_gguf's
arch-aware default. Raw-byte Q4K passthrough preserved (no requant).
FALSIFIERS (oracle-based, mutation-verified):
- resolve_rms_eps unit tests (FALSIFY-APR-IMPORT-EPS-001..004): qwen2/qwen3
missing-key -> 1e-6, llama -> 1e-5, explicit GGUF eps used verbatim.
Mutation: reverting to unwrap_or(1e-5) makes the qwen2/qwen3 tests RED.
- apr_import_config_fidelity integration test: from_apr config == from_gguf
config field-for-field (host-gated; auto-skips without the fixture).
Contract contracts/apr-import-config-fidelity-v1.yaml
(OBLIG-APR-IMPORT-CONFIG-FIDELITY); pv validate + pv lint contracts/ PASS.
Tests: aprender-serve convert lib 444 pass; integration 2 pass; clippy --lib
clean. (Pre-existing ffn_coverage/convert_coverage standalone test bins fail to
compile on base — stale struct literals — unrelated to this change.)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
P4 correctness sweep —
.apr-vs-.ggufGPU F2 divergence on BlackwellInvestigated the reported
.apr-vs-.ggufGPU F2 per-position divergence forqwen2.5-coder-1.5b-instruct-q4_k_m, reproduced on real GB10 (sm_121). ORACLE throughout = the.ggufpath (GGUFConfig::from_gguf); all falsifiers are oracle-based + mutation-verified (perfeedback_contracts_ratchet_not_radar).What was proven (CPU-side, no GPU)
The
GGUFConfigbuilt byfrom_apr(the.aprloader) is already byte-identical tofrom_gguf(the.gguforacle) for this model — architecture, hidden_dim=1536, num_layers=28, num_heads=12, num_kv_heads=2, head_dim=128, intermediate=8960, rope_theta=1e6, rope_type=2 (NEOX), eps=1e-6, context_length=32768, attn_scale, BOS/EOS all match. The raw.aprmetadata stampsrms_norm_eps≈1e-6,rope_type=2,rope_theta=1e6correctly. The "missing config field causes pos-11 divergence" hypothesis is FALSIFIED at the config level — there is no format/config divergence for this model.GPU re-verify (gx10 GB10,
--features cuda).ggufand.aprbehave identically: same load-time PARITY-GATE cosine0.981714, same F2 result (GPU token 4740 != CPU token 16BOS probe), and the same coherent output ("4"for"2+2=") on the 647-kernel CUDA graph underSKIP_PARITY_GATE=1. The F2 BOS-probe rejection is the known stale-gate behavior (apr-cpu-vs-gpu-output-parity-v1v1.10.0 / PMAT-885), is format-independent, and is NOT an.apr-specific bug.The real defect this audit surfaced (the fix)
GgufToAprQ4KConverter::convertresolvedrms_norm_epswith a hard-codedunwrap_or(1e-5)(LLaMA's epsilon) for every architecture, whileGGUFConfig::from_gguffalls back to the arch-specificArchConstraints::default_eps(1e-6 for Qwen2/Qwen3). For any 1e-6-eps model whose GGUF omits the epsilon key (e.g. a weights-only Qwen2 export), the old code would stamp1e-5into the.apr→ a real per-layer RMSNorm divergence vs the same model run as.gguf(pos-0 clean, compounds position-by-position — the F2 signature). Fix: route eps through a newresolve_rms_eps()helper mirroringfrom_gguf's arch-aware default. Raw-byte Q4K passthrough preserved (no requant).Falsifiers (oracle-based, mutation-verified)
resolve_rms_epsunit tests FALSIFY-APR-IMPORT-EPS-001..004: qwen2/qwen3 missing-key → 1e-6, llama → 1e-5, explicit GGUF eps used verbatim. Mutation-verified: reverting tounwrap_or(1e-5)turns the qwen2/qwen3 tests RED.apr_import_config_fidelityintegration test:from_aprconfig ==from_ggufconfig field-for-field (host-gated; auto-skips without the fixture).contracts/apr-import-config-fidelity-v1.yaml(OBLIG-APR-IMPORT-CONFIG-FIDELITY);pv validate+pv lint contracts/PASS.Tests
aprender-serveconvert lib: 444 pass; integration: 2 pass;clippy --libclean.ffn_coverage/convert_coveragestandalone test bins fail to compile on base (stale struct literals) — unrelated to this change.Honest status
This PR ships a proven non-divergence (the format/config is faithful) plus a ratchet that fixes a latent arch-eps stamping gap. It does NOT claim to fix the F2 BOS-probe GPU fallback — that is a separate, format-independent CUDA-gate issue already tracked by
apr-cpu-vs-gpu-output-parity-v1.🤖 Generated with Claude Code