[BEAM] Fix Python VarIntCoder OverflowError on uint64 values#39047
Draft
AviKndr wants to merge 1 commit into
Draft
[BEAM] Fix Python VarIntCoder OverflowError on uint64 values#39047AviKndr wants to merge 1 commit into
AviKndr wants to merge 1 commit into
Conversation
The Cython write_var_int64/get_varint_size stream methods take a signed int64_t parameter. A Python int in the unsigned 64-bit range [2**63, 2**64) -- a uint64 -- is converted to that signed parameter at the call boundary and rejected with an OverflowError before the method body runs, even though the body already operates on the unsigned bit pattern and the VarInt wire encoding is well-defined. Fold such values to the signed int64 with the identical bit pattern (and thus identical VarInt encoding) before handing them to the stream. This matches Java's signed VarIntCoder on the wire; decoding remains signed. Values past 64 bits are left unchanged and still overflow downstream, preserving the coder's documented 64-bit limit. Adds test_varint_coder_uint64 covering no-overflow encoding, wire equivalence to the signed twin, size estimation, signed decode, and the still-raising out-of-range case (guarded on the compiled implementation). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
VarIntCoderraisesOverflowErrorwhen handed a Python int in the unsigned 64-bit range[2**63, 2**64)(a uint64), even though such a value has a well-defined VarInt encoding.Why
In the Cython build, the stream methods take a signed
int64_t:The body re-casts to
uint64_tand encodes the bit pattern correctly, but Cython converts the incoming Python int to the signedint64_tparameter at the call boundary — before the body runs — so any uint64 value is rejected withOverflowError. The same applies toget_varint_sizeused byestimate_size.Fix
A uint64 value
vand the signed int64v - 2**64(its two's-complement twin) produce identical VarInt bytes.VarIntCoderImplnow folds uint64 values to that signed twin before they cross into Cython, in bothencode_to_streamandestimate_size:VarIntCoder.slow_streampath produces the same bytes.>= 2**64are left unchanged and still overflow in the Cython path, preserving the coder's 64-bit contract.Note: decoding remains signed (
read_var_int64returnsint64_t), matching Java — so the encoding of2**64 - 1decodes back to-1. This change removes the crash and guarantees correct wire bytes; it does not make the coder round-trip uint64 back to unsigned (that would be a separate coder).Tests
Adds
test_varint_coder_uint64to the sharedcoders_test_common.pysuite (runs both compiled and uncompiled): no-overflow encoding, wire equivalence to the signed twin, size estimation, signed decode semantics, and the still-raising out-of-range case (guarded on the compiled implementation).Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.🤖 Generated with Claude Code