Skip to content

Speed up Python GraphBinary deserialization#3493

Open
kirill-stepanishin wants to merge 2 commits into
apache:masterfrom
kirill-stepanishin:python-graphbinary-int-dispatch
Open

Speed up Python GraphBinary deserialization#3493
kirill-stepanishin wants to merge 2 commits into
apache:masterfrom
kirill-stepanishin:python-graphbinary-int-dispatch

Conversation

@kirill-stepanishin

Copy link
Copy Markdown
Contributor

The GraphBinary reader built a DataType enum member from the type byte for every object it decoded. That per-object enum construction heavily degrades deserialization performance on large result sets.

The reader now builds a {type code: deserializer} lookup table once up front and dispatches on the raw integer instead, avoiding per-object enum construction. Behavior is unchanged: an unknown type code still raises ValueError("... is not a valid DataType").

Performance

Benchmarked on two cross-region EC2 instances (server in US-EAST-2, client in US-WEST-2) to capture realistic network latency, against the Modern graph over GraphBinary V4 on Python 3.11. Each query was run with and without this change, alternating back to back across 3 sweeps, reporting the median.

Query Before After Change
g.V().repeat(both()).times(12) (~200k results) 7.97 s 5.85 s 26% faster
g.V() (6 results) 0.107 s 0.109 s no change

The improvement is significant on large result sets, where per-object deserialization cost dominates, and scales with the number of objects returned.

Assisted-by: Claude Code:claude-opus-4-8
@codecov-commenter

codecov-commenter commented Jun 29, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.11%. Comparing base (a28cd1f) to head (2cfc742).
⚠️ Report is 167 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #3493      +/-   ##
============================================
- Coverage     76.35%   76.11%   -0.25%     
- Complexity    13424    13861     +437     
============================================
  Files          1012     1030      +18     
  Lines         60341    62712    +2371     
  Branches       7075     7338     +263     
============================================
+ Hits          46076    47731    +1655     
- Misses        11548    12018     +470     
- Partials       2717     2963     +246     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread gremlin-python/src/main/python/gremlin_python/structure/io/graphbinaryV4.py Outdated
@kenhuuu

kenhuuu commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

VOTE +1

@Cole-Greer Cole-Greer left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VOTE +1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants