fix: support new HugeGraph edge id format#349
Conversation
imbajin
left a comment
There was a problem hiding this comment.
The fix is in the right area, but the compatibility logic should mirror the java-client edge-id semantics more directly and lock the legacy path with a test.
| } | ||
|
|
||
| String[] parts = SplicingIdGenerator.split(edgeId); | ||
| if (parts.length == 4) { |
There was a problem hiding this comment.
❗️ High priority: please align this parser with the java-client edge-id invariant instead of hardcoding each length/index pair.
Context
- Legacy client 1.3 parsed the old 4-part id as
parts[2]. - Current toolchain java-client parses the permanent 5/6-part formats in
Edge.name()asidParts[idParts.length - 2]after validating the part count. - So the stable semantic is: Computer's edge name is the edge sort-values segment, i.e. the penultimate part of a valid edge id.
Risk
The current implementation encodes the same rule as 4 -> parts[2], 5 -> parts[3], and 6 -> parts[4]. That works for these examples, but it re-implements java-client parsing in a more fragile form and makes future format/client upgrades easier to drift.
Suggestion
Keep the compatibility range explicit, but extract through the shared invariant:
if (parts.length >= 4 && parts.length <= 6) {
return parts[parts.length - 2];
}Please also add a 4-part regression test beside the new 5/6-part tests, since this method explicitly preserves HugeGraph 1.3 compatibility but the current coverage only locks the new formats.
|
❗️ Please also update the CI HugeGraph environment after fixing the parser. 🔗 Reference: Context
Required update Please update the workflow to use the latest GRAPH_ENV_VERSION: 1.7.0Test completeness After that, the related coverage should prove both compatibility directions:
|
|
Updated in
Local checks: mvn -pl computer-test -am -Djacoco.skip=true -Dtest=HugeConverterTest -Dsurefire.failIfNoSpecifiedTests=false test
git diff --check -- .github/workflows/computer-ci.yml computer/computer-core/src/main/java/org/apache/hugegraph/computer/core/input/HugeConverter.java computer/computer-test/src/main/java/org/apache/hugegraph/computer/core/input/HugeConverterTest.javaThe new GitHub Actions runs are currently waiting for workflow approval ( |
There was a problem hiding this comment.
Blocking: yes. Summary: Legacy edge-id compatibility still depends on client-version behavior, and latest Computer CI is cancelled. Evidence: JDK 11 HugeConverterTest passed locally; current-head computer-ci cancelled.
🔗 Please check the cancelled current-head Computer CI run: https://github.com/apache/hugegraph-computer/actions/runs/27905173009/job/82572760266
|
|
||
| String[] parts = SplicingIdGenerator.split(edgeId); | ||
| if (parts.length == LEGACY_EDGE_ID_PARTS) { | ||
| return edge.name(); |
There was a problem hiding this comment.
convertEdgeName() already splits the id, but the 4-part branch still delegates to edge.name(). That only works with the current hugegraph-client 1.3.0 dependency; the current java-client implementation only accepts 5/6-part ids and derives the name from idParts[idParts.length - 2], so this compatibility shim will break for legacy ids when the client dependency is aligned with the 1.7.0 runtime that this PR now validates against.
Please return parts[2] for LEGACY_EDGE_ID_PARTS directly, or use the shared parts[parts.length - 2] invariant for all accepted arities, and leave edge.name() only for null or unknown formats.
imbajin
left a comment
There was a problem hiding this comment.
Blocking: no. Summary: No obvious issues found in the current head. Evidence: git diff --check passed; HugeConverterTest and HugeClientCompatibilityTest passed (8 tests); latest gh pr checks reported no checks.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #349 +/- ##
============================================
- Coverage 85.03% 84.98% -0.05%
- Complexity 3296 3375 +79
============================================
Files 349 361 +12
Lines 12485 12716 +231
Branches 1130 1157 +27
============================================
+ Hits 10616 10807 +191
- Misses 1329 1340 +11
- Partials 540 569 +29 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
I checked the current Command used on both this branch and docker run --rm -v <repo>:/workspace -w /workspace maven:3.8.8-eclipse-temurin-11 \
mvn -f computer/pom.xml -pl computer-test -am -P unit-test \
-Dtest=NettyTransportClientTest,HeartbeatHandlerTest \
-Djacoco.skip=true -Dsurefire.failIfNoSpecifiedTests=false testResult:
I will keep this PR scoped to the edge-id compatibility fix unless you prefer me to include a separate flaky-test stabilization change here. |
imbajin
left a comment
There was a problem hiding this comment.
Requesting changes for the edge-id compatibility path. CI is green now, but the regression test still does not cover the server-generated 6-part directional ID format.
Reference context:
- Server writes directed 6-part edge IDs in
EdgeId.asString()viathis.direction.type().string(): https://github.com/apache/hugegraph/blob/master/hugegraph-struct/src/main/java/org/apache/hugegraph/id/EdgeId.java#L146-L157 - Server direction tokens are
O/I, not enum names: https://github.com/apache/hugegraph/blob/master/hugegraph-struct/src/main/java/org/apache/hugegraph/type/HugeType.java#L53-L57 - Server parses the 6-part direction with
HugeType.fromString(idParts[1]): https://github.com/apache/hugegraph/blob/master/hugegraph-struct/src/main/java/org/apache/hugegraph/id/EdgeId.java#L276-L283 - Current java-client derives the edge name from the penultimate segment for both 5/6-part IDs: https://github.com/apache/hugegraph-toolchain/blob/master/hugegraph-client/src/main/java/org/apache/hugegraph/structure/graph/Edge.java#L142-L149
The expected invariant is:
server 6-part directed edge id
ownerVertexId > O/I > edgeLabelId > subLabelId > sortValues > otherVertexId
^ ^
| |
server direction token edge name / sort values
java-client 1.7 Edge.name(): parts[parts.length - 2]
computer current guard: accepts EDGE_OUT / EDGE_IN only
So the compatible parser should accept the server tokens O and I, or more simply follow the java-client invariant and return the penultimate segment for known 5/6-part edge IDs while keeping the 4-part legacy branch.
| return parts[LEGACY_EDGE_NAME_INDEX]; | ||
| } else if (parts.length == PARENT_EDGE_ID_PARTS) { | ||
| return parts[PARENT_EDGE_NAME_INDEX]; | ||
| } else if (parts.length == DIRECTIONAL_EDGE_ID_PARTS && |
There was a problem hiding this comment.
❗️ High priority: this still does not parse the server-generated 6-part edge ID format correctly.
This branch only accepts parts[1] when it is EDGE_OUT or EDGE_IN, but server-side EdgeId.asString() writes the direction via this.direction.type().string(). In HugeGraph those values are O / I, not the enum names:
- server writer: https://github.com/apache/hugegraph/blob/master/hugegraph-struct/src/main/java/org/apache/hugegraph/id/EdgeId.java#L146-L157
- server direction tokens: https://github.com/apache/hugegraph/blob/master/hugegraph-struct/src/main/java/org/apache/hugegraph/type/HugeType.java#L53-L57
- server parser: https://github.com/apache/hugegraph/blob/master/hugegraph-struct/src/main/java/org/apache/hugegraph/id/EdgeId.java#L276-L283
So a valid current server ID like S1:178201>O>5>5>参数标准!3BA0>S4:239464 will skip this branch and fall back to edge.name(). That fallback is not safe here because Computer still depends on hugegraph-client 1.3.0, whose Edge.name() only supports the legacy 4-part ID format.
The current java-client invariant is also simpler: for 5/6-part IDs, Edge.name() returns the penultimate segment: https://github.com/apache/hugegraph-toolchain/blob/master/hugegraph-client/src/main/java/org/apache/hugegraph/structure/graph/Edge.java#L142-L149
ownerVertexId > O/I > edgeLabelId > subLabelId > sortValues > otherVertexId
^
expected edge name
Please align this parser with that invariant, or at least validate against the server tokens O and I. The regression tests should cover both >O> and >I> forms and prove that valid 6-part IDs do not fall back to the old client parser.
| public BeanDeserializerBuilder updateBuilder( | ||
| DeserializationConfig config, BeanDescription beanDesc, | ||
| BeanDeserializerBuilder builder) { | ||
| if (EdgeLabel.class.equals(beanDesc.getBeanClass())) { |
There was a problem hiding this comment.
This compatibility module ignores edgelabel_type, parent_label, and links for every EdgeLabel deserialization. That works for the pinned 1.3 client because those fields are unknown there, but in current java-client 1.7 they are real fields: sourceLabel() / targetLabel() derive from links.
A more version-tolerant approach would ignore unknown fields rather than explicitly dropping known current fields, for example with an @JsonIgnoreProperties(ignoreUnknown = true) mix-in, or by conditionally adding ignorables only when the runtime EdgeLabel class does not expose those fields. The tests should prove both sides: old client does not fail on new server fields, and current client semantics do not lose links.
imbajin
left a comment
There was a problem hiding this comment.
Blocking: no. Summary: No obvious issues found in the current head. Evidence: git diff --check passed; JDK 11 HugeConverterTest,HugeClientCompatibilityTest passed (12 tests); no visible checks reported for this branch.
Purpose of the PR
HugeGraph server now formats edge ids with 5 or 6 parts after the parent/child edge label change. The computer loader still called
Edge.name()from the older client, which expects exactly 4 parts and can fail while loading edges for algorithms such as LPA.Main Changes
HugeConverter.convertEdgeName()to read sort values from 4-part, 5-part, and 6-part edge ids.LoadServiceinstead of callingedge.name()directly.Verifying these changes
Does this PR potentially affect the following parts?
Documentation Status
Doc - TODODoc - DoneDoc - No Need