8382713: [VectorAPI] Perform late inlining of failed vector intrinsics by jatin-bhateja · Pull Request #30876 · openjdk/jdk

jatin-bhateja · 2026-04-22T09:49:38Z

Currently, we attempt lazy intrinsification of vector intrinsics during incremental inlining stage, in case intrinsification fail due to non-constant context expected by the inline expander, a static call is generated, this incurs a call overhead penalty.

As per following comments from @iwanowww on JDK-8303762 pull request
#24104 (comment)

We should attempt procedure inlining of failed vector intrinsics to avoid penalties associated with call overhead, for vector operations whose fall back implementation uses other vector APIs it will also save boxing penalty.

Patch address this concern by adding a new hybrid call generator (LateInlineVectorCallGenerator ) which encapsulates both intrinsic and parser call generator. During incremental inlining, the intrinsic gets multiple chances to succeed. If all attempts fail, the fallback implementation is inlined instead, absorbing call over head penalties.

Please review and share your feedback.

Best Regards,
Jatin

I confirm that I make this contribution in accordance with the OpenJDK Interim AI Policy.

Progress

Change must not contain extraneous whitespace
Commit message must refer to an issue
Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

JDK-8382713: [VectorAPI] Perform late inlining of failed vector intrinsics (Enhancement - P4)

Reviewers

Vladimir Ivanov (@iwanowww - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/30876/head:pull/30876
$ git checkout pull/30876

Update a local copy of the PR:
$ git checkout pull/30876
$ git pull https://git.openjdk.org/jdk.git pull/30876/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 30876

View PR using the GUI difftool:
$ git pr show -t 30876

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/30876.diff

Using Webrev

Link to Webrev Comment

jatin-bhateja · 2026-04-22T09:49:53Z

/label add hotspot-compiler-dev

bridgekeeper · 2026-04-22T09:51:31Z

👋 Welcome back jbhateja! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2026-04-22T09:52:04Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

openjdk · 2026-04-22T09:53:14Z

@jatin-bhateja
The hotspot-compiler label was successfully added.

openjdk · 2026-04-22T09:53:21Z

The total number of required reviews for this PR has been set to 2 based on the presence of this label: hotspot-compiler. This can be overridden with the /reviewers command.

openjdk · 2026-04-22T09:53:58Z

@jatin-bhateja To determine the appropriate audience for reviewing this pull request, one or more labels corresponding to different subsystems will normally be applied automatically. However, no automatic labelling rule matches the changes in this pull request. In order to have an "RFR" email sent to the correct mailing list, you will need to add one or more applicable labels manually using the /label pull request command.

Applicable Labels

build
client
compiler
core-libs
hotspot
hotspot-compiler
hotspot-gc
hotspot-jfr
hotspot-runtime
i18n
ide-support
javadoc
jdk
net
nio
security
serviceability
shenandoah

mlbridge · 2026-04-22T09:57:48Z

Webrevs

iwanowww

Thanks, Jatin!

iwanowww · 2026-04-22T19:12:26Z

+  }
+};
+
+bool LateInlineVectorCallGenerator::inline_fallback() const {


What's the purpose of this method? All vector intrinsics do have fallback implementation. If there are any cases added later, then they don't have to rely on LateInlineVectorCallGenerator.

jatin-bhateja · 2026-04-27T05:36:27Z

Hi @iwanowww , your comments have been addressed.

jatin-bhateja · 2026-04-29T04:23:31Z

I modified BackSholes benchmark to use FloatVector.SPECIES_512, and then explicitly passed
-XX:UseAVX=2 to force intrinsic failure. Following are the performance numbers with and without
InlineVectorFallback, we see some improvements despite of error margins.

CommandLine: java -jar target/benchmarks.jar -f 1 -i 5 -wi 1 -w 30 -jvmArgs "-XX:UseAVX=2 --add-modules=jdk.incubator.vector -XX:+UnlockDiagnosticVMOptions -XX:+InlineVectorFallback" BlackScholes.vector_black_scholes


With -XX:-InlineVectorFallback
Benchmark                          (size)   Mode  Cnt     Score      Error  Units
BlackScholes.vector_black_scholes    1024  thrpt    5  7460.391 ± 1412.273  ops/s

With -XX:+InlineVectorFallback
Benchmark                          (size)   Mode  Cnt     Score      Error  Units
BlackScholes.vector_black_scholes    1024  thrpt    5  7851.062 ± 1765.271  ops/s

jatin-bhateja · 2026-05-01T06:58:58Z

Hi @iwanowww , your comments have been addressed.

iwanowww

Overall, looks good. Minor suggestions follow.

iwanowww · 2026-05-04T20:58:24Z

  product(bool, EnableVectorAggressiveReboxing, false, EXPERIMENTAL,        \
          "Enables aggressive reboxing of vectors")                         \
                                                                            \
+  product(bool, InlineVectorFallback, true, DIAGNOSTIC,                     \


Let's call it IncrementalInlineVector and put it next to IncrementalInline et al.

jatin-bhateja · 2026-05-06T07:44:06Z

Hi @iwanowww , your comments have been addressed, please share the results of your test run.

iwanowww · 2026-05-06T20:47:11Z

Unfortunately, I see multiple failures in Vector API-related tests. They failed mostly on linux-aarch64, but there were few linux-x64 failures [1] as well. I'll take a closer look, but it seems the problem on linux-aarch64 is that fallback implementations are unconditionally inlined and it causes problems (multiple tests on 512-bit vectors fail due memory exhaustion [2]).

[1] In particular:

compiler/vectorapi/TestVectorTest.java (w/ -XX:UseAVX=0)

Failed IR Rules (3) of Methods (2)
----------------------------------
1) Method "compiler.vectorapi.TestVectorTest::branch" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={}, failOn={"_#CMP_I#_", "_#CMOVE_I#_"}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - failOn: Graph contains forbidden nodes:
         * Constraint 1: "(\\d+(\\s){2}(CmpI.*)+(\\s){2}===.*)"

         * Constraint 2: "(\\d+(\\s){2}(CMoveI.*)+(\\s){2}===.*)"

2) Method "compiler.vectorapi.TestVectorTest::cmove" - [Failed IR rules: 2]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={}, failOn={"_#CMP_I#_"}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - failOn: Graph contains forbidden nodes:
         * Constraint 1: "(\\d+(\\s){2}(CmpI.*)+(\\s){2}===.*)"

   * @IR rule 2: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#VECTOR_TEST#_", "1", "_#CMOVE_I#_", "1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 2: "(\\d+(\\s){2}(CMoveI.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 7 = 1 [given]

compiler/vectorapi/VectorMaskCompareNotTest.java (w/ -ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation)

Failed IR Rules (1) of Methods (1)
----------------------------------
1) Method "compiler.vectorapi.VectorMaskCompareNotTest::testCompareULEMaskNotByte" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"asimd", "true", "avx", "true", "rvv", "true"}, counts={"_#XOR_V_MASK#_", "= 0", "_#XOR_V#_", "= 0", "_#VECTOR_MASK_CAST#_", "= 1", "_#VECTOR_MASK_CMP#_", "= 3"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(XorVMask.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 3 = 0 [given]
         
         * Constraint 2: "(\\d+(\\s){2}(XorV.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 3 = 0 [given]

[2] jdk/incubator/vector/ByteVector512LoadStoreTests.java

    --- Allocation timelime by phase ---
        Phase seq. number                             Bytes                  Nodes
                 (380 older entries lost)
          >4                  incrementalInline 157724064 (+0)        32839 (+0) 
...
           >227          incrementalInline_igvn 174587184 (+261984)   34935 (-18) 
          <4 (cont.)          incrementalInline 174587184 (+0)        34935 (+0) 
         <2 (cont.)                   optimizer 180612856 (+6025672)  34697 (-238) 
...
         <295 (cont.)                  regalloc 177447520 (+0)        83805 (+0) 
          >305                         buildIFG 221892144 (+44444624)  83398 (-407) 
...
         <295 (cont.)                  regalloc 299650328 (+0)        81331 (+0) 
          >318                    regAllocSplit 1073764680 (+774114352)  81331 (+0) 
    ---

#  Internal Error (.../src/hotspot/share/compiler/compilationMemoryStatistic.cpp:935), pid=1510298, tid=1510316
#  fatal error: c2 (1695) jdk/incubator/vector/ByteVector512$ByteShuffle512::intoMemorySegment((Ljava/lang/foreign/MemorySegment;JLjava/nio/ByteOrder;)V): Hit MemLimit - limit: 1073741824 now: 1073764680

jatin-bhateja · 2026-05-08T10:58:31Z

[1] In particular:

compiler/vectorapi/TestVectorTest.java (w/ -XX:UseAVX=0)

Failed IR Rules (3) of Methods (2)
----------------------------------
1) Method "compiler.vectorapi.TestVectorTest::branch" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={}, failOn={"_#CMP_I#_", "_#CMOVE_I#_"}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - failOn: Graph contains forbidden nodes:
         * Constraint 1: "(\\d+(\\s){2}(CmpI.*)+(\\s){2}===.*)"

         * Constraint 2: "(\\d+(\\s){2}(CMoveI.*)+(\\s){2}===.*)"

2) Method "compiler.vectorapi.TestVectorTest::cmove" - [Failed IR rules: 2]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={}, failOn={"_#CMP_I#_"}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - failOn: Graph contains forbidden nodes:
         * Constraint 1: "(\\d+(\\s){2}(CmpI.*)+(\\s){2}===.*)"

   * @IR rule 2: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#VECTOR_TEST#_", "1", "_#CMOVE_I#_", "1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 2: "(\\d+(\\s){2}(CMoveI.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 7 = 1 [given]

Hi @iwanowww , this failure is related to use of UseAVX=0, here fromBitsCoerced is not intrinsified, earlier it remained as CallStaticJavaNode but now it gets inlined, new inlined context has graph shape which infers CMoveI and CmpI and test failed since IR rule don't expect these nodes, one target agnostic fix is to guard these IR rules with -XX:-IncrementalInlineVector flag, but it will defeat the purpose of this test since IncrementalInlineVector is default on. Since test runs on multiple targets guarding by UseAVX > 0 may not be desirable.

Let me know what do you think ?

* compiler/vectorapi/VectorMaskCompareNotTest.java (w/ `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation`)
Failed IR Rules (1) of Methods (1)

Method "compiler.vectorapi.VectorMaskCompareNotTest::testCompareULEMaskNotByte" - [Failed IR rules: 1]:

@ir rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"asimd", "true", "avx", "true", "rvv", "true"}, counts={"#XOR_V_MASK#", "= 0", "#XOR_V#", "= 0", "#VECTOR_MASK_CAST#", "= 1", "#VECTOR_MASK_CMP#", "= 3"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"

Phase "PrintIdeal":

counts: Graph contains wrong number of nodes:

Constraint 1: "(\d+(\s){2}(XorVMask.)+(\s){2}===.)"

Failed comparison: [found] 3 = 0 [given]

Constraint 2: "(\d+(\s){2}(XorV.)+(\s){2}===.)"

Failed comparison: [found] 3 = 0 [given]

With the patch, the vector intrinsic fallback inlining generates more code in the compilation unit, this effects inlining of
other methods, e.g. AbstractMask::intoArray. When intoArray is NOT inlined, the mask must be boxed before passing to a non-inline method, as a result of this VectorMaskCmp encapsulated in VectorBoxNode get addition user which is VectorStoreMask created at https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vector.cpp#L270 during VectorBoxNode scalarization.

This increase the outcout of VectorMaskCmpNode and inhabits optimization which folds XorVMask (VectorMaskCmp, maskAll(true)
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectornode.cpp#L2366

Increasing InlineSmallCode to 10000 allows intoArray to be inlined, mask is not boxed, VectorMaskCmp has outcnt=1, XorVMask is folded and Test Passes

[2] jdk/incubator/vector/ByteVector512LoadStoreTests.java

    --- Allocation timelime by phase ---
        Phase seq. number                             Bytes                  Nodes
                 (380 older entries lost)
          >4                  incrementalInline 157724064 (+0)        32839 (+0) 
...
           >227          incrementalInline_igvn 174587184 (+261984)   34935 (-18) 
          <4 (cont.)          incrementalInline 174587184 (+0)        34935 (+0) 
         <2 (cont.)                   optimizer 180612856 (+6025672)  34697 (-238) 
...
         <295 (cont.)                  regalloc 177447520 (+0)        83805 (+0) 
          >305                         buildIFG 221892144 (+44444624)  83398 (-407) 
...
         <295 (cont.)                  regalloc 299650328 (+0)        81331 (+0) 
          >318                    regAllocSplit 1073764680 (+774114352)  81331 (+0) 
    ---

#  Internal Error (.../src/hotspot/share/compiler/compilationMemoryStatistic.cpp:935), pid=1510298, tid=1510316
#  fatal error: c2 (1695) jdk/incubator/vector/ByteVector512$ByteShuffle512::intoMemorySegment((Ljava/lang/foreign/MemorySegment;JLjava/nio/ByteOrder;)V): Hit MemLimit - limit: 1073741824 now: 1073764680

Over all there is a tradeoff of unconditionally inlining vector intrinsic since most of them a bulky and it may impact inlining decisions within their calling context.

Do you think its beneficial to limit the scope of inlining to only few intrinsics initially e.g.
https://github.com/jatin-bhateja/jdk/blob/46fcc9acc05bdef5fd01f4972ed9a66de5f07198/src/hotspot/share/opto/callGenerator.cpp#L463

Please let me know your views.

openjdk · 2026-05-08T11:06:40Z

@jatin-bhateja Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.

iwanowww · 2026-05-08T19:28:31Z

compiler/vectorapi/TestVectorTest.java (w/ -XX:UseAVX=0)

target agnostic fix is to guard these IR rules with -XX:-IncrementalInlineVector flag, but it will defeat the purpose of this test since IncrementalInlineVector is default on.

I don't see why it defeats the purpose of the test. It's an IR test and limiting possible IR shapes is fine.

compiler/vectorapi/VectorMaskCompareNotTest.java

With the patch, the vector intrinsic fallback inlining generates more code in the compilation unit, this effects inlining of
other methods, e.g. AbstractMask::intoArray.

Do we miss @ForceInline on AbstractMask::intoArray? Any other methods not inlined?

Do you think its beneficial to limit the scope of inlining to only few intrinsics initially.

I think regular inlining heuristics should be applied to vector fallback implementations.

iwanowww · 2026-05-14T21:27:46Z

@@ -47,16 +47,16 @@ public static void main(String[] args) {
    public int call() { return 1; }

    @Test
-    @IR(failOn = {IRNode.CMP_I, IRNode.CMOVE_I})
+    @IR(failOn = {IRNode.CMP_I, IRNode.CMOVE_I}, applyIf = {"IncrementalInlineVector", "false"})


Does it mean that the rule is disabled unless the test is explicitly run with -XX:-IncrementalInlineVector?
I doubt it will be regularly executed in such mode. So, it defeats the purpose of the test, doesn't it?

Instead, why don't you explicitly run the test with -XX:-IncrementalInlineVector flag?

Done, I am now passing -XX:-IncrementalInlineVector to test invocation.

iwanowww · 2026-05-14T21:28:53Z

@@ -1294,7 +1294,7 @@ public static void testCompareMaskNotDoubleNegative() {
    public static void main(String[] args) {
        TestFramework testFramework = new TestFramework();
        testFramework.setDefaultWarmup(5000)
-                     .addFlags("--add-modules=jdk.incubator.vector")
+                     .addFlags("--add-modules=jdk.incubator.vector", "-XX:InlineSmallCode=100000")


Should AbstractMask::intoArray() be marked w/ @ForceInline instead?

With @ForceInline over AbstractMask::intoArray test passes with "-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation" but fails with default option due to difference in inlining

Failed IR Rules (1) of Methods (1) ---------------------------------- 1) Method "compiler.vectorapi.VectorMaskCompareNotTest::testCompareNEMaskNotFloatNaN" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"asimd", "true", "avx", "true", "rvv", "true"}, counts={"_#XOR_V_MASK#_", "= 0", "_#XOR_V#_", "= 0", "_#VECTOR_MASK_CMP#_", "= 2"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 3: "(\d+(\s){2}(VectorMaskCmp.*)+(\s){2}===.*)" - Failed comparison: [found] 0 = 2 [given] - No nodes matched!

With -XX:InlineSmallCode=1000000 it passes with all the configurations.

Please, elaborate where it fails. Does func.apply(m).intoArray(mr, 0); in testCompareMaskNotFloat cause problems?

I investigated if further here is my analysis

Adding @ForceInline to AbstractMask::intoArray is desirable for vector intrinsic inlining, but it exposes a pre-existing bug in C2's switch profiling.

The bug is in Parse::do_tableswitch() in parse2.cpp: when a mature MDO has all-zero MultiBranchData counts, merge_ranges() marks every arm as never_reached, and jump_switch_ranges() collapses the entire switch to a single unstable_if trap. The parser should treat this as "no useful profile" (fall back to cnt = 1.0F), not "every arm is cold." I confirmed this analysis by passing -XX:-TieredCompilation or -XX:-UseSwitchProfiling — the test passes with either flag.

This profiling issue is orthogonal to the vector intrinsic late inlining work and should be addressed in a separate PR. For now, @ForceInline on AbstractMask::intoArray is not added and -XX:InlineSmallCode=1000000 is added to the failing test as a workaround

Thanks for the details. Hm, that doesn't sound right. There's no support for caller-sensitive profiling yet, so each method profile data is stored in a dedicated per-method MDO instance. (There are deoptimization counts which may depend on inlining, but regular branch counts should not be affected.) Anyway, let's continue investigating it separately.
Please, file a follow-up bug for it. Does -XX:-IncrementalInlineVector work as a workaround? I'm not fond of InlineSmallCode tweaks.

I have filed a follow up JBS for this https://bugs.openjdk.org/browse/JDK-8385134

iwanowww · 2026-05-18T21:32:40Z

@@ -1294,7 +1294,7 @@ public static void testCompareMaskNotDoubleNegative() {
    public static void main(String[] args) {
        TestFramework testFramework = new TestFramework();
        testFramework.setDefaultWarmup(5000)
-                     .addFlags("--add-modules=jdk.incubator.vector")
+                     .addFlags("--add-modules=jdk.incubator.vector", "-XX:InlineSmallCode=100000")


Please, elaborate where it fails. Does func.apply(m).intoArray(mr, 0); in testCompareMaskNotFloat cause problems?

iwanowww · 2026-05-19T21:12:02Z

Overall, looks good. I tweaked test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java [1] and submitted the patch for testing.

[1]

diff --git a/test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java b/test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java
index 935363f8526..4aeb5ba36b0 100644
--- a/test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java
+++ b/test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java
@@ -1295,7 +1295,7 @@ public static void main(String[] args) {
         TestFramework testFramework = new TestFramework();
         testFramework.setDefaultWarmup(5000)
                      .addFlags("--add-modules=jdk.incubator.vector",
-                               "-XX:InlineSmallCode=1000000")
+                               "-XX:-IncrementalInlineVector")
                      .start();
     }
 }

iwanowww · 2026-05-20T22:15:08Z

One more IR test failure:

Test: compiler/vectorapi/VectorCompareWithZeroTest.java
Platform: linux-aarch64
Flags: -ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation


Failed IR Rules (5) of Methods (5)
----------------------------------
1) Method "compiler.vectorapi.VectorCompareWithZeroTest::testByteVectorEqualToZero" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#VMASK_CMP_ZERO_I_NEON#_", ">= 1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "Final Code":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(vmaskcmp_zeroI_neon.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!

2) Method "compiler.vectorapi.VectorCompareWithZeroTest::testDoubleVectorLessThanZero" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#VMASK_CMP_ZERO_D_NEON#_", ">= 1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "Final Code":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(vmaskcmp_zeroD_neon.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!

3) Method "compiler.vectorapi.VectorCompareWithZeroTest::testFloatVectorLessEqualToZero" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#VMASK_CMP_ZERO_F_NEON#_", ">= 1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "Final Code":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(vmaskcmp_zeroF_neon.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!

4) Method "compiler.vectorapi.VectorCompareWithZeroTest::testLongVectorGreaterThanZero" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#VMASK_CMP_ZERO_L_NEON#_", ">= 1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "Final Code":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(vmaskcmp_zeroL_neon.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!

5) Method "compiler.vectorapi.VectorCompareWithZeroTest::testShortVectorNotEqualToZero" - [Failed IR rules: 1]:
   * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#VMASK_CMP_ZERO_I_NEON#_", ">= 1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "Final Code":
       - counts: Graph contains wrong number of nodes:
         * Constraint 1: "(\\d+(\\s){2}(vmaskcmp_zeroI_neon.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 >= 1 [given]
           - No nodes matched!

openjdk · 2026-06-04T16:34:57Z

@jatin-bhateja this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout JDK-8382713
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

…ing fallback call generator selection

jatin-bhateja · 2026-06-11T06:49:08Z

Hi @iwanowww , added the handling to prevent accumulation of spurious messages during fallback call generator selection using RAII based mechanism. Also explicitly printing message "late inline succeeded (vector intrinsic fallback)" in case intrinsification fails but fallback generator (inlining) succeedes.

Please let me if the patch looks good land now, your earlier comments have been addressed.

iwanowww

Overall, looks good.

iwanowww · 2026-06-12T15:23:48Z

    return &_nullStream;
  }
+  if (is_suspended()) {
+    locate(state, callee);


Why do you perform locate call?

locate() was retained to keep the IPInlineSite topology intact (one node per JVMState frame) while only the message append is suppressed. But that topology is already guaranteed by record ordering — the order in which record() calls happen during compilation ensures every node's parent exists before the node itself is needed — so locate() rebuilding the path during a suspended probe is unnecessary, and dropping it is safe.

jatin-bhateja · 2026-06-15T05:12:46Z

Overall, looks good.

Hi @iwanowww , Your comments have been addressed.

iwanowww

Looks good. Submitted for testing.

iwanowww · 2026-06-16T21:16:08Z

Strangely, compiler.vectorapi.VectorMaskCompareNotTest still fails. I noticed that you changed default warmup setting. Why did you do that?

Failed IR Rules (1) of Methods (1)
----------------------------------
1) Method "compiler.vectorapi.VectorMaskCompareNotTest::testCompareUGTMaskNotByteCast" - [Failed IR rules: 1]:
   * @IR rule 3: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"avx2", "true", "rvv", "true"}, counts={"_#XOR_V_MASK#_", "= 0", "_#XOR_V#_", "= 0", "_#VECTOR_MASK_CMP#_", "= 1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
     > Phase "PrintIdeal":
       - counts: Graph contains wrong number of nodes:
         * Constraint 3: "(\\d+(\\s){2}(VectorMaskCmp.*)+(\\s){2}===.*)"
           - Failed comparison: [found] 0 = 1 [given]
           - No nodes matched!

jatin-bhateja · 2026-06-17T04:26:28Z

Strangely, compiler.vectorapi.VectorMaskCompareNotTest still fails. I noticed that you changed default warmup setting. Why did you do that?

Rebased with latest mainline, looks like it mistakenly got introduced with previous merge.

Kindly re-verify.

iwanowww

Looks good.

jatin-bhateja · 2026-06-19T04:27:23Z

Hi @mhaessig , we need one more approval here to transition this to ready state, can you do the needful.

eme64

@jatin-bhateja This looks like important work, so thanks for working on it!

I have some questions about the tests below, I'm especially wondering why you had to set -XX:-IncrementalInlineVector in some of the IR tests? Because if the flag is now on by default, would it not be more important to have IR rules with the flag enabled? What are the affected IR rules?

Also: Could we have some new IR tests that demonstrate the benefit of late vector inlining, and make sure there won't be regressions on it?

eme64 · 2026-06-19T05:58:58Z

    public static void main(String[] args) {
-        TestFramework.runWithFlags("--add-modules=jdk.incubator.vector");
+        TestFramework.runWithFlags("--add-modules=jdk.incubator.vector",
+                                   "-XX:-IncrementalInlineVector");


Why did you add these flags here? Would the IR rules fail without?
Suggestion: can you have a run with and a run without the flag, and then show which IR rules are affected, guarding them with the flag?

eme64 · 2026-06-19T05:59:13Z

+                     .addFlags("--add-modules=jdk.incubator.vector",
+                               "-XX:-IncrementalInlineVector")


Same question about flag here.

eme64 · 2026-06-19T05:59:21Z

+                     .addFlags("--add-modules=jdk.incubator.vector",
+                               "-XX:-IncrementalInlineVector")


Same question about flag here.

8382713: [VectorAPI] Perform late inlining of failed vector intrinsics

c7e6fce

openjdk Bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Apr 22, 2026

openjdk Bot added the rfr Pull request is ready for review label Apr 22, 2026

jatin-bhateja mentioned this pull request Apr 22, 2026

8303762: Optimize vector slice operation with constant index using VPALIGNR instruction #24104

Closed

4 tasks

iwanowww reviewed Apr 22, 2026

View reviewed changes

Review comments resolutions

931d45e

iwanowww reviewed Apr 27, 2026

View reviewed changes

Comment thread src/hotspot/share/opto/compile.cpp Outdated

Comment thread src/hotspot/share/opto/compile.cpp Outdated

Comment thread src/hotspot/share/opto/callGenerator.cpp Outdated

Review comments resolutions

e779a2f

iwanowww reviewed May 4, 2026

View reviewed changes

Review comments resolutions

5f46f5b

iwanowww reviewed May 5, 2026

View reviewed changes

Comment thread src/hotspot/share/opto/compile.cpp Outdated

Review comments resolution

d18bd2a

Review comment resolution

8171911

jatin-bhateja force-pushed the JDK-8382713 branch from 1cceb24 to 8171911 Compare May 8, 2026 11:04

Review comments resolution

679e444

iwanowww reviewed May 14, 2026

View reviewed changes

Review comments resolution

dc3ffe8

iwanowww reviewed May 18, 2026

View reviewed changes

Review comments resolutions

77150d9

iwanowww reviewed May 19, 2026

View reviewed changes

Comment thread src/hotspot/share/opto/callGenerator.cpp Outdated

Review comments resolutions

ae4a373

openjdk Bot added the merge-conflict Pull request has merge conflict with target branch label Jun 4, 2026

jatin-bhateja added 2 commits June 11, 2026 05:31

Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8382713

60be1f5

RAII based mechanism to prevent accumulation of spurious messages dur…

21a5d97

…ing fallback call generator selection

openjdk Bot removed the merge-conflict Pull request has merge conflict with target branch label Jun 11, 2026

iwanowww reviewed Jun 12, 2026

View reviewed changes

Review comments resolutions

f891ea6

iwanowww reviewed Jun 15, 2026

View reviewed changes

jatin-bhateja added 2 commits June 17, 2026 04:10

Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8382713

6b0cbc9

Review comments resolution

c9f1e69

iwanowww approved these changes Jun 17, 2026

View reviewed changes

eme64 suggested changes Jun 19, 2026

View reviewed changes

		.addFlags("--add-modules=jdk.incubator.vector",
		"-XX:-IncrementalInlineVector")

Uh oh!

Conversation

jatin-bhateja commented Apr 22, 2026 • edited by openjdk Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Reviewing

Uh oh!

jatin-bhateja commented Apr 22, 2026

Uh oh!

bridgekeeper Bot commented Apr 22, 2026

Uh oh!

openjdk Bot commented Apr 22, 2026

Uh oh!

openjdk Bot commented Apr 22, 2026

Uh oh!

openjdk Bot commented Apr 22, 2026

Uh oh!

openjdk Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mlbridge Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

iwanowww left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jatin-bhateja commented Apr 27, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jatin-bhateja commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jatin-bhateja commented May 1, 2026

Uh oh!

iwanowww left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jatin-bhateja commented May 6, 2026

Uh oh!

iwanowww commented May 6, 2026

Uh oh!

jatin-bhateja commented May 8, 2026

Failed IR Rules (1) of Methods (1)

Uh oh!

openjdk Bot commented May 8, 2026

Uh oh!

iwanowww commented May 8, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jatin-bhateja May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jatin-bhateja commented Apr 22, 2026 •

edited by openjdk Bot

Loading

openjdk Bot commented Apr 22, 2026 •

edited

Loading

mlbridge Bot commented Apr 22, 2026 •

edited

Loading

jatin-bhateja commented Apr 29, 2026 •

edited

Loading

jatin-bhateja May 19, 2026 •

edited

Loading