-
Notifications
You must be signed in to change notification settings - Fork 730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MathLoadTest_autosimd crash Illegal instruction vmState=0x00000000 Compiled_method= #19408
Comments
@hzongaro fyi |
This is a 0.45 release build.
|
Dup of #19377? These are all 64-bit JVMs. |
@BradleyWood, may I ask you to look at this one as well? |
https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.functional_x86-64_windows_Nightly_testList_0/709 - win2012x64-openj9-1a
|
See also #19424 (cmdLineTester_loopReduction_0) All the failures occur on win2012x64-openj9-1a |
We definitely have a problem with generating AVX-512 on 32-bit JVMs in 64-bit machine. I think we have two issues here since there is a failure on a 64-bit jvm. @pshipton Could you get me the cpuid info of the machine that failed this test? |
The javacore states The javacore also states this, which makes no sense to me. Don't know how avx512dq can exist without avx512f. |
I think @AdamBrousseau will have to obtain the cpuid info of win2012x64-openj9-1a |
Is is a virtual machine, so perhaps it's messed up somehow. |
I've disabled https://openj9-jenkins.osuosl.org/computer/win2012x64%2Dopenj9%2D1a/ in jenkins since we don't need tests running on it and crashing. Also opened infrastructure/issues/9283 |
@pshipton Has anything similar happened on any other machine? This is the instruction causing problems. It is valid on AVX-512 supported hardware.
@AdamBrousseau Could you get me the cpuid info for win2012x64-openj9-1a |
No, I checked all the failures and they were on win2012x64-openj9-1a |
@AdamBrousseau I need the list of instruction set extensions supported by that CPU. Whatever command would be equivalent to lscpu on linux. |
Hopefully this helps
https://www.intel.com/content/www/us/en/products/sku/120485/intel-xeon-gold-6140-processor-24-75m-cache-2-30-ghz/specifications.html |
@AdamBrousseau So the cpu in question does support AVX-512, and therefore the instruction in this issue. But that output does not tell me if it is enabled or not. |
@pshipton I assume this hasn't been seen since you disabled that machine. Are you going to remove the blocker tag? |
Done. |
@pshipton I'm looking into another report of this problem. You mentioned that win2012x64-openj9-1a was a virtual machine, would that have been on vmware? |
I don't know. @AdamBrousseau might. |
I highly doubt they are vmware given the licensing cost/model they (vmware) have moved to. I have asked in slack. |
Thanks, I'll be interested to know how it was setup. We haven't had any confirmation of the cause on my case yet. |
That machine (the older 2012 on classic infra) are Citrix hypervisor. The newer ones on VPC are KVM. |
Thanks. I went searching for documentation on Citrix config but I couldn't find anything about setting cpu features for the VM. I did find an old post about missing avx512 support for some processors depending on core and frequency settings, but I was a bit confused by it as I'd expect that to result in a freq cap rather than missing feature support. |
@BradleyWood could you double check something I'm a bit confused over? In OMR::X86::TreeEvaluator::maskLoadEvaluator ( https://github.com/eclipse/omr/blob/b5ef5eda4680b6b5cf0c2f954362f9f47353ce04/compiler/x/codegen/SIMDTreeEvaluator.cpp#L72-L92 ) we test for avx512f and if available take the body of the if statement. But if it's not available we call SIMDloadEvaluator(node, cg); which doesn't include any further tests of cpu feature flags. In the two cases where we have seen crashes avx512f has not been set, and you mentioned earlier that this seemed odd. I'm wondering if the code path to SIMDloadEvaluator with avx512f not set is a rare case that may not have been well tested, and perhaps is not valid? |
That method is for loading vector masks which I doubt is used without enabling the vector API. Are you seeing opcodes such as mload or mloadi? SIMDloadEvaluator checks for instruction support via the call to |
I only have the compiled method body to go on, where we crash on |
Ah, I missed calls to SIMDloadEvaluator from the amd64/codegen directory, so there are more possible paths. |
@JamesKingdon Well it could come from other places too, MOVDQURegMem becomes vmovdqu32 when encoded with EVEX (avx-512) prefix, but you have to explicitly mark it as EVEX_128 to get the instruction above. Generally that is only done after getting the encoding prefix from calling But I'm pretty sure that processor supports that instruction, so I am bewildered as to how we get illegal instruction crash. |
@BradleyWood |
The case that triggered my interest was closed, but has just come back to life again so I need to pick this up. I was thinking that the problem was with badly configured virtualisation layers that enable avx512vl but not avx512f, but I've noticed several cases recently with that configuration that haven't been for this problem, so at the very least it's not the only factor in reproducing the issue. |
We have found that this crash is present on virtual machines with AVX-512 capable hardware. However, these features had been disabled. The question remains, why do we detect support for these features if the hyperviser has disabled them. I would expect that the hypervisor modifies the behaviour of the cpuid instruction, which is how we gather this information. |
In all likelihood Windows server 2012 does not support AVX-512. I believe we did not check the ZMM os support flag. This is likely a different issue than our internal customer is experiencing @JamesKingdon. |
https://openj9-jenkins.osuosl.org/job/Test_openjdk11_j9_special.system_x86-64_windows_Personal_testList_0/185 - win2012x64-openj9-1a
MathLoadTest_autosimd_special_5m_12
-Xjit -Xgcpolicy:balanced -Xnocompressedrefs
https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk11_j9_special.system_x86-64_windows_Personal_testList_0/185/system_test_output.tar.gz
MathLoadTest_autosimd_special_5m_20
-Xcompressedrefs -Xgcpolicy:gencon -Xjit:counts=- - - - - - 1 1 1 1000 250 250 - - - 10000 100000 10000,gcOnResolve,rtResolve,sampleInterval=2,scorchingSampleThreshold=10000,quickProfile -Xmn512k -Xcheck:gc:vmthreads:all:quiet
Changes since last special.system build
f44a1c6...c5c5206
eclipse-openj9/openj9-omr@723d2e4...33a1542
ibmruntimes/openj9-openjdk-jdk11@95a3a61...b4574cc
The text was updated successfully, but these errors were encountered: