Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: small OSR locals must be normalize on load #84000

Merged
merged 3 commits into from
Mar 28, 2023

Conversation

AndyAyersMS
Copy link
Member

@AndyAyersMS AndyAyersMS commented Mar 27, 2023

When I changed the importation strategy for OSR in #83910 it exposed a latent issue -- small OSR locals must normalized on load if they were exposed at Tier0.

Fixes #83959.

When I changed the importation strategy for OSR in dotnet#83910 it
exposed a latent issue -- small OSR locals must normalized on load.

Fixes dotnet#83959.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 27, 2023
@ghost ghost assigned AndyAyersMS Mar 27, 2023
@ghost
Copy link

ghost commented Mar 27, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

When I changed the importation strategy for OSR in #83910 it exposed a latent issue -- small OSR locals must normalized on load.

Fixes #83959.

Author: AndyAyersMS
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@AndyAyersMS
Copy link
Member Author

@jakobbotsch PTAL
cc @dotnet/jit-contrib

@AndyAyersMS
Copy link
Member Author

Expecting a modest number of diffs for OSR methods.

@AndyAyersMS
Copy link
Member Author

/azp run runtime-coreclr libraries-pgo

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@AndyAyersMS
Copy link
Member Author

May also fix #83960 -- checking that now.

@AndyAyersMS
Copy link
Member Author

/azp run runtime-coreclr pgostress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Mar 28, 2023

Running into various issues:

Assert failure(PID 24585 [0x00006009], Thread: 24602 [0x601a]): Assertion failed '!CodeGen::instIsFP(ins) && (EA_SIZE(attr) <= EA_32BYTE) && (reg != REG_NA)' in 'System.Runtime.Intrinsics.Tests.Vectors.Vector512Tests:Vector512SByteShuffleOneInputWithLocalIndicesTest():this' during 'Generate code' (IL size 296; hash 0xe5239cf9; Tier1-OSR)

;; osx
./RunTests.sh: line 168: 50069 Illegal instruction: 4  "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.Runtime.Tests.runtimeconfig.json --depsfile System.Runtime.Tests.deps.json xunit.console.dll System.Runtime.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=AdditionalTimezoneChecks -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing $RSP_FILE
/private/tmp/helix/working/B38E099C/w/ACDA0999/e
----- end Mon Mar 27 20:14:10 EDT 2023 ----- exit code 132 ----------------------------------------------------------
exit code 132 means SIGILL Illegal Instruction. Core dumped. Likely codegen issue.

Assert failure(PID 12116 [0x00002f54], Thread: 4788 [0x12b4]): Assertion failed 'op1->gtEffectiveVal() == base' in 'System.RuntimeType:GetMethodBase(System.RuntimeType,int):System.Reflection.MethodBase' during 'Assertion prop' (IL size 480; hash 0xbca2468f; Tier1)

File: D:\a\_work\1\s\src\coreclr\jit\gentree.cpp Line: 4474

@AndyAyersMS
Copy link
Member Author

May also fix #83960 -- checking that now.

Seems like it does, but locally I am hitting another odd error. Running on a Coffee Lake i7 8700 so no AVX-512, I end up here when running under WSL2 when built against ed3721b.

cc @tannergooding

## JIT/opt/Compares/compares/compares.sh 

* thread #1, name = 'corerun', stop reason = signal SIGILL: illegal instruction operand
    frame #0: 0x00007ffff776560d libcoreclr.so`RtlRestoreContext at context2.S:140
   137      // See https://github.com/apple/darwin-xnu/blob/main/osfmk/i386/fpu.c#L174
   138
   139      // Restore the ZMM_Hi256 state
-> 140      vinsertf64x4 zmm0, zmm0, ymmword ptr [rdi + (CONTEXT_Zmm0H + 0 * 32)], 1
   141      vinsertf64x4 zmm1, zmm1, ymmword ptr [rdi + (CONTEXT_Zmm0H + 1 * 32)], 1
   142      vinsertf64x4 zmm2, zmm2, ymmword ptr [rdi + (CONTEXT_Zmm0H + 2 * 32)], 1
   143      vinsertf64x4 zmm3, zmm3, ymmword ptr [rdi + (CONTEXT_Zmm0H + 3 * 32)], 1

@tannergooding
Copy link
Member

This would be caused by #83784

We're supposed to skip this bit if CONTEXT.XStateFeaturesMask hasn't been marked with XSTATE_MASK_AVX512: https://github.com/dotnet/runtime/pull/83784/files#diff-7fb8fda5dbdf85daa35acb2847f3613b2ca50e1dafafae5475e5443589258eacR128-R129

That correspondingly should only be set if FPREG_HasAvx512Registers returns true: https://github.com/dotnet/runtime/pull/83784/files#diff-eb14b2a3e90e2d0548c465157fcf8480b33fa3b83c1594c068486fe894d32bfeR964-R979

Which itself should only be getting set if the Linux kernel itself reports XFEATURE_MASK_AVX512 in the native thread context: https://github.com/dotnet/runtime/pull/83784/files#diff-6bd99da7ca73739fa65674c40dcb32ecc9af9bcaaff94fdc3d0f4e13704c2331R511-R525

and correspondingly the CPUID query also reports all 5 ISAs as supported: https://github.com/dotnet/runtime/pull/83784/files#diff-eb14b2a3e90e2d0548c465157fcf8480b33fa3b83c1594c068486fe894d32bfeR336-R383

Will need to dig into this a bit to try and repro things...

@tannergooding
Copy link
Member

Found the issue: #84012

Comment on lines 1102 to 1109
(lvIsParam || m_addrExposed || lvIsStructField || lvIsOSRLocal);
}

bool lvNormalizeOnStore() const
{
return varTypeIsSmall(TypeGet()) &&
// lvIsStructField is treated the same as the aliased local, see fgDoNormalizeOnStore.
!(lvIsParam || m_addrExposed || lvIsStructField);
!(lvIsParam || m_addrExposed || lvIsStructField || lvIsOSRLocal);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to guess that the reason is that the local could have been marked normalize-on-load in tier-0 due to the more conservative address exposure there. Maybe add a comment to that effect?

Also, I guess if we wanted to we could scope this down to just OSR locals that were address exposed in tier 0, since we have that information available.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about scoping it down, so let me do that (will need a new bit on lclvar dsc).

@AndyAyersMS
Copy link
Member Author

/azp run runtime-coreclr libraries-pgo, runtime-coreclr pgostress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@AndyAyersMS
Copy link
Member Author

So far just this one failure has recurred:

  Discovering: System.Runtime.Intrinsics.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Runtime.Intrinsics.Tests (found 1023 test cases)
  Starting:    System.Runtime.Intrinsics.Tests (parallel test collections = on, max threads = 2)

Assert failure(PID 23193 [0x00005a99], Thread: 23208 [0x5aa8]): Assertion failed '!CodeGen::instIsFP(ins) && (EA_SIZE(attr) <= EA_32BYTE) && (reg != REG_NA)' in 'System.Runtime.Intrinsics.Tests.Vectors.Vector512Tests:Vector512SByteShuffleOneInputWithLocalIndicesTest():this' during 'Generate code' (IL size 296; hash 0xe5239cf9; Tier1-OSR)

    File: /__w/1/s/src/coreclr/jit/emitxarch.cpp Line: 7810
    Image: /datadisks/disk1/work/B9AF0A02/p/dotnet

Does not repro locally, suspect it requires AVX-512.

@tannergooding how can I repro this w/o having AVX-512 capable hardware?

@AndyAyersMS
Copy link
Member Author

Also seeing an arm64 failure. Not sure it is related but will take a look.

set DOTNET_TieredCompilation=1
set DOTNET_ReadyToRun=0
set DOTNET_TC_QuickJitForLoops=1
set DOTNET_TieredPGO=1
set DOTNET_JitRandomGuardedDevirtualization=1
set DOTNET_JitRandomEdgeCounts=1
set DOTNET_JitRandomlyCollect64BitCounts=1
17:06:48.864 Running test: baseservices/threading/regressions/2164/foreground-shutdown/foreground-shutdown.cmd

Return code:      1
Raw output file:      C:\h\w\BAA209F1\w\ACE309A2\uploads\regressions\2164\foreground-shutdown\output.txt
Raw output:
BEGIN EXECUTION
 "C:\h\w\BAA209F1\p\corerun.exe" -p "System.Reflection.Metadata.MetadataUpdater.IsSupported=false"  foreground-shutdown.dll 
Xunit.Sdk.EqualException: Assert.Equal() Failure
Expected: 100
Actual:   101
   at Xunit.Assert.Equal[T](T expected, T actual, IEqualityComparer`1 comparer) in /_/src/xunit.assert/Asserts/EqualityAsserts.cs:line 96
   at Xunit.Assert.Equal[T](T expected, T actual) in /_/src/xunit.assert/Asserts/EqualityAsserts.cs:line 63
   at __GeneratedMainWrapper.Main()
Expected: 100
Actual: 101
END EXECUTION - FAILED

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Mar 28, 2023

@tannergooding how can I repro this w/o having AVX-512 capable hardware?

Bruce pointed out a mismatched VM altjit would work -- I tried this via windows crossjit on linux -- and while I do indeed see zmm's in the disasm, I can't yet repro the assertion failure.

; Assembly listing for method System.Runtime.Intrinsics.Tests.Vectors.Vector512Tests:Vector512SByteShuffleOneInputWithLocalIndicesTest():this
; Emitting BLENDED_CODE for X64 CPU with AVX512 - Windows
; Tier-1 compilation
; OSR variant for entry point 0x11f
...
       4889842420010000     mov      qword ptr [rsp+120H], rax
       62F17C4810B42470030000 vmovups  zmm6, zmmword ptr[rsp+370H]
       8BB4246C030000       mov      esi, dword ptr [rsp+36CH]
...

@AndyAyersMS
Copy link
Member Author

Also seeing an arm64 failure. Not sure it is related but will take a look.

There aren't any OSR methods in this test, so the failure seems unrelated.

@AndyAyersMS
Copy link
Member Author

Going to merge since we have potential OSR silent bad code; will try and repro the AVX512 issue some other way.

@AndyAyersMS AndyAyersMS merged commit fd157a5 into dotnet:main Mar 28, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Apr 28, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[jitosr_stress_random] Failures in System.Text related tests
3 participants