Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failures seen in nightly PGO runs #65585

Closed
AndyAyersMS opened this issue Feb 18, 2022 · 9 comments
Closed

Failures seen in nightly PGO runs #65585

AndyAyersMS opened this issue Feb 18, 2022 · 9 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Feb 18, 2022

(1) There are consistent failures in some of the volatile tests. I believe this is a failure to suspend threads.

https://dev.azure.com/dnceng/public/_build/results?buildId=1610898

/root/helix/work/workitem/uploads/Reports/JIT.jit64/opt/cse/VolatileTest_op_mul/VolatileTest_op_mul.output.txt
Raw output:
BEGIN EXECUTION
/root/helix/work/correlation/corerun -p System.Reflection.Metadata.MetadataUpdater.IsSupported=false VolatileTest_op_mul.dll ''
this test is designed to hang if jit cse doesnt honor volatile
Gathering state for process 1419 corerun
Crashing thread 00000593 signal 00000005

(2) A sporadic failure seen six times in the past two weeks:
https://dev.azure.com/dnceng/public/_build/results?buildId=1618181

    JIT\Regression\JitBlue\Runtime_56953\Runtime_56953\Runtime_56953.cmd [FAIL]
      
      Assert failure(PID 8896 [0x000022c0], Thread: 5180 [0x143c]): Assertion failed '(constIndOffset % elemSize) == 0' in 'System.Collections.Generic.Dictionary`2[__Canon,__Canon][System.__Canon,System.__Canon]:TryInsert(System.__Canon,System.__Canon,ubyte):bool:this' during 'Do value numbering' (IL size 739)
      
          File: D:\a\_work\1\s\src\coreclr\jit\gentree.cpp Line: 17151
          Image: D:\h\w\9913081B\p\corerun.exe

(3) A one-off failure
https://dev.azure.com/dnceng/public/_build/results?buildId=1609234

    readytorun/coreroot_determinism/coreroot_determinism/coreroot_determinism.sh [FAIL]
      1 / 1 (100%, 1 failed): failed in 17700 msecs, exit code 134, expected 0: corerun /tmp/helix/working/B9CA09B3/p/crossgen2/crossgen2.dll @/private/tmp/helix/working/B9CA09B3/w/A707094C/e/readytorun/coreroot_determinism/coreroot_determinism/seed2/CPAOT-ret.out/System.Private.CoreLib.dll.rsp
      !! Assert failure(PID 6403 [0x00001903], Thread: 5047721 [0x4d05a9]): regNum >= 0 && regNum <= 30
      !!     File: /Users/runner/work/1/s/src/coreclr/vm/gcinfodecoder.cpp Line: 1641
      !!     Image: /private/tmp/helix/working/B9CA09B3/p/corerun
      !! task_for_pid(6403) FAILED 5 (os/kern) failure
@dotnet-issue-labeler dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Feb 18, 2022
@ghost
Copy link

ghost commented Feb 18, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

(1) There are consistent failures in some of the volatile tests. I believe this is a failure to suspend threads.

https://dev.azure.com/dnceng/public/_build/results?buildId=1610898

/root/helix/work/workitem/uploads/Reports/JIT.jit64/opt/cse/VolatileTest_op_mul/VolatileTest_op_mul.output.txt
Raw output:
BEGIN EXECUTION
/root/helix/work/correlation/corerun -p System.Reflection.Metadata.MetadataUpdater.IsSupported=false VolatileTest_op_mul.dll ''
this test is designed to hang if jit cse doesnt honor volatile
Gathering state for process 1419 corerun
Crashing thread 00000593 signal 00000005

(2) A sporadic failure seen six times in the past two weeks:
https://dev.azure.com/dnceng/public/_build/results?buildId=1618181

    JIT\Regression\JitBlue\Runtime_56953\Runtime_56953\Runtime_56953.cmd [FAIL]
      
      Assert failure(PID 8896 [0x000022c0], Thread: 5180 [0x143c]): Assertion failed '(constIndOffset % elemSize) == 0' in 'System.Collections.Generic.Dictionary`2[__Canon,__Canon][System.__Canon,System.__Canon]:TryInsert(System.__Canon,System.__Canon,ubyte):bool:this' during 'Do value numbering' (IL size 739)
      
          File: D:\a\_work\1\s\src\coreclr\jit\gentree.cpp Line: 17151
          Image: D:\h\w\9913081B\p\corerun.exe

(3) A one-off failure
https://dev.azure.com/dnceng/public/_build/results?buildId=1609234

    readytorun/coreroot_determinism/coreroot_determinism/coreroot_determinism.sh [FAIL]
      1 / 1 (100%, 1 failed): failed in 17700 msecs, exit code 134, expected 0: corerun /tmp/helix/working/B9CA09B3/p/crossgen2/crossgen2.dll @/private/tmp/helix/working/B9CA09B3/w/A707094C/e/readytorun/coreroot_determinism/coreroot_determinism/seed2/CPAOT-ret.out/System.Private.CoreLib.dll.rsp
      !! Assert failure(PID 6403 [0x00001903], Thread: 5047721 [0x4d05a9]): regNum >= 0 && regNum <= 30
      !!     File: /Users/runner/work/1/s/src/coreclr/vm/gcinfodecoder.cpp Line: 1641
      !!     Image: /private/tmp/helix/working/B9CA09B3/p/corerun
      !! task_for_pid(6403) FAILED 5 (os/kern) failure
Author: AndyAyersMS
Assignees: -
Labels:

area-CodeGen-coreclr, untriaged

Milestone: -

@AndyAyersMS
Copy link
Member Author

@EgorBo let me know if you are interested in looking at these.

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Feb 18, 2022

Also, from libraries runs (subset of all failures)

  Starting:    System.Xml.XPath.XDocument.Tests (parallel test collections = on, max threads = 2)
Assert failure(PID 5500 [0x0000157c], Thread: 5506 [0x1582]): Assertion failed 'fgReachable(begBlk, endBlk)' in 'MS.Internal.Xml.XPath.FilterQuery:MatchNode(System.Xml.XPath.XPathNavigator):System.Xml.XPath.XPathNavigator:this' during 'Update flow graph opt pass' (IL size 444)

    File: /__w/1/s/src/coreclr/jit/optimizer.cpp Line: 167
    Image: /datadisks/disk1/work/AB7009C6/p/dotnet
  Starting:    System.Text.RegularExpressions.Tests (parallel test collections = on, max threads = 4)

Assert failure(PID 61 [0x0000003d], Thread: 72 [0x0048]): Assertion failed '(gcInfo.gcRegGCrefSetCur & killMask) == 0' in 'Microsoft.CodeAnalysis.CSharp.Binder:BindSimpleBinaryOperator(Microsoft.CodeAnalysis.CSharp.Syntax.BinaryExpressionSyntax,Microsoft.CodeAnalysis.CSharp.BindingDiagnosticBag,Microsoft.CodeAnalysis.CSharp.BoundExpression,Microsoft.CodeAnalysis.CSharp.BoundExpression,bool):Microsoft.CodeAnalysis.CSharp.BoundExpression:this' during 'Generate code' (IL size 831)

    File: /__w/1/s/src/coreclr/jit/codegenarmarch.cpp Line: 3201
    Image: /root/helix/work/correlation/dotnet
  Starting:    System.IO.FileSystem.Watcher.Tests (parallel test collections = on, max threads = 8)
    System.IO.Tests.FileSystemEventArgsTests.FileSystemEventArgs_ctor_RelativePathFromCurrentDirectoryInGivenDrive(directory: "C:", name: "foo.txt") [FAIL]
      Assert.Equal() Failure
                ↓ (pos 0)
      Expected: D:\h\w\B8AD09AE\w\AC3C0917\e\foo.txt
      Actual:   C:\foo.txt
                ↑ (pos 0)
      Stack Trace:
        /_/src/libraries/System.IO.FileSystem.Watcher/tests/Args.FileSystemEventArgs.cs(74,0): at System.IO.Tests.FileSystemEventArgsTests.FileSystemEventArgs_ctor_RelativePathFromCurrentDirectoryInGivenDrive(String directory, String name)
    System.IO.Tests.RenamedEventArgsTests.RenamedEventArgs_ctor_OldFullPath_DirectoryIsRelativePathFromCurrentDirectoryInGivenDrive(directory: "C:", name: "foo.txt", oldName: "bar.txt") [FAIL]
      Assert.Equal() Failure
                ↓ (pos 0)
      Expected: D:\h\w\B8AD09AE\w\AC3C0917\e\bar.txt
      Actual:   C:\bar.txt
                ↑ (pos 0)
      Stack Trace:
        /_/src/libraries/System.IO.FileSystem.Watcher/tests/Args.RenamedEventArgs.cs(77,0): at System.IO.Tests.RenamedEventArgsTests.RenamedEventArgs_ctor_OldFullPath_DirectoryIsRelativePathFromCurrentDirectoryInGivenDrive(String directory, String name, String oldName)

@EgorBo
Copy link
Member

EgorBo commented Feb 18, 2022

@AndyAyersMS

regNum >= 0 && regNum <= 30

was fixed in #65253

(gcInfo.gcRegGCrefSetCur & killMask) == 0

was fixed by some recent @jakobbotsch PR I believe

@jakobbotsch
Copy link
Member

was fixed by some recent @jakobbotsch PR I believe

#65432 to be exact.

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Feb 20, 2022

(OSR) arm64 OSR is killing live-in registers in the prolog. Need to look more carefully at the set of initReg candidates.

Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object.
at BenchmarksGame.MandelBrot_7.DoBench(Int32 size, Int32 lineLength)
at BenchmarksGame.MandelBrot_7.Main(String[] args)
        00000000          ldr     x2, [fp,#320]         // load r2 from Tier0 frame
        00000000          mov     x2, #296              // trash r2 to load q17
        00000000          ldr     q17, [fp, x2]    
        00000000          mov     x2, #280
        00000000          ldr     q18, [fp, x2]
        00000000          mov     x2, #264
        00000000          ldr     q19, [fp, x2]
        00000000          ldr     q16, [fp,#248]
        00000000          ldr     w1, [fp,#244]
						;; bbWeight=1    PerfScore 27.50
G_M1905_IG02:              ;; offset=004CH
        00000000          fmul    v20.2d, v16.2d, v17.2d
        00000000          fsub    v21.2d, v20.2d, v18.2d
        00000000          sbfiz   x4, x1, #3, #32
        00000000          str     q21, [x0, x4]
        00000000          fmov    v21.2d, #1.0000
        00000000          fsub    v20.2d, v20.2d, v21.2d
        00000000          str     q20, [x2, x4]          // dies here with AV using trashed x2

Fixed by #65609.

@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Feb 22, 2022
@JulieLeeMSFT JulieLeeMSFT modified the milestones: 6.0.x, 7.0.0 Feb 22, 2022
@JulieLeeMSFT
Copy link
Member

Multiple issues are being tracked from this issue. Assigning @AndyAyersMS to drive team for all the necessary fixes.

@EgorBo
Copy link
Member

EgorBo commented Feb 26, 2022

More known issues from your list which turned out to be not PGO related :

#64764:

Assertion failed 'fgReachable(begBlk, endBlk)'

#65311:

Assertion failed '(gcInfo.gcRegGCrefSetCur & killMask) == 0'

@AndyAyersMS
Copy link
Member Author

Closing this; we will just open issues for specific failure cases.

@ghost ghost locked as resolved and limited conversation to collaborators Jul 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

4 participants