Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion failed 'compCyclesAtEnd >= m_compCyclesAtEndOfInlining fails intermittently on freebsd-x64 Debug build #105472

Open
am11 opened this issue Jul 25, 2024 · 14 comments
Assignees
Labels
arch-x64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-freebsd FreeBSD OS Priority:2 Work that is important, but not critical for the release
Milestone

Comments

@am11
Copy link
Member

am11 commented Jul 25, 2024

It fails on random methods with Debug build, e.g.

Assert failure(PID 3357 [0x00000d1d], Thread: 100352 [0x18800]): Assertion failed 'compCyclesAtEnd >= m_compCyclesAtEndOfInlining' in 'System.CommandLine.Parsing.CommandResult:ValidateArguments(ubyte):this' during 'Post-Emit' (IL size 181; hash 0xcb6701c7; FullOpts)

    File: /runtime/src/coreclr/jit/compiler.cpp:9511
    Image: /root/.dotnet9/dotnet

Abort trap (core dumped)

then:

Assert failure(PID 3368 [0x00000d28], Thread: 101122 [0x18b02]): Assertion failed 'compCyclesAtEnd >= m_compCyclesAtEndOfInlining' in 'Microsoft.DotNet.Cli.EnvironmentVariableNames:.cctor()' during 'Post-Emit' (IL size 213; hash 0x1c79c842; FullOpts)

etc.

root@cbsdnode:~/foo # lldb -- ~/.dotnet9/dotnet restore --packages pkgs
...
(lldb) settings set -- target.run-args  "restore" "--packages" "pkgs"
(lldb) break set -f compiler.cpp -l 9511 -c "compCyclesAtEnd < m_compCyclesAtEndOfInlining"
Breakpoint 1: no locations (pending).
WARNING:  Unable to resolve breakpoint to any actual locations.
(lldb) r
Process 3356 launched: '/root/.dotnet9/dotnet' (x86_64)
2 locations added to breakpoint 1
warning: libclrjit.so was compiled with optimization - stepping may behave oddly; variables may not be available.
Process 3356 stopped
* thread #1, name = 'dotnet', stop reason = breakpoint 1.1
    frame #0: 0x00002269bb736a10 libclrjit.so`Compiler::compCompile(void**, unsigned int*, JitFlags*) [inlined] Compiler::RecordStateAtEndOfCompilation(this=0x00004bee10c87818) at compiler.cpp:9511:5 [opt]
(lldb) p compCyclesAtEnd
(uint64_t) $0 = 392445
(lldb) p m_compCyclesAtEndOfInlining
(uint64_t) $1 = 158170148

(side note: that warning is bogus because it's definitely a debug build)

cc @Thefrank

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 25, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Jul 25, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@am11
Copy link
Member Author

am11 commented Jul 25, 2024

Tried DOTNET_TieredCompilation=0, but it didn't make any difference.

@jakobbotsch
Copy link
Member

This uses PAL QueryThreadCycleTime, so presumably that doesn't work correctly on FreeBSD.

@jakobbotsch
Copy link
Member

It seems we could switch all of this to just use QueryPerformanceCounter instead, I'm not sure why we try to save number of cycles -- it seems like we convert it back to time whenever we output it anyway.

@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Jul 25, 2024
@JulieLeeMSFT JulieLeeMSFT added this to the 9.0.0 milestone Jul 25, 2024
@Thefrank
Copy link
Contributor

Is QueryThreadCycleTime failing something that PAL tests would catch? I do not recall any failures from PAL tests (-s clr.paltests?) but I have not run them in a year or more as things mostly just worked.

@am11
Copy link
Member Author

am11 commented Jul 25, 2024

Yup, PAL tests seem to have that API covered. If you can reproduce this issue with Debug build (basically dotnet new console && dotnet restore), could you try running it? I will also try that in my environment later.

@Thefrank
Copy link
Contributor

uname -a;freebsd-version
FreeBSD gitload 14.1-RELEASE FreeBSD 14.1-RELEASE releng/14.1-n267679-10e31f0946d8 GENERIC amd64
14.1-RELEASE-p2
./build.sh -c Debug -ci --restore --build /p:OFFICIALBUILDID=20240725.99 --subset Clr.PalTests
/root/runtime/artifacts/bin/coreclr/freebsd.x64.Debug/paltests/runpaltests.sh /root/runtime/artifacts/bin/coreclr/freebsd.x64.Debug/paltests /root/runtime/artifacts/bin/coreclr/freebsd.x64.Debug/paltestsoutputs
Passing test as dwProcessId is > 0
.paltest_getcurrentthread_test1 has been disabled on this platform
.paltest_getcurrentthread_test2 has been disabled on this platform
.........ReleaseMutex() test returned 0.
Test passed.
......................Global Counter Value: 1 
Global Counter Value: 2 
.Global Counter Value: 1 
Global Counter Value: 2 
.Wait for Single Object (hEvent) was successful.
Global Counter Value: 100000 
Wait for thread was successful
..
 Starting functional  eventing APIs tests  

 All eventing APIs were fired successfully 

Finished running PAL tests.

The following test(s) failed:
exception_handling/pal_sxs/test1/paltest_pal_sxs_test1. Exit code: 1

PAL Test Results:
  Passed: 336
  Failed: 1

Copied PAL test output files to /root/runtime/artifacts/bin/coreclr/freebsd.x64.Debug/paltestsoutputs.

I assume the Failed:1 is where this is failing?

@am11
Copy link
Member Author

am11 commented Jul 26, 2024

exception_handling/pal_sxs/test1/paltest_pal_sxs_test1. Exit code: 1

That one is different. @AustinWise is fixing it for illumos-x64 #105207. We can take a look at it for freebsd-x64 separately.

QueryThreadCycleTime intermittent failure is not reproduced with PAL tests. I tried it in my box and got the same results: https://dpaste.com/CGVJZEZSK (cat /tmp/PalTestOutput/default/pal_tests.xml | curl -s -F "content=<-" http://dpaste.com/api/v2/) which shows: <test name="threading.QueryThreadCycleTime.test1" type="threading.QueryThreadCycleTime" method="test1" time="0" result="Pass" />.

I then removed everything from paltestlist.txt except for threading.QueryThreadCycleTime.test1 and ran it 100x in the loop, all passed. So it could be in specific state of a busy thread, it fails?

@jakobbotsch's idea to avoid reliance on thread times is something we should try. There are only two usages of this PAL API in product code, one in jit and the other one in superpmi.

@AndyAyersMS
Copy link
Member

It seems we could switch all of this to just use QueryPerformanceCounter instead, I'm not sure why we try to save number of cycles -- it seems like we convert it back to time whenever we output it anyway.

I think the ambition here was to avoid charging the jit for time when the thread is suspended. Changing it to a wallclock measure should be ok.

@EgorBo EgorBo added the Priority:2 Work that is important, but not critical for the release label Jul 29, 2024
@EgorBo
Copy link
Member

EgorBo commented Aug 1, 2024

@am11 can this wait post 9.0 or better to be included in 9.0 ?

@JulieLeeMSFT JulieLeeMSFT assigned AndyAyersMS and unassigned EgorBo Aug 1, 2024
@am11
Copy link
Member Author

am11 commented Aug 1, 2024

I think this can wait until after .net 9.0; since it's Debug-only and (what I gathered from @Thefrank) it does not block publishing nativeaot apps on FreeBSD with Release build of shared framework.

@am11 am11 modified the milestones: 9.0.0, 10.0.0 Aug 1, 2024
@Thefrank
Copy link
Contributor

Thefrank commented Aug 2, 2024

src/tests/build.sh -nativeaot Release -tree:nativeaot
src/tests/run.sh --runnativeaottests Release

Time [secs] | Total | Passed | Failed | Skipped | Assembly Execution Summary
============================================================================
      0.331 |     1 |      1 |      0 |       0 | nativeaot.CustomMain.XUnitWrapper.dll
      0.074 |     1 |      1 |      0 |       0 | nativeaot.CustomMainWithStubExe.XUnitWrapper.dll
      0.064 |     1 |      1 |      0 |       0 | nativeaot.GenerateUnmanagedEntryPoints.XUnitWrapper.dll
      9.574 |    20 |     18 |      2 |       0 | nativeaot.SmokeTests.XUnitWrapper.dll
      0.068 |     1 |      1 |      0 |       0 | nativeaot.StartupHook.XUnitWrapper.dll
----------------------------------------------------------------------------
     10.111 |    24 |     22 |      2 |       0 | (total)


Creating repro files at: /root/runtime/artifacts/log/repro
Writing repro: /root/runtime/artifacts/log/repro/nativeaot/SmokeTests/HardwareIntrinsics/X64Avx2/repro_X64Avx2.sh
Writing repro: /root/runtime/artifacts/log/repro/nativeaot/SmokeTests/HardwareIntrinsics/X64Avx512/repro_X64Avx512.sh
Repro files written.

Log files at: /root/runtime/artifacts/log
Repro files at: /root/runtime/artifacts/log/repro

Same results with Debug

E5-2660 lacks AVX2 amd AVX512

@am11
Copy link
Member Author

am11 commented Aug 2, 2024

It was actually failing the assertion on dotnet restore (and other SDK commands) with debug build of runtime.

@Thefrank, please open an issue with details for the two failing tests.

@Thefrank
Copy link
Contributor

Thefrank commented Aug 3, 2024

Dug back into the test:

The required instruction sets are not supported by the current CPU.
/root/runtime/src/tests/Common/scripts/nativeaottest.sh: line 15: 58188 Abort trap              (core dumped) $_DebuggerFullPath $1/native/$exename "${@:3}"

For both the failed tests.
I don't have any hardware that supports AVX2 and AVX512 and that is running FreeBSD.
I can still open an issue if desired.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-x64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-freebsd FreeBSD OS Priority:2 Work that is important, but not critical for the release
Projects
None yet
Development

No branches or pull requests

6 participants