Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add XUnitLogChecker to log libraries dumps, do not enable for NativeAOT tests #94868

Merged
merged 5 commits into from
Nov 29, 2023

Conversation

carlossanlop
Copy link
Member

Second attempt of #93906 . That change had to be reverted because NativeAOT tests were broken in outerloop.

I added an extra commit that changes the NativeAOT condition per @MichalStrehovsky 's suggestion in this comment.

@ghost
Copy link

ghost commented Nov 16, 2023

Tagging subscribers to this area: @dotnet/area-infrastructure-libraries
See info in area-owners.md if you want to be subscribed.

Issue Details

Second attempt of #93906 . That change had to be reverted because NativeAOT tests were broken in outerloop.

I added an extra commit that changes the NativeAOT condition per @MichalStrehovsky 's suggestion in this comment.

Author: carlossanlop
Assignees: carlossanlop
Labels:

area-Infrastructure-libraries

Milestone: -

@carlossanlop
Copy link
Member Author

/azp run

Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@carlossanlop
Copy link
Member Author

/azp list

This comment was marked as resolved.

@carlossanlop
Copy link
Member Author

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@hoyosjs hoyosjs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm as long as OuterLoop passes

@carlossanlop
Copy link
Member Author

It failed again, but I see what's happening:

  • The Build step for NativeAOT jobs compile the product by passing the /p:TestNativeAot=true property, which ensures XUnitLogChecker won't be built (log).
  • But the Send to helix step does not pass either of the two properties that IsXUnitLogCheckerSupported checks (TestNativeAot or the freshly added RunNativeAotTestApps). So naturally, the IsXUnitLogCheckerSupported property will be true and the tests will fail because they try to run the tool but can't find it (log).

I'm thinking that one option would be to pass the /p:IsXUnitLogCheckerSupported=false property to the test command for NativeAOT tests. I need to determine if I can just pass any extra global custom properties to the postBuildSteps parameters.

@carlossanlop
Copy link
Member Author

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@carlossanlop
Copy link
Member Author

carlossanlop commented Nov 17, 2023

@MichalStrehovsky @hoyosjs The latest commit seems to have worked: The NativeAOT runs hit unrelated test failures, but they are printing the expected message, are not running XUnitLogChecker, and the final returned exit code is the one from the test run itself:

   at System.Net.Sockets.SocketAsyncEngine.EventLoop() + 0xc4
./RunTests.sh: line 174:    23 Aborted                 (core dumped) ./System.Net.HttpListener.Tests -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing -xml testResults.xml $RSP_FILE
/root/helix/work/workitem/e
----- end Fri Nov 17 04:30:09 PM UTC 2023 ----- exit code 134 ----------------------------------------------------------
exit code 134 means SIGABRT Abort. Managed or native assert, or runtime check such as heap corruption, caused call to abort(). Core dumped.
ulimit -c value: unlimited
cat /proc/sys/kernel/core_pattern: /home/helixbot/dotnetbuild/dumps/core.%u.%p
cat /proc/sys/kernel/core_uses_pid: 0
cat: /proc/sys/kernel/coredump_filter: No such file or directory
cat /proc/sys/kernel/coredump_filter:
Looking around for any Linux dumps...
dmesg: read kernel buffer failed: Operation not permitted
The '__IsXUnitLogCheckerSupported' env var is not set.
+ export _commandExitCode=134
+ python /root/helix/work/correlation/reporter/run.py https://dev.azure.com/dnceng-public/ public 10820448 eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6Im9PdmN6NU1fN3AtSGpJS2xGWHo5M3VfVjBabyJ9.eyJuYW1laWQiOiJjNzczZjJjMi01MTIwLTQyMDctYWZlMi1hZmFmMzVhOGJjMGEiLCJzY3AiOiJMb2NhdGlvblNlcnZpY2UuQ29ubmVjdCBQaXBlbGluZUNhY2hlLlJlYWRXcml0ZVJvb3RBY2Nlc3MgUmVhZEFuZFB1Ymxpc2hUZXN0OmNiYjE4MjYxLWM0OGYtNGFiYi04NjUxLThjZGNiNTQ3NDY0OSBSZWFkQW5kVXBkYXRlQnVpbGRCeVVyaTpjYmIxODI2MS1jNDhmLTRhYmItODY1MS04Y2RjYjU0NzQ2NDkvZG90bmV0L3J1bnRpbWUvMjY1OkJ1aWxkL0J1aWxkLzQ3MzkzMiIsImF1aSI6ImRmZWY2ZTMyLTI3YzItNDEzZi1hNWNhLWIyNDllMTZmMjJhOCIsInNpZCI6ImIxODMwZWQ3LWE2MWQtNGU4Ni1hZGU1LTY1ZjI1ZGM5MDZiYSIsIkJ1aWxkSWQiOiJjYmIxODI2MS1jNDhmLTRhYmItODY1MS04Y2RjYjU0NzQ2NDk7NDczOTMyIiwiam9icmVmIjoiMDQxNTFlYjgtMTY4Ny00ODY0LTlhZTctOWI5ZWQ4MDA0MGZkOjFhOTAzYmVjLTU2YzEtNWY2OS03YjU3LWYwYWI5YzFlMjM4ZSIsInBwaWQiOiJ2c3RmczovLy9CdWlsZC9CdWlsZC80NzM5MzIiLCJvcmNoaWQiOiIwNDE1MWViOC0xNjg3LTQ4NjQtOWFlNy05YjllZDgwMDQwZmQuYnVpbGQuYnVpbGRfbGludXhfYXJtNjRfcmVsZWFzZV9uYXRpdmVhb3RfbGlicy5fX2RlZmF1bHQiLCJyZXBvSWRzIjoiIiwiaXNzIjoiYXBwLnZzdG9rZW4udmlzdWFsc3R1ZGlvLmNvbSIsImF1ZCI6ImFwcC52c3Rva2VuLnZpc3VhbHN0dWRpby5jb218dnNvOjZmY2M5MmU1LTczYTctNGY4OC04ZDEzLWQ5MDQ1YjQ1ZmIyNyIsIm5iZiI6MTcwMDIzMjU4OCwiZXhwIjoxNzAwMjUxNzg4fQ.OUYvf3FjAJ6KoM4hsVyLeehlu3cqL6OIY2w8hK4k7pdLADD2aRNPDJNFGbGHa9oeup1Zbhmf8fPoKu4Rcvsq1_2JqjaPlvrogj_SfMUxS3agyLVRu8piL6ObEKV2ED5aAlk3kH48nH_UMNBFPm8XXJxxEsm1DjmAkgJOScK1NteCUJSO-0SY_9dLgleD84oi2_YCAQMBkE3jExI59Xf80qKZkmR9HIjWDVHjgXnKWxlCb8iP3fWr5bsxDzPtuyKHW50QDO89hnHUI1rKsbIQcHWNk0zF41LhYIxzi2kPqglxecBTzjNUBEy06vhN1GjcXIZKZOuOQc8cEUQ_Wx8Iww
2023-11-17T16:30:09.997Z	INFO   	run.py	run(48)	main	Beginning reading of test results.
2023-11-17T16:30:09.998Z	INFO   	run.py	__init__(42)	read_results	Searching '/root/helix/work/workitem/e' for test results files
2023-11-17T16:30:09.998Z	INFO   	run.py	__init__(42)	read_results	Searching '/root/helix/work/workitem/uploads' for test results files
2023-11-17T16:30:09.998Z	WARNING	run.py	__init__(55)	read_results	No results file found in any of the following formats: xunit, junit, trx
2023-11-17T16:30:09.998Z	INFO   	run.py	packing_test_reporter(30)	report_results	Packing 0 test reports to '/root/helix/work/workitem/e/__test_report.json'
2023-11-17T16:30:09.998Z	INFO   	run.py	packing_test_reporter(33)	report_results	Packed 1812 bytes
+ python /root/helix/work/correlation/gen-debug-dump-docs.py -buildid 473932 -workitem System.Net.HttpListener.Tests -jobid d4beded8-3f54-46c2-b4f3-f693c4dfa1b9 -outdir /root/helix/work/workitem/uploads -templatedir /root/helix/work/correlation -dumpdir /home/helixbot/dotnetbuild/dumps -productver 9.0.0
gen-debug-dump-docs.py: read file: /root/helix/work/correlation/debug-dump-template.md
gen-debug-dump-docs.py: writing output file: /root/helix/work/workitem/uploads/how-to-debug-dump.md
gen-debug-dump-docs.py: done writing debug dump information
+ exit 134
+ export _commandExitCode=134
+ chmod -R 777 /home/helixbot/dotnetbuild/dumps
+ exit 134

[END EXECUTION]
Exit Code:134
----- end Fri Nov 17 16:32:15 UTC 2023 ----- exit code 139 ----------------------------------------------------------
./RunTests.sh: line 174:    23 Segmentation fault      (core dumped) ./System.Linq.Expressions.Tests -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing -xml testResults.xml $RSP_FILE
/root/helix/work/workitem/e
----- end Fri Nov 17 16:32:15 UTC 2023 ----- exit code 139 ----------------------------------------------------------
exit code 139 means SIGSEGV Illegal memory access. Deref invalid pointer, overrunning buffer, stack overflow etc. Core dumped.
ulimit -c value: unlimited
cat /proc/sys/kernel/core_pattern: /home/helixbot/dotnetbuild/dumps/core.%u.%p
cat /proc/sys/kernel/core_uses_pid: 1
cat: /proc/sys/kernel/coredump_filter: No such file or directory
cat /proc/sys/kernel/coredump_filter:
Looking around for any Linux dumps...
Looking for files matching core.* ...
dmesg: klogctl: Operation not permitted
The '__IsXUnitLogCheckerSupported' env var is not set.
+ export '_commandExitCode=139'
+ python /root/helix/work/correlation/reporter/run.py https://dev.azure.com/dnceng-public/ public 10820498 eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6Im9PdmN6NU1fN3AtSGpJS2xGWHo5M3VfVjBabyJ9.eyJuYW1laWQiOiJjNzczZjJjMi01MTIwLTQyMDctYWZlMi1hZmFmMzVhOGJjMGEiLCJzY3AiOiJMb2NhdGlvblNlcnZpY2UuQ29ubmVjdCBQaXBlbGluZUNhY2hlLlJlYWRXcml0ZVJvb3RBY2Nlc3MgUmVhZEFuZFB1Ymxpc2hUZXN0OmNiYjE4MjYxLWM0OGYtNGFiYi04NjUxLThjZGNiNTQ3NDY0OSBSZWFkQW5kVXBkYXRlQnVpbGRCeVVyaTpjYmIxODI2MS1jNDhmLTRhYmItODY1MS04Y2RjYjU0NzQ2NDkvZG90bmV0L3J1bnRpbWUvMjY1OkJ1aWxkL0J1aWxkLzQ3MzkzMiIsImF1aSI6ImQzNWUyZjJkLTkxMzktNDQ0Yi1hMjI0LTQ2YzJmMjBiMjZjMCIsInNpZCI6IjIwMzg5NWZjLTM0NTMtNGIzZC1iOTE0LWY2ZDRiNzQzYTcyNSIsIkJ1aWxkSWQiOiJjYmIxODI2MS1jNDhmLTRhYmItODY1MS04Y2RjYjU0NzQ2NDk7NDczOTMyIiwiam9icmVmIjoiMDQxNTFlYjgtMTY4Ny00ODY0LTlhZTctOWI5ZWQ4MDA0MGZkOjIwZTZmNjczLTBiMGMtNWM4NS0zNzdjLTVkOTQxZDRjOGVlMyIsInBwaWQiOiJ2c3RmczovLy9CdWlsZC9CdWlsZC80NzM5MzIiLCJvcmNoaWQiOiIwNDE1MWViOC0xNjg3LTQ4NjQtOWFlNy05YjllZDgwMDQwZmQuYnVpbGQuYnVpbGRfbGludXhfbXVzbF94NjRfcmVsZWFzZV9uYXRpdmVhb3RfbGlicy5fX2RlZmF1bHQiLCJyZXBvSWRzIjoiIiwiaXNzIjoiYXBwLnZzdG9rZW4udmlzdWFsc3R1ZGlvLmNvbSIsImF1ZCI6ImFwcC52c3Rva2VuLnZpc3VhbHN0dWRpby5jb218dnNvOjZmY2M5MmU1LTczYTctNGY4OC04ZDEzLWQ5MDQ1YjQ1ZmIyNyIsIm5iZiI6MTcwMDIzMjYyMiwiZXhwIjoxNzAwMjUxODIyfQ.M1LiFYfZyXiO5O8or7rXoOXcdeIfYe6lYTDg9sE-eFqlBwAg1H0qBkAcQ2_XTNv_G3k-hMYGfFp6cjaJVDUWOsr7nxloZNaU7-0jW-1pPS4cfack8gUoEWXzt--f6Xcuhdgoq_4oaZ3hDQvJnISFb4cr5DpGByjktMw9u3Q96LDpl3uecjPF6wyP_ciG2Vc9hv4TsFLQtyZGSkx-ntvEyQyN_qeXVwwUr3PHrwHjPQSXTmlCXJspRNGQLoeQ7lKQ_sYo-2FB_5nXhO_6GcC3SxTsYoIYZRYOJiClmdItag6MZMpJD56rT7TjkXl1de8VvuftInbiCDVS1OHs5NfuMA
2023-11-17T16:32:16.370Z	INFO   	run.py	run(48)	main	Beginning reading of test results.
2023-11-17T16:32:16.371Z	INFO   	run.py	__init__(42)	read_results	Searching '/root/helix/work/workitem/e' for test results files
2023-11-17T16:32:16.371Z	INFO   	run.py	__init__(42)	read_results	Searching '/root/helix/work/workitem/uploads' for test results files
2023-11-17T16:32:16.371Z	WARNING	run.py	__init__(55)	read_results	No results file found in any of the following formats: xunit, junit, trx
2023-11-17T16:32:16.372Z	INFO   	run.py	packing_test_reporter(30)	report_results	Packing 0 test reports to '/root/helix/work/workitem/e/__test_report.json'
2023-11-17T16:32:16.372Z	INFO   	run.py	packing_test_reporter(33)	report_results	Packed 1816 bytes
+ python /root/helix/work/correlation/gen-debug-dump-docs.py -buildid 473932 -workitem System.Linq.Expressions.Tests -jobid 08d04944-d50b-4b38-b20f-46778845c041 -outdir /root/helix/work/workitem/uploads -templatedir /root/helix/work/correlation -dumpdir /home/helixbot/dotnetbuild/dumps -productver 9.0.0
gen-debug-dump-docs.py: read file: /root/helix/work/correlation/debug-dump-template.md
gen-debug-dump-docs.py: writing output file: /root/helix/work/workitem/uploads/how-to-debug-dump.md
gen-debug-dump-docs.py: done writing debug dump information
+ exit 139
+ export '_commandExitCode=139'
+ chmod -R 777 /home/helixbot/dotnetbuild/dumps
+ exit 139

[END EXECUTION]
Exit Code:139

Copy link
Member

@agocke agocke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems likely to break user workflows, to me. Now test scripts have to be passed an additional flag, where they didn't before.

I don't think it's a good idea to introduce a new system to pass extra information about the product to the test. We already do this via customizing the RunScript today. https://github.com/dotnet/runtime/compare/main...agocke:runtime:xunitlogchecker?expand=1 shows an example of how we could encode the environment variable into the script. I think that's simpler and more reliable than the additions in this PR.

@ghost ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label Nov 18, 2023
@MichalStrehovsky
Copy link
Member

Is the issue only that we don't have a workable dotnet.exe?

Instead of piping through a flag, could we detect this when we try to run it and emit message to the log that we tried, but we couldn't run XUnitLogChecker?

- Avoid using yml extraHelixArguments.
- Add the new embedded SetCommands optional section in runner scripts.
- Set __IsXUnitLogCheckerSupported in tests.targets as SetScriptCommand items instead of directly in sendtohelixhelp.proj.
@ghost ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Nov 28, 2023
@carlossanlop
Copy link
Member Author

I added a new commit based on the feedback by @agocke. I rebased and force pushed because there were conflicts.
@agocke can you please let me know if you agree with my latest commit?

Copy link
Member

@agocke agocke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks directionally right to me, I think there may just be some parameter confusion around the xunitwrapper binaries.

And no YAML changes! Great!

…hether it was built or not has already been decided before, and the runner scripts know when to execute it.
@carlossanlop
Copy link
Member Author

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@carlossanlop
Copy link
Member Author

carlossanlop commented Nov 29, 2023

I manually triggered runtime-nativeaot-outerloop against my latest commit and it surfaced a build failure that consistently showed up in all the jobs. I don't think my change caused it. Unfortunately, the failure is preventing me from validating my changes, since a failure would show up after building. @dotnet/ilc-contrib any ideas on what the root cause could be?

Oh and I opened an issue to see if it could catch other hits besides the ones in my PR: #95367

@MichalStrehovsky
Copy link
Member

I manually triggered runtime-nativeaot-outerloop against my latest commit and it surfaced a build failure that consistently showed up in all the jobs. I don't think my change caused it. Unfortunately, the failure is preventing me from validating my changes, since a failure would show up after building. @dotnet/ilc-contrib any ideas on what the root cause could be?

Oh and I opened an issue to see if it could catch other hits besides the ones in my PR: #95367

Thanks for reporting it! It's a JIT bug - I think I have a fix in #95383, but it's in the JIT codebase and I'm not a JIT dev so... we'll see.

@MichalStrehovsky
Copy link
Member

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@carlossanlop
Copy link
Member Author

Thanks @MichalStrehovsky for the quick fix!

The newly triggered run did not show the build error and was able to execute tests. The failures are unrelated to this PR and prove that my change worked and is not blocking nativeaot outerloop anymore:

@carlossanlop carlossanlop merged commit 35b866c into dotnet:main Nov 29, 2023
199 of 205 checks passed
@carlossanlop carlossanlop deleted the LibrariesXUnitLogChecker2 branch November 29, 2023 17:59
@github-actions github-actions bot locked and limited conversation to collaborators Dec 30, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants