Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mono][infra] Disable failing apple mobile tests #92108

Merged
merged 12 commits into from
Sep 25, 2023

Conversation

kotlarmilos
Copy link
Member

@kotlarmilos kotlarmilos commented Sep 15, 2023

Description

This PR aims to disable failing tests on the CI. Tracking issues are added for disabled tests.

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Sep 15, 2023
@kotlarmilos kotlarmilos self-assigned this Sep 15, 2023
@kotlarmilos kotlarmilos added area-Infrastructure-mono os-ios Apple iOS and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Sep 15, 2023
@kotlarmilos kotlarmilos added this to the 9.0.0 milestone Sep 15, 2023
@ghost
Copy link

ghost commented Sep 15, 2023

Tagging subscribers to this area: @directhex
See info in area-owners.md if you want to be subscribed.

Issue Details

Work in progress.

This PR aims to disable failing tests on the CI.

Author: kotlarmilos
Assignees: kotlarmilos
Labels:

area-Infrastructure-mono, os-ios, needs-area-label

Milestone: -

@kotlarmilos
Copy link
Member Author

/azp run runtime-ioslike,runtime-ioslikesimulator

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@kotlarmilos
Copy link
Member Author

/azp run runtime-ioslike,runtime-ioslikesimulator

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@kotlarmilos
Copy link
Member Author

/azp run runtime-ioslikesimulator

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kotlarmilos
Copy link
Member Author

/azp run runtime-ioslikesimulator

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ivanpovazan
Copy link
Member

This PR will also address: #92129
Thanks @kotlarmilos!

@kotlarmilos
Copy link
Member Author

/azp run runtime-ioslikesimulator

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ivanpovazan
Copy link
Member

Should we also consider this to be covered here: #90460 as we currently are not able to reproduce the failure locally

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@kotlarmilos
Copy link
Member Author

Should we also consider this to be covered here: #90460 as we currently are not able to reproduce the failure locally

Yes, absolutely. The test is disabled and the tracking issue is added..

@kotlarmilos
Copy link
Member Author

/azp run runtime-ioslikesimulator

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kotlarmilos
Copy link
Member Author

/azp run runtime-ioslikesimulator

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kotlarmilos
Copy link
Member Author

/azp run runtime-ioslikesimulator

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kotlarmilos
Copy link
Member Author

/azp run runtime-ioslikesimulator

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kotlarmilos kotlarmilos marked this pull request as ready for review September 20, 2023 14:52
@kotlarmilos
Copy link
Member Author

With more runtime tests enabled, the timeout for simulators went from 00:30:00 to 03:00:00.

@@ -37,8 +37,9 @@
'$(Scenario)' == 'gcstress0xc_jitstress1' or
'$(Scenario)' == 'gcstress0xc_jitstress2' or
'$(Scenario)' == 'gcstress0xc_jitminopts_heapverify1'">06:00:00</_workItemTimeout>
<_workItemTimeout Condition="'$(_workItemTimeout)' == '' and ('$(TargetOS)' == 'iossimulator' or '$(TargetOS)' == 'tvossimulator' or '$(TargetOS)' == 'maccatalyst' or '$(TargetOS)' == 'android')">00:30:00</_workItemTimeout>
<_workItemTimeout Condition="'$(_workItemTimeout)' == '' and ('$(TargetOS)' == 'ios' or '$(TargetOS)' == 'tvos')">00:45:00</_workItemTimeout>
<_workItemTimeout Condition="'$(_workItemTimeout)' == '' and ('$(TargetOS)' == 'iossimulator' or '$(TargetOS)' == 'tvossimulator' or '$(TargetOS)' == 'maccatalyst')">03:00:00</_workItemTimeout>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how I feel about this change. This has the potential to back things up considerably should the work item go on and on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that simulators are too slow. Since there is a coverage on devices, we may reduce the testing scope on the simulators, disabling the JIT and SIMD subsets for example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 hours seems like too much to me. Is it truly that slow?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does it spend the most amount of time: building vs execution ?

Copy link
Member

@ivanpovazan ivanpovazan Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tvossimulator-x64 Release AllSubsets_Mono_RuntimeTests:

Screenshot 2023-09-20 at 18 40 03

During the cancelled Send to Helix job - JIT_Intrinsics alone take up: ~24mins from
https://helix.dot.net/api/jobs/5ff7ae64-ef29-4e8c-88dd-3c3a7481ed6c/workitems/JIT_Intrinsics?api-version=2019-06-17

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rolfbjarne do you also experience long execution times on ios-* simulators on xamarin's CI?

Usually just a few minutes, and it's not changed recently. Then again, we don't have that many tests either.

I find it weird that the simulator is so much slower than device though for you (assuming you run the same set of tests for both), in our experience simulator has always been the faster of the two.

Copy link
Member Author

@kotlarmilos kotlarmilos Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the test execution on simulators is not identical to the devices. For example, for tracing/eventpipe/buffersize the test duration is almost the same. However, there is a log on simulators after each test that kills the simulator, which may stale the execution:

info: Application has finished with exit code: 100 (as expected)
info: Cleaning up simulator 'iPhone X (iOS 15.0) - created by XHarness'
dbug: 
dbug: Running launchctl remove com.apple.CoreSimulator.CoreSimulatorService
dbug: Process simctl exited with 137
dbug: Process launchctl exited with 0
dbug: 
dbug: Running killall -9 "iPhone Simulator" "iOS Simulator" Simulator "Simulator (Watch)" com.apple.CoreSimulator.CoreSimulatorService ibtoold
dbug: Process killall exited with 0
info: Simulators cleaned up
dbug: Saving diagnostics data to '/tmp/helix/working/BA6B0A0B/w/9E9B08FA/e/diagnostics.json'
XHarness exit code: 0

Copy link
Member Author

@kotlarmilos kotlarmilos Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the xharness it is expected:
https://github.com/dotnet/xharness/blob/a3a749a7056623c665bba226fe843152f413f044/src/Microsoft.DotNet.XHarness.Apple/Orchestration/BaseOrchestrator.cs#L476-L496

I measured the interval between two tracing/eventpipe tests. Assuming both test executions are approximately the same, it takes about 22s on a device and about 64s on a simulator, which is almost 3x slower.

At the start of the test execution, xharness tries to shutdown the simulator:

info: Looking for available ios-simulator-64 simulators..
dbug: Looking for available ios-simulator-64 simulators. Storing logs into list-ios-simulator-64-20230920_033625.log
info: Found simulator device 'iPhone X (iOS 15.0) - created by XHarness'
info: Getting app bundle information from '/tmp/helix/working/BA6B0A0B/w/9E9B08FA/e/tracing_eventpipe.app'..
dbug: 
dbug: Running /usr/libexec/PlistBuddy -c "Print CFBundleName" /tmp/helix/working/BA6B0A0B/w/9E9B08FA/e/tracing_eventpipe.app/Info.plist
dbug: Process PlistBuddy exited with 0
dbug: 
dbug: Running /usr/libexec/PlistBuddy -c "Print CFBundleIdentifier" /tmp/helix/working/BA6B0A0B/w/9E9B08FA/e/tracing_eventpipe.app/Info.plist
dbug: Process PlistBuddy exited with 0
dbug: 
dbug: Running /usr/libexec/PlistBuddy -c "Print UIRequiredDeviceCapabilities" /tmp/helix/working/BA6B0A0B/w/9E9B08FA/e/tracing_eventpipe.app/Info.plist
dbug: Process PlistBuddy exited with 1
dbug: Property UIRequiredDeviceCapabilities not present in Info.plist, assuming 32-bit is not supported
dbug: 
dbug: Running /usr/libexec/PlistBuddy -c "Print CFBundleExecutable" /tmp/helix/working/BA6B0A0B/w/9E9B08FA/e/tracing_eventpipe.app/Info.plist
dbug: Process PlistBuddy exited with 0
info: Reseting simulator 'iPhone X (iOS 15.0) - created by XHarness'
dbug: 
dbug: Running launchctl remove com.apple.CoreSimulator.CoreSimulatorService
dbug: Process launchctl exited with 0
dbug: 
dbug: Running killall -9 "iPhone Simulator" "iOS Simulator" Simulator "Simulator (Watch)" com.apple.CoreSimulator.CoreSimulatorService ibtoold
dbug: No matching processes belonging to you were found
dbug: Process killall exited with 1
dbug: 
dbug: Running /Applications/Xcode131.app/Contents/Developer/usr/bin/simctl shutdown A912189F-5CF6-4921-B4EA-DDCD1ED23F10
dbug: An error was encountered processing the command (domain=com.apple.CoreSimulator.SimError, code=405):
dbug: Unable to shutdown device in current state: Shutdown
dbug: Process simctl exited with 149
dbug: 

After that, it restarts the simulator:

dbug: Running /Applications/Xcode131.app/Contents/Developer/usr/bin/simctl shutdown A912189F-5CF6-4921-B4EA-DDCD1ED23F10
dbug: An error was encountered processing the command (domain=com.apple.CoreSimulator.SimError, code=405):
dbug: Unable to shutdown device in current state: Shutdown
dbug: Process simctl exited with 149
info: Simulator reset finished

There is a command which looks like creating a new simulator at the start of the test execution:

dbug: Xamarin.Hosting: Booting iPhone X (iOS 15.0) - created by XHarness...
dbug: Xamarin.Hosting: Booted iPhone X (iOS 15.0) - created by XHarness successfully.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simulator runs get unstable after running for a while. The shutdown / restart is meant to protect against that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the xcrun simctl shutdown and xcrun simctl boot are time-consuming. Without the --reset-simulator parameter, the tests take the same amount of time as they do for devices.

I suggest disabling it for the runtime tests and monitoring the CI.

@kotlarmilos
Copy link
Member Author

/azp run runtime-ioslikesimulator

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kotlarmilos
Copy link
Member Author

@ivanpovazan @steveisok Please take a look again.

@@ -3699,6 +3699,12 @@
<ExcludeList Include = "$(XunitTestBinBase)/baseservices/exceptions/unhandled/**">
<Issue>System.Diagnostics.Process is not supported</Issue>
</ExcludeList>
<ExcludeList Include = "$(XunitTestBinBase)/JIT/SIMD/Vector3Interop_r/**">
<Issue>https://github.com/dotnet/runtime/issues/92129</Issue>
Copy link
Member

@ivanpovazan ivanpovazan Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once this PR lands we should remove blocking CI labels.

@@ -5,6 +5,8 @@
<AllowUnsafeBlocks>true</AllowUnsafeBlocks>
<InvariantGlobalization>true</InvariantGlobalization>
<CLRTestTargetUnsupported Condition="'$(IlcMultiModule)' == 'true'">true</CLRTestTargetUnsupported>
<!-- Tracking issue: https://github.com/dotnet/runtime/issues/90460 -->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Copy link
Member

@ivanpovazan ivanpovazan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kotlarmilos kotlarmilos merged commit 65195ff into dotnet:main Sep 25, 2023
179 of 182 checks passed
@ghost ghost locked as resolved and limited conversation to collaborators Oct 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants