Unstable duration of tests running in ADO #12392

sandreenko · 2019-04-01T17:56:38Z

We hit timeouts with GCStress on tests like baseservices/threading/regressions/beta2/437017 or JIT/HardwareIntrinsics/X86/Avx2/Avx2_ro from time to time. The timeout for GCStress is set to 1 hour per test. Kusto results show that sometimes these tests execution takes 20 minutes, sometimes 40, some fail with 1 hour timeout.

Duratation for beta2_437044_437044_sh Ubuntu.1604 gcstress0x3:

807.2075189
3600.1316769
1395.8818172
3558.8521005
794.5199284
778.0971284
1800.0648784
984.1676776
1586.0055026
1431.6268684
1117.4029117

The similar picture is for other tests, for example beta2_437044 on x64 WIndows.10 minopts:

So max is 35, min is 3, that doesn't look right.

(queue for this:

database('engineeringdata').TestResults
| where Method contains 'beta2_437044' 
| join kind=inner (database('engineeringdata').Jobs) on JobId
| where QueueName contains "Windows.10" and Properties contains "minopts" and Properties contains "x64"

`)

We have seen a similar problem with Jenkins but it was because we used XUnit ParallelRun=all and it was creating more threads than CPU cores and it was fixed by changing it to XUnit ParallelRun=collections (#22746). With ADO our understanding was that we run only one assembly on one machine so there were no difference between all and collections.

The text was updated successfully, but these errors were encountered:

sandreenko · 2019-04-01T17:57:04Z

cc @echesakov @dotnet/jit-contrib

echesakov · 2019-04-01T20:46:35Z

@sandreenko Just FYI, in Helix we are also using -parallel collections
https://github.com/dotnet/coreclr/blob/d686be5513cd09426704eb576858411d1e736d0d/tests/helixpublishwitharcade.proj#L96

sandreenko · 2019-04-04T22:12:30Z

I have checked some simple tests without threads/GC and they show the same instability. I think this should be in 3.0.

echesakov · 2019-04-09T00:33:36Z

regressions\beta2\437017 frequently times out on Ubuntu.1404.Arm32
This query

TestResults
| where Method == '_regressions_beta2_437017_437017_sh' and Result == 'Fail' and Duration > 600
| join kind=inner (Jobs) on JobId
| extend Configuration=extract('"configuration":"([^"]+)"', 1, Properties)

shows 24 records. 23 of them are Checked-no_tiered_compilation and one is Checked-jitstress1

CarolEidt · 2019-04-18T23:13:20Z

_tailcall_Desktop__il_relthread_race__il_relthread_race_sh frequently times out in multiple configurations (Checked, Checked-no_tiered_compilation, Checked-gcstress0xc, and Checked-jitstress2_jitstressregs2

sandreenko · 2019-05-18T00:02:04Z

beta2_437044 is just an unstable threading test. I will look at other failures soon.

sandreenko · 2019-05-22T20:31:37Z

More data about tests\src\JIT\Methodical\tailcall\Desktop_il_relthread-race
All numbers are in seconds for the default mode (Checked without complus vars):

arm32 Linux shows min = 2.2, max = 7, factor 3;
x64 Linux shows min = 1.1, max = 28, factor 25;
arm64 Linux shows min = 1.7, max = 600(timeout), factor 350.

x64 Windows min = 2, max = 256, factor 128;
x86 Windows min = 2, max = 350, factor 175;

x64 OSX min = 2, max = 32, factor 16;

the test uses 4 threads and one mutex.

sandreenko · 2019-05-22T21:39:37Z

If you ask Kusto a query like this:

database('engineeringdata').Jobs | where Source contains "ci/dotnet/coreclr" | project JobId, Properties
| join hint.strategy = broadcast (database('engineeringdata').TestResults 
| where Result == 'Pass' and Traits contains "TestGroup=JIT.jit64" // choose any group that you like
| project JobId, Method, Duration) on JobId
| project Method, Properties, Duration 
| summarize min(Duration), max(Duration), max_min_factor = max(Duration)/min(Duration), count() by Method, Properties
| order by max_min_factor
| project Method, max_min_factor, Properties, min_Duration, max_Duration, count_
| limit 100

it will show you data like this:

Method	max_min_factor	Properties	min_Duration	max_Duration	count_
_regress_ddb_87766_ddb87766_ddb87766_sh	1714.98374634029	{"architecture":"x64","operatingSystem":"Ubuntu.1804.Amd64.Open","configuration":"Checked"}	0.0066262	11.3638253	167
_opt_cse_HugeField2_HugeField2_sh	602.574396857634	{"operatingSystem":"Ubuntu.1804.Amd64.Open","configuration":"Checked-gcstress0x3","architecture":"x64"}	0.0089614	5.3999102	3
_regress_ddb_87766_ddb87766_ddb87766_sh	499.630942461785	{"operatingSystem":"Debian.9.Amd64.Open","configuration":"Checked-jitminopts","architecture":"x64"}	0.0128419	6.4162106	6
_opt_cse_HugeField2_HugeField2_sh	483.194188782788	{"architecture":"x64","configuration":"Checked-gcstress0xc","operatingSystem":"Ubuntu.1604.Amd64.Open"}	0.0110063	5.3181802	2
_opt_cse_HugeArray_HugeArray_sh	338.851130223	{"architecture":"x64","configuration":"Checked-gcstress0xc_jitstress2","operatingSystem":"Ubuntu.1804.Amd64.Open"}	0.0144529	4.8973815	2
_opt_cse_hugeexpr1_hugeexpr1_cmd	283.013152020376	{"operatingSystem":"Windows.10.Amd64.Open","architecture":"x86","configuration":"Checked-jitminopts"}	0.0546532	15.4675744	12
_opt_cse_HugeField2_HugeField2_sh	268.162398105807	{"operatingSystem":"Debian.9.Amd64.Open","architecture":"x64","configuration":"Checked-gcstress0xc"}	0.0110654	2.9673242	4
_regress_ddb_87766_ddb87766_ddb87766_sh	241.511345881109	{"operatingSystem":"Debian.9.Amd64.Open","architecture":"x64","configuration":"Checked-jitstress2"}	0.0083202	2.0094227	13

that means that we have a problem not with a particular set of tests or platform or test mode but it is a general issue that affects the whole testing.

briansull · 2019-05-22T21:56:18Z

That query indicates that a very few set of "huge tests" have widely varying executions times.
Which can happen when an O/S has to resort to paging the virtual memory out to disk.

_regress_ddb
_opt_cse_HugeField2
_opt_cse_HugeArray
_opt_cse_hugeexpr1

xiangzhai · 2020-02-12T07:11:14Z

:mips-interest

sandreenko · 2020-06-05T23:22:21Z

I am going to reenable the disabled tests and see if the issue still reproduces. There were many infrastructure changes since it was open, so it could have been fixed already.

In the past, I was trying to make a change to allocate 2 cores per running test in helix but it did not work well (https://github.com/dotnet/coreclr/pull/24749/files). On some machines I was getting an incorrect number of cores (like 2 cores VM on 64 cores physical arm machine was giving me 32, instead of 1), on others, it still was hitting timeouts, but less frequently, because the tests that hit them did not allocate 2 cores, they were allocating dozen each. The big downside of that change was that on average a test run was taking longer (like ~20% slower).

sandreenko · 2020-06-05T23:31:46Z

As I see from the recent GCStress runs it is still an issue: JIT/HardwareIntrinsics/Arm/AdvSimd/AdvSimd_r/AdvSimd_r.sh
and similar that were not disabled are still failing with timeouts (example)

echesakov · 2020-06-05T23:36:36Z

As I see from the recent GCStress runs it is still an issue: JIT/HardwareIntrinsics/Arm/AdvSimd/AdvSimd_r/AdvSimd_r.sh

@sandreenko I am working on splitting those up #37282

sandreenko · 2020-06-05T23:57:42Z

@sandreenko I am working on splitting those up #37282

Could you please try to reenabled tests from https://github.com/dotnet/runtime/pull/37530/files and close this issue in your PR?

echesakov · 2020-06-06T00:13:45Z

Could you please try to reenabled tests from https://github.com/dotnet/runtime/pull/37530/files and close this issue in your PR?

Sure

jashook · 2020-07-20T18:18:37Z

Addressed

sandreenko self-assigned this May 13, 2019

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the 5.0 milestone Jan 31, 2020

sandreenko mentioned this issue Jun 5, 2020

Reenable tests that were disabled because of unstable timeouts. #37530

Closed

jashook closed this as completed Jul 20, 2020

ghost locked as resolved and limited conversation to collaborators Dec 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unstable duration of tests running in ADO #12392

Unstable duration of tests running in ADO #12392

sandreenko commented Apr 1, 2019

sandreenko commented Apr 1, 2019

echesakov commented Apr 1, 2019

sandreenko commented Apr 4, 2019

echesakov commented Apr 9, 2019

CarolEidt commented Apr 18, 2019

sandreenko commented May 18, 2019

sandreenko commented May 22, 2019

sandreenko commented May 22, 2019

briansull commented May 22, 2019

xiangzhai commented Feb 12, 2020

sandreenko commented Jun 5, 2020

sandreenko commented Jun 5, 2020

echesakov commented Jun 5, 2020

sandreenko commented Jun 5, 2020

echesakov commented Jun 6, 2020

jashook commented Jul 20, 2020

Unstable duration of tests running in ADO #12392

Unstable duration of tests running in ADO #12392

Comments

sandreenko commented Apr 1, 2019

sandreenko commented Apr 1, 2019

echesakov commented Apr 1, 2019

sandreenko commented Apr 4, 2019

echesakov commented Apr 9, 2019

CarolEidt commented Apr 18, 2019

sandreenko commented May 18, 2019

sandreenko commented May 22, 2019

sandreenko commented May 22, 2019

briansull commented May 22, 2019

xiangzhai commented Feb 12, 2020

sandreenko commented Jun 5, 2020

sandreenko commented Jun 5, 2020

echesakov commented Jun 5, 2020

sandreenko commented Jun 5, 2020

echesakov commented Jun 6, 2020

jashook commented Jul 20, 2020