Long Running Test - System.Diagnostics.Tests.ProcessTests.Kill_ExitedChildProcess_DoesNotThrow #35506

ViktorHofer · 2020-04-27T10:45:43Z

https://dev.azure.com/dnceng/public/_build/results?buildId=618671&view=ms.vss-test-web.build-test-results-tab&runId=19376690&resultId=184383&paneView=attachments

console.a6c9799c.log:
https://helix.dot.net/api/2019-06-17/jobs/5607cdb0-5b11-499c-a476-0d70c6599ef4/workitems/System.Diagnostics.Process.Tests/files/console.a6c9799c.log

Configuration: netcoreapp5.0-Windows_NT-Release-x86-CoreCLR_release-Windows.7.Amd64.Open

C:\h\w\A661091E\w\A8EC08EF\e>"C:\h\w\A661091E\p\dotnet.exe" exec --runtimeconfig System.Diagnostics.Process.Tests.runtimeconfig.json --depsfile System.Diagnostics.Process.Tests.deps.json xunit.console.dll System.Diagnostics.Process.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing  
  Discovering: System.Diagnostics.Process.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Diagnostics.Process.Tests (found 233 of 253 test cases)
  Starting:    System.Diagnostics.Process.Tests (parallel test collections = on, max threads = 2)
    System.Diagnostics.Tests.ProcessStartInfoTests.ShellExecute_Nano_Fails_Start [SKIP]
      Condition(s) not met: "IsWindowsNanoServer"
Invalid number of parameters
0 File(s) copied
   System.Diagnostics.Process.Tests: [Long Running Test] 'System.Diagnostics.Tests.ProcessTests.Kill_ExitedChildProcess_DoesNotThrow', Elapsed: 00:02:10
   System.Diagnostics.Process.Tests: [Long Running Test] 'System.Diagnostics.Tests.ProcessTests.Kill_ExitedChildProcess_DoesNotThrow', Elapsed: 00:04:10
   System.Diagnostics.Process.Tests: [Long Running Test] 'System.Diagnostics.Tests.ProcessTests.Kill_ExitedChildProcess_DoesNotThrow', Elapsed: 00:06:10
   System.Diagnostics.Process.Tests: [Long Running Test] 'System.Diagnostics.Tests.ProcessTests.Kill_ExitedChildProcess_DoesNotThrow', Elapsed: 00:08:10
   System.Diagnostics.Process.Tests: [Long Running Test] 'System.Diagnostics.Tests.ProcessTests.Kill_ExitedChildProcess_DoesNotThrow', Elapsed: 00:10:10
   System.Diagnostics.Process.Tests: [Long Running Test] 'System.Diagnostics.Tests.ProcessTests.Kill_ExitedChildProcess_DoesNotThrow', Elapsed: 00:12:10
   System.Diagnostics.Process.Tests: [Long Running Test] 'System.Diagnostics.Tests.ProcessTests.Kill_ExitedChildProcess_DoesNotThrow', Elapsed: 00:14:10

The text was updated successfully, but these errors were encountered:

ghost · 2020-04-27T10:45:46Z

Tagging subscribers to this area: @eiriktsarpalis
Notify danmosemsft if you want to be subscribed.

danmoseley · 2020-04-27T17:50:24Z

It timed out, it wasn't just long running. Not obvious why: we need dump files on hangs dotnet/dnceng#1216

wfurt · 2020-04-28T17:45:45Z

I've seen it on OSX as well. Maybe in this case test can enforce reasonable timeout and Assert to cause corefump.

danmoseley · 2020-04-28T18:47:16Z

Not a bad idea, although the core dump won't help if the issue somehow is that the child process won't exit.

I'll make the change.

wfurt · 2020-04-28T19:01:40Z

I started as well as I'm planning to dump process list on failure. I'd be happy to stop and let you some fun @danmosemsft. I think for now, the focus should be to cap test duration and collect useful info on failure.

danmoseley · 2020-04-28T19:22:54Z

Oh you go ahead then since that sounds way better than the simple thing I was going to do.

For the RemoteExec case, we have a pretty good setup right now for timeouts and for gathering info on hangs, active processes etc.
https://github.com/dotnet/arcade/blob/590a102630c7efc7ca6f652f7c6c47dee4c4086c/src/Microsoft.DotNet.RemoteExecutor/src/RemoteInvokeHandle.cs#L139-L219

Do we have a way to trigger this when we're not in a RemoteExec context? It would be super handy to have a general mechanism. I can think of all kinds of ways we could extend it. (I hope we can avoid 2+ implementations of it)

danmoseley · 2020-04-28T19:26:31Z

And ultimately, we need to find a way to hook into xunit so that it triggers for hangs in arbitrary tests, without special effort in each test.

wfurt · 2020-04-28T19:30:40Z

I think the in-process may be tricky. We may be able to mark test as failure but I don't know if there is reliable and safe way how to terminate running function.

danmoseley · 2020-04-28T19:40:31Z

@wfurt one way would be to spawn the RemoteExecutor with a special flag "make a dump of my process". It would run the same code I linked above but against the PID provided. When it returned, the test could either throw XUnitException to fail itself and continue, or terminate the process.

danmoseley · 2020-04-28T19:40:47Z

cc @stephentoub in case he has another suggestion.

stephentoub · 2020-04-28T20:05:38Z

Doesn't the vstest infrastructure we're about to switch to make it straightforward to get dumps, or am I misremembering?

ViktorHofer · 2020-04-28T20:07:39Z

Yes, VSTest can be configured to kill the testhost after a specified timeout and will then collect the dump. @nohwnd has the details and can talk about cross-plat support of the dump collection.

danmoseley · 2020-04-28T20:11:08Z

It might be interesting to be able to get a dump, fail only that test, and continue.

It would also be good to have a hook to gather other data if and when we need it.

It would also be good to understand what platforms we can create dumps on.

@nohwnd could you share more about what vstest offers?

nohwnd · 2020-04-29T10:31:31Z

@danmosemsft

Update: Got my ARM and AMD abbreviations confused.

Hang dumps currently work only on Windows x86, x64 but not ~~amd~~ ARM. ~~So I don't think it will be much help here, based on the test name.~~

We are investigating how to make hang and crash dumps cross-platform in the helix prototype effort you are also part of.

It might be interesting to be able to get a dump, fail only that test, and continue.

That would be very nice, but it is complicated by the fact that test host cannot safely dump itself, because it is not always safe to do that across platforms. So the blame data collector would have to detect the hang (that is already happening), dump the process when the test timeout is reached, and associate that dump with the test somehow, but NOT kill the process as it does now.

Instead it would wait until all tests finish running, or until they are all past their timeout threshold. This would in the worst case generate 1 full dump per test if all are hanging, and full dump is requested.

Once all tests are finished or timed-out, the test host would stay hanging because it cannot terminate until it's threads are done running, and because dotnet core does not allow threads to be aborted we need to kill the process externally (or it may be able to terminate itself, I am not sure now).

The upside is that this approach should be test framework agnostic, because this happens above the test adapter level. And because there is a special parser for the Sequence file that is produced, AzDo is also able to mark those unfinished tests as aborted.

It would also be good to have a hook to gather other data if and when we need it.

What kind of data would that be? I did not find that in this thread.

It would also be good to understand what platforms we can create dumps on.

Windows x86 and x64 (but not ~~amd~~ ARM) currently because the procdump tool is used for both hang dumps and crash dumps. But with the efforts around Microsoft.Diagnostics.NETCore.Client we should soon be able to produce hang dumps easily on Windows, Linux and MacOS. At least on modern dotnets (3.1+ in windows and linux, and 5.0 on macos).

I am experimenting with improving the overall experience here, where I collect dumps via vstest console across multiple operating systems, ideally by the end of the script all the ticks would be green:

https://dev.azure.com/jajares/blame/_build/results?buildId=18&view=results

danmoseley · 2020-04-29T15:55:47Z

Thanks @nohwnd for working on this!

What kind of data would that be? I did not find that in this thread.

It would depend on the test (or the test failure) but one example that came up elsewhere was logging whether the machine was heavily loaded and what other processes were running. I could imagine we might want to log config files or registry keys, the versions of certain libraries, etc. Certainly this is secondary to reliably getting useable dumps.

wfurt · 2020-04-29T17:35:25Z

It seems like dotnet/dnceng#1216 is essentially dup of https://github.com/dotnet/core-eng/issues/5380. Second one outlines steps for the NIX platforms and it should not depends on architecture. Alternatively, we could use tools from the diag repo.

As far as the auxiliary information, load, m tail for kernel log and process lists with states and and stats would be good start IMHO. For networking tests, dump of interfaces, dns configuration, routing table and connections would be awesome.

ViktorHofer · 2020-05-05T08:28:17Z

It would depend on the test (or the test failure) but one example that came up elsewhere was logging whether the machine was heavily loaded and what other processes were running. I could imagine we might want to log config files or registry keys, the versions of certain libraries, etc. Certainly this is secondary to reliably getting useable dumps.

@danmosemsft that could be achieved with a (diagnostics) data logger:

DataCollectors are used to monitor test execution. Getting CPU or memory usage info, taking screenshot, recording screen activity, measuring code coverage, etc. while executing tests are a few common scenarios that can be realised through DataCollectors.

from https://github.com/Microsoft/vstest-docs/blob/master/docs/extensions/datacollector.md

Dotnet-GitSync-Bot added area-System.Diagnostics.Process untriaged New issue has not been triaged by the area owner labels Apr 27, 2020

ViktorHofer mentioned this issue Apr 27, 2020

skip Microsoft.XmlSerializer.Generator.Tests on FreeBSD #35494

Merged

wfurt mentioned this issue Apr 28, 2020

improve resiliency of process tests #35584

Merged

wfurt closed this as completed in #35584 Apr 30, 2020

jeffhandley removed the untriaged New issue has not been triaged by the area owner label Sep 17, 2020

ghost locked as resolved and limited conversation to collaborators Dec 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long Running Test - System.Diagnostics.Tests.ProcessTests.Kill_ExitedChildProcess_DoesNotThrow #35506

Long Running Test - System.Diagnostics.Tests.ProcessTests.Kill_ExitedChildProcess_DoesNotThrow #35506

ViktorHofer commented Apr 27, 2020

ghost commented Apr 27, 2020

danmoseley commented Apr 27, 2020

wfurt commented Apr 28, 2020

danmoseley commented Apr 28, 2020

wfurt commented Apr 28, 2020

danmoseley commented Apr 28, 2020 •

edited

Loading

danmoseley commented Apr 28, 2020

wfurt commented Apr 28, 2020

danmoseley commented Apr 28, 2020

danmoseley commented Apr 28, 2020

stephentoub commented Apr 28, 2020

ViktorHofer commented Apr 28, 2020

danmoseley commented Apr 28, 2020

nohwnd commented Apr 29, 2020 •

edited

Loading

danmoseley commented Apr 29, 2020

wfurt commented Apr 29, 2020

ViktorHofer commented May 5, 2020

Long Running Test - System.Diagnostics.Tests.ProcessTests.Kill_ExitedChildProcess_DoesNotThrow #35506

Long Running Test - System.Diagnostics.Tests.ProcessTests.Kill_ExitedChildProcess_DoesNotThrow #35506

Comments

ViktorHofer commented Apr 27, 2020

ghost commented Apr 27, 2020

danmoseley commented Apr 27, 2020

wfurt commented Apr 28, 2020

danmoseley commented Apr 28, 2020

wfurt commented Apr 28, 2020

danmoseley commented Apr 28, 2020 • edited Loading

danmoseley commented Apr 28, 2020

wfurt commented Apr 28, 2020

danmoseley commented Apr 28, 2020

danmoseley commented Apr 28, 2020

stephentoub commented Apr 28, 2020

ViktorHofer commented Apr 28, 2020

danmoseley commented Apr 28, 2020

nohwnd commented Apr 29, 2020 • edited Loading

danmoseley commented Apr 29, 2020

wfurt commented Apr 29, 2020

ViktorHofer commented May 5, 2020

danmoseley commented Apr 28, 2020 •

edited

Loading

nohwnd commented Apr 29, 2020 •

edited

Loading