Spurious 'Unable to read beyond end of stream' in CI while collecting results #210

Cyberboss · 2018-10-08T22:45:16Z

https://travis-ci.org/tgstation/tgstation-server/jobs/438449753#L1156

Not sure what causes this it rarely happens and seemingly at random. Latest coverlet.console

p10tyr · 2018-10-12T09:43:06Z

Could it be that the files are cleaned up before Coverlet finishes doing its thing.

That error usually occurs when the StreamReader opens a file, it has its Length in memory then starts to stream the data... but its either 0, in very rare occasion in processes of being deleted.

For the Length to be more the length of the file is not possible unless it got modified as mentioned.

Because of this "randomness" it seems like a race condition somewhere.

prime-gram · 2018-10-24T16:15:02Z

I'm also facing this, but consistently, when dealing with a huge number of testcases (~17k), i.e. in our nightly builds (where more complete testing is done).

In regular use cases, I don't see this happening.

My use case:
I'm calling coverlet, in dotnet test (via cake's DotNetCoreTest)

Error Line:
[16:34:51] : [Step 2/3] C:\Users\SVC-SS-BUILDER\.nuget\packages\coverlet.msbuild\2.3.1\build\netstandard2.0\coverlet.msbuild.targets(27,5): error : Unable to read beyond the end of the stream.

More Detailed:

[16:34:51] :     [Step 2/3] Results File: D:\tc_ba2\510a05667bb2cb9a\test\DynamicPricing.Application.Nightly.Tests\TestResults\TestResults.trx
[16:34:51] :     [Step 2/3] 
[16:34:51] :     [Step 2/3] Total tests: 17071. Passed: 17071. Failed: 0. Skipped: 0.
[16:34:51] :     [Step 2/3] Test Run Successful.
[16:34:51] :     [Step 2/3] Test execution time: 26.1204 Minutes
[16:34:51]i:     [Step 2/3] TRX (1s)
[16:34:51] :     [Step 2/3] 
[16:34:51] :     [Step 2/3] Calculating coverage result...
[16:34:51] :     [Step 2/3] C:\Users\SVC-SS-BUILDER\.nuget\packages\coverlet.msbuild\2.3.1\build\netstandard2.0\coverlet.msbuild.targets(27,5): error : Unable to read beyond the end of the stream. [D:\tc_ba2\510a05667bb2cb9a\test\DynamicPricing.Application.Nightly.Tests\DynamicPricing.Application.Nightly.Tests.csproj]
[16:34:51]E:     [Step 2/3] An error occurred when executing task 'RunTests'.
[16:34:52]E:     [Step 2/3] Error message is logged
[16:34:51]E:     [Step 2/3] Error: System.AggregateException: One or more errors occurred. ---> Cake.Core.CakeException: .NET Core CLI: Process returned an error (exit code 1).
[16:34:51]E:     [Step 2/3]    at Cake.Core.Tooling.Tool`1.ProcessExitCode(Int32 exitCode)
[16:34:51]E:     [Step 2/3]    at Cake.Core.Tooling.Tool`1.Run(TSettings settings, ProcessArgumentBuilder arguments, ProcessSettings processSettings, Action`1 postAction)
[16:34:51]E:     [Step 2/3]    at Cake.Common.Tools.DotNetCore.Test.DotNetCoreTester.Test(String project, DotNetCoreTestSettings settings)
[16:34:51]E:     [Step 2/3]    at Submission#0.DotNetCoreTest(String project, DotNetCoreTestSettings settings)
[16:34:51]E:     [Step 2/3]    at Submission#0.<<Initialize>>b__0_27(String configuration, String provider, Tuple`2 connectionInfo, Boolean nightly, Boolean coverage)
[16:34:51]E:     [Step 2/3]    at Submission#0.<<Initialize>>b__0_26()

p10tyr · 2018-10-25T08:57:12Z

Yea it looks like coverlet is trying to open the file.. and its not there any more.
Again the common thing here is a CI pipeline.
Can you run this locally on your machine using dotnet test with coverlet?

prime-gram · 2018-10-25T16:21:50Z

I ran it locally, but was unable to recreate the situation.

In the CI, it did happen always. But not on my laptop.

p10tyr · 2018-10-26T07:40:03Z

So as I thought it is a CI related issue. Its not the build or coverlet problem it is that CI starts to cleanup too quickly.
Please tell us more about how your CI is setup, which platform etc.

The next step of problem elimination is to manually run it on the CI Agent.

If it is AzureDevops you should be able to get console access to your build agent (i am not sure about self hosted agents though) and try and run the pipeline commands manually. See if it breaks the same way. If it does that means its an environment problem but I seriously doubt that.

To Consider:

I think when the pipeline is executing one after the other.. there is a cleanup operation that is happening. Please check for any options like that and try and do cleanup at the end.. or, on first step do a clean sources and then make sure there are no other cleanup options tagged

Let us know what you find please.

prime-gram · 2018-10-26T08:26:02Z

That is still not clear to me.

19k Total Tests = (17k nightly + 2k regular baseline)

We in fact have 19k tests, therefore, the 17k tests I was able to run (until I got the exception), was in the middle of the process. Therefore, before a possible CI cleanup (which I'm not sure we do at the end, I believe we cleanup the stuff at the beggining of a CI run).

Another hint: I had the exclusion lists not working properly (I was sending separate exclusion lists, via the command line, and that doesn't work well in powershell, there's an issue for that), which was causing that the extra 17k tests were also being "covered" by coverlet, generating a massive output (my guess, since I'm unable to see the result, because of the exception).

Cheers

p10tyr · 2018-10-26T08:28:01Z

Yea... I am not entirely sure. The best thing to try and do is recreate the problem somewhere, outside of CI.
Sorry that is all I can say for this

Cyberboss · 2018-10-27T18:08:35Z

~~A potential workaround is to invoke bash and sync after~~

i.e.

$HOME/.dotnet/tools/coverlet bin/$CONFIG/netcoreapp2.1/Tgstation.Server.Api.Tests.dll --target "bash" --targetargs "-c \"dotnet test -c $CONFIG --no-build && sync\"" --format opencover --output "../../TestResults/api.xml" --include "[Tgstation.Server*]*" --exclude "[Tgstation.Server.Api.Tests*]*"

Cyberboss · 2018-10-29T16:01:54Z

Disregard, it still errored out

prime-gram · 2018-10-29T16:23:07Z

I have the "feeling" that this has to do with some async (write) operation of some results, and when coverlet tries to read the output of it (it has not finished yet).

but this is just a guess...

leppie · 2018-10-30T13:04:40Z

Getting this too on AppVeyor lately (last 2 days). Nothing has changed on our side.

tonerdo · 2018-10-31T02:41:26Z

Not sure what causes this it rarely happens and seemingly at random. Latest coverlet.console

Out of curiosity is everyone having this problem using coverlet.console?

Cyberboss · 2018-10-31T03:48:35Z

That is what I'm using

…

On Tue, Oct 30, 2018, 10:41 PM Toni Solarin-Sodara ***@***.***> wrote: Not sure what causes this it rarely happens and seemingly at random. Latest coverlet.console Out of curiosity is everyone having this problem using coverlet.console? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#210 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHywenVul7IKf64aTZG5mVliZtFBZoQZks5uqQ3bgaJpZM4XNyyv> .

leppie · 2018-10-31T06:37:33Z

Not sure what causes this it rarely happens and seemingly at random. Latest coverlet.console

Out of curiosity is everyone having this problem using coverlet.console?

I am using coverlet.msbuild and just calling dotnet test /p:CollectCoverage=true on a per project basis in AppVeyor.

Still on 2.3.0 here

prime-gram · 2018-10-31T10:17:38Z

" just calling dotnet test /p:CollectCoverage=true on a per project basis"

Same here. Env: Windows + Teamcity + cake build script (which calls that, then).

tonerdo · 2018-11-04T02:34:45Z

Quick follow up question, does this only happen on a CI server? Anyone experienced it locally?

Cyberboss · 2018-11-04T02:39:50Z

I ran my tests with coverlet in a docker container for a few hours but never got the error

…

On Sat., Nov. 3, 2018, 10:34 p.m. Toni Solarin-Sodara < ***@***.*** wrote: Quick follow up question, does this only happen on a CI server? Anyone experienced it locally? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#210 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHyweovQW9IU-Y5pUpSWLEhGI1P3uZdvks5urlJGgaJpZM4XNyyv> .

WiredUK · 2018-11-08T21:43:34Z

Just wanted to add my support to this issue. We also have intermittent failures on our Azure DevOps pipeline. Here's a snippet from the last failure, using coverlet.msbuild:

2018-11-08T17:03:48.4568642Z Results File: D:\agents\5\_work\_temp\SVC_TFSBuild_NCL-S-BUILD03_2018-11-08_17_03_47.trx
2018-11-08T17:03:48.4569469Z 
2018-11-08T17:03:48.4572151Z Total tests: 14. Passed: 14. Failed: 0. Skipped: 0.
2018-11-08T17:03:48.4572637Z Test Run Successful.
2018-11-08T17:03:48.4576890Z Test execution time: 12.9343 Seconds
2018-11-08T17:03:48.5061451Z 
2018-11-08T17:03:48.5062641Z Calculating coverage result...
2018-11-08T17:03:49.0712122Z D:\.nuget\packages\coverlet.msbuild\2.3.1\build\netstandard2.0\coverlet.msbuild.targets(27,5): error : Unable to read beyond the end of the stream. [D:\agents\5\_work\58\s\test\aaa.bbb.ccc.Unit.Tests\aaa.bbb.ccc.Unit.Tests.csproj]

WiredUK · 2018-11-14T14:35:06Z

I've started doing some testing on this one. Have a custom version of coverlet with a bunch of extra logging to hopefully extract exactly where and why this is exploding. It's quite tedious though as a CI build needs to run and the hamsters in our build server are very tired. Will update when I get some more info...

WiredUK · 2018-11-14T14:51:24Z

And here is a better stack trace:

2018-11-14T14:47:34.2650390Z System.IO.EndOfStreamException: Unable to read beyond the end of the stream.
2018-11-14T14:47:34.2651311Z    at System.IO.BinaryReader.FillBuffer(Int32 numBytes)
2018-11-14T14:47:34.2652232Z    at System.IO.BinaryReader.ReadInt32()
2018-11-14T14:47:34.2652389Z    at Coverlet.Core.Coverage.CalculateCoverage()
2018-11-14T14:47:34.2652489Z    at Coverlet.Core.Coverage.GetCoverageResult()
2018-11-14T14:47:34.2652672Z    at Coverlet.MSbuild.Tasks.CoverageResultTask.Execute()

p10tyr · 2018-11-14T14:55:44Z

I know that error from working with files. It basically means the file has been deleted while trying to read it.

That log is cool.. but it would be cooler if you knew what file it is trying to read. I suppose Coverlet is trying to instrument the DLL then all of a sudden it disappears...

This is where the error is propagating

https://github.com/tonerdo/coverlet/blob/7ede9d7cba383fea9709cee4d380c0de27b58b42/src/coverlet.core/Coverage.cs#L178

And by reading

https://docs.microsoft.com/en-us/dotnet/api/system.io.binaryreader.readint32?view=netframework-4.7.2

It seems like just maybe there is a bug here where Coverlet expected more things than there were for some reason in the loops doing the ReadInt32.

Also it is difficult to understand what that code is actually doing...

WiredUK · 2018-11-14T15:02:03Z

@ppumkin Yeah, I'm going to dig a little further now I know which part of the app to instrument. It's difficult to test as it's a sporadic issue, but I can queue up multiple builds at the same time which helps.

WiredUK · 2018-11-14T15:19:13Z

@ppumkin Regarding your edit, the error could also be here:

https://github.com/tonerdo/coverlet/blob/7ede9d7cba383fea9709cee4d380c0de27b58b42/src/coverlet.core/Coverage.cs#L190

Edit:
But it's not, it is throwing when trying to read hitCandidatesCount

p10tyr · 2018-11-14T15:23:14Z

Yes, true. That is why I wrote its difficult to understand the code because is has a few levels of embedded loops and its reading 4 unsigned bytes each time from result.HitsFilePath,
What is in that file? Some kind of temporary file. If it it is temporary maybe that is the problem on the CI builds, lack of permissions to access the file? it gets cleaned up after build runs?

What is the file? (the path to it and contents)

WiredUK · 2018-11-14T15:23:38Z

The file that it was trying to read from varies, but for me is in the form (where xxx.yyy.zzz is a project name):

C:\Users\SVC_TFSBuild\AppData\Local\Temp\xxxx.yyyy.zzzz_d884df17-4b8f-41e4-9ebf-93b044af742d

WiredUK · 2018-11-14T15:24:13Z

I'm now wondering if it's something to do with it being in the temp directory...

Cyberboss · 2018-11-14T15:24:35Z

Here's one I just triggered in travis https://travis-ci.org/tgstation/tgstation-server/jobs/455035648#L1719

p10tyr · 2018-11-14T15:24:40Z

yes! Temp directory! I bet you something on AZ (or other CI apparently like Travis) is cleaning up the temp after BUILD completes

WiredUK · 2018-11-14T15:26:23Z

Possibly yeah, but this build server we have is local, not a hosted build.

WiredUK · 2018-11-15T00:49:01Z

That's a potentially good spot there @Cyberboss.

Question for all others: Are your build servers under heavy load or severely underpowered by any chance? Mine absolutely is (currently running on a shard host with a bunch of other VMs) and the disk is being hammered, so there's every possibility that it's taking too long to execute that process.

Cyberboss · 2018-11-15T00:50:58Z

Travis CI is constantly under load, so I don't doubt it.

…

On Wed., Nov. 14, 2018, 7:49 p.m. David Glass ***@***.*** wrote: That's a potentially good spot there @Cyberboss <https://github.com/Cyberboss>. Question for all others: Are your build servers under heavy load or severely underpowered by any chance? Mine absolutely is (currently running on a shard host with a bunch of other VMs) and the disk is being hammered, so there's every possibility that it's taking too long to execute that process. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#210 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHyweglPDmYyZpeZiuCJ9A46U_uAe6Xiks5uvLn_gaJpZM4XNyyv> .

petli · 2018-11-15T10:45:43Z

A way to avoid to do a lot of work in ProcessExit might be to use a memory mapped file to keep track of hit counts, which would be pretty easy to implement after #181. Instead of allocating an int[] a memory mapped file the size of HitCandidates.Count * 4 would be allocated, and each hit increment a counter at HitIndex * 4. This way there's no data to write to disk at the end of the test.

It shouldn't be a significant performance hit to use the memory mapped APIs instead of incrementing an int array directly, but that must be verified. It may not be necessary to use file-backed memory maps at all, and just use them as a shared memory area between the coverlet process and the test process.

To avoid locking threads on each others there would have to be one file per thread (much like there's one array per thread in #181), and Coverage.CalculateCoverage() would have to read all those files and tally them.

p10tyr · 2018-11-16T10:03:54Z

Is this a use case for the new Span T - maybe? I never used it but listening to some conferences it sounds like that is what it could be used for?

Any way, anything other than file system writes is going to be better.. Sounds like a lot of work though.

Is there a "temporary" work around to increase this time out maybe?

Fixes coverlet-coverage#210

ElephantsGerald · 2018-11-20T08:06:11Z

I can also confirm that this starts to become a problem on a CI server under load.

Marachkouski · 2019-01-09T12:06:27Z

We also have such problem in our build environment.

codemzs · 2019-01-11T00:29:30Z

@tonerdo Is there a workaround for this? I'm running into this error at the end of all test execution after I get all code coverage results. It happens locally on my machine when I run debug build but for release.

sharwell · 2019-01-12T05:17:41Z

I narrowed this down with @codemzs today. The failure occurs when the hit file is truncated due to process termination while writing the file. After that point, subsequent loads of the file for coverage gathering fail because the file is either empty or contains fewer entries than claimed. I'm planning to address two parts of this and send a pull request:

Improve write performance using memory mapped files, allowing the final flush to disk to occur after the process closes.
During reads, handle files that are too short by assuming the rest of the data is zero-filled.

There was some mention above of possible timeouts. I'm not planning to address this part specifically, but if the allowed data flush duration can be extended then it would likely round out the situation for great overall reliability.

codemzs · 2019-01-14T03:06:38Z

Can we please get #276 in to address the issue and have a new nuget? @Cyberboss @tonerdo @petli

tonerdo · 2019-01-17T10:10:26Z

@Cyberboss @codemzs New NuGet releases with the proposed fixed have been released

ViktorHofer · 2019-01-20T00:26:57Z

The believe the fastest way to transfer data between the child (test process) and the host (coverlet) would be to just write the unique id + the hitcount out to stdout and read it in the host. That would remove complex references to MemoryMappedFile without any file I/O.

The problem with that solution is that coverlet.msbuild.tasks allow to instrument and assembly and collect its coverage without controlling how the process is launched. Therefore reading from stdout is not possible in the build task.

cc @petli

petli · 2019-01-21T10:35:08Z

@ViktorHofer Do you think named pipes be as fast as stdout? The pipe name could be passed into the instrumented code via ModuleTrackerTemplate much like the mmap name.

sharwell · 2019-01-21T17:01:52Z

I'm starting to think the cause of this issue lies in the test runner itself, as opposed to the CLR. However, I haven't been able to pinpoint the sequence in the vstest runner that leads to the forced termination of the test process.

ViktorHofer · 2019-01-24T16:45:30Z

Correct the shutdown is triggered by vstest. This can be easily tested with the following code. Happens with both mstest and xunit which proves that its not the runner itself but the testing framework (vstest) which is responsible for stopping the execution early.

using Microsoft.VisualStudio.TestTools.UnitTesting;
using System;
using System.Diagnostics;
using System.Threading.Tasks;

namespace dotnettest
{
    [TestClass]
    public class UnitTest1
    {
        [TestMethod]
        public void Test1()
        {
            AppDomain.CurrentDomain.ProcessExit += CurrentDomain_ProcessExit;
        }

        private void CurrentDomain_ProcessExit(object sender, EventArgs e)
        {
            System.IO.File.WriteAllText("abc.txt", "start");
            Task.Delay(10000).Wait();
            System.IO.File.WriteAllText("abc.txt", "end");
        }
    }
}

Repro: dotnet test.

Related code paths in vstest:
https://github.com/Microsoft/vstest/blob/7b6248203164f8ea821f6795632bd22e0b69afb0/src/Microsoft.TestPlatform.CoreUtilities/Utilities/JobQueue.cs

VSTest which is triggered by dotnet test is responsible I'll try to gather more information.

As a workaround you could invoke your test runner without dotnet test, i.e. xunit.console.dll (unsupported).

Fixes coverlet-coverage#210

MarcoRossignoli · 2019-09-28T08:11:38Z

Close thank's to new collectors https://github.com/tonerdo/coverlet#requirements

tonerdo added the bug Something isn't working label Nov 13, 2018

Cyberboss added a commit to Cyberboss/coverlet that referenced this issue Nov 16, 2018

Change hits file instrumentation to be memory mapped

bc7a4c1

Fixes coverlet-coverage#210

Cyberboss mentioned this issue Nov 16, 2018

Change hits file instrumentation to be memory mapped #241

Closed

gfoidl mentioned this issue Dec 1, 2018

Code Coverage scripts, CI, README.md gfoidl/Base64#28

Merged

petli mentioned this issue Dec 28, 2018

Writing hit counts to a shared memory mapped area instead of to a file. #276

Merged

petli mentioned this issue Jan 17, 2019

[WIP] Enable CoreLib coverage hits dotnet/corefx#34641

Closed

ViktorHofer mentioned this issue Jan 24, 2019

VSTest stops process execution early microsoft/vstest#1900

Closed

sharwell added a commit to sharwell/coverlet that referenced this issue Jan 30, 2019

Implement a VSTest data collector to write coverage data to disk

641dba9

Fixes coverlet-coverage#210

sharwell mentioned this issue Jan 30, 2019

[WIP] Implement a VSTest data collector to write coverage data to disk #329

Closed

MarcoRossignoli mentioned this issue Mar 29, 2019

Latest NuGet package no longer works #359

Closed

vagisha-nidhi mentioned this issue May 3, 2019

Integrate coverlet with VSTest (Microsoft Test Platform) #395

Closed

suhsteve mentioned this issue Jun 13, 2019

Load Assembly referenced from within a lambda closure dotnet/spark#135

Merged

MarcoRossignoli mentioned this issue Jul 23, 2019

Running coverlet global tool over dotnet build intermittently fails to open file when calculating coverage #491

Closed

KerbalSpike mentioned this issue Jul 26, 2019

Running multiple test commands in the same directory overwrites coverage #268

Closed

MarcoRossignoli closed this as completed Sep 28, 2019

MarcoRossignoli mentioned this issue Apr 6, 2020

Add possibility to run coverlet in standalone mode #781

Closed

Spurious 'Unable to read beyond end of stream' in CI while collecting results #210

Spurious 'Unable to read beyond end of stream' in CI while collecting results #210

Comments

Cyberboss commented Oct 8, 2018

p10tyr commented Oct 12, 2018

prime-gram commented Oct 24, 2018 • edited Loading

p10tyr commented Oct 25, 2018 • edited Loading

prime-gram commented Oct 25, 2018

p10tyr commented Oct 26, 2018 • edited Loading

prime-gram commented Oct 26, 2018

p10tyr commented Oct 26, 2018

Cyberboss commented Oct 27, 2018 • edited Loading

Cyberboss commented Oct 29, 2018

prime-gram commented Oct 29, 2018

leppie commented Oct 30, 2018

tonerdo commented Oct 31, 2018

Cyberboss commented Oct 31, 2018 via email

leppie commented Oct 31, 2018 • edited Loading

prime-gram commented Oct 31, 2018

tonerdo commented Nov 4, 2018

Cyberboss commented Nov 4, 2018 via email • edited Loading

WiredUK commented Nov 8, 2018 • edited Loading

WiredUK commented Nov 14, 2018

WiredUK commented Nov 14, 2018

p10tyr commented Nov 14, 2018 • edited Loading

WiredUK commented Nov 14, 2018 • edited Loading

WiredUK commented Nov 14, 2018 • edited Loading

p10tyr commented Nov 14, 2018 • edited Loading

WiredUK commented Nov 14, 2018

WiredUK commented Nov 14, 2018

Cyberboss commented Nov 14, 2018

p10tyr commented Nov 14, 2018 • edited Loading

WiredUK commented Nov 14, 2018

WiredUK commented Nov 15, 2018

Cyberboss commented Nov 15, 2018 via email

petli commented Nov 15, 2018

p10tyr commented Nov 16, 2018 • edited Loading

ElephantsGerald commented Nov 20, 2018

Marachkouski commented Jan 9, 2019

codemzs commented Jan 11, 2019 • edited Loading

sharwell commented Jan 12, 2019

codemzs commented Jan 14, 2019

tonerdo commented Jan 17, 2019

ViktorHofer commented Jan 20, 2019

petli commented Jan 21, 2019

sharwell commented Jan 21, 2019

ViktorHofer commented Jan 24, 2019 • edited Loading

MarcoRossignoli commented Sep 28, 2019

prime-gram commented Oct 24, 2018 •

edited

Loading

p10tyr commented Oct 25, 2018 •

edited

Loading

p10tyr commented Oct 26, 2018 •

edited

Loading

Cyberboss commented Oct 27, 2018 •

edited

Loading

leppie commented Oct 31, 2018 •

edited

Loading

Cyberboss commented Nov 4, 2018 via email •

edited

Loading

WiredUK commented Nov 8, 2018 •

edited

Loading

p10tyr commented Nov 14, 2018 •

edited

Loading

WiredUK commented Nov 14, 2018 •

edited

Loading

WiredUK commented Nov 14, 2018 •

edited

Loading

p10tyr commented Nov 14, 2018 •

edited

Loading

p10tyr commented Nov 14, 2018 •

edited

Loading

p10tyr commented Nov 16, 2018 •

edited

Loading

codemzs commented Jan 11, 2019 •

edited

Loading

ViktorHofer commented Jan 24, 2019 •

edited

Loading