-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spurious 'Unable to read beyond end of stream' in CI while collecting results #210
Comments
Could it be that the files are cleaned up before Coverlet finishes doing its thing. That error usually occurs when the StreamReader opens a file, it has its Length in memory then starts to stream the data... but its either 0, in very rare occasion in processes of being deleted. For the Length to be more the length of the file is not possible unless it got modified as mentioned. Because of this "randomness" it seems like a race condition somewhere. |
I'm also facing this, but consistently, when dealing with a huge number of testcases (~17k), i.e. in our nightly builds (where more complete testing is done). In regular use cases, I don't see this happening. My use case: Error Line: More Detailed:
|
Yea it looks like coverlet is trying to open the file.. and its not there any more. |
I ran it locally, but was unable to recreate the situation. In the CI, it did happen always. But not on my laptop. |
So as I thought it is a CI related issue. Its not the build or coverlet problem it is that CI starts to cleanup too quickly. The next step of problem elimination is to manually run it on the CI Agent. If it is AzureDevops you should be able to get console access to your build agent (i am not sure about self hosted agents though) and try and run the pipeline commands manually. See if it breaks the same way. If it does that means its an environment problem but I seriously doubt that. To Consider: I think when the pipeline is executing one after the other.. there is a cleanup operation that is happening. Please check for any options like that and try and do cleanup at the end.. or, on first step do a clean sources and then make sure there are no other cleanup options tagged Let us know what you find please. |
That is still not clear to me. 19k Total Tests = (17k nightly + 2k regular baseline) We in fact have 19k tests, therefore, the 17k tests I was able to run (until I got the exception), was in the middle of the process. Therefore, before a possible CI cleanup (which I'm not sure we do at the end, I believe we cleanup the stuff at the beggining of a CI run). Another hint: I had the exclusion lists not working properly (I was sending separate exclusion lists, via the command line, and that doesn't work well in powershell, there's an issue for that), which was causing that the extra 17k tests were also being "covered" by coverlet, generating a massive output (my guess, since I'm unable to see the result, because of the exception). Cheers |
Yea... I am not entirely sure. The best thing to try and do is recreate the problem somewhere, outside of CI. |
i.e. $HOME/.dotnet/tools/coverlet bin/$CONFIG/netcoreapp2.1/Tgstation.Server.Api.Tests.dll --target "bash" --targetargs "-c \"dotnet test -c $CONFIG --no-build && sync\"" --format opencover --output "../../TestResults/api.xml" --include "[Tgstation.Server*]*" --exclude "[Tgstation.Server.Api.Tests*]*" |
Disregard, it still errored out |
I have the "feeling" that this has to do with some async (write) operation of some results, and when coverlet tries to read the output of it (it has not finished yet). but this is just a guess... |
Getting this too on AppVeyor lately (last 2 days). Nothing has changed on our side. |
Out of curiosity is everyone having this problem using |
That is what I'm using
…On Tue, Oct 30, 2018, 10:41 PM Toni Solarin-Sodara ***@***.***> wrote:
Not sure what causes this it rarely happens and seemingly at random.
Latest coverlet.console
Out of curiosity is everyone having this problem using coverlet.console?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#210 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHywenVul7IKf64aTZG5mVliZtFBZoQZks5uqQ3bgaJpZM4XNyyv>
.
|
I am using Still on 2.3.0 here |
" just calling dotnet test /p:CollectCoverage=true on a per project basis" Same here. Env: Windows + Teamcity + cake build script (which calls that, then). |
Quick follow up question, does this only happen on a CI server? Anyone experienced it locally? |
I ran my tests with coverlet in a docker container for a few hours but
never got the error
…On Sat., Nov. 3, 2018, 10:34 p.m. Toni Solarin-Sodara < ***@***.*** wrote:
Quick follow up question, does this only happen on a CI server? Anyone
experienced it locally?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#210 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHyweovQW9IU-Y5pUpSWLEhGI1P3uZdvks5urlJGgaJpZM4XNyyv>
.
|
Just wanted to add my support to this issue. We also have intermittent failures on our Azure DevOps pipeline. Here's a snippet from the last failure, using
|
I've started doing some testing on this one. Have a custom version of coverlet with a bunch of extra logging to hopefully extract exactly where and why this is exploding. It's quite tedious though as a CI build needs to run and the hamsters in our build server are very tired. Will update when I get some more info... |
And here is a better stack trace:
|
I know that error from working with files. It basically means the file has been deleted while trying to read it. That log is cool.. but it would be cooler if you knew what file it is trying to read. I suppose Coverlet is trying to instrument the DLL then all of a sudden it disappears... This is where the error is propagating And by reading https://docs.microsoft.com/en-us/dotnet/api/system.io.binaryreader.readint32?view=netframework-4.7.2 It seems like just maybe there is a bug here where Coverlet expected more things than there were for some reason in the loops doing the ReadInt32. Also it is difficult to understand what that code is actually doing... |
@ppumkin Yeah, I'm going to dig a little further now I know which part of the app to instrument. It's difficult to test as it's a sporadic issue, but I can queue up multiple builds at the same time which helps. |
@ppumkin Regarding your edit, the error could also be here: Edit: |
Yes, true. That is why I wrote its difficult to understand the code because is has a few levels of embedded loops and its reading 4 unsigned bytes each time from What is the file? (the path to it and contents) |
The file that it was trying to read from varies, but for me is in the form (where xxx.yyy.zzz is a project name):
|
I'm now wondering if it's something to do with it being in the temp directory... |
Here's one I just triggered in travis https://travis-ci.org/tgstation/tgstation-server/jobs/455035648#L1719 |
yes! Temp directory! I bet you something on AZ (or other CI apparently like Travis) is cleaning up the temp after BUILD completes |
Possibly yeah, but this build server we have is local, not a hosted build. |
That's a potentially good spot there @Cyberboss. Question for all others: Are your build servers under heavy load or severely underpowered by any chance? Mine absolutely is (currently running on a shard host with a bunch of other VMs) and the disk is being hammered, so there's every possibility that it's taking too long to execute that process. |
Travis CI is constantly under load, so I don't doubt it.
…On Wed., Nov. 14, 2018, 7:49 p.m. David Glass ***@***.*** wrote:
That's a potentially good spot there @Cyberboss
<https://github.com/Cyberboss>.
Question for all others: Are your build servers under heavy load or
severely underpowered by any chance? Mine absolutely is (currently running
on a shard host with a bunch of other VMs) and the disk is being hammered,
so there's every possibility that it's taking too long to execute that
process.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#210 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHyweglPDmYyZpeZiuCJ9A46U_uAe6Xiks5uvLn_gaJpZM4XNyyv>
.
|
A way to avoid to do a lot of work in ProcessExit might be to use a memory mapped file to keep track of hit counts, which would be pretty easy to implement after #181. Instead of allocating an int[] a memory mapped file the size of HitCandidates.Count * 4 would be allocated, and each hit increment a counter at HitIndex * 4. This way there's no data to write to disk at the end of the test. It shouldn't be a significant performance hit to use the memory mapped APIs instead of incrementing an int array directly, but that must be verified. It may not be necessary to use file-backed memory maps at all, and just use them as a shared memory area between the coverlet process and the test process. To avoid locking threads on each others there would have to be one file per thread (much like there's one array per thread in #181), and Coverage.CalculateCoverage() would have to read all those files and tally them. |
Is this a use case for the new Span T - maybe? I never used it but listening to some conferences it sounds like that is what it could be used for? Any way, anything other than file system writes is going to be better.. Sounds like a lot of work though. Is there a "temporary" work around to increase this time out maybe? |
I can also confirm that this starts to become a problem on a CI server under load. |
We also have such problem in our build environment. |
@tonerdo Is there a workaround for this? I'm running into this error at the end of all test execution after I get all code coverage results. It happens locally on my machine when I run debug build but for release. |
I narrowed this down with @codemzs today. The failure occurs when the hit file is truncated due to process termination while writing the file. After that point, subsequent loads of the file for coverage gathering fail because the file is either empty or contains fewer entries than claimed. I'm planning to address two parts of this and send a pull request:
There was some mention above of possible timeouts. I'm not planning to address this part specifically, but if the allowed data flush duration can be extended then it would likely round out the situation for great overall reliability. |
Can we please get #276 in to address the issue and have a new nuget? @Cyberboss @tonerdo @petli |
@Cyberboss @codemzs New NuGet releases with the proposed fixed have been released |
The believe the fastest way to transfer data between the child (test process) and the host (coverlet) would be to just write the unique id + the hitcount out to stdout and read it in the host. That would remove complex references to MemoryMappedFile without any file I/O. The problem with that solution is that coverlet.msbuild.tasks allow to instrument and assembly and collect its coverage without controlling how the process is launched. Therefore reading from stdout is not possible in the build task. cc @petli |
@ViktorHofer Do you think named pipes be as fast as stdout? The pipe name could be passed into the instrumented code via ModuleTrackerTemplate much like the mmap name. |
I'm starting to think the cause of this issue lies in the test runner itself, as opposed to the CLR. However, I haven't been able to pinpoint the sequence in the vstest runner that leads to the forced termination of the test process. |
Correct the shutdown is triggered by vstest. This can be easily tested with the following code. Happens with both mstest and xunit which proves that its not the runner itself but the testing framework (vstest) which is responsible for stopping the execution early.
Repro: Related code paths in vstest: VSTest which is triggered by As a workaround you could invoke your test runner without dotnet test, i.e. xunit.console.dll (unsupported). |
Close thank's to new collectors https://github.com/tonerdo/coverlet#requirements |
https://travis-ci.org/tgstation/tgstation-server/jobs/438449753#L1156
Not sure what causes this it rarely happens and seemingly at random. Latest coverlet.console
The text was updated successfully, but these errors were encountered: