Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BinaryFormatter Bitmap serialization test failed #28553

Closed
ViktorHofer opened this issue Jan 29, 2019 · 17 comments · Fixed by #34306
Closed

BinaryFormatter Bitmap serialization test failed #28553

ViktorHofer opened this issue Jan 29, 2019 · 17 comments · Fixed by #34306
Assignees
Labels
area-System.Drawing blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' test-run-core Test failures in .NET Core test runs
Milestone

Comments

@ViktorHofer
Copy link
Member

https://mc.dot.net/#/user/dotnet-bot/pr~2Fdotnet~2Fcorefx~2Frefs~2Fpull~2F34871~2Fmerge/test~2Ffunctional~2Fcli~2F~2Fouterloop~2F/20190128.1/workItem/System.Runtime.Serialization.Formatters.Tests/analysis/xunit/System.Runtime.Serialization.Formatters.Tests.BinaryFormatterTests~2FValidateAgainstBlobs(obj:%20Bitmap%20%7B%20Flags%20=%2077840,%20FrameDimensionsList%20=%20%5B7462dc86-6180-4c7e-8e3f-ee7333a7a483%5D,%20Height%20=%20)

Assert.Equal() Failure
Expected: 77840
Actual: 73744

at System.Runtime.Serialization.Formatters.Tests.EqualityExtensions.IsEqual(Bitmap this, Bitmap other, Boolean isSamePlatform) in /__w/1/s/src/System.Runtime.Serialization.Formatters/tests/EqualityExtensions.cs:line 1233

Failing in Outerloop. @safern can you please take a look.

@stephentoub
Copy link
Member

Hasn't this been failing for a long time in outer loop?

@ViktorHofer
Copy link
Member Author

Right.

@stephentoub
Copy link
Member

Ok. I was just confused by the "safern can you please take a look" part :)

@ViktorHofer
Copy link
Member Author

I pinged him as he is the owner of System.Drawing who also added some serialization tests for it. I didn't notice that failing test in Outerloop before so that's why I created this issue now.

@stephentoub
Copy link
Member

Got it. :)

@stephentoub
Copy link
Member

This is consistently failing on OpenSUSE.

@safern
Copy link
Member

safern commented Feb 22, 2019

I’ll take a look.

@safern
Copy link
Member

safern commented Feb 22, 2019

I ran this in a openSuse 42.3 container that I've got, and it didn't repro, I actually got the expected value from the raw image and from the blob-deserialized image. I will try to use the repro tool tomorrow to validate the machines have the latest libgdiplus installed.

@stephentoub
Copy link
Member

Thanks.

@safern
Copy link
Member

safern commented Feb 22, 2019

I'm waiting for the engineering team to provide me with a machine with helix setup to validate this issue and understand if the issue is libgdiplus version itself or not.

@safern
Copy link
Member

safern commented Feb 26, 2019

I tried to repro this on 2 different OpenSUSE42.3 environments, one a docker container and the other one of the helix machines, I didn't have any luck reproducing the issue. I also just downloaded the latest testResults.xml file from some of the latest outerloop runs (official builds do outerloop) and it is passing:

<test name="System.Runtime.Serialization.Formatters.Tests.BinaryFormatterTests.ValidateAgainstBlobs(obj: Bitmap { Flags = 73744, FrameDimensionsList = [7462dc86-6180-4c7e-8e3f-ee7333a7a483], Height = 100, HorizontalResolution = 96, Palette = ColorPalette { Entries = [...], Flags = 0 }, ... }, blobs: [System.Runtime.Serialization.Formatters.Tests.TypeSerializableValue, System.Runtime.Serialization.Formatters.Tests.TypeSerializableValue])" type="System.Runtime.Serialization.Formatters.Tests.BinaryFormatterTests" method="ValidateAgainstBlobs" time="0.0115822" result="Pass" />

This is also the output from the helix machine I got into it from multiple runs:

~/safern/tests> cat testResults.xml | grep 'ValidateAgainstBlobs' | grep 'Bitmap'
      <test name="System.Runtime.Serialization.Formatters.Tests.BinaryFormatterTests.ValidateAgainstBlobs(obj: Bitmap { Flags = 73744, FrameDimensionsList = [7462dc86-6180-4c7e-8e3f-ee7333a7a483], Height = 100, HorizontalResolution = 96, Palette = ColorPalette { Entries = [...], Flags = 0 }, ... }, blobs: [System.Runtime.Serialization.Formatters.Tests.TypeSerializableValue, System.Runtime.Serialization.Formatters.Tests.TypeSerializableValue])" type="System.Runtime.Serialization.Formatters.Tests.BinaryFormatterTests" method="ValidateAgainstBlobs" time="0.0143476" result="Pass" />

Also, I did a kusto query and the last failure was on January 27th. From digging in libgdiplus it seems like there were some fixes on Bitmap in libgdiplus, could be the reason why it is not failing anymore.

Closing and if we see the failure again we can reopen.

@safern safern closed this as completed Feb 26, 2019
@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@msftgits msftgits added this to the 3.0 milestone Feb 1, 2020
@janvorli
Copy link
Member

@safern it now seems to be reproducing on the "Libraries Test Run release coreclr Linux x64 Debug" leg. It has failed on my PR with this exact issue on x64 Ubuntu 18.04 several times (I've tried to restart the leg and keep getting it):
https://dev.azure.com/dnceng/public/_build/results?buildId=579375&view=results

@janvorli janvorli reopened this Mar 30, 2020
@janvorli
Copy link
Member

There is a similar failure in other tests
System.Runtime.Serialization.Formatters.Tests.BinaryFormatterTests.RoundtripManyObjectsInOneStream:

Assert.Equal() Failure
Expected: 19977332
Actual: 19978652

System.Runtime.Serialization.Formatters.Tests.BinaryFormatterTests.ValidateBasicObjectsRoundtrip:

Assert.Equal() Failure
Expected: 93535970
Actual: 93537180

@jaredpar
Copy link
Member

jaredpar commented Mar 30, 2020

This hasn't hit a rolling build yet but it is starting to show up in a number of PRs.

System.Runtime.Serialization.Formatters.Tests Work Item

Console Log Summary

Builds

Build Pull Request Test Failure Count
#579294 #32592 2
#579319 #34261 2
#579375 #34154 2
#579412 #34263 1
#579428 #32592 2
#579508 #34064 1
#579536 #34275 1
#579539 #34275 1
#579587 #34225 1
#579589 #34166 1
#579596 #34046 1
#579629 #34022 1
#579675 #34086 1
#579684 #32592 2
#579750 #34249 1

Configurations

  • netcoreapp5.0-Linux-Debug-x64-CoreCLR_release-Ubuntu.1804.Amd64.Open
  • netcoreapp5.0-Windows_NT-Debug-x64-Mono_release-(Windows.Nano.1809.Amd64.Open)windows.10.amd64.serverrs5.open@mcr.microsoft.com/dotnet-buildtools/prereqs:nanoserver-1809-helix-amd64-08e8e40-20200107182504

Helix Logs

Build Pull Request Console Core Test Results Run Client
#579294 #32592 console.log testResults.xml run_client.py
#579294 #32592 console.log run_client.py
#579319 #34261 console.log testResults.xml run_client.py
#579319 #34261 console.log testResults.xml run_client.py
#579375 #34154 console.log testResults.xml run_client.py
#579375 #34154 console.log testResults.xml run_client.py
#579412 #34263 console.log testResults.xml run_client.py
#579428 #32592 console.log testResults.xml run_client.py
#579428 #32592 console.log run_client.py
#579508 #34064 console.log testResults.xml run_client.py
#579536 #34275 console.log testResults.xml run_client.py
#579539 #34275 console.log testResults.xml run_client.py
#579587 #34225 console.log testResults.xml run_client.py
#579589 #34166 console.log testResults.xml run_client.py
#579596 #34046 console.log testResults.xml run_client.py
#579629 #34022 console.log testResults.xml run_client.py
#579675 #34086 console.log testResults.xml run_client.py
#579684 #32592 console.log testResults.xml run_client.py
#579684 #32592 console.log run_client.py
#579750 #34249 console.log testResults.xml run_client.py

runfo tests -d runtime -c 100 -pr -n "System.Runtime.Serialization.Formatters.Tests Work Item" -m -e 579185

Excluded 579185 from data because it appears to be a legitimate failing PR

@jaredpar jaredpar added the blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' label Mar 30, 2020
@safern
Copy link
Member

safern commented Mar 30, 2020

This hasn't hit a rolling build yet but it is starting to show up in a number of PRs.

Some of the PRs that are on that data aren't this issue. For example, PR: #34166, build: https://dev.azure.com/dnceng/public/_build/results?buildId=579185&view=ms.vss-test-web.build-test-results-tab

All workitems are crashing because of the changes in the PR itself.

@safern it now seems to be reproducing on the "Libraries Test Run release coreclr Linux x64 Debug" leg.

@joperezr and I are looking at it at the moment, trying to repro locally.

@safern
Copy link
Member

safern commented Mar 30, 2020

Ok, so in between @MattGal @joperezr and I were able to root cause the issue. Here's the summary of our investigation:

First, I thought it was a libgdiplus version issue because in my local Ubuntu18.04 I was running against 6.0.4 (latest) libgdiplus and it didn't repro. So I rolled back to the one inbox and it started failing consistently. So we went and talked to @MattGal to see if something changed on the machines, and it hasn't (however they use libgdiplus 4.2, so they definitely should be updated to use the latest), but since nothing changed on how the machines are setup, we went and look at the build history, and found this change: #34251 which seems suspicious because in this specific scenario we build a Bitmap from a Stream and we save it into a Stream, which uses managed delegates passed down via PInvoke to libgdiplus, and libgdiplus calls back into them.

So I reverted the change and it doesn't repro on any of the two versions above in my local machine after 10 runs it didn't repro at all.

@jkotas even though it only reproes on an old libgdiplus, I think it is worth investigating why it started failing and if there is a bug in the runtime.

We have two options here to unblock PRs. Either revert #34251 or condition this test data to only run on Windows or when libgdiplus version > 6. @jkotas which one do you prefer?

I'll follow up on updating the machines to use the latest libgdiplus as that we should do and enable the test that we have for macOS checking that we run on > 6 in other Linux distros once that happens.

Note that this only repros on PRs because in CI we run these tests against a Release framework and in PRs against Debug. Also your PR didn't catch it because it only changed CoreCLR and we only ran tests against a checked runtime and this test is skipped on checked runtime because it is very slow

@safern
Copy link
Member

safern commented Mar 30, 2020

I ended up putting the revert PR here in the meantime: #34306

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Drawing blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' test-run-core Test failures in .NET Core test runs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants