Test results depends on Mac device #109

navartis · 2019-10-29T16:46:21Z

After updating MacOS up to Catalina 10.15 and Xcode up to 11.1 (11A1027)

I found that tests on other Mac device are failed with reference images saved an my Mac device.

Latest stable version 6.2.0 of library are used.

I spend 2 days trying to find the reason and fix it, implement "Highlight Different Pixels" feature and create request #108 which can help investigate such situations.

But now a time I have no ideas how to fix this issue.

MaikCL · 2019-11-19T17:36:32Z

Same happen to me, I record all images using Catalina, but running the same test in a Travis environment with macOS 10.14 It's failing for the diffs, as if they were shadows with different contrast

mime29 · 2019-12-18T04:44:37Z

Same issue here. Any idea about the reason?

navartis · 2019-12-20T11:56:47Z

It seems to me the reason is that Apple in Xcode 11x migrate Simulators processing from CPU to GPU.

GPU calculations faster but not accurate and it result may depends on hardware GPU venders and drivers.

I spend a lot of time, but the only working work around for me is using perPixelTolerance: 6/256
in func FBSnapshotVerifyView.

And this request #108

luiz-brasil · 2020-03-16T18:39:48Z

It seems to me the reason is that Apple in Xcode 11x migrate Simulators processing from CPU to GPU.

GPU calculations faster but not accurate and it result may depends on hardware GPU venders and drivers.

I spend a lot of time, but the only working work around for me is using perPixelTolerance: 6/256
in func FBSnapshotVerifyView.

And this request #108

@navartis Why are you using the 6/256 number for perPixeltolerance?

mime29 · 2020-03-19T00:35:54Z

[Q] We didn't find a good solution on my side yet but we noticed one thing: The random diff result seems to come from views containing transparent assets (images with alpha channel not set to 1)
Do you observe the same thing on your sides?

navartis · 2020-03-19T08:17:37Z

It seems to me the reason is that Apple in Xcode 11x migrate Simulators processing from CPU to GPU.
GPU calculations faster but not accurate and it result may depends on hardware GPU venders and drivers.
I spend a lot of time, but the only working work around for me is using perPixelTolerance: 6/256
in func FBSnapshotVerifyView.
And this request #108

@navartis Why are you using the 6/256 number fo perPixeltolerance?

@luiz-brasil Unfortunately '6/256' is a magic number. I hate magic numbers in code, but in this case I have no ideas

samskiter · 2020-04-08T11:19:53Z

We're also seeing this issue. Interestingly it seems to only occur when using either usesDrawViewHierarchyInRect OR snapshotting during a UI test (not unit test).

Seems that it could be related to being rendered in a UIWindow.

I also got the same results between two Mac Pros (i.e. identical machines produce the same result).

Could this be graphics card related and is it time to raise a radar?

pigeon56 · 2020-04-24T20:14:59Z

I'm so glad that I stumbled across this thread, because I think I'm seeing the same thing.

About a week ago, I started noticing failures in some of my iOS Snapshot tests. These tests passed locally - both when run via Xcode and Fastlane - but failed when run on CI. I'm still trying to figure out which OS our build machine is using (though I suspect it might be 10.14.x), but it just so happens that I updated my machine to Catalina last week, and now I'm getting these failures.

I was able to capture the diff screenshots after running the tests on CI, and they're gray. I asked a designer to take one and bump up the contract and only with a massive bump in contrast were we able to see the slightest traces of some scree elements. (And I really had to squint to see them, even after the adjustment.)

Seems like the workaround at the moment is to bump up the tolerance. Has anyone found a different solution?

samskiter · 2020-05-05T15:11:25Z

Just to confirm - is everyone else seeing that this not only happens between 10.14 and 10.15 (which is normal for any upgrade) but also between machines that are running the same version of macOS Catalina (10.15)?

Finally, has anyone got a simple test case that could be sent to Apple?

luiz-brasil · 2020-05-05T15:46:11Z

@samskiter unfortunately I don't have a specific test case, primarily because these errors occur randomly.

I can affirm that the error occurs between different machines with the same version of Mac OS Catalina.

gspiers · 2020-05-05T16:01:50Z

@samskiter Results seem to be GPU dependent. In the simulator you can get different results by using the integrate or discrete GPUs File -> GPU Selection. I've done a lot of investigation and the only fix was to increase the tolerance of snapshots.

One of the Apple engineers that works on the simulator said the way forward is to increase pixel matching tolerance. Conversation on Twitter. Hope that helps.

navartis · 2020-05-05T16:30:47Z

😢

samskiter · 2020-05-05T20:25:24Z

Wow, apple just broke an entire tool. Thanks Apple.

…

On Tue, 5 May 2020 at 17:31, navartis ***@***.***> wrote: 😢 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#109 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJWOW56ZLTVHHMZVZYTWC3RQA5MRANCNFSM4JGMB5YA> .

JustinDSN · 2020-05-06T00:07:46Z

@samskiter Results seem to be GPU dependent. In the simulator you can get different results by using the integrate or discrete GPUs File -> GPU Selection. I've done a lot of investigation and the only fix was to increase the tolerance of snapshots.

One of the Apple engineers that works on the simulator said the way forward is to increase pixel matching tolerance. Conversation on Twitter. Hope that helps.

Because of this issue, what values are teams using for perPixelTolerance and overallTolerance?

pigeon56 · 2020-05-06T00:15:13Z

I'm running a little experiment on a few tests. I'll have to follow up tomorrow to let you know if it works for all of my tests, but so far I'm having luck with it.

For almost all screens:
FBSnapshotVerifyView(UIImageView(image: croppedImage), identifier: identifier, perPixelTolerance: 6/256) // tried this number after seeing it higher up in this thread

This fixed most of my failures, but screens that use transparent elements still fail. For screens with transparent elements (where alpha < 1), this is actually passing for me:
FBSnapshotVerifyView(UIImageView(image: croppedImage), identifier: identifier, overallTolerance: 0.02)

Again, these successes are based on a very small sampling of tests that are currently passing. Nearly all screens are being verified with the per pixel value, as we don't use a whole lot of transparency. I'll update this post tomorrow when I know whether or not it's a success.

pigeon56 · 2020-05-06T21:23:56Z

I'm running a little experiment on a few tests. I'll have to follow up tomorrow to let you know if it works for all of my tests, but so far I'm having luck with it.

For almost all screens:
FBSnapshotVerifyView(UIImageView(image: croppedImage), identifier: identifier, perPixelTolerance: 6/256) // tried this number after seeing it higher up in this thread

This fixed most of my failures, but screens that use transparent elements still fail. For screens with transparent elements (where alpha < 1), this is actually passing for me:
FBSnapshotVerifyView(UIImageView(image: croppedImage), identifier: identifier, overallTolerance: 0.02)

Again, these successes are based on a very small sampling of tests that are currently passing. Nearly all screens are being verified with the per pixel value, as we don't use a whole lot of transparency. I'll update this post tomorrow when I know whether or not it's a success.

@JustinDSN Unfortunately, I'm going to have to spend more time on this until I know exactly which values are going to work for us.

The per pixel value of 6/256 is working for most screens, but the overall tolerance is giving me problems. It seems like screens that incorporate a transparency or use the fullscreen modal that was new to iOS 13 must be compared using overall tolerance. I've seen the tests pass locally with an overall tolerance value as low as 0.02 but when I run the tests on CI, even 0.25 isn't enough for one particular screen. (For the others, it's fine.) I'm trying a value of 0.30 on CI right now to see what happens. Even if that passes, it's way too high of a threshold to give me any sort of confidence that this is an effective and worthwhile comparison. (Update: It failed at 0.30 too. Just for that particular screen.)

To make things a little more fun, I do have a screen that definitely uses transparency and passes perfectly with a per pixel comparison. It's nearly identical to that one screen that I can't get a passing screenshot comparison on using per pixel or overall tolerance. (Mentioned in the previous paragraph.) Our CI build machine uses an older version of Xcode than I do - which isn't ideal, of course - so maybe that's part of it?

I'm still working with the same small sampling of tests. I'll scale this and update this thread again as soon as I figure out the floor of this overall tolerance threshold and apply it to more tests.

samskiter · 2020-05-11T12:23:16Z

OOI does going up from 6/256 also fix your issuesd without having to use an overall tolerance?

pigeon56 · 2020-05-13T20:40:15Z

@samskiter It fixed the problem in some places, but not all. Also, my results are based on images that I took from my machine as the reference images. I don't know if the result would change if my colleague would have recently uploaded a reference image.

Despite some successes with the perPixel screen comparison, I'm just not confident in these tests anymore. I've already spent too much time trying to salvage them, and it's just not sustainable. This is a huge let down because they were really valuable to our organization. In fact, we caught a bug yesterday morning in a screen where the comparison is still working.

Short answer to your question: No. At the very least, screens where some form of transparency is used are definitely going to give you problems with a perPixel tolerance measurement.

samskiter · 2020-05-14T11:32:21Z

That's frustrating. We also depend heavily on these!

tirodkar · 2020-06-16T23:31:41Z

Were folks able to find any solution or workaround for this?

david6p2 · 2021-08-04T18:04:23Z

No solution yet to this problem? just add perPixelTolerance: 6 / 256 where transparency, blur, etc. is used? This didn't work for me.

This adds variants of the `SnapshotVerify*(...)` methods that allow for imprecise comparisons, i.e. using the `perPixelTolerance` and `overallTolerance` parameters. ## Why is this necessary? Adding tolerances has been a highly requested feature (see #63) to work around some simulator changes introduced in iOS 13. Historically the simulator has supported CPU-based rendering, giving us very stable image representations of views that we can compare pixel-by-pixel. Unfortunately, with iOS 13, Apple changed the simulator to use exclusively GPU-based rendering, which means that the resulting snapshots may differ slightly across machines (see uber/ios-snapshot-test-case#109). The negative effects of this were mitigated in iOSSnapshotTestCase by adding two tolerances: a **per-pixel tolerance** that controls how close in color two pixels need to be to count as unchanged and an **overall tolerance** that controls what portion of pixels between two images need to be the same (based on the per-pixel calculation) for the images to be considered unchanged. Setting these tolerances to non-zero values enables engineers to record tests on one machine and run them on another without worrying about the tests failing due to differences in GPU rendering.

This adds variants of the `SnapshotVerify*(...)` methods that allow for imprecise comparisons, i.e. using the `perPixelTolerance` and `overallTolerance` parameters. ## Why is this necessary? Adding tolerances has been a highly requested feature (see #63) to work around some simulator changes introduced in iOS 13. Historically the simulator has supported CPU-based rendering, giving us very stable image representations of views that we can compare pixel-by-pixel. Unfortunately, with iOS 13, Apple changed the simulator to use exclusively GPU-based rendering, which means that the resulting snapshots may differ slightly across machines (see uber/ios-snapshot-test-case#109). The negative effects of this were mitigated in iOSSnapshotTestCase by adding two tolerances to snapshot comparisons: a **per-pixel tolerance** that controls how close in color two pixels need to be to count as unchanged and an **overall tolerance** that controls what portion of pixels between two images need to be the same (based on the per-pixel calculation) for the images to be considered unchanged. Setting these tolerances to non-zero values enables engineers to record tests on one machine and run them on another (e.g. record new reference images on their laptop and then run tests on CI) without worrying about the tests failing due to differences in GPU rendering. This is great in theory, but from our testing we've found even the lowest tolerance values to consistently handle GPU differences between machine types let through a significant number of visual regressions. In other words, there is no magic tolerance threshold that avoids false negatives based on GPU rendering and also avoids false positives based on minor visual regressions. This is especially true for accessibility snapshots. To start, tolerances seem to be more reliable when applied to relatively small snapshot images, but accessibility snapshots tend to be fairly large since they include both the view and the legend. Additionally, the text in the legend can change meaningfully and reflect only a small number of pixel changes. For example, I ran a test of full screen snapshot on an iPhone 12 Pro with two columns of legend. Even an overall tolerance of only `0.0001` (0.01%) was enough to let through a regression where one of the elements lost its `.link` trait (represented by the text "Link." appended to the element's description in the snapshot). But this low a tolerance _wasn't_ enough to handle the GPU rendering differences between a MacBook Pro and a Mac Mini. This is a simplified example since it only uses `overallTolerance`, not `perPixelTolerance`, but we've found many similar situations arise even with the combination. Some teams have developed infrastructure to allow snapshots to run on the same hardware consistently and have built a developer process around that infrastructure, but many others have accepted tolerances as a necessity today. ## Why create separate "imprecise" variants? The simplest approach to adding tolerances would be adding the `perPixelTolerance` and `overallTolerance` parameters to the existing snapshot methods, however I feel adding separate methods with an "imprecise" prefix is better in the long run. The naming is motivated by the idea that **it needs to be very obvious when what you're doing might result in unexpected/undesirable behavior**. In other words, when using one of the core snapshot methods, you should have extremely high confidence that a test passing means there's no regressions. When you use an "imprecise" variant, it's up to you to set your confidence levels according to your chosen tolerances. This is similar to the "unsafe" terminology around memory in the Swift API. You should generally feel very confident in the memory safety of your code, but any time you see "unsafe" it's a sign to be extra careful and not gather unwarranted confidence from the compiler. Longer term, I'm hopeful we can find alternative comparison algorithms that allow for GPU rendering differences without opening the door to regressions. We can integrate these into the core snapshot methods as long as they do not introduce opportunities for regressions, or add additional comparison variants to iterate on different approaches.

hramos mentioned this issue Jan 17, 2020

[CI] Snapshot Test reference images differ on Circle CI facebook/react-native#27801

Closed

LeoNatan mentioned this issue May 7, 2020

[feature] Element Screenshot wix/Detox#2012

Closed

NickEntin mentioned this issue Jul 30, 2020

Add support for tolerances in animation snapshot tests cashapp/stagehand#38

Open

NickEntin mentioned this issue Nov 10, 2020

Improve support for snapshotting views with text field cursors to prevent tests from flaking cashapp/AccessibilitySnapshot#15

Closed

AvdLee mentioned this issue Jan 21, 2021

Fix tests, update to use the iPhone 12 simulator. WeTransfer/WeScan#290

Merged

NickEntin mentioned this issue Aug 15, 2023

Add imprecise comparison variants to iOSSnapshotTestCase extension cashapp/AccessibilitySnapshot#144

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test results depends on Mac device #109

Test results depends on Mac device #109

navartis commented Oct 29, 2019

MaikCL commented Nov 19, 2019

mime29 commented Dec 18, 2019

navartis commented Dec 20, 2019

luiz-brasil commented Mar 16, 2020 •

edited

Loading

mime29 commented Mar 19, 2020

navartis commented Mar 19, 2020

samskiter commented Apr 8, 2020 •

edited

Loading

pigeon56 commented Apr 24, 2020

samskiter commented May 5, 2020

luiz-brasil commented May 5, 2020

gspiers commented May 5, 2020

navartis commented May 5, 2020

samskiter commented May 5, 2020 via email

JustinDSN commented May 6, 2020 •

edited

Loading

pigeon56 commented May 6, 2020 •

edited

Loading

pigeon56 commented May 6, 2020 •

edited

Loading

samskiter commented May 11, 2020

pigeon56 commented May 13, 2020 •

edited

Loading

samskiter commented May 14, 2020

tirodkar commented Jun 16, 2020

david6p2 commented Aug 4, 2021 •

edited

Loading

Test results depends on Mac device #109

Test results depends on Mac device #109

Comments

navartis commented Oct 29, 2019

MaikCL commented Nov 19, 2019

mime29 commented Dec 18, 2019

navartis commented Dec 20, 2019

luiz-brasil commented Mar 16, 2020 • edited Loading

mime29 commented Mar 19, 2020

navartis commented Mar 19, 2020

samskiter commented Apr 8, 2020 • edited Loading

pigeon56 commented Apr 24, 2020

samskiter commented May 5, 2020

luiz-brasil commented May 5, 2020

gspiers commented May 5, 2020

navartis commented May 5, 2020

samskiter commented May 5, 2020 via email

JustinDSN commented May 6, 2020 • edited Loading

pigeon56 commented May 6, 2020 • edited Loading

pigeon56 commented May 6, 2020 • edited Loading

samskiter commented May 11, 2020

pigeon56 commented May 13, 2020 • edited Loading

samskiter commented May 14, 2020

tirodkar commented Jun 16, 2020

david6p2 commented Aug 4, 2021 • edited Loading

luiz-brasil commented Mar 16, 2020 •

edited

Loading

samskiter commented Apr 8, 2020 •

edited

Loading

JustinDSN commented May 6, 2020 •

edited

Loading

pigeon56 commented May 6, 2020 •

edited

Loading

pigeon56 commented May 6, 2020 •

edited

Loading

pigeon56 commented May 13, 2020 •

edited

Loading

david6p2 commented Aug 4, 2021 •

edited

Loading