Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: how are people storing their snapshot images? #92

Closed
itaylor opened this issue Jul 26, 2018 · 18 comments
Closed

Question: how are people storing their snapshot images? #92

itaylor opened this issue Jul 26, 2018 · 18 comments
Labels

Comments

@itaylor
Copy link

itaylor commented Jul 26, 2018

I've been using jest-image-snapshot for a while, and I started out storing my snapshots as checked in assets in git, just like I do with my JS jest snapshots. This works great.

However, now that I've got a few hundred snapshot images and they all get updated semi-frequently as UI changes in the app occur, my Git repo is starting to get pretty huge. The historic screenshot files that are in the git repo going back to my first commits aren't really very useful, but they're taking up a bunch of space.

I'm guessing other people must be dealing with this issue too, how are you solving it? Do use some script to store the snapshots externally from your git repo and then .gitignore them? Do you use git-filter-branch to rip out the old snapshots on a periodic schedule? Do you use some other smart way to manage these large binary files?

@kgoedecke
Copy link

kgoedecke commented Jul 27, 2018

I guess you could store them on S3 and set customSnapshotsDir to the S3 bucket URL. I haven't tried this, but maybe it actually works.

@itaylor
Copy link
Author

itaylor commented Aug 3, 2018

I've not heard back from many of you, so I've been thinking about this some more, as it seems like it might be a problem that hasn't been solved yet. 😄

Here's an idea that would address this problem.
Right now we do a pixelmatch for each image, comparing the stored snapshot image which lives in the file system (presumably checked in to source control) with the new image.

What if, instead of storing the whole image in the snapshot, we instead stored a SHA hash of the image as the snapshot, and uploaded the actual snapshot image to some file store (like S3 or something like it) with the hash as the filename of the image. Then when comparing images, we would look at the new image, take its hash, if the hash matches, then the image is the same, and the assertion passes. If the hashes don't match, we go and download the old image (we know it's file name because we know its hash) and run pixelmatch to generate the difference image. The functionality would be pretty much the same, but the snapshots we'd store in our source control systems would just be the hashes, taking up very little space, and the snapshot files themselves could then be managed and deleted however needed, outside of source control.

Any thoughts about this idea?

@kgoedecke
Copy link

kgoedecke commented Aug 3, 2018

@itaylor I see a few problems with that idea, currently you can set a threshold for the image comparison, which would not work with the hash solution. On top of that, even if you take a screenshot and the content is the same, I highly doubt the image will be exactly the same and especially have the same hash! But as I said before I think we can totally integrate S3 as the primary store for the screenshots. I would definitely help to implement it. What do you think?

@itaylor
Copy link
Author

itaylor commented Aug 3, 2018

I was thinking you'd still be able to do the threshold check on the pixelmatch data, you'd just have to pull the image from the external file store first (but only if the images don't match the hash). I checked and for the way I'm using this library, with puppeteer running inside of docker the raw image data is exactly the same from one screenshot to another, and the hash would be the same if there were no changes.

@kgoedecke
Copy link

@itaylor I’d be curious to get the opinion of some other folks here on that topic. To me it still sounds quite error prone to do this using a hash.

@itaylor
Copy link
Author

itaylor commented Aug 6, 2018

@kgoedecke So, let me understand better what you're proposing. To me, it sounds like your proposal is that we'd add some config that allows the snapshot images to be persisted in a S3 bucket. When I've made some changes and want to run the tests, I'd run jest, and it would go out and download everything in that bucket and then use those as the snapshots using the exact same diffing mechanism that current code does. If I run jest -u when the run is finished it would push the modified images to the S3 bucket, wiping out whatever was there previously.

Is that a mostly correct high-level summary of your proposal?

If it is, then I think it's missing some things that I'd need to be able to use it. I'm using this in a large project, with many branches and many developers. I need to be able to have multiple people running the code and a shared source of truth for what the correct images are. In my mind, this necessitates there being something written into the filesystem and checked in along with the code that is some sort of manifest that describes which images are the "correct" snapshots to compare against. My idea is to use a hash of the image, as that could potentially avoid even needing to download the image (if it is the same) but the idea could work without a hash so long as the checked in part of the snapshot is the manifest of the images, and the images live somewhere else that can be shared between a large number of developers.

@anescobar1991
Copy link
Member

So this would require a user to be connected to the internet in order to use the matcher?

@itaylor
Copy link
Author

itaylor commented Aug 7, 2018

@anescobar1991 Yes, any of the various permutations I've described above would require an internet connection to compare vs snapshots. That said, I wouldn't suggest that this would be the only option. For a smallish project with not very many screenshots, just keeping the files in the file system and checking them in with the code works fine, and I think that should continue to work.

@bburns
Copy link

bburns commented Aug 12, 2018

@itaylor I think using an image hash is a great idea - it would be nice to have it as an option anyway. It would flag anything that had changed so you could inspect the image manually, without needing to store images in .git or elsewhere.

@kgoedecke
Copy link

@itaylor if you're screenshot really do have the same hash then it's probably not a bad idea. Do you mind sharing how you currently take the screenshots with Docker? I'm in particular interested in how you do it locally across different developers.

@itaylor
Copy link
Author

itaylor commented Aug 17, 2018

@kgoedecke, Yes, I'd be glad to share.

We're using puppeteer and Jest image Snapshot in an integration testing workflow.
There are a few pieces to it:

  • A docker-compose file that runs the whole application and all its dependencies. This is the same setup that developers use to run the application for development purposes, and doesn't include the integration tests.
  • A separate repository for the integration tests, with a Dockerfile in its root. The Dockerfile started based on the guide from the puppeteer docs, but I removed the puppeteer user management part of the Dockerfile, as it caused file permission problems with mounted files. Instead I launch puppeteer with the --no-sandbox commands. It also uses a different entrypoint, one that I've created that runs yarn && yarn run test
  • The tests themselves are just Jest tests. It's basically configured using the Jest docs with an important change to how puppeteer is launched that disables some chrome features that don't play well with docker, and which sets the window sizes to constant values:
browser = await puppeteer.launch({ ignoreHTTPSErrors: true, args: ['--no-sandbox', '--disable-dev-shm-usage', '--disable-setuid-sandbox', `--window-size=${width},${height}`] });
  • A script that removes any existing database containers (so they start from their clean container image), starts the application via docker-compose, waits for the started containers to be healthy, then does a docker run -it on the docker image for the integration tests with a volume mount to the local test directory. After everything's started this will end up running the yarn run test command, which starts Jest, which runs the tests inside of the docker container, using the headless Chrome that is also running inside of the same Docker container.

We run this same setup locally and on our integration test server running in the cloud, and given the same input, it produces the same screenshot output consistently, whether running on developer's Macs using Docker for Mac, or native Linux on our integration test VMs.

Other considerations...

  • Dates/times in the application:
    If your application displays dates and times, the screenshots will be different on every run. We run all of our date/time formatting through a few utility functions. These functions have a mode that we can control when starting the application that makes them print the date format string instead formatting the date, allowing screenshots taken at different times to still match pixel for pixel.

@10xLaCroixDrinker
Copy link
Member

@itaylor have you considered using LFS to manage your image snapshots?

@kgoedecke
Copy link

@itaylor thanks a lot for sharing this, I actually ran into the same problem that you describe in your 2nd bullet point. Seems like I need to improve that step with your guidelines now 👍 🙇

@itaylor
Copy link
Author

itaylor commented Aug 19, 2018

@10xLaCroixDrinker Thanks for the tip on LFS!

I've been using Git daily for about a decade, but somehow I hadn't heard of LFS, and yes, after a cursory look at the docs, it seems like it should completely solve my issues with being able to effectively store large amounts of large screenshot files in the Git repo!

It's seems to be basically the same idea I suggested, using the filename as the hash, uploading the file to a file store and then just storing the hash in the repo, except it's built in to Git and has been fully supported by major Git repositories like Github/Gitlab/Bitbucket for a few years now 😃. I'll give it a try sometime over the next week or so and report back if it works for me. If it does work well, then maybe we could just add a couple of lines to the Readme.md file suggesting using it and we can close this ticket.

@itaylor
Copy link
Author

itaylor commented Aug 27, 2018

I've done a POC with Git LFS and it works well. Unfortunately, the company I work at uses Gitlab, and we're not on the newest version and there are incompatibilities with the latest Git LFS versions. I was able to work around this issue by using an older version of Git LFS client 2.4.2 instead of the latest 2.5.1. After jumping that hurdle, using Git LFS was pretty much seamless. My screenshots are no longer being stored inside of the repository, but they're still tracked there, and when I git checkout a branch it goes and fetches them then. When I get a chance I'll make a PR to the docs of this project recommending using Git LFS for those concerned with large repositories.

@mattrabe
Copy link

mattrabe commented May 17, 2019

@itaylor Were you ever able to follow up with a documentation PR? I've just implemented this solution and it seems to be working (yay!). I don't see it in the docs, though... I'd be happy to create a doc PR unless it's already been done and I just can't find it.

Some useful information:

Important info for husky users trying to integrate Git LFS hooks: typicode/husky#108 (comment)

Note RE .gitattributes for binary image files (thx @anescobar1991) plus LFS integration: Add this to your .gitattributes, replacing **/__image_snapshots__/*.* with a pattern that matches all of your image screenshots:

**/__image_snapshots__/*.* binary
**/__image_snapshots__/*.* filter=lfs diff=lfs merge=lfs -text

@anescobar1991 Would you be open to a doc PR that points people in the direction of using LFS if needed?

@10xLaCroixDrinker
Copy link
Member

@mattrabe that would be great

@ifiokjr
Copy link

ifiokjr commented Jul 4, 2019

In case anyone has setup LFS but is struggling to get the tests to pass on Azure Pipelines you will need to add the following step (preferably at the beginning of your pipeline).

steps:
- checkout: self  # self represents the repo where the initial Pipelines YAML file was found
  lfs: true # This enables lfs for the job

See here for more information.

By default, LFS is switched off and so without this update will see the following error messages

Error running image diff.

and

Error: Invalid file signature

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants