-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: how are people storing their snapshot images? #92
Comments
I guess you could store them on S3 and set |
I've not heard back from many of you, so I've been thinking about this some more, as it seems like it might be a problem that hasn't been solved yet. 😄 Here's an idea that would address this problem. What if, instead of storing the whole image in the snapshot, we instead stored a SHA hash of the image as the snapshot, and uploaded the actual snapshot image to some file store (like S3 or something like it) with the hash as the filename of the image. Then when comparing images, we would look at the new image, take its hash, if the hash matches, then the image is the same, and the assertion passes. If the hashes don't match, we go and download the old image (we know it's file name because we know its hash) and run pixelmatch to generate the difference image. The functionality would be pretty much the same, but the snapshots we'd store in our source control systems would just be the hashes, taking up very little space, and the snapshot files themselves could then be managed and deleted however needed, outside of source control. Any thoughts about this idea? |
@itaylor I see a few problems with that idea, currently you can set a threshold for the image comparison, which would not work with the hash solution. On top of that, even if you take a screenshot and the content is the same, I highly doubt the image will be exactly the same and especially have the same hash! But as I said before I think we can totally integrate S3 as the primary store for the screenshots. I would definitely help to implement it. What do you think? |
I was thinking you'd still be able to do the threshold check on the pixelmatch data, you'd just have to pull the image from the external file store first (but only if the images don't match the hash). I checked and for the way I'm using this library, with puppeteer running inside of docker the raw image data is exactly the same from one screenshot to another, and the hash would be the same if there were no changes. |
@itaylor I’d be curious to get the opinion of some other folks here on that topic. To me it still sounds quite error prone to do this using a hash. |
@kgoedecke So, let me understand better what you're proposing. To me, it sounds like your proposal is that we'd add some config that allows the snapshot images to be persisted in a S3 bucket. When I've made some changes and want to run the tests, I'd run Is that a mostly correct high-level summary of your proposal? If it is, then I think it's missing some things that I'd need to be able to use it. I'm using this in a large project, with many branches and many developers. I need to be able to have multiple people running the code and a shared source of truth for what the correct images are. In my mind, this necessitates there being something written into the filesystem and checked in along with the code that is some sort of manifest that describes which images are the "correct" snapshots to compare against. My idea is to use a hash of the image, as that could potentially avoid even needing to download the image (if it is the same) but the idea could work without a hash so long as the checked in part of the snapshot is the manifest of the images, and the images live somewhere else that can be shared between a large number of developers. |
So this would require a user to be connected to the internet in order to use the matcher? |
@anescobar1991 Yes, any of the various permutations I've described above would require an internet connection to compare vs snapshots. That said, I wouldn't suggest that this would be the only option. For a smallish project with not very many screenshots, just keeping the files in the file system and checking them in with the code works fine, and I think that should continue to work. |
@itaylor I think using an image hash is a great idea - it would be nice to have it as an option anyway. It would flag anything that had changed so you could inspect the image manually, without needing to store images in .git or elsewhere. |
@itaylor if you're screenshot really do have the same hash then it's probably not a bad idea. Do you mind sharing how you currently take the screenshots with Docker? I'm in particular interested in how you do it locally across different developers. |
@kgoedecke, Yes, I'd be glad to share. We're using puppeteer and Jest image Snapshot in an integration testing workflow.
browser = await puppeteer.launch({ ignoreHTTPSErrors: true, args: ['--no-sandbox', '--disable-dev-shm-usage', '--disable-setuid-sandbox', `--window-size=${width},${height}`] });
We run this same setup locally and on our integration test server running in the cloud, and given the same input, it produces the same screenshot output consistently, whether running on developer's Macs using Docker for Mac, or native Linux on our integration test VMs. Other considerations...
|
@itaylor thanks a lot for sharing this, I actually ran into the same problem that you describe in your 2nd bullet point. Seems like I need to improve that step with your guidelines now 👍 🙇 |
@10xLaCroixDrinker Thanks for the tip on LFS! I've been using Git daily for about a decade, but somehow I hadn't heard of LFS, and yes, after a cursory look at the docs, it seems like it should completely solve my issues with being able to effectively store large amounts of large screenshot files in the Git repo! It's seems to be basically the same idea I suggested, using the filename as the hash, uploading the file to a file store and then just storing the hash in the repo, except it's built in to Git and has been fully supported by major Git repositories like Github/Gitlab/Bitbucket for a few years now 😃. I'll give it a try sometime over the next week or so and report back if it works for me. If it does work well, then maybe we could just add a couple of lines to the Readme.md file suggesting using it and we can close this ticket. |
I've done a POC with Git LFS and it works well. Unfortunately, the company I work at uses Gitlab, and we're not on the newest version and there are incompatibilities with the latest Git LFS versions. I was able to work around this issue by using an older version of Git LFS client 2.4.2 instead of the latest 2.5.1. After jumping that hurdle, using Git LFS was pretty much seamless. My screenshots are no longer being stored inside of the repository, but they're still tracked there, and when I |
@itaylor Were you ever able to follow up with a documentation PR? I've just implemented this solution and it seems to be working (yay!). I don't see it in the docs, though... I'd be happy to create a doc PR unless it's already been done and I just can't find it. Some useful information:Important info for husky users trying to integrate Git LFS hooks: typicode/husky#108 (comment) Note RE .gitattributes for binary image files (thx @anescobar1991) plus LFS integration: Add this to your .gitattributes, replacing
@anescobar1991 Would you be open to a doc PR that points people in the direction of using LFS if needed? |
@mattrabe that would be great |
In case anyone has setup LFS but is struggling to get the tests to pass on Azure Pipelines you will need to add the following step (preferably at the beginning of your pipeline). steps:
- checkout: self # self represents the repo where the initial Pipelines YAML file was found
lfs: true # This enables lfs for the job See here for more information. By default, LFS is switched off and so without this update will see the following error messages
and
|
I've been using jest-image-snapshot for a while, and I started out storing my snapshots as checked in assets in git, just like I do with my JS jest snapshots. This works great.
However, now that I've got a few hundred snapshot images and they all get updated semi-frequently as UI changes in the app occur, my Git repo is starting to get pretty huge. The historic screenshot files that are in the git repo going back to my first commits aren't really very useful, but they're taking up a bunch of space.
I'm guessing other people must be dealing with this issue too, how are you solving it? Do use some script to store the snapshots externally from your git repo and then .gitignore them? Do you use
git-filter-branch
to rip out the old snapshots on a periodic schedule? Do you use some other smart way to manage these large binary files?The text was updated successfully, but these errors were encountered: