Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for EPS, PDF, and SVG image comparison #194

Merged
merged 20 commits into from
Apr 1, 2023

Conversation

astrofrog
Copy link
Collaborator

@astrofrog astrofrog commented Mar 30, 2023

This makes it possible to compare EPS, PDF and SVG output. This doesn't add any new parameters, instead it relies on checking savefig_kwargs['format'], which I think is reasonable?

I haven't yet checked how this works with the hash library approach, will do so shortly.

@astrofrog
Copy link
Collaborator Author

@ConorMacBride @Cadair - this mostly works except that I can't get the hash comparison to work because the hashes for the vector graphics files are not deterministic (e.g. in SVG files the date changes and also some of the hrefs). Should we convert to PNG and then compute a hash?

@astrofrog
Copy link
Collaborator Author

Also the docs failure seems unrelated?

@ConorMacBride
Copy link
Member

I pass metadata={"Creator": None, "Producer": None, "CreationDate": None} to savefig for reproducible PDFs. I'm not sure what works for EPS. The SVG metadata can be fixed also, but I remember there were some randomly generated ids (the hrefs I think?) which I couldn't find a way to fix. So we may need to convert SVG to PNG.

Also, PDF and EPS won't work with the HTML summaries. We could convert them to PNG after testing or just link to them instead. The diff image will always be PNG though, which might be enough to display.

@ConorMacBride
Copy link
Member

Also, we should get the default format from savefig_kwargs['format'] but then get it from the current backend if not set. E.g. someone could pass backend='pdf' to the decorator instead of savefig_kwargs. I wonder if the backend object has an easy way to determine the extension it'll use.

@astrofrog
Copy link
Collaborator Author

@ConorMacBride - for detecting the format, I wonder if it would make sense to actually restrict ourselves to supporting savefig_kwargs and no other mechanism - just because someone's default backend/format (e.g. in matplotlibrc) is SVG doesn't mean that if they run e.g. the astropy test suite we should generate SVGs there? I think pytest-mpl should always default to PNG unless overriden in savefig_kwargs? (and we can document this).

For the hashes, I wonder if it is desirable to have a different behavior for e.g. SVG and PDF or if we should simply always convert to PNG?

@astrofrog
Copy link
Collaborator Author

Also, PDF and EPS won't work with the HTML summaries. We could convert them to PNG after testing or just link to them instead. The diff image will always be PNG though, which might be enough to display.

The HTML summary does seem to work without modifications - it seems to always show PNGs for the before/diff/images - could you double check to see if you agree?

@astrofrog
Copy link
Collaborator Author

Ok so latest status:

  • For SVG, EPS and PDF we convert to PNG before computing hashes
  • I don't think we should look at the backend or rc parameters to determine the format to use, see here
  • The HTML summary seems to just work as I think it's using PNG versions of the figures even for vector graphics
  • Some of the hash tests fail on Mac and Windows, is it acceptable to just test this on Linux? I think the main reason for the difference is that on Mac the ghostscript version might be different, and on Windows we don't have a hash library for the right Matplotlib/Freetype combo

@astrofrog
Copy link
Collaborator Author

@ConorMacBride @Cadair I think this is actually ready for a first review

@astrofrog astrofrog requested review from ConorMacBride and Cadair and removed request for ConorMacBride March 30, 2023 23:07
@astrofrog
Copy link
Collaborator Author

I investigated a little more whether we could avoid converting the figures to PNG for the hashes, and here's what I've found:

  • For PDF files we can indeed get reproducible hashes as described here
  • For EPS files the issue is that the Title of the EPS file is the filename, so in principle we might be ok as long as the filename of the generated files in the tests never changes which I think is correct, however it would be better if one could override this (but not a deal-breaker)
  • For SVG files there are random hrefs that change each time the file is generated but we can set the hashsalt rcParam to make this reproducible.

I think we should therefore be able to compute the hashes from the native files which would be preferable - will try and push a commit shortly.

@astrofrog
Copy link
Collaborator Author

Ok well that escalated quickly but I think it's all done and ready for review!

Just to be clear, hashes are now computed using the raw files, not converted to PNG.

I wonder if we might want to add a deterministic=True kwarg for mpl_image_compare that would hide the need to have to set the savefig kwargs etc? I'd rather do it in a separate PR, but just raising the idea here.

The docs build is failing and that made me realise there are actual docs separate from the README which is what I updated. What is the plan going forward with this, will we just have the docs and not all the stuff in the README? (as it might get out of sync).

Copy link
Member

@ConorMacBride ConorMacBride left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is brilliant! I am seeing issues with vector images in the HTML summaries. The diff image has the wrong filename and EPS files are not displayed. And deterministic=True would be useful but in another PR.

actual_path=test_image,
actual_shape=actual_shape)
summary['status_msg'] = error_message
return error_message

results = compare_images(str(baseline_image), str(test_image), tol=tolerance, in_decorator=True)
summary['tolerance'] = tolerance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran a simple test with PDF, EPS, and SVG and generated the HTML summary. The baseline and result images are the original vector format but the diff image is a PNG. On my browser (Safari) PDF works inside an <img>, but EPS does not. SVG should be compatible with all browsers. I think we should show the PNG versions for the baseline and result images for PDF and EPS.

All these files are available for PDF for example: result.pdf baseline_pdf.png baseline.pdf result_pdf-failed-diff.png result_pdf.png

We could use something like this to get the file name for the images:

def _filename(self, item, image_type):
    ext = self._file_extension(item)
    if image_type == 'result':
        if ext == 'png':
            return 'result-failed-diff.png'
        return f'result_{ext}-failed-diff.png'
    if ext == 'svg':
        return f'{image_type}.svg'
    return f'{image_type}_{ext}.png'

@astrofrog
Copy link
Collaborator Author

astrofrog commented Mar 31, 2023

@ConorMacBride - I think I've fixed the summary. I originally started using the function you wrote but then realised that we can simplify this by just using the results object from compare_image to give us the PNG names when we want to use PNGs. This seems to work for me now - does it work for you?

EDIT: ok I broke the tests, will investigate more

Comment on lines 523 to 524
summary['result_image'] = test_image.relative_to(self.results_dir).as_posix()
summary['baseline_image'] = baseline_image.relative_to(self.results_dir).as_posix()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move these lines back to were they were above and just have a if ext not in ['png', 'svg'] for the two lines below? There are a couple of failure return points these lines have moved past. In SunPy, the baseline images are not committed when a new test is added (they are stored in a separate repo). So the result_image needs to be set so we can see an image for new tests in PRs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should fix the tests

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha I just did that and then read your comment :)

pytest_mpl/plugin.py Outdated Show resolved Hide resolved
Copy link
Member

@ConorMacBride ConorMacBride left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready. I can migrate the docs in a separate PR.

@astrofrog
Copy link
Collaborator Author

Thanks! I'll go ahead and merge this and can try and open a PR later on for adding the 'deterministic' option

@astrofrog astrofrog merged commit df3a8fe into matplotlib:main Apr 1, 2023
This was linked to issues Apr 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow SVG for image comparison Support non-PNG extensions
2 participants