Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-configuration identification #2124

Merged
merged 3 commits into from
Apr 10, 2023

Conversation

bmorris3
Copy link
Contributor

Description

Standalone and MAST interfaces would benefit from automatically launching the "best" jdaviz helper/configuration for a given data file. This PR is one step down that path, providing a new function for suggesting a helper based on the contents of the file.

The function does the following:

  • if filename ends in asdf, give "imviz" (the only ASDF-supporting helper so far)
  • check WCS/gwcs for a spectral axis
  • check if the file is recognized by astropy.io.registry. If there's only one unique match, return the corresponding helper.
  • if the extension is a fits.BinTableHDU or fits.fitsrec.FITS_rec, look for standard spectral columns and return "specviz" if you find them
  • if there are multiple possible matches in the astropy.io.registry, use the number of world dimensions in the WCS/gwcs to break the ties.
  • raise an error if no unique helper matches

This workflow isn't intended to work for any file. This PR supports FITS files from JWST and HST (with FITS WCS), ASDF-in-FITS from JWST, and files with paths ending in .asdf. Since there aren't perfectly-followed standards for all data products, support for any given data product from one of these missions is not guaranteed. That said, the tests added in this PR cover many of the files we are likely to handle, and all helpers except Mosviz.

Change log entry

  • Is a change log needed? If yes, is it added to CHANGES.rst? If you want to avoid merge conflicts,
    list the proposed change log here for review and add to CHANGES.rst before merge. If no, maintainer
    should add a no-changelog-entry-needed label.

Checklist for package maintainer(s)

This checklist is meant to remind the package maintainer(s) who will review this pull request of some common things to look for. This list is not exhaustive.

  • Are two approvals required? Branch protection rule does not check for the second approval. If a second approval is not necessary, please apply the trivial label.
  • Do the proposed changes actually accomplish desired goals? Also manually run the affected example notebooks, if necessary.
  • Do the proposed changes follow the STScI Style Guides?
  • Are tests added/updated as required? If so, do they follow the STScI Style Guides?
  • Are docs added/updated as required? If so, do they follow the STScI Style Guides?
  • Did the CI pass? If not, are the failures related?
  • Is a milestone set? Set this to bugfix milestone if this is a bug fix and needs to be released ASAP; otherwise, set this to the next major release milestone.
  • After merge, any internal documentations need updating (e.g., JIRA, Innerspace)? 🐱

@bmorris3 bmorris3 added the MAST label Mar 31, 2023
@bmorris3 bmorris3 added this to the 3.5 milestone Mar 31, 2023
@github-actions
Copy link

Have you ever questioned the nature of your reality?

(A special day message.)

@bmorris3 bmorris3 marked this pull request as ready for review March 31, 2023 19:55
@codecov
Copy link

codecov bot commented Mar 31, 2023

Codecov Report

Patch coverage: 84.12% and project coverage change: -0.04 ⚠️

Comparison is base (e7330a0) 91.96% compared to head (7b8d713) 91.93%.

❗ Current head 7b8d713 differs from pull request most recent head 2027a5a. Consider uploading reports for the commit 2027a5a to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2124      +/-   ##
==========================================
- Coverage   91.96%   91.93%   -0.04%     
==========================================
  Files         146      146              
  Lines       15911    15972      +61     
==========================================
+ Hits        14633    14684      +51     
- Misses       1278     1288      +10     
Impacted Files Coverage Δ
jdaviz/core/data_formats.py 87.62% <84.12%> (-6.82%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Member

@kecnry kecnry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Planning ahead for eventual support of viz = jdaviz.open(filename/data)... should this handle accepting the data directly as well (a specutils object, for example)? That's fine if its out-of-scope for now, I'm just wondering if that would work into this basic design or if it would require some refactoring/renaming?

Do we prefer the capitalized version of the config (Imviz vs imviz) to match the class names?

@bmorris3
Copy link
Contributor Author

bmorris3 commented Apr 3, 2023

Planning ahead for eventual support of viz = jdaviz.open(filename/data)... should this handle accepting the data directly as well (a specutils object, for example)?

It very well could! There's only very minimal logic to extract the meaningful properties of the file in this snippet here. We could expand this to extract header from Spectrum1D.meta['header'], etc.

Do we prefer the capitalized version of the config (Imviz vs imviz) to match the class names?

Or even better, do we prefer to return cls (Imviz)?

Copy link
Collaborator

@rosteen rosteen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this with a couple non-JWST files I have lying around from old testing, and it successfully identified Specviz for a 1D spectrum, but failed to identify any config for the MANGA cube that we used to use for testing Cubeviz. Might be worth taking a look at that and seeing if there's a simple way to get it to work that might cover a decent number of other mission formats. I can send you that file if you don't have it.

Comment on lines +131 to +133
try:
with asdf_in_fits.open(filename) as af:
wcs = af.tree['meta']['wcs']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me vaguely unhappy, but I don't know of another way to check for this other than trying to open it, so...not a change request, more of a change wish 😆

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this. I thought about something like

     with asdf_in_fits.open(filename) as af:
         meta = getattr(af.tree, 'meta', fits.getheader(filename))

but thought that was too obfuscated.

Copy link
Member

@kecnry kecnry Apr 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually like that idea and since its a key and not an attribute, it might be slightly less ugly (although on second thought you will need two layers of dictionaries which makes it a bit ugly again 😬 ). This entire _get_wcs function could be removed and

header = fits.getheader(filename, ext=ext)
data = fits.getdata(filename, ext=ext)
wcs = _get_wcs(filename, header)

replaced with:

data = fits.getdata(filename, ext=ext)
with asdf_in_fits.open(filename) as af:
    meta = af.tree.get('meta', {}).get('wcs', WCS(fits.getheader(filename, ext=ext)))

which also avoids having to parse the header just to have on hand for the fallback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree @kecnry that this is a better way to do it, but even I have trouble reading it. 🤔

Copy link
Contributor

@cshanahan1 cshanahan1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i couldn't find anything to nitpick so i will just leave a 'looks good to me!'

@kecnry
Copy link
Member

kecnry commented Apr 4, 2023

Let's consider jdaviz.open a follow-up effort for now. I do think it would be useful to support parsing data products directly here though if that isn't too much extra effort.

re returning the classes themselves, that might be overhead for the mast use-case, and jdaviz.open should pretty easily be able to find the classes. But we can always tweak that behavior and who is responsible for finding the class when doing that effort.

Comment on lines +219 to +221
'wave',
'flux'
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry about hard-coding these kinds of column names, but can't think of a more general way to do this. I suggested we could try to auto-identify wavelength or flux columns based on any attached units on the column. Maybe that's out of scope for this PR though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of scope for this PR though.

I think so. Let's get v0.1 done and iterate.

Comment on lines +242 to +246
recognized_spectrum_format.find('s3d') > -1):
return 'cubeviz'
elif (n_axes == 2 and
recognized_spectrum_format.find('x1d') > -1):
return 'specviz'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's worth generalizing these now to something non-JWST specific. For example, this function already fails with MaNGA cubes files, which is maybe a tad bittersweet since Jdaviz was originally developed against those files. :( And for MAST to switch to this function, we'd need this generalized before we can start any work on our end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that there's a better way to do this! Stepping away for a few days and seeing this comment makes it clearer.

This PR relies on astropy.io.registry to auto-identify file types, and then when that fails, uses heuristics to make a guess. The astropy registry knows some JWST files without a problem, so I didn't need to use something else.

But there is already a registry which identifies JWST, MaNGA, and many others: the specutils registry. The easy solution to all this is to also try the specutils registry. I'll do this today! 🤦🏻

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Support for MaNGA introduced in 7b8d713.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. The MaNGA cube and rss files do seem to now correctly identify as cubeviz and specviz, which is great. Since JWST formats are covered by the identify_data function, it's not clear to me why we're still explicitly checking on the filename suffix at lines 241-246?

@bmorris3
Copy link
Contributor Author

@kecnry – any thoughts on this option?

Or even better, do we prefer to return cls (Imviz)?

@kecnry
Copy link
Member

kecnry commented Apr 10, 2023

Or even better, do we prefer to return cls (Imviz)?

Unless it would introduce unnecessary overhead, I'd vote to stick with the string for now and then have jdaviz.open handle retrieving the class... but not a strong preference either way.

@bmorris3 bmorris3 requested a review from havok2063 April 10, 2023 14:39
@bmorris3 bmorris3 force-pushed the auto-detect-config branch from 7b8d713 to 2027a5a Compare April 10, 2023 15:13
Copy link
Collaborator

@havok2063 havok2063 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve this. I think it's in good shape now but there are some refinements that can be made to make this more robust and general.

Comment on lines +242 to +246
recognized_spectrum_format.find('s3d') > -1):
return 'cubeviz'
elif (n_axes == 2 and
recognized_spectrum_format.find('x1d') > -1):
return 'specviz'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. The MaNGA cube and rss files do seem to now correctly identify as cubeviz and specviz, which is great. Since JWST formats are covered by the identify_data function, it's not clear to me why we're still explicitly checking on the filename suffix at lines 241-246?

@bmorris3
Copy link
Contributor Author

bmorris3 commented Apr 10, 2023

Thanks all! I'll write up Jira tickets for the remaining improvements that we labeled out-of-scope for this PR.

@bmorris3 bmorris3 merged commit e653210 into spacetelescope:main Apr 10, 2023
@duytnguyendtn duytnguyendtn mentioned this pull request Jun 15, 2023
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants