Auto-configuration identification #2124

bmorris3 · 2023-03-31T19:44:22Z

Description

Standalone and MAST interfaces would benefit from automatically launching the "best" jdaviz helper/configuration for a given data file. This PR is one step down that path, providing a new function for suggesting a helper based on the contents of the file.

The function does the following:

if filename ends in asdf, give "imviz" (the only ASDF-supporting helper so far)
check WCS/gwcs for a spectral axis
check if the file is recognized by astropy.io.registry. If there's only one unique match, return the corresponding helper.
if the extension is a fits.BinTableHDU or fits.fitsrec.FITS_rec, look for standard spectral columns and return "specviz" if you find them
if there are multiple possible matches in the astropy.io.registry, use the number of world dimensions in the WCS/gwcs to break the ties.
raise an error if no unique helper matches

This workflow isn't intended to work for any file. This PR supports FITS files from JWST and HST (with FITS WCS), ASDF-in-FITS from JWST, and files with paths ending in .asdf. Since there aren't perfectly-followed standards for all data products, support for any given data product from one of these missions is not guaranteed. That said, the tests added in this PR cover many of the files we are likely to handle, and all helpers except Mosviz.

Change log entry

Is a change log needed? If yes, is it added to CHANGES.rst? If you want to avoid merge conflicts,
list the proposed change log here for review and add to CHANGES.rst before merge. If no, maintainer
should add a no-changelog-entry-needed label.

Checklist for package maintainer(s)

This checklist is meant to remind the package maintainer(s) who will review this pull request of some common things to look for. This list is not exhaustive.

Are two approvals required? Branch protection rule does not check for the second approval. If a second approval is not necessary, please apply the trivial label.
Do the proposed changes actually accomplish desired goals? Also manually run the affected example notebooks, if necessary.
Do the proposed changes follow the STScI Style Guides?
Are tests added/updated as required? If so, do they follow the STScI Style Guides?
Are docs added/updated as required? If so, do they follow the STScI Style Guides?
Did the CI pass? If not, are the failures related?
Is a milestone set? Set this to bugfix milestone if this is a bug fix and needs to be released ASAP; otherwise, set this to the next major release milestone.
After merge, any internal documentations need updating (e.g., JIRA, Innerspace)? 🐱

github-actions · 2023-03-31T19:44:38Z

Have you ever questioned the nature of your reality?

(A special day message.)

codecov · 2023-03-31T19:58:01Z

Codecov Report

Patch coverage: 84.12% and project coverage change: -0.04 ⚠️

Comparison is base (e7330a0) 91.96% compared to head (7b8d713) 91.93%.

❗ Current head 7b8d713 differs from pull request most recent head 2027a5a. Consider uploading reports for the commit 2027a5a to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2124      +/-   ##
==========================================
- Coverage   91.96%   91.93%   -0.04%     
==========================================
  Files         146      146              
  Lines       15911    15972      +61     
==========================================
+ Hits        14633    14684      +51     
- Misses       1278     1288      +10

Impacted Files	Coverage Δ
jdaviz/core/data_formats.py	`87.62% <84.12%> (-6.82%)`	⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

kecnry

Planning ahead for eventual support of viz = jdaviz.open(filename/data)... should this handle accepting the data directly as well (a specutils object, for example)? That's fine if its out-of-scope for now, I'm just wondering if that would work into this basic design or if it would require some refactoring/renaming?

Do we prefer the capitalized version of the config (Imviz vs imviz) to match the class names?

bmorris3 · 2023-04-03T19:27:31Z

Planning ahead for eventual support of viz = jdaviz.open(filename/data)... should this handle accepting the data directly as well (a specutils object, for example)?

It very well could! There's only very minimal logic to extract the meaningful properties of the file in this snippet here. We could expand this to extract header from Spectrum1D.meta['header'], etc.

Do we prefer the capitalized version of the config (Imviz vs imviz) to match the class names?

Or even better, do we prefer to return cls (Imviz)?

rosteen

I tried this with a couple non-JWST files I have lying around from old testing, and it successfully identified Specviz for a 1D spectrum, but failed to identify any config for the MANGA cube that we used to use for testing Cubeviz. Might be worth taking a look at that and seeing if there's a simple way to get it to work that might cover a decent number of other mission formats. I can send you that file if you don't have it.

rosteen · 2023-04-04T13:21:27Z

jdaviz/core/data_formats.py

+    try:
+        with asdf_in_fits.open(filename) as af:
+            wcs = af.tree['meta']['wcs']


This makes me vaguely unhappy, but I don't know of another way to check for this other than trying to open it, so...not a change request, more of a change wish 😆

I feel this. I thought about something like

with asdf_in_fits.open(filename) as af: meta = getattr(af.tree, 'meta', fits.getheader(filename))

but thought that was too obfuscated.

I actually like that idea and since its a key and not an attribute, it might be slightly less ugly (although on second thought you will need two layers of dictionaries which makes it a bit ugly again 😬 ). This entire _get_wcs function could be removed and

header = fits.getheader(filename, ext=ext) data = fits.getdata(filename, ext=ext) wcs = _get_wcs(filename, header)

replaced with:

data = fits.getdata(filename, ext=ext) with asdf_in_fits.open(filename) as af: meta = af.tree.get('meta', {}).get('wcs', WCS(fits.getheader(filename, ext=ext)))

which also avoids having to parse the header just to have on hand for the fallback.

I agree @kecnry that this is a better way to do it, but even I have trouble reading it. 🤔

cshanahan1

i couldn't find anything to nitpick so i will just leave a 'looks good to me!'

kecnry · 2023-04-04T15:11:24Z

Let's consider jdaviz.open a follow-up effort for now. I do think it would be useful to support parsing data products directly here though if that isn't too much extra effort.

re returning the classes themselves, that might be overhead for the mast use-case, and jdaviz.open should pretty easily be able to find the classes. But we can always tweak that behavior and who is responsible for finding the class when doing that effort.

havok2063 · 2023-04-04T15:16:06Z

jdaviz/core/data_formats.py

+            'wave',
+            'flux'
+        ]


I worry about hard-coding these kinds of column names, but can't think of a more general way to do this. I suggested we could try to auto-identify wavelength or flux columns based on any attached units on the column. Maybe that's out of scope for this PR though.

out of scope for this PR though.

I think so. Let's get v0.1 done and iterate.

havok2063 · 2023-04-04T15:20:38Z

jdaviz/core/data_formats.py

+                    recognized_spectrum_format.find('s3d') > -1):
+                return 'cubeviz'
+            elif (n_axes == 2 and
+                  recognized_spectrum_format.find('x1d') > -1):
+                return 'specviz'


I wonder if it's worth generalizing these now to something non-JWST specific. For example, this function already fails with MaNGA cubes files, which is maybe a tad bittersweet since Jdaviz was originally developed against those files. :( And for MAST to switch to this function, we'd need this generalized before we can start any work on our end.

I agree that there's a better way to do this! Stepping away for a few days and seeing this comment makes it clearer.

This PR relies on astropy.io.registry to auto-identify file types, and then when that fails, uses heuristics to make a guess. The astropy registry knows some JWST files without a problem, so I didn't need to use something else.

But there is already a registry which identifies JWST, MaNGA, and many others: the specutils registry. The easy solution to all this is to also try the specutils registry. I'll do this today! 🤦🏻

Support for MaNGA introduced in 7b8d713.

Thanks for the update. The MaNGA cube and rss files do seem to now correctly identify as cubeviz and specviz, which is great. Since JWST formats are covered by the identify_data function, it's not clear to me why we're still explicitly checking on the filename suffix at lines 241-246?

bmorris3 · 2023-04-10T14:28:17Z

@kecnry – any thoughts on this option?

Or even better, do we prefer to return cls (Imviz)?

kecnry · 2023-04-10T14:33:17Z

Or even better, do we prefer to return cls (Imviz)?

Unless it would introduce unnecessary overhead, I'd vote to stick with the string for now and then have jdaviz.open handle retrieving the class... but not a strong preference either way.

havok2063

I approve this. I think it's in good shape now but there are some refinements that can be made to make this more robust and general.

havok2063 · 2023-04-10T15:44:37Z

jdaviz/core/data_formats.py

+                    recognized_spectrum_format.find('s3d') > -1):
+                return 'cubeviz'
+            elif (n_axes == 2 and
+                  recognized_spectrum_format.find('x1d') > -1):
+                return 'specviz'


Thanks for the update. The MaNGA cube and rss files do seem to now correctly identify as cubeviz and specviz, which is great. Since JWST formats are covered by the identify_data function, it's not clear to me why we're still explicitly checking on the filename suffix at lines 241-246?

bmorris3 · 2023-04-10T15:58:52Z

Thanks all! I'll write up Jira tickets for the remaining improvements that we labeled out-of-scope for this PR.

bmorris3 added the MAST label Mar 31, 2023

bmorris3 added this to the 3.5 milestone Mar 31, 2023

bmorris3 marked this pull request as ready for review March 31, 2023 19:55

bmorris3 requested review from duytnguyendtn, rosteen, javerbukh, pllim, kecnry, haticekaratay and cshanahan1 as code owners March 31, 2023 19:55

kecnry reviewed Apr 3, 2023

View reviewed changes

rosteen reviewed Apr 4, 2023

View reviewed changes

cshanahan1 approved these changes Apr 4, 2023

View reviewed changes

havok2063 reviewed Apr 4, 2023

View reviewed changes

bmorris3 requested a review from havok2063 April 10, 2023 14:39

bmorris3 added 3 commits April 10, 2023 11:13

tests for auto-config identification

a67ec38

adding changelog

fa65b0d

adding support for manga cubes

2027a5a

bmorris3 force-pushed the auto-detect-config branch from 7b8d713 to 2027a5a Compare April 10, 2023 15:13

havok2063 approved these changes Apr 10, 2023

View reviewed changes

bmorris3 merged commit e653210 into spacetelescope:main Apr 10, 2023

duytnguyendtn mentioned this pull request Jun 1, 2023

Utilize Jdaviz.open for the CLI notebook #2233

Closed

9 tasks

duytnguyendtn mentioned this pull request Jun 15, 2023

Jdaviz Launcher #2257

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-configuration identification #2124

Auto-configuration identification #2124

bmorris3 commented Mar 31, 2023

github-actions bot commented Mar 31, 2023

codecov bot commented Mar 31, 2023 •

edited

Loading

kecnry left a comment

bmorris3 commented Apr 3, 2023

rosteen left a comment

rosteen Apr 4, 2023

bmorris3 Apr 4, 2023

kecnry Apr 4, 2023 •

edited

Loading

bmorris3 Apr 10, 2023

cshanahan1 left a comment

kecnry commented Apr 4, 2023

havok2063 Apr 4, 2023

bmorris3 Apr 10, 2023

havok2063 Apr 4, 2023

bmorris3 Apr 10, 2023

bmorris3 Apr 10, 2023

havok2063 Apr 10, 2023

bmorris3 commented Apr 10, 2023

kecnry commented Apr 10, 2023

havok2063 left a comment

havok2063 Apr 10, 2023

bmorris3 commented Apr 10, 2023 •

edited

Loading

Auto-configuration identification #2124

Auto-configuration identification #2124

Conversation

bmorris3 commented Mar 31, 2023

Description

Change log entry

Checklist for package maintainer(s)

github-actions bot commented Mar 31, 2023

codecov bot commented Mar 31, 2023 • edited Loading

Codecov Report

kecnry left a comment

Choose a reason for hiding this comment

bmorris3 commented Apr 3, 2023

rosteen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kecnry Apr 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cshanahan1 left a comment

Choose a reason for hiding this comment

kecnry commented Apr 4, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bmorris3 commented Apr 10, 2023

kecnry commented Apr 10, 2023

havok2063 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bmorris3 commented Apr 10, 2023 • edited Loading

codecov bot commented Mar 31, 2023 •

edited

Loading

kecnry Apr 4, 2023 •

edited

Loading

bmorris3 commented Apr 10, 2023 •

edited

Loading