Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aperture photometry batch mode API #2401

Merged
merged 6 commits into from
Sep 15, 2023

Conversation

kecnry
Copy link
Member

@kecnry kecnry commented Aug 29, 2023

Description

This pull request implements the (not-yet public*) plugin API for batch processing of aperture photometry. The main method (plugin.batch_aper_phot) takes a list of dictionaries with traitlets names and values as inputs, loops over the list, sets the traitlets, and calls the existing internal method to compute aperture photometry. The results are then available in the results table (from the plugin UI or API, once its public) and the state of the plugin, including the plot, is left based on the last iteration from that list. Any value that is not provided in a dictionary will be adopted from the "current" state at the time (either the original plugin state or as it was left from the previous iteration). This is something we may want to reconsider - perhaps restoring to the original state after each iteration, so that an override in the first entry isn't then adopted in the second if not explicitly provided again.

* none of these methods are made public via this API since there is no exposed user API for aperture photometry yet and the UI implementation is scheduled for follow-up work. When we do expose the user API, we will need to finalize some traitlet/attribute names to avoid confusion and add these two methods to the list of exposed methods.

For example:

ap = imviz.plugins['Imviz Simple Aperture Photometry']
ap._obj.batch_aper_phot([{'dataset': 'image 1', 'subset': 'Subset 1'},
                         {'dataset': 'image 2', 'subset': 'Subset 2'}])

This also implements a helper method (plugin.unpack_batch_options) which takes any number of keyword arguments, where each item can contain either a single value or a list. This then unpacks to create the "matrix" of all combinations that can then be used as input to the method described above. This will then become the hook-in point for the multi-select dropdowns to be implemented in the UI (which likely will only support this matrix mode for the input dataset and aperture subsets).

For example:

ap._obj.unpack_batch_options(dataset=['image1', 'image2'],
                             subset=['Subset 1', 'Subset 2'],
                             bg_subset=['Subset 3'],
                             flux_scaling=3)

returns:

[{'subset': 'Subset 1',
  'dataset': 'image1',
  'bg_subset': 'Subset 3',
  'flux_scaling': 3},
 {'subset': 'Subset 2',
  'dataset': 'image1',
  'bg_subset': 'Subset 3',
  'flux_scaling': 3},
 {'subset': 'Subset 1',
  'dataset': 'image2',
  'bg_subset': 'Subset 3',
  'flux_scaling': 3},
 {'subset': 'Subset 2',
  'dataset': 'image2',
  'bg_subset': 'Subset 3',
  'flux_scaling': 3}]

which could then be passed directly (or modified) as input to batch_aper_phot.

Change log entry

  • Is a change log needed? If yes, is it added to CHANGES.rst? If you want to avoid merge conflicts,
    list the proposed change log here for review and add to CHANGES.rst before merge. If no, maintainer
    should add a no-changelog-entry-needed label.

Checklist for package maintainer(s)

This checklist is meant to remind the package maintainer(s) who will review this pull request of some common things to look for. This list is not exhaustive.

  • Are two approvals required? Branch protection rule does not check for the second approval. If a second approval is not necessary, please apply the trivial label.
  • Do the proposed changes actually accomplish desired goals? Also manually run the affected example notebooks, if necessary.
  • Do the proposed changes follow the STScI Style Guides?
  • Are tests added/updated as required? If so, do they follow the STScI Style Guides?
  • Are docs added/updated as required? If so, do they follow the STScI Style Guides?
  • Did the CI pass? If not, are the failures related?
  • Is a milestone set? Set this to bugfix milestone if this is a bug fix and needs to be released ASAP; otherwise, set this to the next major release milestone.
  • After merge, any internal documentations need updating (e.g., JIRA, Innerspace)?

@codecov
Copy link

codecov bot commented Aug 29, 2023

Codecov Report

Patch coverage is 91.78% of modified lines.

❗ Current head 70fd682 differs from pull request most recent head 2780e97. Consider uploading reports for the commit 2780e97 to get more accurate results

Files Changed Coverage
...imviz/plugins/aper_phot_simple/aper_phot_simple.py 89.28%
...daviz/configs/imviz/tests/test_simple_aper_phot.py 100.00%
jdaviz/core/template_mixin.py 100.00%

📢 Thoughts on this report? Let us know!.

@pllim pllim added the api API change label Aug 29, 2023
Copy link
Contributor

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we aim to stay close to photutils API?

https://photutils.readthedocs.io/en/stable/aperture.html#aperture-photometry-with-multiple-apertures-at-each-position

https://photutils.readthedocs.io/en/stable/aperture.html#background-subtraction

phot_table = aperture_photometry(data, apertures)

So in Imviz, it can be like:

phot_table = aper_phot_plugin.aperture_photometry(list_of_data_labels, list_of_subset_apertures, local_bkg=only_one_subset_background)

cc @eteq and @larrybradley


unpack_batch_options(dataset=['image1', 'image2'],
subset=['Subset 1', 'Subset 2'],
bg_subset=['Subset 3'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if bg_subset has multiple elements, then it is unpacked too? Is that what people want though? Usually a given background is very specific to the chosen aperture?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this bypasses the UI checks, right? For example, now someone can define "Subset 1" to be both the aperture and background and we cannot stop them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if bg_subset has multiple elements, then it is unpacked too? Is that what people want though? Usually a given background is very specific to the chosen aperture?

Right. If you want specific combinations, then you call the other method (or call this first, and pass just the combinations you want).

Also, this bypasses the UI checks, right?

If there are checks that are only in the UI, then yes, those are bypassed. We should move (or at least copy) those to the python end. This PR doesn't yet validate the input values, so technically a batch could get halfway through before erroring out, in which case you'll get the results that completed before the error. We'll need to think of what we want to happen in a case like this. Full validation in advance would likely require a significant refactor of the plugin code and since computing isn't too expensive, I think this is probably okay for now.

Copy link
Member Author

@kecnry kecnry Aug 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I'll also mention that the current plan for the UI is to implement multiselect for the dataset and aperture (subset) only (I just figured it was nice to keep the API that the UI will use as general as possible). Using batch mode on any other input parameter would require calling the API and I'm guessing most people would then choose to manually create their list of all inputs instead of generating one gigantic matrix with a bunch of nonsensical combinations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.s. I thought maybe we can pass multiple apertures straight into photutils instead of looping through them, but looks like that is only possible if the aperture shape/size is identical and only differ in center positions, so I guess that is not very useful for us... See astropy/photutils#1615

so technically a batch could get halfway through before erroring out, in which case you'll get the results that completed before the error

That would be annoying. I prefer returning all successful results and a list of combos that failed, after the loop completes.

Copy link
Member Author

@kecnry kecnry Aug 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would work for me and is easier to implement than full input-validation before running anything. I'm always in favor or raising exceptions, but I can at least wait until all that can pass do so, and then raise the exception at the end so the user is aware that not all were successful.

EDIT: I added a commit which implements what I described above so that the loop does not quit at the first failure.

@kecnry
Copy link
Member Author

kecnry commented Aug 29, 2023

Should we aim to stay close to photutils API?

In my opinion, accepting multiple lists requires more validation on our-end (needing to require each list to be the same length, etc) and is more cumbersome to setup from the user-perspective. I think switching to that syntax would also make it less convenient to call the unpacking method and then just filtering out which you do/don't want to pass along. But it wouldn't be too difficult to change to that syntax if that is the consensus.

We also accept a lot more inputs than just those few, so positional arguments is not ideal (but I think individual lists as kwargs would be reasonable still).

@kecnry kecnry force-pushed the ap-phot-batch-api branch from dbac15b to 456a310 Compare August 29, 2023 19:57
Copy link
Contributor

@eteq eteq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some high-level questions which might become concerns depending on the answers 😉 :

  • I think I understood from the description that this is not meant to be public API yet, but would eventually become public once there's been time to implement it in the UI, is that right?
  • This is pretty deep in the weeds such that I'm honestly not sure which use case this is supposed to address. Is it just the use case of "do aperture photometry on multiple images with multiple different subsets for each image"? And is that intended to be the only case this supports, or is this meant to be the starting point of a larger way of expressing batch operations?

Now a concern:

  • I am afraid that this is becoming a sort of "mini-language" that expresses batch operations. I thing we don't want to go down that road because we should instead be telling users "if you want anything more complex than the simplest of operations, use the tools from Python, because that's going to be an easier and more flexible approach than any mini-language we come up with". Now if we want to say that this API is for internal use - e.g., this is a way for jdaviz (and potentially, future viz tools that are using jdaviz but are not in jdaviz itself) to structure how they do batch operations, I can get behind that. But I think we want to be very careful to make it clear that this is meant mainly as a format to communicate from one UI to the python libraries, not an API that's intended for users to actual do their science on. The most extreme version of that would be to never expose this as a public API - not sure I want to go that far, but it is an option.

@camipacifici
Copy link
Contributor

A couple of comments on @eteq's comments after a discussion at tag up:

  • We already have public access to the plugins from the notebook. I would not call this a mini language, but rather a way to reproduce a specific Jdaviz workflow without clicking.
  • I see this new API as another instance of that same access to the plugins.

Here is a workflow that made sense to us during tag up:

  • user loads 1 or 2 images and creates a handful of subsets
  • user goes to the aperture photometry plugin and with the layer and subset dropdowns selects all the layers and all the subsets (like the multiselect we have in plot options for example)
  • user runs aperture photometry on all combinations from the GUI (so far nothing has happened in the notebook)
  • now user wants to modify the list of combinations so they go to the notebook and use the API to get out the dictionary of combinations of layers and subsets (background would be another entry in the dictionary)
  • user re-loads from the notebook the desired combinations. This updates the GUI and reruns aperture photometry producing a new table

Another workflow:

  • user has a list of layers and subsets extracted from the GUI to the notebook with the get_whatever API
  • user creates the necessary dictionary (I am not opposed to this being lists) of combinations using the unpack_batch_options API
  • user modifies the dictionary (or lists) as they want
  • user runs aperture photometry from the API sending the dictionary (or lists) into the GUI.

Does this make sense @eteq? Am I missing something @kecnry, @orifox, @pllim?

@pllim
Copy link
Contributor

pllim commented Sep 6, 2023

Maybe related or not, but my remaining thoughts:

  • Even with this, don't expect to be able to create 100s of Subsets and do batch photometry with them because Glue in general is plague by performance issues that is exponential to the number of links, on top of each Subset you see link back to all the data loaded.
  • I added "simple" in the name for a reason but I don't think it is simple anymore. In the original design, I stressed that this plugin is not meant for batch photometry...
  • I still do not think radial profile plot makes sense in this plugin and now with the batch stuff, it is even more out of place.

@kecnry
Copy link
Member Author

kecnry commented Sep 6, 2023

Sounds good to me. We can also continue/revisit the discussion about the list vs dictionary inputs anytime before we officially expose the plugin user API.

I made a note that when we do implement the UI for the "matrix" mode, that we will want to be able to retrieve the inputs from those UI-selections, not only by passing lists manually 🐱.

@eteq
Copy link
Contributor

eteq commented Sep 7, 2023

Thanks for the clarification @camipacifici , that makes a lot more sense. But that does lead to a few additional thoughts:

  • If the intent is for it to be a way to essentially "batchify" an existing viz workflow that makes sense. But I don't see anything in this PR that does the step you said of "get out the dictionary of combinations of layers and subsets". Is that a follow-on PR, or something already implemented that I missed?
  • I think it's still very important we make it clear to the users that they should not spend a lot of time trying to start from scratch this way. That is, we want to encourage them to go straight to photutils for a workflow that isn't clearly directly tied to visualization like this. This is probably just making sure to add a few notes in the docs for this feature that says that explicitly, and maybe some runtime checks that generating warnings along the lines of "you are trying to do too much with this, try using photutils directly" (although even that might be going too far).

@kecnry
Copy link
Member Author

kecnry commented Sep 7, 2023

I don't see anything in this PR that does the step you said of "get out the dictionary of combinations of layers and subsets". Is that a follow-on PR, or something already implemented that I missed?

I think this falls under the follow-up UI work, I had added a note to the ticket there to make sure we do this.

Copy link
Contributor

@cshanahan1 cshanahan1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few comments after reviewing this:

  1. I think that the 'unpack_batch_options' should check if the dataset/subset names are valid and loaded, this would make it a more helpful helper function. Otherwise, at least in my opinion, making these combinations would be something I would be inclined to just do myself in a few lines.

  2. @pllim already commented on this but another concern I have with the 'unpack_batch_options' is that it assumes every aperture should be combined with every background subset, when I think most people would assume it should be 1:1. One solution I can think of for this is if background is a list, then it should be the same length as the photometric apertures and raise an error if not, and if its a single subset then that should be combined with each. That way, if people have 2 background apertures they want to apply to each data and each subset, they can just run this twice and combine the results.

  3. Should this allow you to pass in background values rather than subsets to calculate the background in, like the GUI does?

  4. I am trying to understand the intended workflow here, versus what I expected when I see 'batch more photometry'. It seems like the intent of this is to be able to reproduce/automate the process of loading data, making some clicks to create some subsets on sources and some subsets for background, and running the photometry tool on these one-by-one. I don't see how this implementation makes this process reproducible after doing it once and restarting your notebook- the batch option function just passes in references by name so if you restart the notebook, wouldn't you have to click to create these subsets again and the only utility these changes would have is you could avoid having to click the photometry tool for each one-by-one? Is there a way to preserve subsets so they can be re-generated in a new notebook instance?

I see now that the intent of this PR is not to replicate something that should be done in photutils like I orignially envisioned (e.g providing positions, aperture sizes, backgrounds, and getting back a table) but rather to aid in reproducing a workflow someone did by hand in the GUI. However, I think it makes sense to be able to go back and forth more seamlessly - for example if you do some preliminary analysis in photutils to select the brightest sources, maybe you should be able to load those locations in to create subsets in a list/table without clicking (Maybe this functionality is already available?).

@pllim
Copy link
Contributor

pllim commented Sep 7, 2023

To add to the fun, even without this PR, "straight to photutils batch call" is kinda possible already but it is not that convenient. For instance, you can already do the following workflow at a high level:

  1. Draw apertures on Imviz. Center them all you like or move them around or whatever.
  2. Use imviz.get_interactive_regions() to read them back out as regions. Now, this part can get annoying if you have multiple images and they are not aligned by pixels. As long as you know which WCS to use (usually the first loaded image until Brett's rotation PR goes in), you can still easily use Region.to_sky(wcs) to convert them back to sky regions.
  3. Use regions2aperture function to convert Region to native photutils Aperture object (permalink below). I wanted to put this converter in photutils but Larry wanted it in regions, and we couldn't agree, so it is still in Jdaviz.
  4. Background boils down to a scalar number you pass into photometry, so you could grab that number out of the plugin API, no matter how you calculated it.
  5. You can get the data back out using one of the app API calls we already have (it just got refactored so I don't know what it is called now but it is the one that goes through glue-astronomy translator and gives you NDData).
  6. Now you can take all these pieces and pass them into photutils calls directly outside of Jdaviz.
  7. ???
  8. Profit!!!

https://github.com/spacetelescope/jdaviz/blob/6fb8ccccc26e5b1f954e3cf29acf14533328e216/jdaviz/core/region_translators.py#L139C5-L139C21

@kecnry
Copy link
Member Author

kecnry commented Sep 8, 2023

I think that the 'unpack_batch_options' should check if the dataset/subset names are valid and loaded, this would make it a more helpful helper function. Otherwise, at least in my opinion, making these combinations would be something I would be inclined to just do myself in a few lines.

This can probably be added if we want - I originally didn't because (1) I wanted to keep this part of the logic general to potentially be shared with other plugins where we're planning to implement batch mode, (2) the error will be raised later anyways, (3) when using this from the UI all selections are guaranteed to be valid, and (4) perhaps a bit of laziness 😉 .

Honestly, I don't expect anyone to call this method by hand, but it is needed for the current plan for the UI.... but maybe the cleanest solution is to hide this as a private method and instead only expose the matrix from what has been set by the traitlets/UI which already include these checks natively?

@pllim already commented on this but another concern I have with the 'unpack_batch_options' is that it assumes every aperture should be combined with every background subset, when I think most people would assume it should be 1:1. One solution I can think of for this is if background is a list, then it should be the same length as the photometric apertures and raise an error if not, and if its a single subset then that should be combined with each. That way, if people have 2 background apertures they want to apply to each data and each subset, they can just run this twice and combine the results.

Agreed, when a user is doing batch mode, I expect them to use batch_aper_phot directly or to maybe use unpack_batch_options (either through the API or by setting the multiselects in the UI and accessing the matrix that is generated there) and then manually cull that output into the set of options they want.

This expectation that an API call shouldn't create every combination is maybe another sign that this method should be hidden and only used by UI interactions.

Should this allow you to pass in background values rather than subsets to calculate the background in, like the GUI does?

Yes, this should work, as long as bg_subset = 'Manual' (I propose eventually renaming bg_subset to background), either as part of the passed dictionary or fixed in advance.

I am trying to understand the intended workflow here, versus what I expected when I see 'batch more photometry'....

Right, I think there are lots of different opinions on what "batch mode" should and should not entail. The idea here (which can be debated if that is what/all we want) is to allow you to run the existing plugin over several combinations in one click/call rather than having to tediously click a bunch or write your own for-loop. This PR in particularly aims to provide the API framework for a UI that will allow that multiselect capability for the apertures and backgrounds, while also providing more flexibility for custom combinations that a simple "matrix" operation between those multiselects would allow in the UI.

I think anything beyond that (better reproducibility of entire workflows, different interaction with photutils) would motivate more planning about a redesign of the existing plugin and/or require saving state.

even without this PR, "straight to photutils batch call" is kinda possible already but it is not that convenient.

Maybe (as a separate effort) we should try to create convenience functions to export the info needed to pass along to photutils directly if this is a use-case people want?

@pllim
Copy link
Contributor

pllim commented Sep 8, 2023

I fixed the dev job, so you should rebase.

Copy link
Contributor

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cami seems happy enough with this, so I am approving by proxy, though it is hard for me to see how this will play out, but only one way to find out. Thanks!

* loops over options, sets traitlets, calls existing method
* see comments for refactoring that would be needed to NOT touch the traitlets
* in "other changes" for now since this technically isn't public facing until the UI is implemented or the user-API for aperture photometry is officially exposed
* any entry that is failed will be reported, pass full_exceptions to also report the individual exceptions
@kecnry kecnry enabled auto-merge (squash) September 15, 2023 15:31
@kecnry kecnry merged commit 03b68af into spacetelescope:main Sep 15, 2023
@kecnry kecnry deleted the ap-phot-batch-api branch September 15, 2023 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api API change imviz
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants