-
-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introducing Horne extraction #84
Conversation
Optimal extract works as expected in compare_extractions.ipynb. The normalization matches BoxcarExtract and the SNR is greatly improved for a wide extraction aperture. I think an essential addition would be to add the option of "weights" analogous to the weights used in BoxcarExtract, to enable the delineation of an extraction region and allow suppression of bad regions. These weights should be multiplied into the optimal weights. |
One future enhancement (not for this PR) would be to allow the user to compute the kernel from a higher SNR secondary source, such as a 2D spectrum of a star observed with the same instrumental stetup. The 2D star spectrum would be input, stellar trace and kernal computed, then shifted to align with the object trace. This would allow optimal extraction of extremely noisy spectra where the instrumental profile is difficult to measure. |
specreduce/extract.py
Outdated
if np.ma.is_masked(g_x): | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when the continue
is triggered, won't the length of pixels
and extracted
be different when you create the Spectrum1D
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there has to be a more efficient way to vectorize this bit, but i'd have to give it a deeper think on how to do it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kecnry you're right. I was switching between approaches of appending to an empty list and filling in a pre-made array of size pixels
. I'm still thinking about how both fare with vectorization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think using np.fromfunction
is the approach we ultimately want to use here to leverage numpy broadcasting as much as possible. it'll require some refactoring/rearranging of the logic here so that might be something to leave for a future optimization. np.fromiter
is another option that's more efficient than using lists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that all said, it's also fine to have the initial implementation be straightforward and not optimal performance. provides a baseline for future improvements to compare to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got rid of the (Replied in the wrong thread; see above.)continue
in the latest commit. Since the arrays are masked, any NaNs can just be carried on to the final, extracted 1D spectrum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tepickering, I didn't find a sensible way to build kernel_vals
or norms
with np.fromiter
or np.fromfunction
, so I still create them with a loop. Continuing to append to a list was a small percentage faster than filling arrays.
kernel_vals
becomes a 2D array that needs the fit object's mean attribute in each column to be modified before any fit results are stored in it. If that pre-modification wasn't necessary, I could see how np.fromfunction
would be useful, plus the array of index locations it passes to the inner function would eliminate the need for xd_pixels
. np.fromiter
can only return 1D arrays, so it seems like a no-go here.
Each entry in norms
must be calculated immediately after the corresponding fit in kernel_vals
. This means we'd need to get norms
from the same call to either numpy
function as kernel_vals
, and I don't think either is set up to give two separate arrays as output.
An alternate approach I tried for norms
was to turn fit_ext_kernel
into an array of separate fit objects for each column. I could then call np.fromiter
twice on an inner function that used getattr()
-- one call for amplitude and the other for standard deviation. However, even creating the array of fit objects was much slower than using the original loop.
specreduce/extract.py
Outdated
image : `~astropy.nddata.CCDData` or array-like, required | ||
The input 2D spectrum from which to extract a source. | ||
|
||
variance : `~astropy.nddata.CCDData` or array-like, required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we require inputs to be in CCDData
format, then the image and variance data can be packaged into the same object (as well as masks). that's one of the main points of using CCDData
and why it is a core component of astropy
. so i strongly encourage its use here and elsewhere in specreduce
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also note that in DRAGONS if no variance is available, a uniform variance is calculated based on pixel-to-pixel variations in the data. a similar fall-back could be implemented here. to whit:
if var is None:
var_model = models.Const1D(sigma_clipped_stats(data, mask=mask)[2] ** 2)
var = np.full_like(data[ix1:ix2], var_model.amplitude)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood on the benefits of CCDData
. An option could be to include variance and mask keyword arguments that default to None for users who prefer to work with separate numpy arrays for whatever reason. Solely handling CCDData
objects could move the barrier to entry higher for users depending on how familiar they are with them.
I am hesitant on calculating a variance for the user. My impression is that the calculation varies based on the instrument and what other data products you have available (weights, etc.). If that's the case, it could be hard to do that in a general manner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess an easy-ish way to do it would be to add a check if image
is not CCDData
, and if that's the case, make CCDData
instance from that plus mask
and variance
. and we'd need a way to input a unit
if image
is not a Quantity
. still, that's extra code and logic for us to maintain that wouldn't be required if we just consistently use CCDData
internally. users that have normal numpy arrays can always just pack them into CCDData
at call-time. e.g.:
horne_extract(CCDData(image, variance=variance, mask=mask, unit=unit), trace)
users of ground-based data that do initial processing using ccdproc
will already have CCDData
instances ready to go.
hear you on making variance assumptions. requiring it to be specified is a valid approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no variance data, I think setting it to unity (or any other constant) everywhere will work, without having to compute the variance from the science data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it would be good to support CCDData, I think we also need to keep the option to support less structured inputs, such as bare numpy arrays. Not everyone in the community uses the CCDData format. @eteq What do you think?
|
||
# fit source profile, using Gaussian model as a template | ||
# NOTE: could add argument for users to provide their own model | ||
gauss_prof = models.Gaussian1D(amplitude=coadd.max(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gaussian profiles are a very poor assumption for any kind of extended source. so i think it'd be worth coding in some generality to at least support a list of a few models that have similar-ish inputs. a model_pars
input dict could be used to pass configuration for an input model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any that jump out at you as "must-include?" I mentioned Moffat earlier. Sérsic and Lorentz seem reasonable to me. I'm not sure which are seen as standard in this field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moffat for sure since it's widely applicable for ground-based data. lorentz and voigt would be appropriate as well. sérsic is tricky because it's one-sided. definitely want to stick with symmetric profiles for now...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimal extraction doesn't work in principle for extended sources. You can't sensibly weight one part of an extended galaxy more than another. Optimal extraction is intended for point sources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was thinking more along the lines of spatially resolved, but compact, galaxies where the spectrum doesn't change significantly as a function of radius. that said, it's a pretty niche case and gaussian + moffat probably cover the vast majority of needs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For point sources, getting the kernel shape exactly right is not too important and will only have a small effect on the S/N of the extracted spectrum. For extended sources, a point source (or any other centrally peaked) kernel will down-weight the extended emission and up-weight the core.
HorneExtract can take an image as a CCDData object or a numpy array. The variance now argument only applies in the latter case, along with the new mask and unit arguments. HorneExtract now takes better advantage of numpy broadcasting. OptimalExtract now recycles HorneExtract's docstring.
The latest commit...
The notebooks will need to be updated once we finalize |
Just a quick comment here w.r.t. That said, two complexities:
Combining those two together, the assumption is that inside the blocks is "the data is an NDData or subclass" (or maybe |
i agree with @eteq that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few minor comments. Otherwise I think the code itself is looking pretty clean and ready to go (although I'll leave the science review of the actual results to others).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Horne extraction works and gives the expected results for numpy array inputs. Please consider the following changes:
-
Make mask and unit optional inputs with default 'zero' mask and unitless units. These inputs are optional for Boxcar and should be for Horne as well. However, it does make sense for variance to be required, since it is required for the Horne algorithm.
-
Update compare_extractions notebook to follow the HorneExtract required inputs, e.g.:
from astropy import units as u
mask=img-img
hrn = HorneExtract()
hrn_result1d_whole = hrn(img, trace, variance=variance, mask=mask, unit=u.Jy) -
Also give an example in compare_extractions.ipynb of using a CCDData object as input. I did not try this mode of input.
|
||
# will also need to clone specreduce | ||
# git clone git@github.com:astropy/specreduce.git OR git clone https://github.com/astropy/specreduce | ||
# find and delete all occurrences of ".data" in kosmos/apextract.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that if NDData
or CCDData
is used as an input, like me and @eteq have requested, this wouldn't be required. the docstring in kosmos
says 2d numpy array, or CCDData object
, but the use of the .data
attribute means it actually needs something NDData
-like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this, but it doesn't work because the KOSMOS
trace also tries to call img.shape
on the object, which isn't an NDData
attribute. The best solution would be to update this notebook to use KosmosTrace
once #85 is finished.
I was asked to collect outstanding issues in a single comment. Please share opinions on them and guidance on which of these must be decided before a merge. For a summary of the most recent changes, look here.
I wrote earlier that I will wait to update the notebooks until we've decided on the final set of arguments. |
part of the point of using
i think it'd be enough to add a check for that type and set
i think it's good enough for now. significant gains would probably require significant work so best to see if we really need it first.
i'd say it's fine for now.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried HorneExtract on fake dataset in compare_extractions. Tested input numpy array, NDData, or CCDData object and all 3 worked. Horne extraction gives lower noise in a wide aperture than does Boxcar Extract, as expected
The alternative validation may be necessary because We can't control whether users remember to wrap the variance array with a
A default mask of zeros seems to have broad support, so I'll implement that. I don't believe setting uncertainty to Internally turning the
I'm not sure that's what it actually means and was hoping someone else would know. The docs page doesn't go into the math, but from the Wikipedia page it seems like it's not that simple.
Would you prefer I remove them as arguments like I did here with
I'm trying to thread a needle between your preference and other stakeholders' desires to work with arrays. No one has relented, so the compromise is to support both. |
i think a reasonable approach would be to raise an exception if it's
the docs do give a few simple examples of the math, but they definitely could be clearer. the examples in the docs for the 1/variance trick works when the uncertainties are uncorrelated. full support for correlated uncertainties is a deeper, stickier problem we don't want to get into now, i think.
as i suggested in #86, we can keep them for now and clean them up later when the work is done to add those attributes to
we had a discussion about this during the astropy coordination meeting this week and how this relates to the future of |
i just noticed that in the notebook for this PR, the VLT weights are actually inverse variance. they're then converted to variance in exactly the way i suggest converting variance back to inverse variance. |
i tried running through
everything ran fine up to that point and looks right. |
Did you follow the instructions in the requirements file for removing instances of |
d'oh! fixed that and now it works. however, the |
It's a raw cell, so it's not supposed to execute. It's just there for demonstration. The real |
gotcha. the wording is confusing, though. it implies that if you change the cell to a code cell and run it, it should work (modulo fixing kosmos). as a test, i added
and then everything after that was happy either way. it's up to you if you want to tweak that. i'm content to merge as-is since the runnable cells all seem to work as intended. |
I'm also OK with leaving it for now and remembering to change it once we've merged #85. That will also allow us to remove the gymnastics of modifying the |
ok, i'm going to go merge #90 and then merge this. last call for any changes before i do so... |
I'm about to push a change for accepting |
if any(arg is None for arg in (variance, mask, unit)): | ||
raise ValueError('if image is a numpy array, the variance, ' | ||
'mask, and unit arguments must be specified. ' | ||
'consider wrapping that information into one ' | ||
'object by instead passing an NDData image.') | ||
if image.shape != variance.shape: | ||
raise ValueError('image and variance shapes must match') | ||
if image.shape != mask.shape: | ||
raise ValueError('image and mask shapes must match') | ||
|
||
# fill in non-required arguments if empty | ||
if mask is None: | ||
mask = np.ma.masked_invalid(image) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ojustino - this came up today when trying to use Horne extraction. The first if statement raises an error if mask is not passed, but later there is a check to fallback if mask is None. Can both mask and unit (default to unitless as the docstring says) be optional when passing image as an array instead of an NDData object? (If so, maybe this should be a follow-up issue since this PR has since been merged, I just wanted to attach it to the lines of code somewhere to have a breadcrumb).
Edit: the if mask is None
check will also need to happen before comparing image.shape
to mask.shape
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we settled on mask
and unit
being optional for array-type images, so the if any()
should be updated to reflect that. I can open a pull request for it you'd like to copy over some of what you've written in a new issue.
I agree that mask
check should be moved earlier.
This pull request introduces Horne extraction classes, a Jupyter notebook adapted from spacetelescope/dat_pyinthesky#163 explaining how the classes were developed, and another Jupyter notebook comparing the results.
(I'm manually tagging @ibusko since I can't currently request him as a reviewer.)
extract.py
After some productive troubleshooting with @PatrickOgle, we have the
HorneExtract
andOptimalExtract
classes. I made the latter as an alias since it seems many prefer that name.There are a few
NOTE
comments in the code. Here's some other food for thought:CCDData
image objects? Most of our examples use numpy arrays.bkgrd_prof
) allows usage of other models (Moffat, etc.) but presents difficulties because we need to use the trace to fix the mean and not everyastropy.modeling
model has a "mean" parameter.optimal_extract_VLT.ipynb
I recommend at least reading the introduction to see a step-by-step description which steps of the Horne extraction process I've covered in the classes and which are left to the user.
compare_extractions.ipynb
We can take or leave this notebook, which generates a fake image and then compares the results from the Horne and boxcar algorithms. It was helpful in diagnosing some issues with the Horne work, and the final result is a good visualization of the improvement in S/N gained through Horne extraction.