Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster physical coordinates #47

Merged
merged 73 commits into from
Aug 16, 2024
Merged

Conversation

HannoSpreeuw
Copy link
Collaborator

Major speedup for processing large (4k*4k) images with > 1e5 sources by applying the guvectorize decorator. These images can now be processed - including loading of the FITS file - in less than a second on a single core of e.g. a AMD Ryzen 9 7900X.

A major overhaul of the data layout was needed to achieve this speedup. In summary, the speedup is achieved by doing all source processing in a vectorized way, this is much more efficient than any parallellization effort. Numba's guvectorize decorator not only implements vectorization on all the source processing in pixel coordinates, but also compiles the corresponding Python code. Also, all the conversions from pixel coordinates to celestial coordinates were done in a vectorized way using Astropy routines.

Note that you will see little of the claimed speedup in this branch, because all the measured source properties, which are stored in Numpy arrays, are finally passed into extract.ParamSet instances, which is the format that the unit tests accept. This is very time consuming. So to actually achieve the speedup, this conversion, which is the final step of source processing, has to be removed and simultaneously the unit tests will have to accommodate for any new format that stores source properties, e.g. xarray.

Also note that this guvectorized processing of sources only applies if the user does not opt for forced beam fits nor deblending. In these cases the speedup can be achieved, using moments only, no fitting in these cases yet. Hopefully this will be added later.
If the user does choose either deblending or forced beam fits, analytical Jacobians and bounds enhance and stabilize Gaussian fits.

…["semimajor"] * param["semiminor"] / beamsize). This is its equivalent, following the correct definition of the flux density (measured in Jy): i.e. the specific intensity a.k.a. spectral brightness integrated over the solid angle that a source subtends. An unresolved source subtends a solid angle equal to the beamsize, that is why we need the division by beamsize. The flux density is numerically ideally very close to the peak flux density in case of unresolved sources.
…cludes the transformation to celestial coordinates, for which we will use all_pix2world from Astropy.

Will raise a RuntimeError if any return value is a nan, similar to p2s. On the way, fixed an error in the description of p2s, it seems to use a list (of two items) as input, not a tuple.
…of the positions of the upper left corners of the chunks in the image and not correct for that later. This required setting up and filling an array chunk_positions containing those positions. chunk_positions is now the second argument of moments_enhanced. Within moments_enhanced it is called chunkpos.
…to all_pix2world from astropy.wcs works in this manner.
…the __init__ method from that class calling _physical_coordinates. Inspired by the success - speedup factor > 30 - from applying Numba's guvectorize decorator to fitting.moments_enhanced (instead of njit), we wanted to do the same thing with _physical_coordinates. Now the problems come from the calls to wcs.wcs_pix2world from Astropy. One cannot @guvectorize that in nopython mode, nor njit it. There is wcs.all_pix2world which will do a fast, vectorized call to convert a number of pixel positions to celestial coordinates. That will help us, but that routine needs to be called prior to the guvectorized call to the equivalent of the first part of _physical_coordinates, which we will conveniently call first_part_of_celestial_coordinates. first_part_of_celestial_coordinates gets two input arrays of celestial coordinates, namely the celestial coordinates of the barycentric positions (xbar, ybar) as well as the celestial coordinates (xbar, ybar + 1) in order to determine the orientation of the y-axis relative to local north. Another input array are the errors on xbar and ybar. dummy is an input array with the same shape as the output array to help Numba infer the shape of the output array. That output array will contain the values of errorx_proj, errory_proj and yoffset_angle, which will be needed to run the second part of the _physical_coordinates, which is the next function to be written. So why not a single guvectorized routine doing the same thing as _physical_coordinates? The problem is that _physical_coordinates contains a number of calls to wcs.wcs_pix2world, some of which can be done upfront using wcs.all_pix2world, but others need intermediate output from _physical_coordinates as input. So splitting into (at least) two separate routines, replacing _physical_coordinates is unfortunately needed. On a completely different note, a few lines of comments in moments and moments_enhances were moved, since they should not appear after the descriptions of the arguments.
… Chose a shorter variable name in order to stay within an 80 character line length.
…to extract instead of fitting, since it mimics code from extract.Detection._physical coordinates.
…code in computing error_radius for all sources in a single instruction.
…. Input would be five arrays, with number of rows equal to number of islands and two columns corresponding to ra and dec. First array corresponding to (xbar, ybar) and the next four corresponding (xbar +- x-error, ybar +- yerror). At this point I am not sure we want to mimic get_error_radius to this level, since there should be more elegant ways to derive an absolute angular error on position. Will probably choose a different approach.
…too involved. One of the four corners of [xbar += xbar_error, ybar +- ybar_error] should suffice to compute an absolute angular error on position. With this simplified approach, instead of get_error_radius, angsep was guvectorized. This also required njitting cmp.
…ze decorator), while trying to mimic the original code as accurately as possible, including all warnings. That is hard, because the warning will be raised for all objects also if only one object meets the condition. Btw, there seems to be a redundant isnan conditional - if numpy.isnan(self.dec.value) - in line 851 of extract,py, i.e. in extract.Detection._physical_coordinates. That should already be covered by the conditional in WCS.p2s.
…maj_y, start_smaj_y, end_smin_x, start_smin_x, end_smin_y and start_smin_y can also be computed within first_part_of_celestial_coordinates. Some explanation: When designing the first_part_of_celestial_coordinates routine, the idea was to mimic as much of extract.Detection._physical_coordinates we could cover until a call to WCS.p2s was needed, which Numba cannot compile. It turns out that after that call to WCS,p2s there are these eight quantities that do not depend on any sky (celestial) quantity, so we might as well include their computations in first_part_of_celestial_coordinates.
…extract.Detection._physical_coordinates. Slightly more involved than I was hoping for, because one has to first extract the column containing errorx_proj from input_for_second_part, append a column with zeros to that and then add these two columns to the barycentric pixel positions and derive the corresponding sky coordinates. Likewise for errory_proj, which is the second column in input_for_second_part, but now one has to prepend a column with zeros before adding that to the barycentric pixel positions and deriving the corresponding sky coordinates. As in extract.Detection._physical_coordinates, the maximum differences, in both RA and dec, corresponding to these two additions, are taken as the errors in RA and dec. In the extract.Detection._physical_coordinates, eps_ra and eps_dec, i.e. the calibration errors affecting the accuracy of the positions in the image, are keyword arguments to the Detection class. Since that class is bypassed, I added eps_ra and eps_dec to the _pyse call, as keyword arguments.
…is time we are computing the celestial position angles of the sources and their deconvolved equivalents. For both we need to add yoffset_angle which is the third column from input_for_second_part. In extract.Detection._physical_coordinates there is a nan check: if not numpy.isnan(self.theta_dc.value). To me, it seems this nan check is redundant since the addition and modulo operations on a nan will result in another nan. Also, if one of the deconvolved celestial position angles is a nan, I presume its uncertainty will also be a nan. It should be, at least.
…ms originating from calculations within fitting.moments_enhanced. An analysis showed that this came from thresholds larger than the peak values. How can this occur? More accurately, why can we have self.data_bgsubbed[chunk].data[peak_position] < analysisthresholdmap[peak_position]? Most likely it has to with the subthresholds used for deblending: thresholds[count] = analysisthresholdmap[peak_position] is not correct when deblending is performed. My hunch is that if the subisland constitutes multiple pixels the subthreshold at which the subisland has been found could be lower or higher than analysisthresholdmap[peak_position].
…sical_coordinates. All aspects of source measurements have now been vectorized which means that a 2300² pixels image with 2025 sources is processed in less than a second on my laptop, once the now redundant loop starting at line 1265 in image.py, i.e. 'for count, label in enumerate(labels):....' has been commented out. That loop is still needed to make an object that can be serialized which is needed for at least one unit test. So I need to think about how to accommodate for that, because a unit test will break if I remove that loop. Another remaining problem is the line 'thresholds[count] = measurement["thresh"]' which I recently uncommented. The problem is that sep.extract can return a threshold of zero for deblended sources. This will cause zero division errors in fitting.moments_enhanced.
…resh"] instead of analysisthresholdmap[peak_position] has drawbacks in case of deblended sources. In that case sep.extract gives measurement["thresh"]=0 (with measurement["flag"] = 2) for unknown reasons - not sure if SExtractor behaves the same - and such a threshold of zero gives division by zero errors in fitting.moments_enhanced, obviously. Actually a pity we cannot propagate subthresholds, since they are more accurate than the analysis threshold at which the composite island was detected. This would be a reason to revert to PySE's own deblending algorithm instead of the one from sep.extract, but speeding that up will be hard, since it used ndimage.label which Numba cannot compile. Anyway, the error I had tried to fix had a different cause that I thought. The (x, y) pixel coordinates from sep.extract are switched wrt the native coordinates in PySE (which follow ds9, except from an offset of 1). This has now been fixed. A TBB 2300² pixels all-sky image is now processed within 0.5s on my pc. And a 2300² pixels 'standard' image with 2025 sources within 1s, once the redundant 'for count, label in enumerate(labels):' in line 1264 of image.py has been removed. This loop cannot be removed as yet, since the unit tests require a results object that has a 'serialize' attribute.
…to uncomment it, because it will make find_objects somewhat faster. But that gain of speed is more than compensated by the expense of labelled_data[numpy.isin(labelled_data, labels_above_det_thr, invert=True)] = 0. So no speed gain there.

Besides that it will lead to incorrect results because the label numbers will not be in sync with the slices from slices = ndimage.find_objects(labelled_data).
So better remove it.
…op has been commented out. In terms of computing quantities this loop is completely redundant, but the output - i.e. the measured source properties - has to be in the results = containers.ExtractionResults() format that the unit tests and the rest of the pipeline need. We will have to accommodate for unit tests that accept a more compute efficient i.e. closer to Numpy arrays format and also the rest of the pipeline will have to deal with this.

I added some timing that I will explain in the next commit.
…eblending. In that case one can replace the call to sep.extract to scipy.ndimage.label which is much faster, because it does no source measurements, only connected-component labelling. (The source measurements from sep.extract are inadequate for what we need in the TKP, but do contain a few handy quantities such as number of pixels in the island and peak position, which we should be able to accommodate for in a different manner). When either forced_beam or deblending are used, PySE will use the old code.
…tests, because the unit tests want the source measurements in the containers.ExtractionResults format. What has been modified here was inspired by runs on a large (4096*4096) image with 167281 sources. These runs reveal that sep.extract takes a considerable time, most likely not because of connected-component labelling (ccl), but because of the source measurements. This was confirmed by the reintroduction of scipy.ndimage.label for ccl. The only downside of this approach is that for deblending one needs to revert to a slower way of processing in _pyse, see the condition on line 951 in image.py. If the user requires no forced beam fits nor deblending of sources, 4K*4K images with 167281 sources can be processed within 2.0s, if the image does not have to be loaded from disk (AMD Ryzen 9 7900X). Another downside of the removal of sep.extract is that we do not have the number of pixels per island, nor the maximum values and maximum positions, which is needed to filter out islands above the analysis threshold with peaks below the detection threshold. Also, the maximum positions are needed to determine the local noise levels. For now, they are set to the upper left corner of the image chunk, i.e.: chunk[0].start, chunk[1].start. The new function slices_to_indices is not yet deployed here, but will be deployed as a helper function in the next commit to find the maximum values and maximum positions of the islands. Something like max_inds = numpy.unravel_index(image_chunk.argmax(), image_chunk.shape) within the for count, label in enumerate(labels) loop is definitely too slow for images with over 100.000 sources.
… pixel values, maximum positions and number of pixels for each island. This was needed, since we elimininated sep.extract, which turned out too slow for more than 100,000 sources. The maximum values per island are needed to filter out the islands above the analysis thresholds with peak values below the detection threshold. The new guvectorized function find_max_and_argmax_image_slice takes the image as float64 data, while float32 should have been sufficient, but this breaks some unit tests, unfortunately. We will stick with float64 for now, since there seems to be no speed penalty. Also, the loop at line 1362 in image.py is still needed to return a results = containers.ExtractionResults() format, which the unit tests need. This makes runs a lot slower.
…an analytical Jacobian to the fitting algorithm, since it will result in more stable fits, i.e. a smaller chance on runaway fits. The six partial derivatives of an anisotropic 2D Gaussian derivatives are provided here, as a list of six functions.
…s preferred. leastsq is legacy. This results in a different default value of maxfev (copied to the keyword argument max_nfev in least_squares) Make use of the analytical Jacobian for more stable fits. This requires evaluating the six partial derivatives, which constitute the Jacobian, to be evaluated over the positions that the island covers. For parameters that are fixed, the Jacobian derivatives should be wiped out. Residuals seemed to have an improperly formatted docstring. paramlist was an ndararay, which is confusing, so I renamed paramlist to params. Residuals returns a 1d-array, not 2d. Finally, I followed up on recommendations from PyLint.
…functions rather than a list. The reason is that the fixed parameters for Gauss fitting will have to be wiped out and it is safer to do so on the basis of a dict key than on the basis of a list index.
…n a dict instead of a list. This commit accommodates for that. It also has an effect on the calculation of jacobian_values. Some work on setting the bounds for Gauss fitting has been implemented here, but it will shift in a next commit to a property of ParamSet instances. Bounds for Gauss fitting have not been applied yet.
…ams' argument of fitting.fitgaussian is not necessarily a ParamSet instance, so may not have a bounds attribute. At the same time it may be useful to store these bounds for future reference, as they are an essential part of the fitting process. Fixed by adding an extra 'bounds' argument to fitting.fitgaussian. Upper and lower bounds of 0.5 and 1.5 times the moments values seem reasonable for the peak value and for the semi-minor and semi-major axes. For xbar and ybar the fitted valuesshould always be within the rectangular slice encompassing the island. For the position angle the fit should be between +-numpy.pi/2, but that will not work for the major or minor axes aligned with the coordinate axes. That is why the 1.05 was added and the keep_feasible=True, which allows to keep the position angle constraint feasible throughout iterations.numpy.seterr(divide='raise', invalid='raise') was added as a handle on invalid divisions and logarithms.
…a keyword argument, instead of deriving them within. The fixed parameters are wiped out, because bounds have to be in sync with initial.
…xis is aligned with one of the coordinate axes, since keep_feasible=True already accommodates for a smooth fitting process.
…the docstring. These colons apparently stop the linter from complaining.
…s used by scipy.optimize.least_squares, which we are currently using.
…d has sufficient width for Gauss fitting. However, a thin strip from the upper left corner to the bottom right corner of the rectangular slice encompassing the island would pass this test, although its width could be as small as a single pixel. This new test should catch this case. Not sure yet how to deal with masked pixels. numpy.ma.where(data != 0, 1, 0) is implemented here, but numpy.where(data != 0, 1, 0) may suffice.
@HannoSpreeuw HannoSpreeuw requested a review from suvayu May 10, 2024 10:01
@HannoSpreeuw HannoSpreeuw self-assigned this May 10, 2024
…ameters when the island has a width of at least 3 pixels in both dimensions; 2 pixels would not be enough to determine an axis length in a meaningful way, I reckon. This requires an extra input argument to fitting.moments_enhanced: the minimum width. The most natural function to derive the minimum width, for every island, is insert_island_data. That is implemented here, but the minimum width are not yet passed on to fitting.moments_enhanced. This will be done in a subsequent commit.
…hanced: in all cases with insufficient width to derive Gaussian shape parameters, use the clean beam parameters. This application requires an extra argument to fitting.moments_enhanced. Next, the no_pixels == 1 case can no longer occur, so can be removed. Perhaps, even with this filtering out of "thin" detections, we can still run into math domain errors, with working2 slightly larger than working1, so a try except has been added.
… i.e. in the case that no Gauss fitting was performed or could be performed.
…ot apply vectorization - should make it slightly cleared what chunkpos is about.
… The idea is that we should apply a function that computes the residual from one set of Gaussian parametersfrom one island to that island. And then apply Numba's guvectorize decorator to apply all sets of Gaussian parameters to all corresponding islands. That should be really fast. The problem is most likely that the self.Gaussian_residuals output image, containing all zeros before the call to extract.calculate_and_insert_residuals and containing all the residuals after the call, does not change size with the number of islands and somehow Numba does not know how to make sense of that. I tried to remove the '->', i.e. no output, in the guvectorize decorator spec and add writable_args=('residual_map',), but that did not help Numba in any way, it still fails with the same 'ValueError: output operand requires a reduction along dimension -1, but the reduction is not enabled. The dimension size of 1 does not match the expected output shape.'
…t_residuals - for calculating a residual map fast, i.e. in a guvectorized way - for now. By commenting this out, the unit tests will pass and we can proceed with merging with the accelerated branch, i.e. merging PR #47.
@HannoSpreeuw HannoSpreeuw merged commit 3650d85 into accelerated Aug 16, 2024
@HannoSpreeuw HannoSpreeuw deleted the faster_physical_coordinates branch August 16, 2024 09:12
HannoSpreeuw added a commit that referenced this pull request Aug 23, 2024
…t_residuals - for calculating a residual map fast, i.e. in a guvectorized way - for now. By commenting this out, the unit tests will pass and we can proceed with merging with the accelerated branch, i.e. merging PR #47.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant