-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Support for Sorting Lightcurves by Time #353
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #353 +/- ##
==========================================
+ Coverage 95.53% 95.55% +0.02%
==========================================
Files 25 25
Lines 1702 1710 +8
==========================================
+ Hits 1626 1634 +8
Misses 76 76 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
What would actually happen if the condition of lightcurve cohesion is not met? Do we need a docstring telling something about that?
Added a note that this would fail to globally sort the table in the docstring. |
Introduces
Ensemble.sort_lightcurves
which sorts lightcurves by their timestamp in ascending order.This PR closes #316 though we opt to provide a separate function for sorting lightcurves by their timestamp rather than adding additional functionality to the batch function. This also makes it easier for the user to only sort when needed rather than worrying about the arguments specified over repeated batch calls.
Solution Description
Ideally we would use a multi-index of Object ID and timestamp for the Source table, but Dask lacks support for this. Instead, we use the Object ID as the sorted index for both Source and Object tables, and a call to
Ensemble.sort_lightcurves
perform a per-partition Pandassort_values
for sorting the underlying Pandas dataframe by {Object ID, timestamp} (Note that Dask[sort_values](https://docs.dask.org/en/latest/generated/dask.dataframe.DataFrame.sort_values.html)
supports only sorting a single column).Because we aim for lightcurve cohesion, where the rows for each lightcurve are only on a single partition in the Source table, this per-partition sorting is all we need allowing us to escape some of the constraints of sorting entire Dask dataframes, especially allowing us to sort lightcurves on a lazy basis for only the partitions that we need.
Ensemble.sort_lightcurves
also has an optional parameter for whether to sort the lightcurves by band as well as time.Code Quality
Project-Specific Pull Request Checklists
New Feature Checklist