Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for Sorting Lightcurves by Time #353

Merged
merged 13 commits into from
Mar 19, 2024
Merged

Add Support for Sorting Lightcurves by Time #353

merged 13 commits into from
Mar 19, 2024

Conversation

wilsonbb
Copy link
Collaborator

@wilsonbb wilsonbb commented Jan 19, 2024

  • My PR includes a link to the issue that I am addressing

Introduces Ensemble.sort_lightcurves which sorts lightcurves by their timestamp in ascending order.

This PR closes #316 though we opt to provide a separate function for sorting lightcurves by their timestamp rather than adding additional functionality to the batch function. This also makes it easier for the user to only sort when needed rather than worrying about the arguments specified over repeated batch calls.

Solution Description

Ideally we would use a multi-index of Object ID and timestamp for the Source table, but Dask lacks support for this. Instead, we use the Object ID as the sorted index for both Source and Object tables, and a call to Ensemble.sort_lightcurves perform a per-partition Pandas sort_values for sorting the underlying Pandas dataframe by {Object ID, timestamp} (Note that Dask [sort_values](https://docs.dask.org/en/latest/generated/dask.dataframe.DataFrame.sort_values.html) supports only sorting a single column).

Because we aim for lightcurve cohesion, where the rows for each lightcurve are only on a single partition in the Source table, this per-partition sorting is all we need allowing us to escape some of the constraints of sorting entire Dask dataframes, especially allowing us to sort lightcurves on a lazy basis for only the partitions that we need. Ensemble.sort_lightcurves also has an optional parameter for whether to sort the lightcurves by band as well as time.

Code Quality

  • My code builds (or compiles) cleanly without any errors or warnings
  • My code contains relevant comments and necessary documentation

Project-Specific Pull Request Checklists

  • I have added a function that requires a sync_tables command, and have added the necessary sync_tables call

New Feature Checklist

  • I have added or updated the docstrings associated with my feature using the NumPy docstring format
  • I have updated the tutorial to highlight my new feature (if appropriate)
  • I have added unit/End-to-End (E2E) test cases to cover my new feature
  • My change includes a breaking change
    • My change includes backwards compatibility and deprecation warnings (if possible)

Copy link

github-actions bot commented Jan 19, 2024

Before [8f0c7b7] After [6e6d8ff] Ratio Benchmark (Parameter)
53.1±1ms 52.3±0.8ms 0.98 benchmarks.time_prune_sync_workflow
51.5±1ms 50.0±2ms 0.97 benchmarks.time_batch

Click here to view all benchmarks.

Copy link

codecov bot commented Jan 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.55%. Comparing base (8f0c7b7) to head (6cd113c).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #353      +/-   ##
==========================================
+ Coverage   95.53%   95.55%   +0.02%     
==========================================
  Files          25       25              
  Lines        1702     1710       +8     
==========================================
+ Hits         1626     1634       +8     
  Misses         76       76              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wilsonbb wilsonbb marked this pull request as ready for review January 22, 2024 19:41
@wilsonbb wilsonbb requested a review from hombit January 22, 2024 19:41
Copy link
Contributor

@hombit hombit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

What would actually happen if the condition of lightcurve cohesion is not met? Do we need a docstring telling something about that?

src/tape/ensemble.py Outdated Show resolved Hide resolved
@wilsonbb
Copy link
Collaborator Author

Looks great!

What would actually happen if the condition of lightcurve cohesion is not met? Do we need a docstring telling something about that?

Added a note that this would fail to globally sort the table in the docstring.

@hombit hombit self-requested a review March 19, 2024 07:54
@wilsonbb wilsonbb merged commit 86a0c45 into main Mar 19, 2024
10 checks passed
@dougbrn dougbrn deleted the timing branch April 4, 2024 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ensemble.batch should support sorting by timestamp
2 participants