Avoid loops in `s2fft.sampling.reindex` functions to reduce compile and run times #245

matt-graham · 2024-11-19T11:23:54Z

The index conversion functions in s2fft.sampling.reindex currently use Python loops (over L iterations) to perform the reindexing operations slice by slice. As these loops are unrolled when JIT compiling this leads to long compile times as L gets bigger.

This PR refactors the reindexing functions to instead construct arrays of indices and use these to gather or scatter the relevant entries in single indexing operations, thus avoiding the loops. For the functions converting between HEALPix and 2D harmonic coefficient indexing, we can use the numpy.triu_indices function to construct the relevant index arrays. For the maps between 1D and 2D layouts we can use numpy.where on a boolean array constructed so that the relevant array elements are True. In both cases the explicitly constructed index arrays add some memory overhead - as they depend only L which is static they will be constants at compile time, and in all cases are I believe $\mathcal{O}(L^2)$ in size - however this is no larger than the memory requirements of the harmonic coefficient arrays themselves, and the computational graphs of the resulting compiled functions are much smaller (no longer scaling with L) so its unclear if overall the memory requirements are much larger.

I've done some benchmarking of the new implementations here compared to the current implementations and in all cases tested the compile times are significantly lower and the run times comparable or significantly better:

https://gist.github.com/matt-graham/7a2b5b77f51b5301b8910ced176d8a8a

For example for the flm_2d_to_hp_fast function, the following plot shows the compile times of the current implementation ("Current"), the proposed implementation ("triu_indices based") and two other alternatives I tried out ("Slice based" and "Nested fori_loop") for various L values

and the corresponding plot for run times of the compiled functions

On both compile and run times the proposed "triu_indices based" implementation is significantly quicker than both the current implementation and the other alternatives considered.

EDIT: There was a slight bug in the flm_hp_to_2d_fast reimplementation now hopefully fixed. The run times for the new function are now a little slower but the compile time is still much better than current implementation and importantly not quickly growing with L.

codecov · 2024-11-19T12:46:16Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.06%. Comparing base (909e6f1) to head (d2c8f54).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #245      +/-   ##
==========================================
- Coverage   96.07%   96.06%   -0.02%     
==========================================
  Files          31       31              
  Lines        3567     3555      -12     
==========================================
- Hits         3427     3415      -12     
  Misses        140      140

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

jasonmcewen

LGTM!

CosmoMatt · 2024-11-26T10:19:09Z

@matt-graham nice, this was a temporary loop that I never got around to removing!

CosmoMatt

Nice!

matt-graham added 2 commits November 18, 2024 18:44

Avoid unrolled for loops in reindexing functions

6b0ee97

Propagate type and fix typo in docstring

89f793b

matt-graham requested a review from CosmoMatt November 19, 2024 11:23

Fix bug in flm_hp_to_2d_fast implementation

d2c8f54

jasonmcewen self-requested a review November 26, 2024 09:53

jasonmcewen approved these changes Nov 26, 2024

View reviewed changes

CosmoMatt approved these changes Nov 26, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid loops in `s2fft.sampling.reindex` functions to reduce compile and run times #245

Avoid loops in `s2fft.sampling.reindex` functions to reduce compile and run times #245

matt-graham commented Nov 19, 2024 •

edited

Loading

codecov bot commented Nov 19, 2024

jasonmcewen left a comment

CosmoMatt commented Nov 26, 2024

CosmoMatt left a comment

Avoid loops in s2fft.sampling.reindex functions to reduce compile and run times #245

Are you sure you want to change the base?

Avoid loops in s2fft.sampling.reindex functions to reduce compile and run times #245

Conversation

matt-graham commented Nov 19, 2024 • edited Loading

codecov bot commented Nov 19, 2024

Codecov Report

jasonmcewen left a comment

Choose a reason for hiding this comment

CosmoMatt commented Nov 26, 2024

CosmoMatt left a comment

Choose a reason for hiding this comment

Avoid loops in `s2fft.sampling.reindex` functions to reduce compile and run times #245

Avoid loops in `s2fft.sampling.reindex` functions to reduce compile and run times #245

matt-graham commented Nov 19, 2024 •

edited

Loading