Faster unstacking to sparse #5577

dcherian · 2021-07-05T17:20:59Z

Tests added
Passes pre-commit run --all-files
User visible changes (including notable bug fixes) are documented in whats-new.rst

From 7s to 25 ms and 3.5GB to 850MB memory usage =) by passing the coordinate locations directly to the sparse constructor.

asv run -e --bench unstacking.UnstackingSparse.time_unstack_to_sparse  --cpu-affinity=3 HEAD
[  0.00%] · For xarray commit c9251e1c <sparse-unstack>:
[  0.00%] ·· Building for conda-py3.8-bottleneck-dask-distributed-netcdf4-numpy-pandas-scipy-sparse
[  0.00%] ·· Benchmarking conda-py3.8-bottleneck-dask-distributed-netcdf4-numpy-pandas-scipy-sparse
[  0.01%] ··· Running (unstacking.UnstackingSparse.time_unstack_to_sparse_2d--)..
[  0.02%] ··· unstacking.UnstackingSparse.time_unstack_to_sparse_2d    623±30μs
[  0.02%] ··· unstacking.UnstackingSparse.time_unstack_to_sparse_3d    22.8±2ms
[  0.06%] ··· unstacking.UnstackingSparse.peakmem_unstack_to_sparse_2d    793M
[  0.06%] ··· unstacking.UnstackingSparse.peakmem_unstack_to_sparse_3d    794M


[  0.04%] · For xarray commit 80905135 <main>:
[  0.04%] ·· Building for conda-py3.8-bottleneck-dask-distributed-netcdf4-numpy-pandas-scipy-sparse..
[  0.04%] ·· Benchmarking conda-py3.8-bottleneck-dask-distributed-netcdf4-numpy-pandas-scipy-sparse
[  0.05%] ··· Running (unstacking.UnstackingSparse.time_unstack_to_sparse_2d--)..
[  0.06%] ··· unstacking.UnstackingSparse.time_unstack_to_sparse_2d    596±30ms
[  0.06%] ··· unstacking.UnstackingSparse.time_unstack_to_sparse_3d    7.72±0.1s
[  0.02%] ··· unstacking.UnstackingSparse.peakmem_unstack_to_sparse_2d    867M
[  0.02%] ··· unstacking.UnstackingSparse.peakmem_unstack_to_sparse_3d    3.56G

cc @bonnland

xarray/core/variable.py

github-actions · 2021-07-05T17:28:09Z

Unit Test Results

        6 files         6 suites 53m 48s ⏱️
16 281 tests 14 545 ✔️ 1 736 💤 0 ❌
90 882 runs 82 702 ✔️ 8 180 💤 0 ❌

Results for commit 267a14f.

♻️ This comment has been updated with latest results.

max-sixty · 2021-07-05T18:37:06Z

From 7s to 25 ms

Casual!

xarray/core/variable.py

doc/whats-new.rst

* upstream/main: (34 commits) Use same bool validator as other inputs (pydata#5703) conditionally disable bottleneck (pydata#5560) Refactor index vs. coordinate variable(s) (pydata#5636) pre-commit: autoupdate hook versions (pydata#5685) Flexible Indexes: Avoid len(index) in map_blocks (pydata#5670) Speed up _mapping_repr (pydata#5661) update the link to `scipy`'s intersphinx file (pydata#5665) Bump styfle/cancel-workflow-action from 0.9.0 to 0.9.1 (pydata#5663) pre-commit: autoupdate hook versions (pydata#5660) fix the binder environment (pydata#5650) Update api.rst (pydata#5639) Kwargs to rasterio open (pydata#5609) Bump codecov/codecov-action from 1 to 2.0.2 (pydata#5633) new blank whats-new for v0.19.1 v0.19.0 release notes (pydata#5632) remove deprecations scheduled for 0.19 (pydata#5630) Make typing-extensions optional (pydata#5624) Plots get labels from pint arrays (pydata#5561) Add to_numpy() and as_numpy() methods (pydata#5568) pin fsspec (pydata#5627) ...

Illviljan · 2021-10-29T23:16:53Z

       before           after         ratio
     [36f05d70]       [0310ebec]
-           2.98G             204M     0.07  unstacking.UnstackingSparse.peakmem_unstack_to_sparse_3d [fv-az292-755/conda-py3.8-bottleneck-dask-distributed-netcdf4-numpy-pandas-scipy-sparse]
-              3G             204M     0.07  unstacking.UnstackingSparse.peakmem_unstack_to_sparse_3d [fv-az292-755/conda-py3.8-dask-distributed-netcdf4-numpy-pandas-scipy-sparse]
-      10.2±0.02s         29.7±2ms     0.00  unstacking.UnstackingSparse.time_unstack_to_sparse_3d [fv-az292-755/conda-py3.8-bottleneck-dask-distributed-netcdf4-numpy-pandas-scipy-sparse]
-      10.1±0.05s       27.4±0.6ms     0.00  unstacking.UnstackingSparse.time_unstack_to_sparse_3d [fv-az292-755/conda-py3.8-dask-distributed-netcdf4-numpy-pandas-scipy-sparse]
-        714±20ms         945±30μs     0.00  unstacking.UnstackingSparse.time_unstack_to_sparse_2d [fv-az292-755/conda-py3.8-bottleneck-dask-distributed-netcdf4-numpy-pandas-scipy-sparse]
-         721±8ms         923±30μs     0.00  unstacking.UnstackingSparse.time_unstack_to_sparse_2d [fv-az292-755/conda-py3.8-dask-distributed-netcdf4-numpy-pandas-scipy-sparse]

Quite the improvement indeed. :)

xarray/core/variable.py

* upstream/main: (39 commits) Fixed a mispelling of dimension in dataarray documentation for from_dict (pydata#6020) [pre-commit.ci] pre-commit autoupdate (pydata#6014) [pre-commit.ci] pre-commit autoupdate (pydata#5990) Use set_options for asv bottleneck tests (pydata#5986) Fix module name retrieval in `backend.plugins.remove_duplicates()`, plugin tests (pydata#5959) Check for py version instead of try/except when importing entry_points (pydata#5988) Add "see also" in to_dataframe docs (pydata#5978) Alternate method using inline css to hide regular html output in an untrusted notebook (pydata#5880) Fix mypy issue with entry_points (pydata#5979) Remove pre-commit auto update (pydata#5958) Do not change coordinate inplace when throwing error (pydata#5957) Create CITATION.cff (pydata#5956) Add groupby & resample benchmarks (pydata#5922) Fix plot.line crash for data of shape (1, N) in _title_for_slice on format_item (pydata#5948) Disable unit test comments (pydata#5946) Publish test results from workflow_run only (pydata#5947) Generator for groupby reductions (pydata#5871) whats-new dev whats-new for 0.20.1 (pydata#5943) Docs: fix URL for PTSA (pydata#5935) ...

dcherian · 2021-12-02T01:50:12Z

@pydata/xarray I'm planning to merge on Friday. It's been sitting around for a while and is a giant improvement.

* upstream/main: fix grammatical typo in docs (pydata#6034) Use condas dask-core in ci instead of dask to speedup ci and reduce dependencies (pydata#6007) Use complex nan by default when interpolating out of bounds (pydata#6019) Simplify missing value handling in xarray.corr (pydata#6025) Add pyXpcm to Related Projects doc page (pydata#6031) Make xr.corr and xr.map_blocks work without dask (pydata#5731)

doc/whats-new.rst

Faster unstacking to sparse

9ac1e07

dcherian commented Jul 5, 2021

View reviewed changes

xarray/core/variable.py Outdated Show resolved Hide resolved

Update xarray/core/variable.py

6bd0fe7

dcherian added the needs review label Jul 5, 2021

[skip-ci] Add memory benchmarks

e976ada

dcherian added the topic-arrays related to flexible array support label Jul 5, 2021

max-sixty reviewed Jul 5, 2021

View reviewed changes

xarray/core/variable.py Outdated Show resolved Hide resolved

max-sixty reviewed Jul 5, 2021

View reviewed changes

xarray/core/variable.py Outdated Show resolved Hide resolved

cleanups + add comments

e4a6ec2

dcherian force-pushed the sparse-unstack branch from fa201bd to e4a6ec2 Compare July 5, 2021 20:45

optimize.

0c6f22f

dcherian mentioned this pull request Jul 6, 2021

Faster unstacking of dask arrays #5582

Open

bugfix

6e12955

dcherian commented Jul 7, 2021

View reviewed changes

doc/whats-new.rst Outdated Show resolved Hide resolved

dcherian and others added 5 commits July 7, 2021 09:21

[skip-ci] Update doc/whats-new.rst

8e6c548

clean up comments

637421d

FIx whats-new

58aa601

Merge branch 'main' into sparse-unstack

267a14f

dcherian added the run-benchmark Run the ASV benchmark workflow label Oct 28, 2021

Illviljan reviewed Nov 8, 2021

View reviewed changes

xarray/core/variable.py Show resolved Hide resolved

dcherian added 2 commits November 23, 2021 19:52

faster benchmarks

ea22454

dcherian added the plan to merge Final call for comments label Nov 24, 2021

make fewer assumptions

97e6915

Fix whats-new

b7017af

dcherian commented Dec 2, 2021

View reviewed changes

doc/whats-new.rst Outdated Show resolved Hide resolved

Update doc/whats-new.rst

1532c5e

dcherian force-pushed the sparse-unstack branch from 9adc72c to 1532c5e Compare December 2, 2021 02:16

dcherian merged commit cdfcf37 into pydata:main Dec 3, 2021

dcherian deleted the sparse-unstack branch December 3, 2021 16:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster unstacking to sparse #5577

Faster unstacking to sparse #5577

dcherian commented Jul 5, 2021 •

edited

Loading

github-actions bot commented Jul 5, 2021 •

edited

Loading

max-sixty commented Jul 5, 2021

Illviljan commented Oct 29, 2021

dcherian commented Dec 2, 2021

Faster unstacking to sparse #5577

Faster unstacking to sparse #5577

Conversation

dcherian commented Jul 5, 2021 • edited Loading

github-actions bot commented Jul 5, 2021 • edited Loading

Unit Test Results

max-sixty commented Jul 5, 2021

Illviljan commented Oct 29, 2021

dcherian commented Dec 2, 2021

dcherian commented Jul 5, 2021 •

edited

Loading

github-actions bot commented Jul 5, 2021 •

edited

Loading