Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up _mapping_repr #5661

Merged
merged 3 commits into from
Aug 2, 2021

Conversation

Illviljan
Copy link
Contributor

@Illviljan Illviljan commented Aug 1, 2021

Creating a ordered list for filtering purposes using .items() turns out being rather slow. Use .keys() instead as that doesn't trigger a bunch of dataarray initializations.

  • Passes pre-commit run --all-files

Test case:

import numpy as np
import xarray as xr

a = np.arange(0, 2000)
data_vars = dict()
for i in a:
    data_vars[f"long_variable_name_{i}"] = xr.DataArray(
        name=f"long_variable_name_{i}",
        data=np.arange(0, 20),
        dims=[f"long_coord_name_{i}_x"],
        coords={f"long_coord_name_{i}_x": np.arange(0, 20) * 2},
    )
ds0 = xr.Dataset(data_vars)
ds0.attrs = {f"attr_{k}": 2 for k in a}

Before:

%timeit print(ds0)
14.6 s ± 215 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

After:

%timeit print(ds0)
120 ms ± 2.06 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

@github-actions
Copy link
Contributor

github-actions bot commented Aug 1, 2021

Unit Test Results

         6 files  ±0           6 suites  ±0   51m 27s ⏱️ ±0s
16 200 tests ±0  14 468 ✔️ ±0  1 732 💤 ±0  0 ❌ ±0 
90 396 runs  ±0  82 224 ✔️ ±0  8 172 💤 ±0  0 ❌ ±0 

Results for commit 8f5b4a1. ± Comparison against base commit 8f5b4a1.

♻️ This comment has been updated with latest results.

@max-sixty
Copy link
Collaborator

Wow, that's quite the speed up! I haven't had time to understand why it's so dramatic, but let's merge unless there are any objections. Feel free to add a whatsnew @Illviljan

@max-sixty max-sixty added the plan to merge Final call for comments label Aug 2, 2021
@dcherian
Copy link
Contributor

dcherian commented Aug 2, 2021

I haven't had time to understand why it's so dramatic

The old way creates all dataarrays (2000 in this case). This version creates first_rows+last_rows dataarrays. Very nice idea @Illviljan !

this branch:
[ 31.25%] ··· repr.Repr.time_repr 14.8±0.3ms
[ 37.50%] ··· repr.Repr.time_repr_html 164±1ms

main:
[ 31.25%] ··· repr.Repr.time_repr 41.6±0.5ms
[ 37.50%] ··· repr.Repr.time_repr_html 188±1ms
@dcherian
Copy link
Contributor

dcherian commented Aug 2, 2021

I pushed a benchmark...

this branch:
[ 31.25%] ··· repr.Repr.time_repr 14.8±0.3ms
[ 37.50%] ··· repr.Repr.time_repr_html 164±1ms

main:
[ 31.25%] ··· repr.Repr.time_repr 41.6±0.5ms
[ 37.50%] ··· repr.Repr.time_repr_html 188±1ms

@Illviljan
Copy link
Contributor Author

The benchmark uses (only) 100 arrays so I suppose there isn't as big of difference at that point. Did you get similar values as me with 2000, @dcherian ?

@dcherian
Copy link
Contributor

dcherian commented Aug 2, 2021

3x is good! I reduced the size so it would run faster, we can't run a 15s benchmark every time we benchmark an older commit...

Can you add a whats-new note please?

@Illviljan
Copy link
Contributor Author

Illviljan commented Aug 2, 2021

Ok, if you got 15 seconds as well I'm satisfied. I was just worried I had something else affecting my results.

@max-sixty max-sixty merged commit 8f5b4a1 into pydata:main Aug 2, 2021
@max-sixty
Copy link
Collaborator

Another great one @Illviljan ! Thank you!

st-bender added a commit to st-bender/xarray that referenced this pull request Aug 10, 2021
* main: (31 commits)
  Refactor index vs. coordinate variable(s) (pydata#5636)
  pre-commit: autoupdate hook versions (pydata#5685)
  Flexible Indexes: Avoid len(index) in map_blocks (pydata#5670)
  Speed up _mapping_repr (pydata#5661)
  update the link to `scipy`'s intersphinx file (pydata#5665)
  Bump styfle/cancel-workflow-action from 0.9.0 to 0.9.1 (pydata#5663)
  pre-commit: autoupdate hook versions (pydata#5660)
  fix the binder environment (pydata#5650)
  Update api.rst (pydata#5639)
  Kwargs to rasterio open (pydata#5609)
  Bump codecov/codecov-action from 1 to 2.0.2 (pydata#5633)
  new blank whats-new for v0.19.1
  v0.19.0 release notes (pydata#5632)
  remove deprecations scheduled for 0.19 (pydata#5630)
  Make typing-extensions optional (pydata#5624)
  Plots get labels from pint arrays (pydata#5561)
  Add to_numpy() and as_numpy() methods (pydata#5568)
  pin fsspec (pydata#5627)
  pre-commit: autoupdate hook versions (pydata#5617)
  Add dataarray scatter with 3d support (pydata#4909)
  ...
dcherian added a commit to kmsquire/xarray that referenced this pull request Aug 11, 2021
* upstream/main: (31 commits)
  Refactor index vs. coordinate variable(s) (pydata#5636)
  pre-commit: autoupdate hook versions (pydata#5685)
  Flexible Indexes: Avoid len(index) in map_blocks (pydata#5670)
  Speed up _mapping_repr (pydata#5661)
  update the link to `scipy`'s intersphinx file (pydata#5665)
  Bump styfle/cancel-workflow-action from 0.9.0 to 0.9.1 (pydata#5663)
  pre-commit: autoupdate hook versions (pydata#5660)
  fix the binder environment (pydata#5650)
  Update api.rst (pydata#5639)
  Kwargs to rasterio open (pydata#5609)
  Bump codecov/codecov-action from 1 to 2.0.2 (pydata#5633)
  new blank whats-new for v0.19.1
  v0.19.0 release notes (pydata#5632)
  remove deprecations scheduled for 0.19 (pydata#5630)
  Make typing-extensions optional (pydata#5624)
  Plots get labels from pint arrays (pydata#5561)
  Add to_numpy() and as_numpy() methods (pydata#5568)
  pin fsspec (pydata#5627)
  pre-commit: autoupdate hook versions (pydata#5617)
  Add dataarray scatter with 3d support (pydata#4909)
  ...
dcherian added a commit to dcherian/xarray that referenced this pull request Aug 13, 2021
* upstream/main: (34 commits)
  Use same bool validator as other inputs (pydata#5703)
  conditionally disable bottleneck (pydata#5560)
  Refactor index vs. coordinate variable(s) (pydata#5636)
  pre-commit: autoupdate hook versions (pydata#5685)
  Flexible Indexes: Avoid len(index) in map_blocks (pydata#5670)
  Speed up _mapping_repr (pydata#5661)
  update the link to `scipy`'s intersphinx file (pydata#5665)
  Bump styfle/cancel-workflow-action from 0.9.0 to 0.9.1 (pydata#5663)
  pre-commit: autoupdate hook versions (pydata#5660)
  fix the binder environment (pydata#5650)
  Update api.rst (pydata#5639)
  Kwargs to rasterio open (pydata#5609)
  Bump codecov/codecov-action from 1 to 2.0.2 (pydata#5633)
  new blank whats-new for v0.19.1
  v0.19.0 release notes (pydata#5632)
  remove deprecations scheduled for 0.19 (pydata#5630)
  Make typing-extensions optional (pydata#5624)
  Plots get labels from pint arrays (pydata#5561)
  Add to_numpy() and as_numpy() methods (pydata#5568)
  pin fsspec (pydata#5627)
  ...
dcherian added a commit to dcherian/xarray that referenced this pull request Aug 13, 2021
* upstream/main: (307 commits)
  Use same bool validator as other inputs (pydata#5703)
  conditionally disable bottleneck (pydata#5560)
  Refactor index vs. coordinate variable(s) (pydata#5636)
  pre-commit: autoupdate hook versions (pydata#5685)
  Flexible Indexes: Avoid len(index) in map_blocks (pydata#5670)
  Speed up _mapping_repr (pydata#5661)
  update the link to `scipy`'s intersphinx file (pydata#5665)
  Bump styfle/cancel-workflow-action from 0.9.0 to 0.9.1 (pydata#5663)
  pre-commit: autoupdate hook versions (pydata#5660)
  fix the binder environment (pydata#5650)
  Update api.rst (pydata#5639)
  Kwargs to rasterio open (pydata#5609)
  Bump codecov/codecov-action from 1 to 2.0.2 (pydata#5633)
  new blank whats-new for v0.19.1
  v0.19.0 release notes (pydata#5632)
  remove deprecations scheduled for 0.19 (pydata#5630)
  Make typing-extensions optional (pydata#5624)
  Plots get labels from pint arrays (pydata#5561)
  Add to_numpy() and as_numpy() methods (pydata#5568)
  pin fsspec (pydata#5627)
  ...
@Illviljan Illviljan deleted the Illviljan-speed_up_mapping_repr branch August 12, 2022 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants