-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
calculating climatologies efficiently #271
Comments
Worker nodes are allocated 100gb of persistent disk space by default. From what I understand, for gce, you can only adjust this on node-pool creation. |
I've been working today on a similar problem for a different group of people. It has resulted in a number of small fixes that are slowly adding up to significantly improved performance both for their use case, and also more generally (hopefully some of these fixes carry over to our problem here as well). What I find very helpful when going through this process is a reproducible example that I can scale down to run easily and quickly on my laptop in less than a minute, and that can also scale down to a managable number of tasks, say 1000, so that I can inspect the graph while it runs. Creating a reproducible example of the full scale problem that scales down to a laptop can be a challenge, especially as we get out further into particular scientific analyses. It is however, also extremely helpful. |
As an example of what a minimal example often looks like, here is the problem that I've been working on recently: https://gist.github.com/mrocklin/0eaf8ae2d636915b3de72814068492c2 I've found that the simplicity of the example problem corresponds very strongly with the speed at which progress is made on that problem. |
I apologize for not making my reproducible example minimal enough. |
Ah, sorry, didn't mean to sound negative. Just wanted to set this up as a possible path for people to be able to push forward on this topic without necessarily understanding the depths of dynamic task scheduling. Arguably most of the work here is distilling the problem down to something on which rapid iteration can occur. I've probably run the notebook above a few hundred times in the last few days (to say nothing of smaller versions in single-threaded situations where I can inspect all of the workers directly). |
I understand and appreciate the value of the approach you described. I didn't post this issue on dask or xarray--I posted it on pangeo, the place where we discuss the actual real-world use cases. This example is already heavily, heavily simplified from the type of analysis that one has to do to achieve actual publishable scientific results. Given the complexity of this stack, I think such "intermediate" examples have value. This issue represents a canonical operation common to all of climate science. I will continue to work to simplify it further by eliminating the "real data" part. Do you think that the 'store' aspect is important here? Once I have achieved a more minimal example, would you like it posted here or as a new dask issue? |
Here is a dask-only version of this problem. I am not sure to what extent it reproduces the pathologies of my real use case, but it does appear to have the basic problem of the data generation outpacing the store operation, resulting in a buildup of in-memory data. import dask.array as dsa
import numpy as np
# create random test data--adjust size at will
ntime = 512
shape = (ntime, 50, 100, 200)
chunkshape = (1, 1) + shape[2:]
data = dsa.random.random(shape, chunks=chunkshape)
# simulate the effects of an xarray groupby operation
n_seasons = 4
sampling = 8
season_index = np.round((np.arange(ntime)/sampling) % n_seasons).astype('int')
# calculate a "climatology"
arrays = [data[season_index==n].mean(axis=0)[None] for n in range(n_seasons)]
result = dsa.concatenate(arrays, axis=0)
# store it in a fake store
class mock_target(object):
def __init__(self):
pass
def __setitem__(self, *args, **kwargs):
pass
target = mock_target()
result.store(target, lock=False) |
Thanks @rabernat ! Here are some initial results: I reduced the dimensions to (512, 2, 100, 200) to reduce the number of tasks from 50k to around 2k. i then ran visualize on it in the following way: out = result.store(target, lock=False, compute=False) # get a lazy delayed value back
out.visualize('dask.pdf', color='order', node_attr={'penwidth': '6'}) This gives you an image that color-codes nodes by their priority. I looked at the image and saw that color seemed to be smooth across the image, which shows that task prioritization was doing a good job. Also being able to just look at the graph gave me good confidence that this is definitely a problem that should be able to run in low memory. I then ran this with the distributed scheduler and it ran too fast to really see what was going on, so I increased the latter two dimensions. I hope that this stays true to the problem at hand. I watched the status page and saw reasonable-ish looking progress. I watched the graph page and again saw that things looked pretty good. The computation proceeded in a linear fashion from bottom to top when possible. I happened to have some diagnostic code up that showed the difference between how much memory Dask thinks it should be using, vs how much it actually is using, and saw that there was a significant difference, something like 1.5GB expected vs 3-4GB real. This is consistent with dask/dask#3530 which is a deep issue facing the whole stack that we will need to think hard about in the future. Now that I looked at a smaller problem that I was able to inspect directly (a lot of the diagnostics fail pretty hard above a few thousand tasks) I switched back to the original array shape and ran the computation again. The tasks here are so small and numerous that the dashboard pretty much choked (the scheduler is probably running at 100% trying to keep up with execution). I increased the size to Probably the next thing for me to do, if there is no response on the above, is cripple my local network to see if it causes run-away data generation (see also dask/distributed#1989 ) @rabernat is this consistent with what you see? Any thoughts on the above? |
I'll just chime in with a side note to say that is is very helpful to learn about one of your general procedures for exploring this sort of issue. Thank you @mrocklin! |
@mrocklin - Thanks a lot for this helpful analysis. I tried adding the following step to the computation to try to represent the load-from-disk step of my original example: from time import sleep
def slow_down(a, overhead=0.04):
sleep(overhead)
return a
data = data.map_blocks(slow_down) This makes the computation easier to watch from the dashboard (without blowing up memory). Basically, it doesn't look like this example reproduces the fundamental problem from the original whereby the loading of data outpaces the storing. 😐 This suggests that the problem is more subtle. One interesting thing to note is that, for the original example on pangeo.pydata.org, things improve if I give it more workers. I re-tried that calculation with 160 workers instead of 80, and it actually ended up keeping much less data in memory (or spilled to disk) than the example with fewer workers. This must be telling us something, but I'm not sure what. |
It was still a helpful problem to go through. I suspect that if I'm able
to simulate a slower network then things will become worse. I'll try to
give this a shot some time in the next week (also, disclaimer, I'll be
mostly focused on other topics over the next couple weeks)
…On Fri, May 25, 2018 at 7:36 AM, Ryan Abernathey ***@***.***> wrote:
@mrocklin <https://github.com/mrocklin> - Thanks a lot for this helpful
analysis.
I add the following step to the computation to try to represent the
load-from-disk step of my original example:
from time import sleepdef slow_down(a, overhead=0.04):
sleep(overhead)
return a
data = data.map_blocks(slow_down)
This makes the computation easier to watch from the dashboard (without
blowing up memory).
Basically, it doesn't look like this example reproduces the fundamental
problem from the original whereby the loading of data outpaces the storing.
😐 This suggests that the problem is more subtle.
One interesting thing to note is that, for the original example on
pangeo.pydata.org, things improve if I give it more workers. I re-tried
that calculation with 160 workers instead of 80, and it actually ended up
keeping *much less* data in memory (or spilled to disk) than the example
with fewer workers. This must be telling us something, but I'm not sure
what.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#271 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszJOtygT50KYpAyr6SZbDsegyGNWoks5t1-y4gaJpZM4UMz_Y>
.
|
I guess it's important to emphasize that, despite what I perceived as an inefficiency, the calculation did work. It basically filled up all of the scratch space, but it didn't crash, and it eventually got through the computation and produced the desired result. |
I guess that's good to hear :)
Also, when creating minimal examples I'm also quite happy to work with
xarray if that's easier. I wonder if it might be worth investigating in a
generator for random datasets within XArray.
…On Fri, May 25, 2018 at 11:54 AM, Ryan Abernathey ***@***.***> wrote:
I guess it's important to emphasize that, despite what I perceived as an
inefficiency, the calculation *did work*. It basically filled up all of
the scratch space, but it didn't crash, and it eventually got through the
computation and produced the desired result.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#271 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszEj5g9H-shRRVYH5aEvtkhMlaVynks5t2CkmgaJpZM4UMz_Y>
.
|
@rabernat can this be closed? |
I would say it can be closed one a version of dask with
dask/dask#3648 in it has been deployed to
pangeo.pydata.org. Until then, users will continue to run out of memory.
…On Tue, Jun 26, 2018 at 4:17 PM Matthew Rocklin ***@***.***> wrote:
@rabernat <https://github.com/rabernat> can this be closed?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#271 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABJFJpgWapnsEoujdBbtzG3wOUBtHbQJks5uApa_gaJpZM4UMz_Y>
.
|
cc @martindurant let me know if you want to walk through doing this and
have someone nearby just in case
On Tue, Jun 26, 2018 at 4:31 PM, Ryan Abernathey <notifications@github.com>
wrote:
… I would say it can be closed one a version of dask with
dask/dask#3648 in it has been deployed to
pangeo.pydata.org. Until then, users will continue to run out of memory.
On Tue, Jun 26, 2018 at 4:17 PM Matthew Rocklin ***@***.***>
wrote:
> @rabernat <https://github.com/rabernat> can this be closed?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#271 (comment)
>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/
ABJFJpgWapnsEoujdBbtzG3wOUBtHbQJks5uApa_gaJpZM4UMz_Y>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#271 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszPkP7cYyQRfQk478KzkPnNX-gcSYks5uApoJgaJpZM4UMz_Y>
.
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date. |
I am trying a new approach to this problem using xarray's new |
That's exciting! And it's mostly a result of the Pangeo Seattle meeting! We should add some version of this notebook to the xarray docs |
Closes #5734 Closes #4473 Closes #4498 Closes #659 Closes #2237 xref https://github.com/pangeo-data/pangeo/issues/271 Co-authored-by: Anderson Banihirwe <axbanihirwe@ualr.edu> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com> Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com> Co-authored-by: Stephan Hoyer <shoyer@google.com> Squashed commit of the following: commit 2e3dca8011bc26e937597ed436a39a6ca7c130d5 Merge: 3ab03ee09 c34ef8a60 Author: Anderson Banihirwe <axbanihirwe@ualr.edu> Date: Thu May 12 17:49:31 2022 -0600 Merge branch 'main' into groupby-aggs-using-numpy-groupies commit 3ab03ee0960b15a03d68a54000a90a17ea464e8f Merge: ad33d8512 11041bdfd Author: Anderson Banihirwe <axbanihirwe@ualr.edu> Date: Wed May 11 13:31:45 2022 -0600 Merge branch 'main' into groupby-aggs-using-numpy-groupies commit ad33d85123dc28a374b1dfee41a2a35aba4a654c Author: dcherian <deepak@cherian.net> Date: Tue May 10 13:20:02 2022 -0600 Deduplicate commit fd20ba2d7447331bed47030fa80bf20735835a1f Author: dcherian <deepak@cherian.net> Date: Tue May 10 11:54:00 2022 -0600 Update whats-new commit 67cda8a552c647795a5480ced99d4f8f14cb3518 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Mon May 9 22:16:50 2022 +0200 Ignore typing when flox is not available commit d711d5804efc9c29a2960d765675550cf4d92c18 Merge: 33f70dae4 4b7683112 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Mon May 9 20:48:20 2022 +0200 Merge branch 'main' into pr/5734 commit 33f70dae4342897577c2bf98e7e8bddc8283585a Merge: c38ef7868 4a384fd06 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Thu May 5 23:21:23 2022 +0200 Merge branch 'groupby-aggs-using-numpy-groupies' of https://github.com/andersy005/xarray into pr/5734 commit c38ef78687a63ad6d7211c0a9a70f1fdd25ab528 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Thu May 5 23:21:09 2022 +0200 Update resample.py commit 4a384fd06786d4dce52538a991d609b95670753f Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Thu May 5 21:13:23 2022 +0000 [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci commit 2ee1de40d4b0e343ff120c744a43e45ef2508ed5 Merge: 32828090d 4f7ef6d1c Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Thu May 5 23:11:21 2022 +0200 Merge branch 'groupby-aggs-using-numpy-groupies' of https://github.com/andersy005/xarray into pr/5734 commit 32828090d0c031e3ad0b13ff7e04832f61265749 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Thu May 5 23:09:01 2022 +0200 Copy/paste instead of a for loop Hopefully mypy will be satisfied commit 4f7ef6d1c94fb431b335d07e6cfd092d3b1a9957 Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Thu May 5 20:36:23 2022 +0000 [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci commit 1ac3281242c0a9d73e579c0aa14eb481e513dd63 Merge: 444feee5c 6fbeb1310 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Thu May 5 22:34:09 2022 +0200 Merge branch 'main' into pr/5734 commit 444feee5c0176390f4e0bf95a32decf179e5d9f2 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Thu May 5 22:32:19 2022 +0200 Try subclassing to ResampleBase-classes Resample assumes having self._group_dim but that was never defined earlier, mypy should complain about this if the class was typed. commit ac49bfa5307aebfb849597e5049e24a725df17ee Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Thu May 5 20:55:45 2022 +0200 Update resample.py commit 4705b6cdc7e6d4c5f6e7c02317b605d172d45242 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Thu May 5 20:46:49 2022 +0200 Update _reductions.py commit 687beacc4696008489b00a68335f674b1b0719ba Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Thu May 5 20:19:53 2022 +0200 Update _reductions.py commit 7a58590561fef1fc1714c71443b61b4553bdfded Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Thu May 5 20:02:30 2022 +0200 Test adding back dummy methods. commit 7869ad5ef595ffaeb19e6c3f977a3603db51356d Merge: 36c206ede 126051f2b Author: Anderson Banihirwe <axbanihirwe@ualr.edu> Date: Wed May 4 21:50:36 2022 -0600 Merge branch 'main' into groupby-aggs-using-numpy-groupies commit 36c206ede789c1567f6d4fd7baa0746c6877aae1 Author: dcherian <deepak@cherian.net> Date: Wed May 4 09:15:40 2022 -0600 Clean up resampling. Add Resample._flox_reduce. Change inheritance order to make things work. commit b2b3001255bf440e9f2907297d091ae4b02dbc28 Author: dcherian <deepak@cherian.net> Date: Tue May 3 09:58:01 2022 -0600 [skip-ci] Fix whats-new. commit 6902de318e15f88d9dd33479a2e061ad318c3ef0 Author: dcherian <deepak@cherian.net> Date: Tue May 3 09:18:25 2022 -0600 Better defaults for resample commit 7dab7308d330e1ca833038f89479959fbff2426e Author: dcherian <deepak@cherian.net> Date: Tue May 3 09:13:24 2022 -0600 [skip-ci] add whats-new commit 7d9b4709353e1562b44cd299a5e8b62913bd9ea1 Author: Deepak Cherian <dcherian@users.noreply.github.com> Date: Tue May 3 08:50:22 2022 -0600 Update ci/requirements/min-all-deps.yml commit 2d1de0fcd08fe135bb721ee8382a0d54282529f2 Author: dcherian <deepak@cherian.net> Date: Mon May 2 14:38:06 2022 -0600 Add flox to min_all_deps commit 5337bd4160d5c73cf145c4710b40e2ece8aef737 Merge: 5583e342c cf8f1d6fc Author: dcherian <deepak@cherian.net> Date: Mon May 2 14:28:09 2022 -0600 Merge remote-tracking branch 'upstream/main' into groupby-aggs-using-numpy-groupies * upstream/main: [pre-commit.ci] pre-commit autoupdate (#6562) Run mypy tests (but always pass) (#6557) Update issue template to include a checklist (#6522) Remove duplicate tests, v3 (#6550) Skip mypy tests in CI (#6552) Direct usage questions to GH discussions (#6539) Revert "Attempt to improve CI caching (#6534)" (#6543) Revert "Remove duplicate tests (#6536)" (#6540) Remove duplicate tests (#6536) Add a slighly cheesy contributors panel to readme (#6520) Fix doctest & mypy CI jobs (#6535) Attempt to improve CI caching (#6534) Attempt to consolidate tests in CI (#6533) Restrict annotations to a single run in GHA (#6532) Fix some mypy issues (#6531) Pin version of black in pre-commit blackdoc (#6492) Scale numfocus image in readme (#6519) Use new importlib.metadata.entry_points interface where available. (#6516) Add a badge for binder (#6518) Bump codecov/codecov-action from 3.0.0 to 3.1.0 (#6509) commit 5583e342cc187d6f14bf43cc36c53d9916345e58 Author: dcherian <deepak@cherian.net> Date: Mon Apr 25 20:50:03 2022 -0600 Set default to "split-reduce" to reduce surprises commit eae37e2fd629abe2558f230b2f496b49cfd4c8a6 Author: dcherian <deepak@cherian.net> Date: Mon Apr 25 20:45:44 2022 -0600 Properly support numeric_only commit fcef26fe30cde5f1e5b41b94d1be0f65cb66fc6d Merge: 3a7052e66 d0a412a0e Author: dcherian <deepak@cherian.net> Date: Sat Apr 23 18:48:41 2022 -0600 Merge branch 'groupby-aggs-using-numpy-groupies' of github.com:andersy005/xarray into groupby-aggs-using-numpy-groupies * 'groupby-aggs-using-numpy-groupies' of github.com:andersy005/xarray: Update ci/requirements/environment-windows.yml commit 3a7052e66f3d287247759733a29a7118ea5c4f22 Author: dcherian <deepak@cherian.net> Date: Sat Apr 23 18:47:30 2022 -0600 Support numeric_only commit 56272775242b96cd5cbe416184b30aa0b8b2f1e7 Merge: d61377964 33cdabd26 Author: dcherian <deepak@cherian.net> Date: Sat Apr 23 18:26:23 2022 -0600 Merge branch 'main' into groupby-aggs-using-numpy-groupies * main: (26 commits) Remove xarray.ufuncs (#6491) Convert readme to markdown (#6495) HTML repr fix for Furo Sphinx theme (#6501) Restrict stalebot on projects & milestones (#6498) Ensure datetime-like variables are left unmodified by `decode_cf_variable` (#6489) Add link to xarray binder to readme (#6494) Add details section to issue template (#6486) [skip-ci] Redirect raster analysis questions to rioxarray (#6455) Accessing the Exception message via e.args[0] (#6451) Add support in the "zarr" backend for reading NCZarr data (#6420) Propagate MultiIndex variables in broadcast (#6477) Fix whatnsew build error (#6480) allow other and drop arguments in where (gh#6466) (#6467) [pre-commit.ci] pre-commit autoupdate (#6472) Fix `xr.where(..., keep_attrs=True)` bug (#6461) Fix `Number` import (#6474) Support **kwargs form in `.chunk()` (#6471) Bump actions/upload-artifact from 2 to 3 (#6468) Bump codecov/codecov-action from 2.1.0 to 3.0.0 (#6470) Bump actions/download-artifact from 2 to 3 (#6469) ... commit d0a412a0e7fdfdc2f8cc02dfcf6279ededfea586 Author: Deepak Cherian <dcherian@users.noreply.github.com> Date: Wed Apr 13 11:36:40 2022 -0600 Update ci/requirements/environment-windows.yml commit d2510c028802cce4a1aab6b0694b3b007dc335ad Merge: d61377964 b4c943e80 Author: Deepak Cherian <dcherian@users.noreply.github.com> Date: Wed Apr 13 11:36:10 2022 -0600 Merge branch 'main' into groupby-aggs-using-numpy-groupies commit d613779646d8ec77f852eb0558fbdc3bb474874b Author: dcherian <deepak@cherian.net> Date: Sun Apr 10 08:41:09 2022 -0600 fix commit da31c4f8771bf15d7b608f5cdf47c872aa457cf5 Merge: 3580ae3ca 158314abf Author: dcherian <deepak@cherian.net> Date: Sun Apr 10 08:32:33 2022 -0600 Merge branch 'groupby-aggs-using-numpy-groupies' of github.com:andersy005/xarray into groupby-aggs-using-numpy-groupies * 'groupby-aggs-using-numpy-groupies' of github.com:andersy005/xarray: [pre-commit.ci] auto fixes from pre-commit.com hooks commit 3580ae3ca1b7e93b9ec83636b7d9b5ab678e6bd9 Author: dcherian <deepak@cherian.net> Date: Sun Apr 10 08:31:20 2022 -0600 fix commit 158314abfcdfbfc2c69bc174f7ae1b6acb78f106 Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Sun Apr 10 14:31:00 2022 +0000 [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci commit 628406c8cab3021eff84ef679bf47b56dba42ec9 Merge: 4dd9e661d 812ce3339 Author: dcherian <deepak@cherian.net> Date: Sun Apr 10 08:29:30 2022 -0600 Merge branch 'groupby-aggs-using-numpy-groupies' of github.com:andersy005/xarray into groupby-aggs-using-numpy-groupies * 'groupby-aggs-using-numpy-groupies' of github.com:andersy005/xarray: [skip-ci] Apply suggestions from code review commit 812ce333957224c8193a1ef53de8fc99ff600439 Author: Deepak Cherian <dcherian@users.noreply.github.com> Date: Sun Apr 10 08:29:24 2022 -0600 [skip-ci] Apply suggestions from code review Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com> commit 4dd9e661def401105589d42bfd14a7bc3e48eb3c Author: dcherian <deepak@cherian.net> Date: Sun Apr 10 08:27:09 2022 -0600 Update envs commit 9d4ee11bb6be58b00a382e6af7e7f6b781d3e80d Author: Deepak Cherian <dcherian@users.noreply.github.com> Date: Sun Apr 10 08:25:20 2022 -0600 [skip-ci] Apply suggestions from code review Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com> commit c176f8d21faa94929db1ee1609e9edc7a9556e41 Author: dcherian <deepak@cherian.net> Date: Wed Mar 30 15:52:26 2022 +0530 Test cleanup commit fd6aa17743cb360c6587309fa92ac0fd290a590b Author: dcherian <deepak@cherian.net> Date: Wed Mar 30 13:23:34 2022 +0530 fix commit 87f94ba854277f65641b585b83b6edd5cbdc4756 Merge: 1a918021a facafac35 Author: Anderson Banihirwe <axbanihirwe@ualr.edu> Date: Thu Apr 7 12:09:53 2022 -0600 Merge branch 'main' into groupby-aggs-using-numpy-groupies commit 1a918021abb66033dc18498f50ce52bf4beae8e6 Merge: 2694dbea6 2e93d549d Author: dcherian <deepak@cherian.net> Date: Tue Mar 29 19:44:12 2022 +0530 Merge branch 'main' into groupby-aggs-using-numpy-groupies * main: (23 commits) Vectorize groupby binary ops (#6160) Speed-up multi-index html repr + add display_values_threshold option (#6400) [pre-commit.ci] pre-commit autoupdate (#6422) Fix concat scalar coord dtype (#6418) use the `DaskIndexingAdapter` for `duck dask` arrays (#6414) Weighted quantile (#6059) upgrade `sphinx` (#6415) Add kwarg-only breaking change to whats-new (#6409) [pre-commit.ci] pre-commit autoupdate (#6396) fix DataArray groupby returning a Dataset (#6394) reindex: fix missing variable metadata (#6389) [skip-ci] Add benchmarks for groupby math (#6390) Fix concat with scalar coordinate (#6385) isel: convert IndexVariable to Variable if index is dropped (#6388) fix dataset groupby combine dataarray func (#6386) fix concat with variable or dataarray as dim (#6387) #6367 Fix for time units checking could produce "unhashable type" error (#6368) Explicit indexes (#5692) Remove test_rasterio_vrt_network (#6371) Allow write_empty_chunks to be set in Zarr encoding (#6348) ... commit 2694dbea60310d83a02e91727c1de71f0eec4ec6 Author: dcherian <deepak@cherian.net> Date: Sun Mar 13 11:10:49 2022 +0530 Test flox kwargs commit e4125831e4c86df4752666bf27344fce99c77b02 Author: dcherian <deepak@cherian.net> Date: Sun Mar 13 11:10:30 2022 +0530 Fix. commit 705b3f00e001b93e5531b7801c48bbe98a4ffba1 Merge: 26d85d526 d535a3bf4 Author: dcherian <deepak@cherian.net> Date: Sun Mar 13 09:52:51 2022 +0530 Merge remote-tracking branch 'upstream/main' into groupby-aggs-using-numpy-groupies * upstream/main: Run pyupgrade on core/groupby (#6351) commit 26d85d5260a38e00f903e689b3819d51fb29f689 Author: dcherian <deepak@cherian.net> Date: Sun Mar 13 09:43:45 2022 +0530 loooser test commit a1769baf158629e84b6258069a22904de8f37fc4 Author: dcherian <deepak@cherian.net> Date: Sun Mar 13 09:34:58 2022 +0530 polish commit 94bcb32de46a152031b0da048d164608f7aa77c4 Merge: 62474a825 229dad93e Author: dcherian <deepak@cherian.net> Date: Sun Mar 13 09:27:55 2022 +0530 Merge branch 'main' into groupby-aggs-using-numpy-groupies * main: Generate reductions for DataArray, Dataset, GroupBy and Resample (#5950) explicitly install `ipython_genutils` (#6350) commit 62474a825edcd04567589d86db5f62a2b5ba5ec2 Merge: e348c7601 434db03e2 Author: dcherian <deepak@cherian.net> Date: Thu Mar 10 10:12:59 2022 -0700 Merge branch 'generate-reductions-class' into groupby-aggs-using-numpy-groupies * generate-reductions-class: (31 commits) polish update _reductions Fixes Apply suggestions from code review Fix path manual tweaks to make ci happy Update _reductions.py Write to file using open() instead. force keyword args after dim Annotate some reduction tests. Update xarray/util/generate_reductions.py Update xarray/util/generate_reductions.py add doctests more reduce another reduce one more reduce more reduce edits make reduce args consistent updates [pre-commit.ci] auto fixes from pre-commit.com hooks ... commit 434db03e2f0d8e2c30b991c0866b80eb46348621 Author: dcherian <deepak@cherian.net> Date: Thu Mar 10 09:47:58 2022 -0700 polish commit e348c76012f7d33cdf84ece5ed00ddc6b59d5cf8 Merge: 3f3a197c8 d293f50f9 Author: dcherian <deepak@cherian.net> Date: Wed Mar 9 09:43:30 2022 -0700 Merge branch 'main' into groupby-aggs-using-numpy-groupies * main: (68 commits) Bump actions/setup-python from 2 to 3 (#6338) Bump actions/checkout from 2 to 3 (#6337) In documentation on adding a new backend, add missing import and tweak headings (#6330) Lengthen underline, correct spelling, and reword (#6326) quantile: use skipna=None (#6303) New whatsnew section v2022.03.0 release notes (#6319) fix typos (using codespell) (#6316) Add General issue template (#6314) Disable CI runs on forks (#6315) Enable running sphinx-build on Windows (#6237) Fix class attributes versus init parameters (#6312) On Windows, enable successful test of opening a dataset containing a cftime index (#6305) from_dict: doctest (#6302) Drop duplicates over multiple dims, and add Dataset.drop_duplicates (#6307) Amended docs on how to add a new backend (#6292) Adding the new wrapper gsw-xarray (#6294) Amended docstring to reflect the actual behaviour of Dataset.map (#6232) Align language def in bugreport.yml with schema (#6290) Move Zarr up in io.rst (#6289) ... commit d5f627cf67204b78957a3bbc6147c455e54d080b Author: dcherian <deepak@cherian.net> Date: Tue Mar 8 17:48:00 2022 -0700 update _reductions commit ebe9985a328ca26ecc008719998d597d350c97de Merge: 1fcd0808d d293f50f9 Author: dcherian <deepak@cherian.net> Date: Tue Mar 8 17:44:58 2022 -0700 Merge branch 'main' into generate-reductions-class * main: (71 commits) Bump actions/setup-python from 2 to 3 (#6338) Bump actions/checkout from 2 to 3 (#6337) In documentation on adding a new backend, add missing import and tweak headings (#6330) Lengthen underline, correct spelling, and reword (#6326) quantile: use skipna=None (#6303) New whatsnew section v2022.03.0 release notes (#6319) fix typos (using codespell) (#6316) Add General issue template (#6314) Disable CI runs on forks (#6315) Enable running sphinx-build on Windows (#6237) Fix class attributes versus init parameters (#6312) On Windows, enable successful test of opening a dataset containing a cftime index (#6305) from_dict: doctest (#6302) Drop duplicates over multiple dims, and add Dataset.drop_duplicates (#6307) Amended docs on how to add a new backend (#6292) Adding the new wrapper gsw-xarray (#6294) Amended docstring to reflect the actual behaviour of Dataset.map (#6232) Align language def in bugreport.yml with schema (#6290) Move Zarr up in io.rst (#6289) ... commit 1fcd0808d3f925ecd5bb4f03c94d70136a332487 Author: dcherian <deepak@cherian.net> Date: Tue Mar 8 16:59:43 2022 -0700 Fixes commit 9799d87f1e45f2e20a55bab13af63f2d895666ee Author: Deepak Cherian <dcherian@users.noreply.github.com> Date: Tue Mar 8 16:57:55 2022 -0700 Apply suggestions from code review Co-authored-by: Stephan Hoyer <shoyer@google.com> commit 7b34077c1355d006155abf3cdb5bc289caae96a9 Author: dcherian <deepak@cherian.net> Date: Wed Feb 16 09:50:49 2022 -0700 Fix path commit cd51a15c3a779a2e11c47677887ba818e3b3162a Merge: 8336c53c3 18703bafe Author: dcherian <deepak@cherian.net> Date: Wed Feb 16 09:46:04 2022 -0700 Merge branch 'main' into generate-reductions-class * main: (32 commits) Small typing fix (#6159) Drop support for python 3.7 (#5892) _season_from_months can now handle np.nan (#5876) Use base ImportError not MoudleNotFoundError when trying to see if the (#6154) Remove numpy from mypy pre-commit (#6151) Change concat dims to be Hashable (#6121) Bump pypa/gh-action-pypi-publish from 1.4.2 to 1.5.0 (#6147) Remove registration of pandas datetime converter in plotting (#6109) Remove pd.Panel checks (#6145) Remove paren from DataArray.from_dict docstring (#6140) Revert "Deprecate bool(ds) (#6126)" (#6141) remove paren from data that is fed to 1D DataArray (#6139) Check for just `...`, rather than `[...]` in `da.stack` (#6132) DOC: Add "auto" to dataarray `chunk` method (#6068) TST: check datetime converter is Matplotlibs (#6128) New algorithm for forward filling (#6118) Limit and format number of displayed dimensions in repr (#5662) Add labels to dataset diagram (#6076) Deprecate bool(ds) (#6126) Revert "disable pytest-xdist (to check CI failure)" (#6127) ... commit 3f3a197c83867cca5545b1032ed1b5312d9ab51b Merge: c157fca5f 4c865d607 Author: Anderson Banihirwe <axbanihirwe@ualr.edu> Date: Thu Jan 13 15:07:31 2022 -0700 Merge branch 'main' into groupby-aggs-using-numpy-groupies commit c157fca5f9b104be66a335e4190e853400ff7ce7 Merge: 2c2e7dcd1 bc28eda79 Author: dcherian <deepak@cherian.net> Date: Wed Jan 12 17:02:49 2022 -0700 Merge remote-tracking branch 'upstream/main' into groupby-aggs-using-numpy-groupies * upstream/main: Add tests for groupby math (#6137) commit 2c2e7dcd164661601595071cafd5b8e6a00bcb10 Merge: 41e43fe22 18703bafe Author: dcherian <deepak@cherian.net> Date: Wed Jan 12 11:18:24 2022 -0700 Merge remote-tracking branch 'upstream/main' into groupby-aggs-using-numpy-groupies * upstream/main: (23 commits) Small typing fix (#6159) Drop support for python 3.7 (#5892) _season_from_months can now handle np.nan (#5876) Use base ImportError not MoudleNotFoundError when trying to see if the (#6154) Remove numpy from mypy pre-commit (#6151) Change concat dims to be Hashable (#6121) Bump pypa/gh-action-pypi-publish from 1.4.2 to 1.5.0 (#6147) Remove registration of pandas datetime converter in plotting (#6109) Remove pd.Panel checks (#6145) Remove paren from DataArray.from_dict docstring (#6140) Revert "Deprecate bool(ds) (#6126)" (#6141) remove paren from data that is fed to 1D DataArray (#6139) Check for just `...`, rather than `[...]` in `da.stack` (#6132) DOC: Add "auto" to dataarray `chunk` method (#6068) TST: check datetime converter is Matplotlibs (#6128) New algorithm for forward filling (#6118) Limit and format number of displayed dimensions in repr (#5662) Add labels to dataset diagram (#6076) Deprecate bool(ds) (#6126) Revert "disable pytest-xdist (to check CI failure)" (#6127) ... commit 41e43fe22e39179310032b7b9927e094721e332f Author: dcherian <deepak@cherian.net> Date: Wed Dec 29 17:08:38 2021 -0700 fix tests commit 70266e176b60763dc84dc8c6852039014c0032c6 Author: dcherian <deepak@cherian.net> Date: Wed Dec 29 11:30:34 2021 -0700 fix tests commit 4fb17b14d7e603abd8c16a2f68d0dfbf5272ca8b Merge: bdb999fa4 2cb95a82b Author: dcherian <deepak@cherian.net> Date: Wed Dec 29 09:48:28 2021 -0700 Merge remote-tracking branch 'upstream/main' into groupby-aggs-using-numpy-groupies * upstream/main: Replace markdown issue templates with issue forms (#6119) Remove lock kwarg (#5912) Remove pre-commit GHA workflow (#6120) is_dask_collection: micro optimization (#6107) Revert "Single matplotlib import (#5794)" (#6064) Add support for cross product (#5365) assert ds errors in test_backends (#6122) assert ds errors in test_dataset.py (#6123) commit bdb999fa4d564148bfd669d8f91c4c509443cb9a Author: dcherian <deepak@cherian.net> Date: Wed Dec 29 09:47:59 2021 -0700 fix tests commit e1ba8a2affa270b72b28812db8d53eebb32e3d29 Author: dcherian <deepak@cherian.net> Date: Tue Dec 28 11:27:29 2021 -0700 use_numpy_groupies → use_flox commit ad6b5bcb5a1817933f9d1e360a5749c9812ccaf8 Merge: 1875fd20a 5d30f96e9 Author: dcherian <deepak@cherian.net> Date: Tue Dec 28 11:23:49 2021 -0700 Merge branch 'main' into groupby-aggs-using-numpy-groupies * main: [pre-commit.ci] pre-commit autoupdate (#6115) Make CI pass by limiting dask version (#6111) Fix mypy precommit (#6110) [pre-commit.ci] pre-commit autoupdate (#6088) Add type definitions in prep for #6086 (#6090) Replace distutils.version with packaging.version (#6096) Attempt datetime coding using cftime when pandas fails (#6049) fix tests for h5netcdf v0.12 (#6097) disable pytest-xdist (to check CI failure) (#6077) cftime: 'gregorian' -> 'standard' [test-upstream] (#6082) Add release note skeleton for 0.21 (#6061) commit 8336c53c334a1bec8e6a1fc8ab7448784492d6e2 Merge: 74064b957 3960ea3ba Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Mon Dec 27 08:01:05 2021 +0100 Merge branch 'main' into pr/5950 commit 74064b957cb7c5d0dacec9708d17126fa11886bd Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Tue Dec 21 23:03:05 2021 +0100 manual tweaks to make ci happy commit 16372a55080c7abcd513ee60fe776e503fb568de Merge: b78df18bd b40678934 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Tue Dec 21 22:49:20 2021 +0100 Merge branch 'generate-reductions-class' of https://github.com/dcherian/xarray into pr/5950 commit b78df18bd11c496a4d5666c6972fb5d2f86e29d3 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Tue Dec 21 22:49:07 2021 +0100 Update _reductions.py commit bc55db33e63b10ef01e7c12518fcd7731e6d7d96 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Tue Dec 21 22:48:57 2021 +0100 Write to file using open() instead. commit 3dc94ae47165616878023b56a02b731549f920db Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Tue Dec 21 21:42:00 2021 +0100 force keyword args after dim commit 66151f62098818c6448213a906691cd4d1c1f026 Merge: cd8a898d5 dbc02d4e5 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Tue Dec 21 21:38:06 2021 +0100 Merge branch 'main' into pr/5950 commit 1875fd20a46d0058cd83b2e3a56f4633d6b5853e Merge: 638d98a70 dbc02d4e5 Author: Anderson Banihirwe <axbanihirwe@ualr.edu> Date: Thu Dec 16 07:21:47 2021 -0700 Merge branch 'main' into groupby-aggs-using-numpy-groupies commit b4067893470e7badd5bfa90f0dbb43eeac4c89f8 Merge: 45feeaba8 5db404659 Author: Deepak Cherian <deepak@cherian.net> Date: Thu Nov 25 21:53:08 2021 -0700 Merge remote-tracking branch 'upstream/main' into generate-reductions-class * upstream/main: Fixed a mispelling of dimension in dataarray documentation for from_dict (#6020) [pre-commit.ci] pre-commit autoupdate (#6014) commit 45feeaba8f3f19c8740da061a33839c5b2cf2ece Author: Deepak Cherian <deepak@cherian.net> Date: Thu Nov 25 21:52:06 2021 -0700 Annotate some reduction tests. commit 638d98a703ab53ab1d1e0ead58a425b30c712e73 Merge: 3c51b1a3e 5db404659 Author: Deepak Cherian <deepak@cherian.net> Date: Thu Nov 25 21:42:56 2021 -0700 Merge remote-tracking branch 'upstream/main' into groupby-aggs-using-numpy-groupies * upstream/main: Fixed a mispelling of dimension in dataarray documentation for from_dict (#6020) [pre-commit.ci] pre-commit autoupdate (#6014) [pre-commit.ci] pre-commit autoupdate (#5990) Use set_options for asv bottleneck tests (#5986) Fix module name retrieval in `backend.plugins.remove_duplicates()`, plugin tests (#5959) commit 2a1b12faf658bcd0079a33fe2d8bac4bc910d8e7 Author: Deepak Cherian <dcherian@users.noreply.github.com> Date: Thu Nov 25 21:41:21 2021 -0700 Update xarray/util/generate_reductions.py Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com> commit 3c51b1a3ef4f6ab285886c478070334284905353 Author: Deepak Cherian <deepak@cherian.net> Date: Thu Nov 25 21:35:59 2021 -0700 Squash merge #5950 Squashed commit of the following: commit 6916fa7debfe4ca5c5ce9796fe5fe3243d6c4d2a Author: Deepak Cherian <dcherian@users.noreply.github.com> Date: Mon Nov 22 11:16:43 2021 -0700 Update xarray/util/generate_reductions.py Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com> commit cd8a898d5b003ea28ec8f3feacb56d76b6dc1096 Author: dcherian <deepak@cherian.net> Date: Sat Nov 20 14:37:17 2021 -0700 add doctests commit 19d82cddf0a4ac30811803de9f7e70a881d52ea0 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Sat Nov 20 22:00:29 2021 +0100 more reduce commit 0f94bec2953aa3e7eadd0c4efc25cd6111b7e663 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Sat Nov 20 20:48:27 2021 +0100 another reduce commit be33560a14ac4b9379e5d0ff4f340cfbd6d552f1 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Sat Nov 20 20:28:39 2021 +0100 one more reduce commit 3d854e52055de0f53f4ba16b0713ac581611ef94 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Sat Nov 20 20:21:26 2021 +0100 more reduce edits commit 2bbddafaacf46f65c950738a9db3cbaddb198763 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Sat Nov 20 20:12:31 2021 +0100 make reduce args consistent commit dfbe103c3425ba4d8aa91095ec5b7386fb785225 Merge: f03b67592 dd28a57f6 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Sat Nov 20 19:01:59 2021 +0100 Merge branch 'generate-reductions-class' of https://github.com/dcherian/xarray into pr/5950 commit f03b67592cbff91172a50213cf0f9621062114cb Merge: 411d75d5c 7a201de64 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Sat Nov 20 19:01:42 2021 +0100 Merge branch 'main' into pr/5950 commit dd28a57f66188db47a05fad184519295d688213d Author: dcherian <deepak@cherian.net> Date: Sat Nov 20 10:57:22 2021 -0700 updates commit 6a9a1240aa95d71ef2081f6e98152981f9db336d Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Sat Nov 20 17:02:07 2021 +0000 [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci commit 411d75d5ced18349968c060918e9bdcd4be04537 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Sat Nov 20 18:00:08 2021 +0100 Now get normal code running as well Protocols are not needed anymore when subclassing/defining directly in the class. When adding a dummy method in DatasetResampleReductions the order of subclassing had to be changed so the correct reduce was used. commit 5dcb5bfebe04302beb732259f3e805bd31691ed8 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Sat Nov 20 12:30:50 2021 +0100 Attempt fixing typing errors Mixing in DatasetReduce fixes: xarray/tests/test_groupby.py:460: error: Invalid self argument "Dataset" to attribute function "mean" with type "Callable[[DatasetReduce, Optional[Hashable], Optional[bool], Optional[bool], KwArg(Any)], T_Dataset]" [misc] Switching to "Dateset" as returned type fixes: xarray/tests/test_groupby.py:77: error: Need type annotation for "expected" [var-annotated] commit 7a201de643515aec9a0c88dc78253499528fa99f Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Fri Nov 19 11:37:20 2021 -0700 [pre-commit.ci] pre-commit autoupdate (#5990) commit 95394d5bcbd7d73bae34c091a080c42bcfc9f07d Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Mon Nov 15 21:40:37 2021 +0100 Use set_options for asv bottleneck tests (#5986) * Use set_options for bottleneck tests * Use set_options in rolling * Update rolling.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update rolling.py * Update rolling.py * set_options not needed. Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> commit b2d7cd8837ea9b3e7e0eb0390479a1986f62d4b4 Author: Kai Mühlbauer <kmuehlbauer@users.noreply.github.com> Date: Mon Nov 15 18:33:43 2021 +0100 Fix module name retrieval in `backend.plugins.remove_duplicates()`, plugin tests (#5959) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> commit c7e9d9647f2c7df7e5b14644926dc2126ad4318f Author: dcherian <deepak@cherian.net> Date: Wed Nov 10 11:49:47 2021 -0700 Minor improvement commit dea8fd9f326a543807c26b6a62e84b28b5cb4cc3 Author: dcherian <deepak@cherian.net> Date: Mon Nov 8 16:18:07 2021 -0700 REfactor commit 9bb2c321e8df2f5978c40a1c3c8f891c77e847ff Author: dcherian <deepak@cherian.net> Date: Mon Nov 8 13:56:53 2021 -0700 Reorder docstring to match numpy commit 99bfe128066ec3ef1b297650a47e2dd0a45801a8 Author: dcherian <deepak@cherian.net> Date: Mon Nov 8 12:44:23 2021 -0700 Fixes #5898 commit 7f39cc0d8c664e3fcf354536ed3a95882064b4b6 Author: dcherian <deepak@cherian.net> Date: Mon Nov 8 12:39:00 2021 -0700 Minor docstring improvements. commit a04ed824a55b757937a4db4aa65729dccf62c1a7 Author: dcherian <deepak@cherian.net> Date: Mon Nov 8 12:35:48 2021 -0700 Small changes commit 816e7941e47b14280103d5d10da94139f394c0cd Author: dcherian <deepak@cherian.net> Date: Sun Nov 7 20:56:37 2021 -0700 Generate DataArray, Dataset reductions too. commit 569c67f28b3e7ff4c475793325a4388220932d02 Author: dcherian <deepak@cherian.net> Date: Sun Nov 7 20:54:42 2021 -0700 Add ddof for var, std commit 6b9a81a6fbe3ba460d905f0e92105d8e25af3ebb Author: dcherian <deepak@cherian.net> Date: Sun Nov 7 20:35:52 2021 -0700 Better generator for reductions. commit cfd2c071cf8df984e5bc4c673abc84e17882d323 Author: Deepak Cherian <deepak@cherian.net> Date: Thu Nov 25 21:02:44 2021 -0700 minimize conflicts commit af03ca45ed4c54b867bacf4359df565b3878c220 Author: Deepak Cherian <deepak@cherian.net> Date: Thu Nov 25 20:52:03 2021 -0700 Small improvement to resampling commit 6916fa7debfe4ca5c5ce9796fe5fe3243d6c4d2a Author: Deepak Cherian <dcherian@users.noreply.github.com> Date: Mon Nov 22 11:16:43 2021 -0700 Update xarray/util/generate_reductions.py Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com> commit 4f378a3fcbcda32d62c401f6f51065381b07074d Author: dcherian <deepak@cherian.net> Date: Mon Nov 22 11:09:04 2021 -0700 Bugfix DataArray resampling. commit cd8a898d5b003ea28ec8f3feacb56d76b6dc1096 Author: dcherian <deepak@cherian.net> Date: Sat Nov 20 14:37:17 2021 -0700 add doctests commit 19d82cddf0a4ac30811803de9f7e70a881d52ea0 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Sat Nov 20 22:00:29 2021 +0100 more reduce commit 0f94bec2953aa3e7eadd0c4efc25cd6111b7e663 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Sat Nov 20 20:48:27 2021 +0100 another reduce commit be33560a14ac4b9379e5d0ff4f340cfbd6d552f1 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Sat Nov 20 20:28:39 2021 +0100 one more reduce commit 3d854e52055de0f53f4ba16b0713ac581611ef94 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Sat Nov 20 20:21:26 2021 +0100 more reduce edits commit 2bbddafaacf46f65c950738a9db3cbaddb198763 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Sat Nov 20 20:12:31 2021 +0100 make reduce args consistent commit dfbe103c3425ba4d8aa91095ec5b7386fb785225 Merge: f03b67592 dd28a57f6 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Sat Nov 20 19:01:59 2021 +0100 Merge branch 'generate-reductions-class' of https://github.com/dcherian/xarray into pr/5950 commit f03b67592cbff91172a50213cf0f9621062114cb Merge: 411d75d5c 7a201de64 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Sat Nov 20 19:01:42 2021 +0100 Merge branch 'main' into pr/5950 commit dd28a57f66188db47a05fad184519295d688213d Author: dcherian <deepak@cherian.net> Date: Sat Nov 20 10:57:22 2021 -0700 updates commit 6a9a1240aa95d71ef2081f6e98152981f9db336d Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Sat Nov 20 17:02:07 2021 +0000 [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci commit 411d75d5ced18349968c060918e9bdcd4be04537 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Sat Nov 20 18:00:08 2021 +0100 Now get normal code running as well Protocols are not needed anymore when subclassing/defining directly in the class. When adding a dummy method in DatasetResampleReductions the order of subclassing had to be changed so the correct reduce was used. commit 5dcb5bfebe04302beb732259f3e805bd31691ed8 Author: Illviljan <14371165+Illviljan@users.noreply.github.com> Date: Sat Nov 20 12:30:50 2021 +0100 Attempt fixing typing errors Mixing in DatasetReduce fixes: xarray/tests/test_groupby.py:460: error: Invalid self argument "Dataset" to attribute function "mean" with type "Callable[[DatasetReduce, Optional[Hashable], Optional[bool], Optional[bool], KwArg(Any)], T_Dataset]" [misc] Switching to "Dateset" as returned type fixes: xarray/tests/test_groupby.py:77: error: Need type annotation for "expected" [var-annotated] commit 8f23310e68c1a08c318471cdcd9d9b5b61d236a4 Author: dcherian <deepak@cherian.net> Date: Fri Nov 19 14:50:01 2021 -0700 Revert "Force failure to make sure CI is working." This reverts commit 098467d3406c3de0cb7e2aaf38639f7ea74800ed. commit a282ad47362111da4efc637ccdaed01824a2e39c Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Thu Nov 18 14:11:30 2021 +0000 [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci commit 098467d3406c3de0cb7e2aaf38639f7ea74800ed Author: dcherian <deepak@cherian.net> Date: Thu Nov 18 07:09:17 2021 -0700 Force failure to make sure CI is working. commit bd24db4881646aba2c0466e93d6989c187aaae98 Author: dcherian <deepak@cherian.net> Date: Wed Nov 17 20:24:48 2021 -0700 Add to all-but-dask commit 033f5b590fb4df4acadff70515fd3c14b0f10935 Author: dcherian <deepak@cherian.net> Date: Wed Nov 17 20:24:10 2021 -0700 Add to print_versions commit cc8abfe28a426afe56d289307b7a9818d52c7d40 Author: dcherian <deepak@cherian.net> Date: Tue Nov 16 10:13:05 2021 -0700 [test-upstream] Rename to flox commit b269439cac3f58be18c02007b53d2cfbddd77cd8 Author: dcherian <deepak@cherian.net> Date: Mon Nov 15 17:54:27 2021 -0700 [test-upstream] Revert setting npg option in benchmarks commit ced9034d99e618c70ba397e3aee5f53a2e475fd0 Author: dcherian <deepak@cherian.net> Date: Mon Nov 15 15:29:38 2021 -0700 Update upstream-dev env commit 860f7be3cc8af1cfdc586dc6d466399ed56e4d65 Author: dcherian <deepak@cherian.net> Date: Mon Nov 15 15:22:02 2021 -0700 add extra test commit 7375dd4d6c6f8fafb771e99e0c9f27c79aad0d9b Author: dcherian <deepak@cherian.net> Date: Mon Nov 15 14:54:23 2021 -0700 fix test. commit 1f370f65f394256b16841d812557e8867ae43cbe Author: dcherian <deepak@cherian.net> Date: Mon Nov 15 14:30:03 2021 -0700 silence warning commit e038cc7a27449b693e19b2bac1003fb368466221 Author: dcherian <deepak@cherian.net> Date: Mon Nov 15 10:10:53 2021 -0700 Fix dimension order when binning a dimension coordinate commit 03b7b31da98842a216801f422151bd4ab130d9cf Author: dcherian <deepak@cherian.net> Date: Sun Nov 14 20:39:48 2021 -0700 one more bugfix commit 553735eb4dbe98d23f3a43db90f134a3cabd642a Merge: 43ade8cbd a883ed022 Author: dcherian <deepak@cherian.net> Date: Sun Nov 14 20:34:01 2021 -0700 Merge remote-tracking branch 'upstream/main' into groupby-aggs-using-numpy-groupies * upstream/main: Check for py version instead of try/except when importing entry_points (#5988) Add "see also" in to_dataframe docs (#5978) Alternate method using inline css to hide regular html output in an untrusted notebook (#5880) Fix mypy issue with entry_points (#5979) Remove pre-commit auto update (#5958) Do not change coordinate inplace when throwing error (#5957) Create CITATION.cff (#5956) commit 43ade8cbdbcd2bf9fc950a6fc5f5d2078b56e86f Author: dcherian <deepak@cherian.net> Date: Sun Nov 14 20:32:06 2021 -0700 "blockwise" need not be the best strategy for resample.. commit c189eea36b3bb993451c2cb6765be4cd07449b67 Author: dcherian <deepak@cherian.net> Date: Sun Nov 14 09:58:40 2021 -0700 Fix upsampling with resample (these have "empty groups") commit edbd376c792f54ade6a9447674b15a79f44c983b Author: dcherian <deepak@cherian.net> Date: Sun Nov 14 09:57:32 2021 -0700 Fix binning and weird issues with precision and pd.cut commit 415eb294af77b36f481f3c934baf617d1f55bb1e Author: dcherian <deepak@cherian.net> Date: Fri Nov 12 15:47:10 2021 -0700 Fix bug when binning by nD variable. commit e9af57c28638288532b470334547223221e1a397 Author: dcherian <deepak@cherian.net> Date: Thu Nov 11 13:36:07 2021 -0700 Try fixing mypy commit 9c2cbb8df424ff8d6e6c6faae5abc24c15a4e84b Author: dcherian <deepak@cherian.net> Date: Thu Nov 11 13:25:50 2021 -0700 Ppass through objects with only numpy or dask arrays commit 35908b590d04f7c66508d6dcea30e6ed8351ac64 Author: dcherian <deepak@cherian.net> Date: Wed Nov 10 14:33:44 2021 -0700 Revert "See if its an import error" This reverts commit be53f13b1dd832a4ba221458f64f6c277998e9da. commit be53f13b1dd832a4ba221458f64f6c277998e9da Author: dcherian <deepak@cherian.net> Date: Wed Nov 10 12:55:58 2021 -0700 See if its an import error commit 11c3d3398a12b2abcd6fd1de0af689a5d673730f Author: dcherian <deepak@cherian.net> Date: Wed Nov 10 12:37:24 2021 -0700 Fixed doctests in dask_groupby commit 77d2665458db1eec682c725e4dcdfef78454fc30 Author: dcherian <deepak@cherian.net> Date: Wed Nov 10 12:12:39 2021 -0700 Revert "Force test failure to check CI env" This reverts commit 31e1fd2419f54205616d59eb1ee176e7a61e07b2. commit c7e9d9647f2c7df7e5b14644926dc2126ad4318f Author: dcherian <deepak@cherian.net> Date: Wed Nov 10 11:49:47 2021 -0700 Minor improvement commit 47b593c481ecb93fe83dd6b9458d5683cc93119f Author: dcherian <deepak@cherian.net> Date: Wed Nov 10 09:10:39 2021 -0700 Use conda-forge numpy_groupies in CI commit 31e1fd2419f54205616d59eb1ee176e7a61e07b2 Author: dcherian <deepak@cherian.net> Date: Wed Nov 10 09:06:36 2021 -0700 Force test failure to check CI env commit 0559ee151420c31ac713829b74c3148fdfd46fa5 Author: dcherian <deepak@cherian.net> Date: Tue Nov 9 21:33:56 2021 -0700 Fix var, std doctests commit 41f0aa5aaf89b83bee6eb8f9f87f497cb4efd31e Author: dcherian <deepak@cherian.net> Date: Tue Nov 9 19:27:57 2021 -0700 fix test commit bece14e22794e5fadb9e3b3b34af920fcc2be77f Author: dcherian <deepak@cherian.net> Date: Tue Nov 9 17:41:13 2021 -0700 Fix median and add test. commit b9bc1dd0dd6e73793315457e9c9e23cf52428b4c Author: dcherian <deepak@cherian.net> Date: Tue Nov 9 17:01:27 2021 -0700 Update reductions commit 0ac5498c87492d595a6860d1bb2d1f4183b329e3 Author: dcherian <deepak@cherian.net> Date: Tue Nov 9 10:12:40 2021 -0700 Avoid stacking by default commit c9a82b3fe9d2dcc0ddee6015595be01a01d5ad00 Author: dcherian <deepak@cherian.net> Date: Tue Nov 9 10:09:29 2021 -0700 Revert "Start supporting ndim groups" This reverts commit 4ef53dbc9f84dc0a1a19f3b8cd17a3b4fd044296. commit 35af40ad26ca9e2193313d7794789ec96cd7f0ba Author: dcherian <deepak@cherian.net> Date: Mon Nov 8 21:58:02 2021 -0700 Revert "WIP refactor init" This reverts commit 6afb3bf82fbbc3a46a70d48eb74c0c9224d05bce. commit 6afb3bf82fbbc3a46a70d48eb74c0c9224d05bce Author: dcherian <deepak@cherian.net> Date: Mon Nov 8 21:57:55 2021 -0700 WIP refactor init commit 4ef53dbc9f84dc0a1a19f3b8cd17a3b4fd044296 Author: dcherian <deepak@cherian.net> Date: Mon Nov 8 21:57:21 2021 -0700 Start supporting ndim groups commit 583187a4118613f98d8a7082790724070a3df3ea Author: dcherian <deepak@cherian.net> Date: Mon Nov 8 21:08:53 2021 -0700 Fix benchmark to not groupby chunked variables. commit 3e08964dca9371ff93139f67bbc6a0170c48c28d Author: dcherian <deepak@cherian.net> Date: Mon Nov 8 16:54:52 2021 -0700 Add benchmarks commit 0c35c0c67dcd0852b08d4408c45e3790fc9b000e Author: dcherian <deepak@cherian.net> Date: Mon Nov 8 16:41:26 2021 -0700 Reimplemented commit 0661c1b667436ca5fde59cd987a1a2db94d9a75e Author: dcherian <deepak@cherian.net> Date: Mon Nov 8 16:40:20 2021 -0700 Refactored generator commit f06e6a7d5ec6575df9ac6772c6092dfa81980ae8 Author: dcherian <deepak@cherian.net> Date: Mon Nov 8 16:24:11 2021 -0700 Revert "Separate out median" This reverts commit 932b9a5d668278019bbd75ea23582ff28a463b91. commit dea8fd9f326a543807c26b6a62e84b28b5cb4cc3 Author: dcherian <deepak@cherian.net> Date: Mon Nov 8 16:18:07 2021 -0700 REfactor commit 9bb2c321e8df2f5978c40a1c3c8f891c77e847ff Author: dcherian <deepak@cherian.net> Date: Mon Nov 8 13:56:53 2021 -0700 Reorder docstring to match numpy commit 08911b9a155ca7a2f95b466fba1f97538d45f1a6 Merge: a2168df2d 20fddb7e3 Author: dcherian <deepak@cherian.net> Date: Mon Nov 8 13:58:11 2021 -0700 Merge remote-tracking branch 'upstream/main' into groupby-aggs-using-numpy-groupies * upstream/main: Add groupby & resample benchmarks (#5922) Fix plot.line crash for data of shape (1, N) in _title_for_slice on format_item (#5948) Disable unit test comments (#5946) Publish test results from workflow_run only (#5947) commit 99bfe128066ec3ef1b297650a47e2dd0a45801a8 Author: dcherian <deepak@cherian.net> Date: Mon Nov 8 12:44:23 2021 -0700 Fixes #5898 commit 7f39cc0d8c664e3fcf354536ed3a95882064b4b6 Author: dcherian <deepak@cherian.net> Date: Mon Nov 8 12:39:00 2021 -0700 Minor docstring improvements. commit a04ed824a55b757937a4db4aa65729dccf62c1a7 Author: dcherian <deepak@cherian.net> Date: Mon Nov 8 12:35:48 2021 -0700 Small changes commit 816e7941e47b14280103d5d10da94139f394c0cd Author: dcherian <deepak@cherian.net> Date: Sun Nov 7 20:56:37 2021 -0700 Generate DataArray, Dataset reductions too. commit 569c67f28b3e7ff4c475793325a4388220932d02 Author: dcherian <deepak@cherian.net> Date: Sun Nov 7 20:54:42 2021 -0700 Add ddof for var, std commit 6b9a81a6fbe3ba460d905f0e92105d8e25af3ebb Author: dcherian <deepak@cherian.net> Date: Sun Nov 7 20:35:52 2021 -0700 Better generator for reductions. commit a2168df2d595c4220a989fb99569196e0951d92f Author: dcherian <deepak@cherian.net> Date: Fri Nov 5 22:09:22 2021 -0600 typo again commit d238459ae2bbe2a38f855e12ca4752c2fd28051c Author: dcherian <deepak@cherian.net> Date: Fri Nov 5 21:50:30 2021 -0600 any,all commit ac85e72607594daf014e6afa50966289d5c1cc7f Author: dcherian <deepak@cherian.net> Date: Fri Nov 5 21:31:22 2021 -0600 make dask_groupby actually optional commit 932b9a5d668278019bbd75ea23582ff28a463b91 Author: dcherian <deepak@cherian.net> Date: Fri Nov 5 21:27:23 2021 -0600 Separate out median commit ad25f78be0250de7cc574c7941b2149015dde26c Author: dcherian <deepak@cherian.net> Date: Fri Nov 5 19:52:36 2021 -0600 Add to asv env commit 3608e9fa135ac93231cbf4f8bff007887a10e19d Author: dcherian <deepak@cherian.net> Date: Fri Nov 5 15:26:28 2021 -0600 get working again commit faee02c7cb95dcc97a71e2fb0b2627c0ca49f2ed Author: dcherian <deepak@cherian.net> Date: Fri Nov 5 12:37:36 2021 -0600 fix env stuff + remove env var commit 77f0e0e0765bcf335222ea0aad58b3700a76ea29 Merge: 0f2c59f7c 1ecf91a4d Author: dcherian <deepak@cherian.net> Date: Fri Nov 5 12:33:04 2021 -0600 Merge remote-tracking branch 'upstream/main' into groupby-aggs-using-numpy-groupies * upstream/main: (27 commits) Generator for groupby reductions (#5871) whats-new dev whats-new for 0.20.1 (#5943) Docs: fix URL for PTSA (#5935) Fix a missing @requires_zarr in tests (#5936) fix the detection of backend entrypoints (#5931) Explicitly list all reductions in api.rst (#5903) DOC: add names of missing contributors to 0.20.0 (#5932) new whats-new.rst section Update open_rasterio deprecation version number (#5916) v0.20 Release notes (#5924) [skip-ci] v0.20.0: whats-new for release (#5905) Update minimum dependencies for 0.20 (#5917) Bump actions/github-script from 4.1 to 5 (#5826) remove requirement for setuptools.pkg_resources (#5845) Update docstring for apply_ufunc, set_options (#5904) Display coords' units for slice plots (#5847) Combine by coords dataarray bugfix (#5834) Add .chunksizes property (#5900) Add typing_extensions as a required dependency (#5911) ... commit 0f2c59f7c31f8d2cd06d64ec3f4d69d87d2eeb85 Merge: 4b25db574 df7646182 Author: Anderson Banihirwe <axbanihirwe@ualr.edu> Date: Tue Oct 26 17:01:49 2021 -0600 Merge branch 'main' of github.com:pydata/xarray into groupby-aggs-using-numpy-groupies commit 4b25db574d9f13059d3b0d6319ae18f9e534b6d6 Merge: e3b3a00c6 262a3f5dd Author: dcherian <deepak@cherian.net> Date: Tue Oct 5 14:49:30 2021 +0530 Merge branch 'groupby-aggs-using-numpy-groupies' of github.com:andersy005/xarray into groupby-aggs-using-numpy-groupies * 'groupby-aggs-using-numpy-groupies' of github.com:andersy005/xarray: Update ci/requirements/environment-windows.yml commit e3b3a00c6a1e1e8cc8e20c1f2a1051efaafe9297 Author: dcherian <deepak@cherian.net> Date: Tue Oct 5 14:48:25 2021 +0530 Fix resampling commit 262a3f5dd3a22f82a719d61ac7a95758fa1d2e9b Author: Deepak Cherian <dcherian@users.noreply.github.com> Date: Mon Oct 4 11:10:44 2021 +0530 Update ci/requirements/environment-windows.yml commit 9b44db9e6d3db00ab8301fe426768925e8a2ff2f Author: dcherian <deepak@cherian.net> Date: Mon Oct 4 10:45:39 2021 +0530 Fix keep_attrs test commit b97ffcb37b416e93c5481d6ce363f74e61e88dbc Author: dcherian <deepak@cherian.net> Date: Mon Oct 4 10:44:04 2021 +0530 Fix windows env commit f4748ee80fc993f9fd9a8b4dd76d242c778b64ae Author: dcherian <deepak@cherian.net> Date: Mon Oct 4 10:28:08 2021 +0530 typo commit 1d9a36053181aeedbc90c3eec5a3457327630771 Author: dcherian <deepak@cherian.net> Date: Mon Oct 4 10:21:33 2021 +0530 Add CI for now commit 462e61b79f6ecfef24275622e84378673f9e9950 Author: dcherian <deepak@cherian.net> Date: Mon Oct 4 10:16:15 2021 +0530 Don't pass numeric_only to DataArray.reduce Tests pass! commit b1e3ab25e85ad3df302209c22639e6af11b1d443 Author: dcherian <deepak@cherian.net> Date: Sun Oct 3 17:37:07 2021 +0530 Raise error when reducing along indexed dimensions with squeeze=True commit 69fd5638c45fbfe374aba6494586bd8b48aee652 Author: dcherian <deepak@cherian.net> Date: Sun Oct 3 17:30:10 2021 +0530 Avoid forwarding DummyGroup objects commit 58c1c6b7f6158af51553249389f6da1651c97b08 Author: dcherian <deepak@cherian.net> Date: Sun Oct 3 16:06:57 2021 +0530 Add _dask_groupby_kwargs commit af4cc5d44ff95e0a4aace6e867de26b1f94e63fd Author: dcherian <deepak@cherian.net> Date: Sun Oct 3 16:05:31 2021 +0530 Fix reduce methods commit cdf7612b6deac9f4b1b793d2f0014507c6e289f9 Author: dcherian <deepak@cherian.net> Date: Sun Oct 3 12:50:27 2021 +0530 Fix resample test commit 489b2ffbc10eb62e94beee4b6fb3e2e55872d4b4 Merge: 4702c9d85 54dba584d Author: Anderson Banihirwe <axbanihirwe@ualr.edu> Date: Sun Sep 26 15:47:30 2021 -0600 Merge branch 'main' into groupby-aggs-using-numpy-groupies commit 4702c9d85ccfd835c1e59073a9588b8a7277f8a5 Merge: e6bcce903 9be0228af Author: dcherian <deepak@cherian.net> Date: Tue Aug 24 12:37:14 2021 -0600 Merge branch 'groupby-aggs-using-numpy-groupies' of github.com:andersy005/xarray into groupby_npg * 'groupby-aggs-using-numpy-groupies' of github.com:andersy005/xarray: Bump actions/github-script from 4.0.2 to 4.1 (#5730) Set coord name concat when `concat`ing along a DataArray (#5611) Add .git-blame-ignore-revs (#5708) Type annotate tests (#5728) Consolidate TypeVars in a single place (#5569) add storage_options arg to to_zarr (#5615) dataset `__repr__` updates (#5580) Xfail failing test on main (#5729) Add xarray-dataclasses to ecosystem in docs (#5725) extend show_versions (#5724) Move docstring for xr.set_options to numpy style (#5702) Refactor more groupby and resample tests (#5707) Remove suggestion to install pytest-xdist in docs (#5713) Add typing to the OPTIONS dict (#5678) Change annotations to allow str keys (#5690) Whatsnew for float-to-top (#5714) Use isort's float-to-top (#5695) Fix errors in test_latex_name_isnt_split for min environments (#5710) Improves rendering of complex LaTeX expressions as `long_name`s when plotting (#5682) Use same bool validator as other inputs (#5703) commit e6bcce9033db6da748f4b89057ef165cd6fcb6f7 Author: dcherian <deepak@cherian.net> Date: Tue Aug 24 12:35:51 2021 -0600 some fixes commit 9be0228afe1e625c707215a6a5d29c993ebdd90b Merge: 35944e434 a6b44d720 Author: Anderson Banihirwe <axbanihirwe@ualr.edu> Date: Tue Aug 24 12:06:03 2021 -0600 Merge branch 'pydata:main' into groupby-aggs-using-numpy-groupies commit 35944e434d30fdc9d4094248d0898ae8a249726c Author: Anderson Banihirwe <axbanihirwe@ualr.edu> Date: Fri Aug 13 15:23:09 2021 -0600 Remove `_numpy_groupies.py` module commit f0883921adf57d5c21ddb17ff9001ba8dba99514 Author: Anderson Banihirwe <axbanihirwe@ualr.edu> Date: Thu Aug 12 18:49:48 2021 -0600 Fix position keyword arguments commit 3ee620018e7e6a742a4c6d107911196b22282332 Author: Anderson Banihirwe <axbanihirwe@ualr.edu> Date: Thu Aug 12 18:47:32 2021 -0600 Remove comments commit 511dd448de6877f8af0d1356bf16bb20d481c209 Author: Anderson Banihirwe <axbanihirwe@ualr.edu> Date: Thu Aug 12 18:43:13 2021 -0600 Add more aggregations commit ef91e6e24eb13a2974268be2c7e0479dbd219e5e Author: Anderson Banihirwe <axbanihirwe@ualr.edu> Date: Thu Aug 12 18:24:03 2021 -0600 Add _numpy_groupies module commit c486df7720edf547e336112a376a7d444f2f8778 Author: Anderson Banihirwe <axbanihirwe@ualr.edu> Date: Thu Aug 12 16:09:54 2021 -0600 Move `_reduce_method` classmethod to `groupby.py` module
Closes pydata#5734 Closes pydata#4473 Closes pydata#4498 Closes pydata#659 Closes pydata#2237 xref pangeo-data/pangeo#271 Co-authored-by: Anderson Banihirwe <axbanihirwe@ualr.edu> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com> Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com> Co-authored-by: Stephan Hoyer <shoyer@google.com>
Closes pydata#5734 Closes pydata#4473 Closes pydata#4498 Closes pydata#659 Closes pydata#2237 xref pangeo-data/pangeo#271 Co-authored-by: Anderson Banihirwe <axbanihirwe@ualr.edu> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com> Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com> Co-authored-by: Stephan Hoyer <shoyer@google.com>
An extremely common task in climate science it to calculate a climatology (seasonal average) of some statistics from a long spatio-temporal dataset. This is exactly what I am trying to do in my pangeo use case. To make this concrete, here is an example reproducible on pangeo.pydata.org
the dataset looks like this
Its size is close to 12 TB uncompressed.
Calculating the climatology is trivial
giving
Now I want to either persist this or, even better, save it as a new dataset
In my head, this should be a pretty "streamable" operation: load aggregate all values of variable
salt
formonth==1
(January),st_ocean==0
(the vertical level); store; and move on to the next variable / month / level.However, these computations do not run very well on the current stack. The rate of reading data outpaces the rate of writing data, leading to huge memory consumption. This is evident in the dashboard screenshot below:
Here I have a cluster of 80 high-memory workers: 22GB RAM each, a total of 1.76 TB. Yet the cluster has nearly 7TB in memory. The workers are spilling lots of data to disk (I didn't even realize that the workers had significant hard drive space). This seems inefficient, although perhaps there is some logic to it that I don't grasp.
This has been discussed in numerous previous issues.
In many ways, this is a duplicate of those issues. However, those issues are also muddled up with problems with worker / scheduler config settings that we have mostly overcome on pangeo.pydata.org. Here workers are not dying--they are just operating in what appears to be a sub-optimal way.
Perhaps now is the time to tackle how this operation is scheduled at the dask level? Or perhaps there is not actually a problem at all? The calculation is slowly ticking forward in an evidently stable way.
The text was updated successfully, but these errors were encountered: