-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Undesired chunking of 3d input when using linear detrend with poyfit()
#116
Comments
This is a good hunch to pursue. My assumption would be that polyfit would not alter the chunk structure at all, particularly if you are applying polyfit over a contiguous (non-chunked) dimension. But that appears not to be the case. I tried to make a very simple reproducible example, and I actually got the opposite behavior: chunks get consolidated rather than decimated! (This is still a problem.) import dask.array as dsa
import xarray as xr
nz, ny, nx = (10, 20, 30)
data = dsa.ones((nz, ny, nx), chunks=(1, 5, nx))
da = xr.DataArray(data, dims=['z', 'y', 'x'])
da.chunks
# -> ((1, 1, 1, 1, 1, 1, 1, 1, 1, 1), (5, 5, 5, 5), (30,))
pf = da.polyfit('x', 1)
pf.polyfit_coefficients.chunks
# -> ((1, 1, 1, 1, 1, 1, 1, 1, 1, 1), (20,), (30,))
# chunks on the y dimension have been consolidated!
pv = xr.polyval(da.x, pf.polyfit_coefficients).transpose('z', 'y', 'x')
pv.chunks
# -> ((1, 1, 1, 1, 1, 1, 1, 1, 1, 1), (20,), (30,))
# and this propagates to polyval
# align back against the original data
(da - pv).chunks
# -> ((1, 1, 1, 1, 1, 1, 1, 1, 1, 1), (5, 5, 5, 5), (30,))
# hides the fact that we have chunk consolidation happening upstream Repeating this with your arrays sizes gives similar results to you nz, ny, nx = (6, 10, 4)
data = dsa.ones((nz, ny, nx), chunks=(2, 10, 2))
da = xr.DataArray(data, dims=['z', 'y', 'x'])
da.chunks
# -> ((2, 2, 2), (10,), (2, 2))
pf = da.polyfit('y', 1)
pf.polyfit_coefficients.chunks
# -> ((2,), (1, 1, 1, 1, 1, 1), (4,))
pv = xr.polyval(da.y, pf.polyfit_coefficients).transpose('z', 'y', 'x')
pv.chunks
# -> ((1, 1, 1, 1, 1, 1), (10,), (4,))
(da - pv).chunks
# -> ((1, 1, 1, 1, 1, 1), (10,), (2, 2)) I think this is an xarray problem; we should probably open an issue in xarray: https://github.com/pydata/xarray/issues. However, I'm not optimistic it can be fixed soon, since xarray itself relies on dask's implementation of polyfit. So yes, in the meantime, we will need a workaround within xrft. Unfortunately this means revisiting #112. 😭 I think we can simplify things by just acknowledging that we will never detrend over a chunked dimension and just using I'm sorry that you're still bogged down in this technical detail. On the other hand, you are getting a valuable crash course in chunking and the pitfalls around it. |
This plan sounds just fine to me, though I might need some help implementing map_blocks. The other option at the moment (i.e. without needing to update the xrft code) is that I can just run
Happy to open an issue about this in xarray, once I get back from lunch!
At least for now, I don't mind the technical details! I'm learning a lot along the way! 😄 |
Just a note on this: But we never want to detrend over a chunked dimension anyway. I mean, I.e., I'm suggesting simply replacing Lines 143 to 144 in 3662a85
with linear_fit = xr.polyval(da[dim[0]], p.polyfit_coefficients)
linear_fit = linear_fit.chunk(da.chunks) # see discussion in https://github.com/xgcm/xrft/issues/116
return da - linear_fit or linear_fit = xr.polyval(da[dim[0]], p.polyfit_coefficients)
return (da - linear_fit).chunk(da.chunks) # see discussion in https://github.com/xgcm/xrft/issues/116 |
@navidcy Yes! I agree! Your suggested fix is exactly what I am currently trying to test (to see if it runs without error on the global dataset), but my dask computation seems to have stalled. Not sure yet if that's because of the polyfit rechunking or an error on my end (likely the latter). But yes, thank you for clarifying this point! |
I opened a PR. It's marked as WIP since the discussion here might conclude that the solution I suggested is not ideal. Feel free to close the PR in that case ;) |
With @navidcy 's suggested fix, I am still unable to run xrft linear detrending using This is likely only a problem for very large datasets, since it works on a subset of the global CESM data. |
Is this closed via #118? |
yes |
I am running the most up-to-date version of xrft, that uses xarray's
polyfit()
for linear detrending.When I run
xrft.dft()
on one dimension with linear detrending on a 3d input, I am getting an output that has different chunking than the original input. Specifically, the first axis that is not the fft dimension appears to be output with smaller chunks.For example, if I construct a simple 3d input:
![Screen Shot 2020-10-30 at 11 38 11](https://user-images.githubusercontent.com/26591824/97646703-708a4280-1aa4-11eb-8419-5c02ee34a794.png)
Then the output is as follows. Note the different chunking along the 'lon' axis.
![Screen Shot 2020-10-30 at 11 39 39](https://user-images.githubusercontent.com/26591824/97646766-a0d1e100-1aa4-11eb-8d3a-a5da4eca1765.png)
In the above image, I have printed out the original input array chunks, the chunks that result from applying
polyfit()
, and the resulting chunks after all detrending. It thus appears thatpolyfit()
is rechunking the 'lon' dimension from chunks of length 2 to chunks of length 1, and that this persists after the detrending function.Any idea if this is normal behavior? I wonder if it might be a bug in xarray's
polyfit()
implementation, but perhaps I have misinterpreted how it is supposed to work. I'm currently testing whether rechunking after detrending to the original chunk sizes would be a workaround for xrft'spolyfit()
implementation.This rechunking doesn't cause problems if the data is not very large, but when I try to run this on, e.g., the global CESM ocean data, dask can't handle it (presumably due to very small chunk sizes/large number of tasks).
Discussed this issue with @navidcy
The text was updated successfully, but these errors were encountered: