Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-dimensional linear trend #1

Open
serazing opened this issue Feb 24, 2017 · 8 comments
Open

Multi-dimensional linear trend #1

serazing opened this issue Feb 24, 2017 · 8 comments

Comments

@serazing
Copy link
Owner

I am looking for a multi-dimensional linear fit method based on dask in order to efficiently remove linear trends before computing fft in xscale.spectral.fft.ftt.

@kuchaale
Copy link

kuchaale commented Mar 2, 2017

@serazing Would this be suitable for your case?

@serazing
Copy link
Owner Author

serazing commented Mar 2, 2017

@kuchaale Thanks but your solution only works because time is assigned to the first dimension. If you try to apply this function to the x dimension, it fails. One solution could be to transpose the array before computing the linear trend.

Even if one manages to do that, da.linalg.lststq cannot deal with more than two dimensions. It will fail for three dimensional arrays. Again one solution would be to stack the rightmost dimensions in one big dimension, then do the fit and finally unstack the array.

What I want to do is to make a robust and flexible function that fit a linear trend and optionally compute some confidence intervals relative to this trend. I think there is some serious work to do to achieve that since it is not implemented in numpy functions for ndarrays.

@kuchaale
Copy link

kuchaale commented Mar 3, 2017

@serazing I did not know that da.linalg.lststq cannot deal with more than two dimensions. I ussumed that it could since scipy.signal.detrend can deal with multi-dimensinal array and uses scipy.linalg.lstsq function which I ussumed as an analogue to da.linalg.lststq.

In my X-regression project I used statsmodels project to get more sophisticated regression functions, e.g. confidence intervals or p-values. I achieved desirable results using xarray functions such as stack, groupby and apply. Woud this approach be suitable for you?

@serazing
Copy link
Owner Author

I recently added something similar using da.vstack and da.linalg.lststq for sinusoidal and polynomial fitting in xscale.signal.fitting. Feel free to check the code.

@ShashankBice
Copy link

Hi @serazing , Great library!
I am running into trouble with the linear trend function along time axis and the code is giving an error message "'NoneType' object is not subscriptable"

My xrarray(elevation_xarray):-
elevation_xarray

<xarray.DataArray 'elevation' (time: 29, lat: 556, lon: 743)>
array([[[ nan, nan, ..., nan, nan],
[ nan, nan, ..., nan, nan],
...,
[ nan, nan, ..., nan, nan],
[ nan, nan, ..., nan, nan]],

   [[      nan,       nan, ...,       nan,       nan],
    [      nan,       nan, ...,       nan,       nan],
    ...,
    [4482.1846, 4487.041 , ...,       nan,       nan],
    [      nan, 4479.762 , ...,       nan,       nan]],

   ...,

   [[      nan,       nan, ...,       nan,       nan],
    [      nan,       nan, ...,       nan,       nan],
    ...,
    [      nan,       nan, ...,       nan,       nan],
    [      nan,       nan, ...,       nan,       nan]],

   [[      nan,       nan, ...,       nan,       nan],
    [      nan,       nan, ...,       nan,       nan],
    ...,
    [      nan,       nan, ..., 6229.428 ,       nan],
    [      nan,       nan, ..., 6229.306 ,       nan]]], dtype=float32)

Coordinates:

  • lat (lat) float64 3.104e+06 3.103e+06 3.103e+06 3.103e+06 3.103e+06 ...
  • lon (lon) float64 4.732e+05 4.733e+05 4.733e+05 4.733e+05 4.734e+05 ...
  • time (time) float64 2.015e+03 2.015e+03 2.013e+03 2.003e+03 ...

and the function I call is:- elevation_trend=xfit.linreg(elevation_xarray,dim='time')
Corresponding error message:-
TypeError Traceback (most recent call last)
in ()
----> 1 elevation_trend=xfit.linreg(elevation_xarray,dim='time')

~/glacierhack_2018/xscale/xscale/signal/fitting.py in linreg(array, dim, coord)
108 A Dataset containing the slope and the offset of the linear regression
109 """
--> 110 linfit = polyfit(array, dim=dim, coord=coord, deg=1)
111 offset = linfit.sel(degree=0, drop=True)
112 slope = linfit.sel(degree=1, drop=True)

~/glacierhack_2018/xscale/xscale/signal/fitting.py in polyfit(array, deg, dim, coord)
42 # + stack the other dimensions
43 array_stacked = _order_and_stack(array, dim)
---> 44 dim_chunk = array.chunks[array.get_axis_num(dim)][0]
45
46 if coord is None:

TypeError: 'NoneType' object is not subscriptable

@serazing
Copy link
Owner Author

Thanks @ShashankBice for this feedback.
Hum, I have got the impression thatdrop=True at lines 111 and 112 is the source of the error because you have a DataArray containing nans. I am not really sure why I decided to use the option drop=True, feel free to modify the code and test it if you have some time to do it.
Otherwise, I will try to do ti myself when I get more time.

Cheers

@ShashankBice
Copy link

Hi @serazing ,
I played with the module a bit, I do not get this error if I pass a dask array instead of an xarray.
This works even if I have nans with data, only thing is that in case of nans, I get an output full of nans while in case of all valid data, I get a somewhat valid result.

I was just wondering if there is an intelligent way to ignore nans by the fitter function, like suppose if I have 15 values along a dimension and one of them has nan, can I use the other 14 to compute linear regression and ignore the one data with nan ?

Thanks!

@serazing
Copy link
Owner Author

serazing commented Oct 2, 2018

Hi @ShashankBice,
Sorry for the late response. A fitting function that would deal with nan values would be really nice indeed. However, the current function is based on the least square method da.linalg.lstsq (have a look also inside the function xscale.fitting.polyval), which cannot deal with missing values or NaNs (to be checked).
To fix the problem with array data containing numpy array instead of task array, we should force the creation of dask array using the method xarray.DataArray.chunk(chunks=None).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants