Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a reason for why DataArray.swap_dims() cannot be done in place like Dataset.swap_dims? #1755

Closed
leeviannala opened this issue Dec 1, 2017 · 8 comments

Comments

@leeviannala
Copy link
Contributor

leeviannala commented Dec 1, 2017

Problem description

This is a problem if I want to swap_dims in DataArray Accessor.

Code Sample

This is what I'm forced to do:

import xarray as xr
import numpy as np
@xr.register_dataarray_accessor('testing')
class TestAccessor(object):
    def __init__(self, xarray_obj):
        self._obj = xarray_obj
    def the_problem(self):
        self._obj = self._obj.swap_dims({'x':'x2'})
        print(self._obj)
        
arr = np.random.rand(4,3,2)
cube = xr.DataArray(arr, dims=['ya', 'x', 'y'], coords={'y':[1,3], 'ya':[1,2,3,6], 'x':[1,2,5]})
cube.coords['x2'] = ('x', [1,2,3])
cube.testing.the_problem()
print(cube)

this prints:

<xarray.DataArray (ya: 4, x2: 3, y: 2)>
array([[[ 0.659583,  0.167555],
        [ 0.357974,  0.46081 ],
        [ 0.85115 ,  0.845257]],

       [[ 0.280308,  0.777399],
        [ 0.512527,  0.542036],
        [ 0.838603,  0.799414]],

       [[ 0.572031,  0.350464],
        [ 0.205219,  0.812232],
        [ 0.687778,  0.984928]],

       [[ 0.803385,  0.63981 ],
        [ 0.089909,  0.499857],
        [ 0.25266 ,  0.967909]]])
Coordinates:
  * y        (y) int32 1 3
  * ya       (ya) int32 1 2 3 6
    x        (x2) int32 1 2 5
  * x2       (x2) int32 1 2 3
<xarray.DataArray (ya: 4, x: 3, y: 2)>
array([[[ 0.659583,  0.167555],
        [ 0.357974,  0.46081 ],
        [ 0.85115 ,  0.845257]],

       [[ 0.280308,  0.777399],
        [ 0.512527,  0.542036],
        [ 0.838603,  0.799414]],

       [[ 0.572031,  0.350464],
        [ 0.205219,  0.812232],
        [ 0.687778,  0.984928]],

       [[ 0.803385,  0.63981 ],
        [ 0.089909,  0.499857],
        [ 0.25266 ,  0.967909]]])
Coordinates:
  * y        (y) int32 1 3
  * ya       (ya) int32 1 2 3 6
  * x        (x) int32 1 2 5
    x2       (x) int32 1 2 3

where the two xarrays are clearly different.

I would want to do:

import xarray as xr
import numpy as np
@xr.register_dataarray_accessor('testing')
class TestAccessor(object):
    def __init__(self, xarray_obj):
        self._obj = xarray_obj
    def the_problem(self):
        self._obj.swap_dims({'x':'x2'}, inplace = True)
        print(self._obj)
        
arr = np.random.rand(4,3,2)
cube = xr.DataArray(arr, dims=['ya', 'x', 'y'], coords={'y':[1,3], 'ya':[1,2,3,6], 'x':[1,2,5]})
cube.coords['x2'] = ('x', [1,2,3])
cube.testing.the_problem()
print(cube)

this would keep the two xarrays the same, as they should be:

<xarray.DataArray (ya: 4, x2: 3, y: 2)>
array([[[ 0.659583,  0.167555],
        [ 0.357974,  0.46081 ],
        [ 0.85115 ,  0.845257]],

       [[ 0.280308,  0.777399],
        [ 0.512527,  0.542036],
        [ 0.838603,  0.799414]],

       [[ 0.572031,  0.350464],
        [ 0.205219,  0.812232],
        [ 0.687778,  0.984928]],

       [[ 0.803385,  0.63981 ],
        [ 0.089909,  0.499857],
        [ 0.25266 ,  0.967909]]])
Coordinates:
  * y        (y) int32 1 3
  * ya       (ya) int32 1 2 3 6
    x        (x2) int32 1 2 5
  * x2       (x2) int32 1 2 3
<xarray.DataArray (ya: 4, x: 3, y: 2)>
array([[[ 0.659583,  0.167555],
        [ 0.357974,  0.46081 ],
        [ 0.85115 ,  0.845257]],

       [[ 0.280308,  0.777399],
        [ 0.512527,  0.542036],
        [ 0.838603,  0.799414]],

       [[ 0.572031,  0.350464],
        [ 0.205219,  0.812232],
        [ 0.687778,  0.984928]],

       [[ 0.803385,  0.63981 ],
        [ 0.089909,  0.499857],
        [ 0.25266 ,  0.967909]]])
Coordinates:
  * y        (y) int32 1 3
  * ya       (ya) int32 1 2 3 6
    x        (x2) int32 1 2 5
  * x2       (x2) int32 1 2 3

I have version 0.10.0, the newest on conda-forge.

@rabernat
Copy link
Contributor

rabernat commented Dec 1, 2017 via email

@leeviannala
Copy link
Contributor Author

@rabernat, please note that i'm working with dataarrays. There is no inplace keyword.

@rabernat
Copy link
Contributor

rabernat commented Dec 1, 2017

Ah right. So you would do

cube = cube.swap_dims({'x':'x2'})

@leeviannala
Copy link
Contributor Author

Seems like I forgot a confusing line in my example. For example in dataarray_accessor, we have a need for in-place swap.

cube = xr.DataArray(arr, dims=['ya', 'x', 'y'], coords={'y':[1,3], 'ya':[1,2,3,6], 'x':[1,2,5]}) #First dataarray object. cube now referes to this.
cube.coords['x2'] = ('x', [1,2,3])
cube = cube.swap_dims({'x':'x2'}) #Second dataarray object. cube now refers to this and first dataarray object is now forgotten

vs:

cube = xr.DataArray(arr, dims=['ya', 'x', 'y'], coords={'y':[1,3], 'ya':[1,2,3,6], 'x':[1,2,5]}) #First dataarray object. cube now referes to this.
cube.coords['x2'] = ('x', [1,2,3])
cube.swap_dims({'x':'x2'}, inline=True) #This just changes the first dataarray object. cube still refers to the first dataarray object

So the question again was, is there a reason, why the inline keyword is implemented on dataset, but not on dataarray.

@shoyer
Copy link
Member

shoyer commented Dec 1, 2017

If I did it again, I would remove the inplace keyword argument from even Dataset methods. Almost every method in xarray creates new underlying objects, so it's misleading to call them "inplace". (Most of our inplace methods actually create a new object and then assign its properties to an existing object.)

Why not make your accessor method return a new DataArray object? That would be more inline with how most xarray methods work. If you have any special attributes you've set on the accessor you can copy those over to the new result.

@leeviannala
Copy link
Contributor Author

Thank you for your explination. I will to look into my problem from another angle.

@fmaussion
Copy link
Member

If I did it again, I would remove the inplace keyword argument from even Dataset methods.

How about deprecating these? Would that be too much of a change you think? (I never use them)

@shoyer
Copy link
Member

shoyer commented Dec 2, 2017

How about deprecating these? Would that be too much of a change you think? (I never use them)

Yes, this is a good idea! Opened #1756 to keep track.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants