Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pad and unpad functions #158

Merged
merged 18 commits into from
Nov 6, 2021
Merged

Add pad and unpad functions #158

merged 18 commits into from
Nov 6, 2021

Conversation

santisoler
Copy link
Contributor

@santisoler santisoler commented Jun 28, 2021

Add a new pad function that wraps the xarray.DataArray.pad method
extrapolating evenly spaced coordinates. The pad_width argument is stored
within the attributes of each coordinate of the padded array. Add a new unpad
function that undoes the padding process by slicing it. By default it reads the
pad_width from the coordinate attributes of the input array. Alternatively,
we can pass a custom pad_width as argument to the function. List the padding
module in the API reference. Add test functions for the new features with 100%
coverage of the new padding module.

Fixes #146

TODO (on a different PR):

  • Add tests for cases when the coordinates are dates and timestamps
  • Make sure that fft and ifft keep the pad_width attribute of the coordinates.

@santisoler
Copy link
Contributor Author

I started drafting a pad function after the invitation of @roxyboy . Although I still think that it might be better that xarray.DataArray.pad handles the padding of evenly spaced coordinates, we could have this wrapper in xrft for a in-house implementation and for maintaining backward compatibility in case xarray changes the behaviour of the pad method.

If we want to add an optional padding inside the fft function, I think we should do that on a separate PR when this is already merged. The same goes with a unpad or trim function.

I haven't added any test in case the coordinates are dates or timestamps. I'm not used to work with such datatype and found that there are a lot of ways of defining date ranges (pandas.date_range, xarray.cftime_range, cftime, numpy.datetime64). I completely ignore which are the best practices to guarantee support for all these datatypes without having to write one if statement for each. Help needed!

@bennyrowland @roxyboy @rabernat I would like to hear your thoughts on this.

xrft/padding.py Outdated
return padded_da


def pad_coordinates(coords, pad_width):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@santisoler this looks really cool. Can we upstream this to xarray?

It would depending on finishing this PR first: https://github.com/pydata/xarray/pull/4974/files but that one is really close

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely. Like I said to @shoyer in #146, I'll be happy to implement the padding of evenly spaced coordinates into xarray. I would just need to have little discussions about the design of the feature and which should be the default behavior of the pad method: Should it pad the coordinates if they are evenly spaced by default? Should the coordinates be padded only if an optional argument is passed?

Feel free to ping me after pydata/xarray#4974 is merged and we can start working on it.

# Start padding the coordinates that appear in pad_width
for dim in pad_width:
# Get the spacing of the selected coordinate
# (raises an error if not evenly spaced)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if the correct behaviour is to error for unevenly spaced coordinates, maybe it is better to fall back on the existing behaviour of pad (particularly for applications in xarray outside of FTs)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that for a general case it should fall back to the default behaviour of the xarray pad method. Nevertheless, for xrft I think we should raise the error. If the coordinates are not evenly spaced, you won't be able to pass the padded grid to fft (or get awkward errors related to the presence of NaNs in the coordinates.

In summary:

  • I would leave the error raising inside xrft
  • I wouldn't add the error on an xarray implementation of this feature.

@bennyrowland
Copy link

@santisoler great work, I really like what you have done so far. I agree with @dcherian that it should be pretty straightforward to move this into xarray, although there are bound to be some advanced cases I haven't considered. Personally I am happy with just padding float coordinates, although I am sure some people will want to use date/timestamps. Maybe we need to start by putting in some more checks on datatypes for the coordinates, and falling back on existing behaviour for datatypes we don't currently want to handle?

@santisoler
Copy link
Contributor Author

Thanks @bennyrowland! For my own applications, I'm totally ok with just supporting padding of floats, nevertheless I think it's important to also support datetimes. If we merge this only supporting floats, it's very likely we will get Issues asking for extending the support for datetimes.

Like I said before, I'm not familarized with the different ways of defining datetimes, so I cannot come up with a good solution on how to support each one of them. That's why I would love to get some help on it.

@lanougue
Copy link
Contributor

lanougue commented Nov 3, 2021

Hi everyone,

First : thanks for the effort !
I am really interested in this PR and checking what is still missing now.

I have some comments/questions (probably more related to my applications):

  • Could it be possible to transfer the attributes of the coords to be padded to the new padded ones ? It is, in fact, mandatory to have the xrft.fft and xrft.ifft to work properly with true_phase=True option. (cf "direct_lag" attribute)

Thanks !

@santisoler
Copy link
Contributor Author

Hi @lanougue,

I am really interested in this PR and checking what is still missing now.

I do think that we should support other types of variables for the coordinates (e.g. dates), but this is probably not going to happen unless someone with a specific use case (and testing cases) wants to tackle it.
Since my own applications for the padding will be restricted only to floats (at least for now), I think we can merge this as it is for now, and maybe open an issue later for supporting the datetimes.

And regarding moving this feature upstream to xarray: pydata/xarray#4974 hasn't been merged yet, so we probably should merge this PR here and wait until it can be a source of inspiration for the upstream version.
Then we can start thinking of using the upstream feature instead of the one here.

Could it be possible to transfer the attributes of the coords to be padded to the new padded ones ? It is, in fact, mandatory to have the xrft.fft and xrft.ifft to work properly with true_phase=True option. (cf "direct_lag" attribute)

Sure, I don't see a problem with that. In fact this is something that should be than: coords attributes should be in agreement with the actual coordinates arrays present in the DataArray.
Could you prove a list of every attribute that should be also "padded"?
I'll be glad to implement it!

@lanougue
Copy link
Contributor

lanougue commented Nov 3, 2021

Hi @santisoler ,

I agree with you with this xarray upstream issue. Personally, I do not know application of padding other than Fourier transforms or convolution problem at the boundaries... I thus imagine it can take a while to the xarray community to care about padding. But who knows ?! Meanwhile, I am in favor of merging this PR too !

As far as I know, xrft plays with only two (coordinate's) attributes which are "spacing" and "direct_lag". These two arguments should in fact stay unchanged through the padding procedure. (After verifying that coordinate is indeed evenly spaced).
Just keeping them during the padding should thus be sufficient.
Thanks again for the work you have done !

@santisoler
Copy link
Contributor Author

santisoler commented Nov 3, 2021

@lanougue I've just updated this branch with master and added support for keeping the attrs of the original coordinates after padding them. Do you think this should be enough? Should we add something else (feature/tests)?

@lanougue
Copy link
Contributor

lanougue commented Nov 3, 2021

@santisoler

Changes looks nice. I did not played with your function for now but one interesting option, in my point of view, would be to keep or not the original name of the coordinates. I think it could be safer to give a new name for the padded coordinates. The default behavior would be (x -> padded_x) unless some keyword argument like "keep_coords_name" is set to True.

Other thing, I would rather writes:
padded_da.coords[dim].attrs.update(da.coords[dim].attrs)
because, we could eventually add a new attribute in the padded coordinate (maybe later) that would otherwise be erased.

This two comments are only suggestions and may be discussed (and maybe passed to upstream discussions concerning the first suggestion).

@santisoler
Copy link
Contributor Author

Other thing, I would rather writes:
padded_da.coords[dim].attrs.update(da.coords[dim].attrs)
because, we could eventually add a new attribute in the padded coordinate (maybe later) that would otherwise be erased.

You're absolutely right! That's not only better in terms on sanity, but also is way more readable.

Changes looks nice. I did not played with your function for now but one interesting option, in my point of view, would be to keep or not the original name of the coordinates. I think it could be safer to give a new name for the padded coordinates. The default behavior would be (x -> padded_x) unless some keyword argument like "keep_coords_name" is set to True.

I think I disagree here. One of the advantages of xarray is to have every data array and its corresponding coordinate under a single object. This removes the need to rename every inner variable (like the coordinates) in order to prevent overwriting the original ones or just to differentiate them: this is the way we used to do things with Numpy arrays before xarray. I think returning a different xr.DataArray after the padding is sufficient to keep things separated: there's the x coordinate for the original array and the x coordinate for the padded array, they are different arrays, but they refer to the same dimension.
Besides, renaming dims and coords might be tricky to generalize: we would need to check if a padded_x variable already exists or not, and decide what to do then. On top of that we would need to write tests to cover all these scenarios.
So, IMHO I think it's better to leave padded coordinates with the same name as the original ones. Feel free to disagree!

@lanougue
Copy link
Contributor

lanougue commented Nov 4, 2021

Ok, it makes sense. It would be nice if other contributors could give their opinion about that.
We cannot cover all scenarii but maybe, we should thus keep some track of the padding procedure somewhere (or in some way) because, we may need to recover the original shape at some point.
We can discuss about it but storing the original number of points (in addition with the already stored spacing and direct_lag) should be enough to do it. Some attribute like "initial_size" would inform you that 1) the size has been changed 2) You can easily recover it.
In my applications, recovering the initial shape is mandatory, so I keep track of it manually but it could easily be done with xrft. Yet, it is still just a suggestion to discuss !

@santisoler
Copy link
Contributor Author

I get it and I agree that would be great to have a way to recover the unpadded array.
On #146 I proposed an unpad function that could be used like this:

# Pad an array
path_width = {
    "x": (10, 10),
    "y": (5, 5),
}  # define amount of padding
grid_padded = xrft.pad(grid, pad_width)

# Perform some actions on the padded array
# ...

# Unpad it
grid = xrft.unpad(grid_padded, pad_width)

It would take the same pad_width argument, so in order to unpad it you would only need to keep the original pad_width variable. If this is something that we would like to have, I'll be glad to add it in this PR. If not, I would leave it for another PR with further discussion.

I'm more prone to tell users to keep track of the pad_width rather than storing it as attribute inside the DataArray. Mainly because the attributes are static: if I manually remove some rows/cols from the array with simple xarray slicing, then the initial_size or pad_width would be old-dated. The second reason is because from a theoretical point of view there's no difference between the padded and the original DataArray: the FFT doesn't care if the values in the array are observed data or results of a padding process.

Besides, any user could store the pad_width as attribute if they don't want to keep track of it manually (e.g. if they want to store the padded array in a file). For example:

# Pad an array
path_width = {
    "x": (10, 10),
    "y": (5, 5),
}  # define amount of padding
grid_padded = xrft.pad(grid, pad_width)
grid_padded.attrs["pad_width"] = pad_width

# Perform some actions on the padded array
# ...

# Unpad it
grid = xrft.unpad(grid_padded, grid_padded.pad_width)

What do you think?

@lanougue
Copy link
Contributor

lanougue commented Nov 4, 2021

It exactly remembers me the discussion I had with @roxyboy and @rabernat when I was working on the xrft.ifft function.
We all liked the round trip functionality ifft(fft(A)) == A which finally led at storing the direct_lag value in the attributes of A's coordinates.
I like your idea of "pad_width", for which I would vote to also store, since we could have the same round-trip functionality : unpad(pad(B)) = B.

It is the same logic for both functions. We can let the user manually store the required information to recover the initial state or we can automatically do it.
Comments from @rabernat and @roxyboy are welcome

@roxyboy
Copy link
Member

roxyboy commented Nov 4, 2021

@santisoler @lanougue Thanks for your work and input. I agree that the pad and unpad will likely be a valuable asset for xrft. I think the two should come as a set within the same PR :)

@santisoler
Copy link
Contributor Author

It exactly remembers me the discussion I had with @roxyboy and @rabernat when I was working on the xrft.ifft function. We all liked the round trip functionality ifft(fft(A)) == A which finally led at storing the direct_lag value in the attributes of A's coordinates. I like your idea of "pad_width", for which I would vote to also store, since we could have the same round-trip functionality : unpad(pad(B)) = B.

It is the same logic for both functions. We can let the user manually store the required information to recover the initial state or we can automatically do it. Comments from @rabernat and @roxyboy are welcome

You're right! Thanks for helping me to remember what was the issue behind it.

Since we want to support a very straight forward round trip like xrft.ifft(xrft.fft(array)), then we surely need to store variables inside the array, as you pointed out.

One alternative is to keep the fft and the ifft much simpler: leave the detrending and the padding beyond their scope and make the user to be the one to apply them. I'm just mentioning it as an alternative, but I know that there's a desire to make fft and ifft much more powerful including processing steps like detrending, padding, etc.

In conclusion, I think we actually need to store the padding on each padded coordinate inside the DataArray.
So, the pad, unpad workflow could look like:

# Pad an array
path_width = {
    "x": (10, 10),
    "y": (5, 5),
}  # define amount of padding
grid_padded = xrft.pad(grid, pad_width)

# Perform some actions on the padded array
# ...

# Unpad it
grid = xrft.unpad(grid_padded)

A part of me wants to support custom pad_widths, like:

grid = xrft.unpad(grid_padded, pad_width={"x": ..., "y": ...})

But if no one will use it, there's no point in adding that feature.

I'll try to implement the unpad function with a single argument (the unpadded grid) and will ping you @lanougue @roxyboy for review and comments.

@lanougue
Copy link
Contributor

lanougue commented Nov 4, 2021

numpy logic would be to specify a "shape" parameter in xrft.fft but I think that xrft logic should depart from this behaviour.

I prefer your approach where you let the user do the padding/unpadding procedure. It, as you said, keeps fft and ifft much simpler (and do not limit their possibilities).
In my opinion, changing the shape parameter in numpy.fft requires the same level of understanding than applying a padding procedure. I thus vote for your approach where you let the user do manually the padding.

I would also like to get the complete round trip:
unpad( ifft( fft( pad(A)))) == A
done automatically without having to store any additional information ! :)

I think that your feature (unpadding with a custom pad_width) can easily be resolved by setting the default pad_width (=None?) being the normal one. Meaning the same used in forward padding (that would be thus stored somewhere)

Waiting for your ping !

Extend the examples of pad() and test the new feature.
The function reads the pad_width attribute from the array coordinates in
order to slice it to its unpadded version.
Add test functions for the new feature.
@santisoler
Copy link
Contributor Author

I prefer your approach where you let the user do the padding/unpadding procedure. It, as you said, keeps fft and ifft much simpler (and do not limit their possibilities). In my opinion, changing the shape parameter in numpy.fft requires the same level of understanding than applying a padding procedure. I thus vote for your approach where you let the user do manually the padding.

I would gladly participate on discussions around this design decision. My idea of making fft and ifft simpler comes both from UNIX philosophy ("Make each program do one thing well") and from personal experience.

I would also like to get the complete round trip: unpad( ifft( fft( pad(A)))) == A done automatically without having to store any additional information ! :)

I've just pushed the implementation of unpad. It takes the pad_width from attributes of the coordinates.
But now the problem is that the fft removes that coordinates, and with them, the pad_width attributes.
So for now the unpad(ifft(fft(pad(da, pad_width)))) round trip doesn't work 😞.
Probably by adding the pad_width to the entire array might help.

I think that your feature (unpadding with a custom pad_width) can easily be resolved by setting the default pad_width (=None?) being the normal one. Meaning the same used in forward padding (that would be thus stored somewhere)

Sure! We can set pad_width=None by default and read it from the array in that case.

@lanougue
Copy link
Contributor

lanougue commented Nov 5, 2021

I think that adding the pad_width attribute for the entire array is a bad idea for 2 reasons:

  • As for spacing and direct_lag, pad_witdth pertains, by definition, to the coordinate itself. As it, you don't need to have a different name for each padded coordinate. I think you don't want to have pad_witdh_x or pad_width_time as your entire array's attributes. I prefer a single attribute name for all coordinate because it is also more simpler to call in code. It is just
    A[coord].pad_width instead of A.attrs['pad_width_'+coord.name]

  • After Fourier Transform, we usually and manually do some operations on the entire array (filtering, square, ...) and not on its coordinates. These operations automatically removes the main array attributes but not the coordinates one. Which is what we want.

Ok for the round trip issue. We have to transfer the pad_width attribute through the fft/ifft operations.

@santisoler
Copy link
Contributor Author

I agree: bad idea to store the pad_width within the array attributes.

So, let me add support for passing a custom pad_width to unpad (None by default) and I think this PR would be ready to be merged. I would say that keeping the pad_width argument through the fft and ifft functions should be done on a separate PR, otherwise I think that this one will grow too much.

The pad_width can be optionally passed to unpad as a single pad_width
argument or as kwargs for each coordinate. By default pad_width is None,
therefore the pad_width is read from the attributes of the array
coordinates.
Add test functions for the new argument.
By making it a private function we avoid listing it in the API
reference.
@santisoler santisoler changed the title Add pad function Add pad and unpad functions Nov 5, 2021
@santisoler
Copy link
Contributor Author

@roxyboy @lanougue I think this is done. Would you like to take a look at the current status of the code? I've updated the PR description and its title.

@bennyrowland
Copy link

@santisoler, thanks for the great work on this. I think that if I wanted to do a round trip pad()/unpad() then I would be more likely to do it the way you initially proposed, keeping the pad_width as a separate parameter and passing it in on both sides, rather than keeping it in the coords. I think this is because the idea of an attribute on the coords is a bit too magic. And what if I want to call unpad() first, and then pad() afterwards? I don't think having the attribute will affect me since it presumably only gets checked for if you don't pass in a pad_width. But I will definitely want to call unpad() only, so having the argument is essential.

I have also previously suggested that alternative forms of the parameter currently called pad_width could be appropriate, for example specifying a final length for a dimension instead of a number of points to pad. I would therefore only make the suggestion that perhaps a more generic name like pad_params could be used, with the current implementation remaining as it is, but making it more natural to implement alternative padding approaches in the future? The alternative is to prefer introducing an alternative pair of functions to do this other kind of operation (e.g. grow() and trim() or something), but that might also introduce confusion about which functions are which. Other than that, I am looking forward to using these new functions soon.

@lanougue
Copy link
Contributor

lanougue commented Nov 5, 2021

@santisoler, thanks for the great work on this. I think that if I wanted to do a round trip pad()/unpad() then I would be more likely to do it the way you initially proposed, keeping the pad_width as a separate parameter and passing it in on both sides, rather than keeping it in the coords. I think this is because the idea of an attribute on the coords is a bit too magic. And what if I want to call unpad() first, and then pad() afterwards? I don't think having the attribute will affect me since it presumably only gets checked for if you don't pass in a pad_width. But I will definitely want to call unpad() only, so having the argument is essential.

Hi @bennyrowland. I think the current behavior proposed here do not have any problem with what you want to do. You need to provide pad_width whatever pad or unpad function you are calling first. In the same spirit, pad function could also look at some already existent pad_width (or unpad_width) that would come from a previous unpad call.
The default behavior of the functions is to check the existence of some pad-related attributes but you can always specify manually whatever you want. I think that storing it will help the large majority of people that will probably pad and unpad with the same number of points.

@santisoler
Copy link
Contributor Author

santisoler commented Nov 5, 2021

@santisoler, thanks for the great work on this. I think that if I wanted to do a round trip pad()/unpad() then I would be more likely to do it the way you initially proposed, keeping the pad_width as a separate parameter and passing it in on both sides, rather than keeping it in the coords. I think this is because the idea of an attribute on the coords is a bit too magic. And what if I want to call unpad() first, and then pad() afterwards? I don't think having the attribute will affect me since it presumably only gets checked for if you don't pass in a pad_width. But I will definitely want to call unpad() only, so having the argument is essential.

I understand. I personally agree with your workflow: for me it's easier to ask for pad_width on both pad and unpad functions. The good part of the current implementation is that you can actually use it as it is right now: when you pass a pad_width to unpad, then the pad_width inside the coordinates array is completely ignored.

I have also previously suggested that alternative forms of the parameter currently called pad_width could be appropriate, for example specifying a final length for a dimension instead of a number of points to pad. I would therefore only make the suggestion that perhaps a more generic name like pad_params could be used, with the current implementation remaining as it is, but making it more natural to implement alternative padding approaches in the future? The alternative is to prefer introducing an alternative pair of functions to do this other kind of operation (e.g. grow() and trim() or something), but that might also introduce confusion about which functions are which. Other than that, I am looking forward to using these new functions soon.

I remembered you mentioning that you usually want to pad your arrays by specifying the final size instead of the width. I could argue that you can do that with the current pad function, for example:

new_size_x, new_size_y = 256, 256
pad_width = {"x": (new_size_x - grid.x.size) / 2, "y": (new_size_y - grid.y.size) / 2} 
xrft.pad(grid, pad_width)

But I know it can be tedious to do every time.

I decided to go with pad_width mainly because of compatibility with xarray.pad and also because it's more powerful than just the final size: you can configure asymmetric padding.
I would avoid adding more parameters to the pad function: it already has quite a few arguments and it demands some reading in order to understand what the pad_width means.

Option 1

My first suggestion would be creating different functions for you case, which can wrap the pad/unpad functions so we don't repeat ourselves.
For example:

def pad_to(
    da,
    size,
    mode="constant",
    stat_length=None, 
    constant_values=None,
    end_values=None,
    reflect_type=None,
    **size_kwargs,
):
    # Redefine size if size_kwargs were passed
    size = either_dict_or_kwargs(size, size_kwargs, "pad")
    # Build pad_width from the size
    pad_width = {dim: int(0.5 * (pad_size[dim] - da[dim].size)) for dim in size}
    # Call pad
    return xrft.pad(da, pad_width, ...)

(we can work on the names).

So you could pad by passing final sizes, like this:

size = {"x": 256, "y": 256}
grid_padded = xrft.pad_to(grid, size)

Option 2

Another option would be to have a utils function to convert from size to pad_width. For example:

def size_to_pad_width(da, size):
    pad_width = {dim: int(0.5 * (pad_size[dim] - da[dim].size)) for dim in size}
    return pad_width

So you could pad using the sizes, like this:

size = {"x": 256, "y": 256}
pad_width = size_to_pad_width(grid, size)
grid_padded = xrft.pad(grid, pad_width)

The benefit of this is we will have a single function for padding and a single function for unpadding. The drawback is that we would need to write one more line of code when using it.


That being said, I would leave these ideas for a new PR.

@roxyboy
Copy link
Member

roxyboy commented Nov 5, 2021

So, let me add support for passing a custom pad_width to unpad (None by default) and I think this PR would be ready to be merged. I would say that keeping the pad_width argument through the fft and ifft functions should be done on a separate PR, otherwise I think that this one will grow too much.

Am I to understand that now the unpad(ifft(fft(pad(da, pad_width))),pad_width) round trip works? If so, could we add one test for this?
Other than that, I agree we can merge this.

@bennyrowland
Copy link

@santisoler, yes but it is (of course) more complicated than that. In my case, I do have asymmetric data which I want to pad to be symmetric, as well as to a specific length, so it definitely isn't a simple enough calculation to do manually every time. I acknowledge that using pad_width to align with xarray is a powerful argument in its favour. Using a helper function to convert to pad_width is unappealing because it will get complicated when padding some axes and not others, to the extent that the helper function will be just as complicated as a new function (like pad_to()) that wraps pad(). Once this PR is merged, perhaps I will raise a new issue to gauge interest in adding such a pad_to() function to xrft.

@santisoler
Copy link
Contributor Author

santisoler commented Nov 6, 2021

Am I to understand that now the unpad(ifft(fft(pad(da, pad_width))),pad_width) round trip works? If so, could we add one test for this? Other than that, I agree we can merge this.

Done @roxyboy! I've found that the line you wrote above works well for the array, but not for the coordinates (they get shifted around the origin), so we need to pass true_phase=True on both fft and ifft.

So the roundtrip looks like:

da_padded = pad(da, pad_width, constant_values=0)
da_fft = fft(da_padded, true_phase=True)
da_ifft = ifft(da_fft, true_phase=True)
da_unpadded = unpad(da_ifft, pad_width=pad_width)

On a side note, I remembered that the current pad function pads with nans (as the xarray.pad function does). This is awkward in xrft, since a single nan in the input array for fft generates garbage. I think that we should set constant_values=0 as default, do you agree?

@roxyboy
Copy link
Member

roxyboy commented Nov 6, 2021

So the roundtrip looks like:

da_padded = pad(da, pad_width, constant_values=0)
da_fft = fft(da_padded, true_phase=True)
da_ifft = ifft(da_fft, true_phase=True)
da_unpadded = unpad(da_ifft, pad_width=pad_width)

On a side note, I remembered that the current pad function pads with nans (as the xarray.pad function does). This is awkward in xrft, since a single nan in the input array for fft generates garbage. I think that we should set constant_values=0 as default, do you agree?

Awesome! Thanks @santisoler . We're almost at the finish line. I agree with constant_values=0 as default and could you make xrft raise a warning when pad_width is specified with true_phase=False?

@santisoler
Copy link
Contributor Author

I agree with constant_values=0 as default

Done! The docstrings were specifying that constant_values would be zero by default, so at some point I had the thought of doing that but I forgot to set it in the arguments.

and could you make xrft raise a warning when pad_width is specified with true_phase=False?

You mean that fft should raise a warning if pad_width exists in the coordinates attrs?
If that is what you meant, I would leave that to a different PR, maybe in the one that should make fft and ifft to support pad_width.
If that's not what you meant, please elaborate.

Thank you for all the feedback! This PR wouldn't have been possible without it!

@roxyboy
Copy link
Member

roxyboy commented Nov 6, 2021

You mean that fft should raise a warning if pad_width exists in the coordinates attrs? If that is what you meant, I would leave that to a different PR, maybe in the one that should make fft and ifft to support pad_width.

Yes. I'll merge this and make a new release.

@roxyboy roxyboy merged commit 5b18b88 into xgcm:master Nov 6, 2021
@lanougue
Copy link
Contributor

lanougue commented Nov 6, 2021

Am I to understand that now the unpad(ifft(fft(pad(da, pad_width))),pad_width) round trip works? If so, could we add one test for this? Other than that, I agree we can merge this.

Done @roxyboy! I've found that the line you wrote above works well for the array, but not for the coordinates (they get shifted around the origin), so we need to pass true_phase=True on both fft and ifft.

I didn't get why true_phase =True/False interferes with the padding ? Whatever true_phase value is should not change the coordinates...
@santisoler , can you explain the problem ?

@santisoler
Copy link
Contributor Author

I didn't get why true_phase =True/False interferes with the padding ? Whatever true_phase value is should not change the coordinates... @santisoler , can you explain the problem ?

When setting true_phase=False, the recovered coordinates are shifted around zero: for example, if x where between 10 and 30, the output of ifft(fft(...)) has the x around -10 and 10. The padding works well in terms of slicing the right amount of elements of the transformed array, but the coordinates are not the same. Therefore, after the round trip, the xarray.DataArray is not the same (their coordinates differ).

Let me know if I explained myself, otherwise I can provide an example.

@lanougue
Copy link
Contributor

lanougue commented Nov 8, 2021

Ok I understand. thanks! When true_phase == False, the coordinate range is indeed lost during the fft/ifft operations and the unpadding can not indeed do the job correctly. You are right, the round trip can be done efficiently only when true_phase is set to True during both fft/ifft operations.
It thus has to be correctly explained in the doc !
@roxyboy, one more unexpected advantage of true_phase ! :), When is it planned to go to default true_phase/amplitude = True ?

@santisoler santisoler deleted the pad branch April 18, 2022 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal: Output length parameter for FTs
5 participants