Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implemented pad with new-indexes #4974

Closed
wants to merge 16 commits into from
Closed

Conversation

fujiisoup
Copy link
Member

@fujiisoup fujiisoup commented Mar 1, 2021

Now we use a tuple of indexes for DataArray.pad and Dataset.pad.

@pep8speaks
Copy link

pep8speaks commented Mar 1, 2021

Hello @fujiisoup! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 3954:1: W293 blank line contains whitespace

Comment last updated at 2021-06-29 11:56:41 UTC

@@ -6638,8 +6653,23 @@ def pad(
coord_pad_options = {}

variables = {}

# standarize pad_width
pad_width_standarized = {} # type: Mapping[Hashable, Tuple[int, int]]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs help for correcting the typing.
Now, pad_width_standarized should be Mapping[Hashable, Tuple[int, int]] instead of Mapping[ Hashable, Union[int, Tuple[Union[int, Iterable], Union[int, Iterable]]] ]
mypy is complaining...

Copy link
Collaborator

@mathause mathause left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not trivial to correctly type... The main change I did was to use Sequence (https://docs.python.org/3/library/collections.abc.html) as Iterable has no __len__. E.g. numpy uses Sequence for ArrayLike (https://github.com/numpy/numpy/blob/master/numpy/typing/_array_like.py).

Disclaimer: I am by no means a specialist here - I am making this up as I go.

xarray/core/dataarray.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Outdated Show resolved Hide resolved
if not isinstance(v, int):
# if pad_width is a tuple of iterable, we use its length for
# pad_width_standarized
pad_width_standarized[k] = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pad_width_standarized[k] = [
pad_width_standardized[k] = [

fujiisoup and others added 3 commits March 2, 2021 17:01
Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Outdated Show resolved Hide resolved
@mathause
Copy link
Collaborator

mathause commented Mar 2, 2021

I fixed the typo (pad_width_standardized) but I haven't really reviewed the code... There is a doctest that fails - da.pad(dim=1) that should be added as test & fixed.

doc/whats-new.rst Outdated Show resolved Hide resolved
fujiisoup and others added 4 commits March 3, 2021 16:13
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
@fujiisoup
Copy link
Member Author

Not sure why the doctest is failing. The same tests in test_dataset.py do not fail...

@mathause
Copy link
Collaborator

mathause commented Mar 3, 2021

I think only the indentation is wrong now - e.g. https://github.com/pydata/xarray/pull/4974/checks?check_run_id=2020140730#step:8:82

@fujiisoup
Copy link
Member Author

Thank you @mathause for your suggestion.
This looks all the tests are passing now.

w1_ = IndexVariable(name, [fill_value_ind] * w1)
else:
w1_ = IndexVariable(name, w1)
variables[name] = var.concat([w0_, var, w1_], dim=name)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit wondering if we put this series of logic should be moved into IndexVariable.pad.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think that would make sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done.

if isinstance(pad_start, int) or isinstance(pad_end, int):
# do not allow [Sequence, int] as pad_width
raise TypeError(
"({}, {}) is used for pad_width[{}]. Must be either (int, int) or (Sequence, Sequence).".format(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed that a mixture of sequence and int for pad_width, like

da.pad(dim2=([0, 1, 2], 3))

makes implementation very complex (as we may need to support many options, such as pad_mode).
I just disallowed the mixed argument.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

xarray/core/dataarray.py Show resolved Hide resolved
@@ -1355,6 +1355,15 @@ def pad(

return type(self)(self.dims, array)

def pad_indexes(self, pad_start: Sequence, pad_end: Sequence):
Copy link
Contributor

@dcherian dcherian Mar 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this IndexVariable.pad? and add a test in test_variable.py?

xarray/tests/test_dataset.py Show resolved Hide resolved
if isinstance(pad_start, int) or isinstance(pad_end, int):
# do not allow [Sequence, int] as pad_width
raise TypeError(
"({}, {}) is used for pad_width[{}]. Must be either (int, int) or (Sequence, Sequence).".format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

xarray/tests/test_dataset.py Show resolved Hide resolved
@benbovy
Copy link
Member

benbovy commented Feb 22, 2022

I was wondering what is the status of this PR?

@fujiisoup do you still plan to work on it? Or maybe is it wise to wait for the indexes refactoring (#5692), i.e., #3868 (comment) and revisit a bit the approach?

More specifically, the approach here (i.e., passing indexes via the pad_width argument) may be tricky in the context of flexible indexes where multiple indexes/coordinates are allowed for one dimension.

What about another parameter data_values to explicitly pass values to certain (coordinate) variables? It would accept something like {var_name: (before_values, after_values)} and would be passed to an Index.pad new method (if var_name is a coordinate with an index) or to Variable.pad as a fallback. This would keep pad_width simple. We'd just need to check that the arrays passed in data_values have dimensions matching with the dimensions of the corresponding variables and have sizes or shapes matching with the sizes specified in pad_width.

@fujiisoup
Copy link
Member Author

Hi.
Sorry for my late reply.
Well, I've just left this PR untouched.

More specifically, the approach here (i.e., passing indexes via the pad_width argument) may be tricky in the context of flexible indexes where multiple indexes/coordinates are allowed for one dimension.

I think we can just discard this PR if this does not fit with the index refactoring.
This PR is not big anyway and maybe rewriting this functionality is faster.

@headtr1ck headtr1ck added topic-indexing plan to close May be closeable, needs more eyeballs labels Nov 21, 2022
@jhamman
Copy link
Member

jhamman commented Sep 14, 2023

Closing this per the last comments and label added to this PR last year.

@jhamman jhamman closed this Sep 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to close May be closeable, needs more eyeballs topic-indexing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

What should pad do about IndexVariables?
7 participants