-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data_vars option added to open_mfdataset #1580
Conversation
These will definitely be needed. Existing tests for
A brief note in What's New under the Enhancements section would actually be appropriate. Just follow the example of the others in that section. Thanks for taking this on! We actually just bumped across this (here), so your fix will immediately benefit more than just you. |
Thanks @spencerahill: I have run the flake8 test and modified the whats-new.rst.
Cheers |
Try running them via pytest instead ( Note that you'll need to add new test(s) that cover the modifications you have made -- not just run the existing tests. |
Thanks @spencerahill py.test seems to be working Cheers |
…s important for Windows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together!
xarray/tests/test_backends.py
Outdated
coord = ds[self.coord_name][:] | ||
coord_expect = ds_expect[self.coord_name][:] | ||
|
||
self.assertArrayEqual(data, data_expect) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make use of self.assertDatasetIdentical()
to shorten up these tests a bit?
xarray/tests/test_backends.py
Outdated
var_shape1[0] + var_shape2[0]) | ||
|
||
def test_invalid_data_vars_value_should_fail(self): | ||
with self.assertRaises(ValueError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to only go around the line where you expect the error
xarray/tests/test_backends.py
Outdated
|
||
self.assertEqual(var_shape, coord_shape) | ||
|
||
def test_common_coord_dims_should_not_change_when_datavars_minimal(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very similar to the last test -- can you maybe consolidate it?
Or you could even potentially drop some of these tests. We have unit tests for concat
and open_mfdataset
already, so the main thing we need to verify is that the keyword argument gets properly passed on. We don't need to check here that every possible way to use it is handled correctly.
xarray/backends/api.py
Outdated
compat=compat, data_vars=data_vars) | ||
combined._file_obj = _MultiFileCloser(file_objs) | ||
combined.attrs = datasets[0].attrs | ||
except ValueError as ve: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's only wrap the lines where this could fail -- so this should be moved up two lines, before combined._file_obj
is assigned.
xarray/backends/api.py
Outdated
compat=compat, data_vars=data_vars) | ||
combined._file_obj = _MultiFileCloser(file_objs) | ||
combined.attrs = datasets[0].attrs | ||
except ValueError as ve: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just use except ValueError:
here and a plain raise
below.
xarray/backends/api.py
Outdated
@@ -431,7 +431,7 @@ def close(self): | |||
|
|||
def open_mfdataset(paths, chunks=None, concat_dim=_CONCAT_DIM_DEFAULT, | |||
compat='no_conflicts', preprocess=None, engine=None, | |||
lock=None, **kwargs): | |||
lock=None, data_vars='all', **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For completeness, would it also make sense to pass on the coords
option at this time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @shoyer:
I have added the coords keyword in a similar manner as data_vars.
I'll probably have to add a test for it as well.
Cheers
…d modify code snippets to use single quotes for consistency.
…milar tests for the data_vars keyword.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple more minor cleanup suggestions, but this is looking great, thank you!
xarray/tests/test_backends.py
Outdated
# tests to be applied to respective pairs | ||
if opt == 'all': | ||
tests = [self.assertEqual, | ||
self.assertNotEqual, self.assertNotEqual] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it's best to avoid putting complex control flow in tests -- it makes them harder to debug. I would actually prefer if you wrote two different tests here with a bit more copied code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've split the test into 2, let me know if this is what you meant.
Cheers
xarray/tests/test_backends.py
Outdated
ds1.to_netcdf(tmpfile1) | ||
ds2.to_netcdf(tmpfile2) | ||
|
||
files = [tmpfile1, tmpfile2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put this shared logic in a context manager?
e.g.,
@contextlib.contextmanager
def setup_files(self):
with create_tmp_file() as tmpfile1:
with create_tmp_file() as tmpfile2:
ds1, ds2 = self.gen_datasets_with_common_coord_and_time()
# save data to the temporary files
ds1.to_netcdf(tmpfile1)
ds2.to_netcdf(tmpfile2)
yield [tmpfile1, tmpfile2]
def test_open_mfdataset_does_same_as_concat(self):
with self.setup_files() as files:
...
setUp
/tearDown
methods would also work, with ExitStack.enter_context()
and .close()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @shoyer:
I like the contextmanager trick a lot. I did feel like there should be a better way to set up tests. Actually, I have never used it before.
Cheers
…g up test inputs in OpenMFDatasetWithDataVarsAndCoordsKwTest
LGTM. @crusaderky - I think you should look at this in conjunction with #1551. |
I think this is ready to go, any final objections? |
Thanks @guziy! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delayed review here, but I'm still not quite happy with the tests here. @guziy would you mind fixing this up in a follow-up? Thanks!
self.assertNotEqual, self.assertNotEqual] | ||
|
||
for a_test, a_shape_pair in zip(tests, shape_pairs): | ||
a_test(*a_shape_pair) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole section should be more explicit, e.g.,
self.assertEqual(var_shape, coord_shape)
self.assertNotEqual(coord_shape1, coord_shape)
self.assertNotEqual(coord_shape2, coord_shape)
That way, we avoid the confusing loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* but assert_equal(var_shape, coord_shape)
, in line with the updated test framework!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MaximilianR :
I am comparing shapes there, not dataarrays.
Cheers
var_shape = ds[self.var_name].shape | ||
|
||
# shape pairs to be compared | ||
shape_pairs = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @shoyer :
It'll actually be less code this way)
Cheers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have committed the changes, should I open another pull request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please. Once a PR is merged you need to open another one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closes
xray.open_mfdataset
concatenates also variables without time dimension #438Added tests (they are passing currently)
Passes
git diff upstream/master | flake8 --diff
There were some long lines that in my opinion look better when not broken, but I complied with flake8 request to shorten them.
Fully documented, including
whats-new.rst
for all changes andapi.rst
for new API (Do not think the change is big enough to be added to the files)