data_vars option added to open_mfdataset #1580

guziy · 2017-09-19T16:29:06Z

Closes xray.open_mfdataset concatenates also variables without time dimension #438
Added tests (they are passing currently)
Passes git diff upstream/master | flake8 --diff
There were some long lines that in my opinion look better when not broken, but I complied with flake8 request to shorten them.
Fully documented, including whats-new.rst for all changes and api.rst for new API (Do not think the change is big enough to be added to the files)

spencerahill · 2017-09-19T19:31:00Z

have not run tests yet

These will definitely be needed. Existing tests for open_mfdataset are in xarray/tests/test_backends.py. @shoyer and/or @jhamman can hopefully help you out if you need more guidance.

Not sure how to install flake8

pip install flake8 should work. Alternatively if you use conda conda install -c conda-forge flake8

Do not think the change is big enough to be added to the files

A brief note in What's New under the Enhancements section would actually be appropriate. Just follow the example of the others in that section.

Thanks for taking this on! We actually just bumped across this (here), so your fix will immediately benefit more than just you.

guziy · 2017-09-19T20:29:58Z

Thanks @spencerahill:

I have run the flake8 test and modified the whats-new.rst.
I think I need more guidance with the tests.
I tried using nosetests and get the following exceptions (I am sure I am doing smth wrong here):

$ nosetests
/Users/huziy/anaconda/envs/py3.6/lib/python3.6/site-packages/matplotlib/__init__.py:917: UserWarning: axes.hold is deprecated. Please remove it from your matplotlibrc and/or style files.
  warnings.warn(self.msg_depr_set % key)
/Users/huziy/anaconda/envs/py3.6/lib/python3.6/site-packages/matplotlib/rcsetup.py:152: UserWarning: axes.hold is deprecated, will be removed in 3.0
  warnings.warn("axes.hold is deprecated, will be removed in 3.0")
E
======================================================================
ERROR: Failure: AttributeError (module 'pytest' has no attribute 'config')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/huziy/anaconda/envs/py3.6/lib/python3.6/site-packages/nose/failure.py", line 39, in runTest
    raise self.exc_val.with_traceback(self.tb)
  File "/Users/huziy/anaconda/envs/py3.6/lib/python3.6/site-packages/nose/loader.py", line 417, in loadTestsFromName
    addr.filename, addr.module)
  File "/Users/huziy/anaconda/envs/py3.6/lib/python3.6/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/Users/huziy/anaconda/envs/py3.6/lib/python3.6/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/Users/huziy/anaconda/envs/py3.6/lib/python3.6/imp.py", line 244, in load_module
    return load_package(name, filename)
  File "/Users/huziy/anaconda/envs/py3.6/lib/python3.6/imp.py", line 216, in load_package
    return _load(spec)
  File "<frozen importlib._bootstrap>", line 675, in _load
  File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 205, in _call_with_frames_removed
  File "/Users/huziy/PythonProjects/xarray/xarray/tests/__init__.py", line 130, in <module>
    _SKIP_FLAKY = not pytest.config.getoption("--run-flaky")
AttributeError: module 'pytest' has no attribute 'config'

----------------------------------------------------------------------
Ran 1 test in 0.582s

FAILED (errors=1)

Cheers

spencerahill · 2017-09-19T21:01:55Z

Try running them via pytest instead (pip install pytest or conda install pytest if you don't have it already).

Note that you'll need to add new test(s) that cover the modifications you have made -- not just run the existing tests.

guziy · 2017-09-19T21:02:53Z

Thanks @spencerahill

py.test seems to be working

Cheers

…s important for Windows

shoyer

Thanks for putting this together!

shoyer · 2017-09-21T15:41:24Z

xarray/tests/test_backends.py

+                        coord = ds[self.coord_name][:]
+                        coord_expect = ds_expect[self.coord_name][:]
+
+                        self.assertArrayEqual(data, data_expect)


Can you make use of self.assertDatasetIdentical() to shorten up these tests a bit?

shoyer · 2017-09-21T15:41:55Z

xarray/tests/test_backends.py

+                                     var_shape1[0] + var_shape2[0])
+
+    def test_invalid_data_vars_value_should_fail(self):
+        with self.assertRaises(ValueError):


move this to only go around the line where you expect the error

shoyer · 2017-09-21T15:43:44Z

xarray/tests/test_backends.py

+
+                    self.assertEqual(var_shape, coord_shape)
+
+    def test_common_coord_dims_should_not_change_when_datavars_minimal(self):


This looks very similar to the last test -- can you maybe consolidate it?

Or you could even potentially drop some of these tests. We have unit tests for concat and open_mfdataset already, so the main thing we need to verify is that the keyword argument gets properly passed on. We don't need to check here that every possible way to use it is handled correctly.

shoyer · 2017-09-21T15:45:05Z

xarray/backends/api.py

+                                    compat=compat, data_vars=data_vars)
+        combined._file_obj = _MultiFileCloser(file_objs)
+        combined.attrs = datasets[0].attrs
+    except ValueError as ve:


Let's only wrap the lines where this could fail -- so this should be moved up two lines, before combined._file_obj is assigned.

shoyer · 2017-09-21T15:45:26Z

xarray/backends/api.py

+                                    compat=compat, data_vars=data_vars)
+        combined._file_obj = _MultiFileCloser(file_objs)
+        combined.attrs = datasets[0].attrs
+    except ValueError as ve:


You can just use except ValueError: here and a plain raise below.

shoyer · 2017-09-21T15:50:27Z

xarray/backends/api.py

@@ -431,7 +431,7 @@ def close(self):

 def open_mfdataset(paths, chunks=None, concat_dim=_CONCAT_DIM_DEFAULT,
                   compat='no_conflicts', preprocess=None, engine=None,
-                   lock=None, **kwargs):
+                   lock=None, data_vars='all', **kwargs):


For completeness, would it also make sense to pass on the coords option at this time?

Thanks, @shoyer:

I have added the coords keyword in a similar manner as data_vars.

I'll probably have to add a test for it as well.

Cheers

…d modify code snippets to use single quotes for consistency.

…milar tests for the data_vars keyword.

shoyer

I have a couple more minor cleanup suggestions, but this is looking great, thank you!

shoyer · 2017-09-23T00:47:17Z

xarray/tests/test_backends.py

+                        # tests to be applied to respective pairs
+                        if opt == 'all':
+                            tests = [self.assertEqual,
+                                     self.assertNotEqual, self.assertNotEqual]


nit: it's best to avoid putting complex control flow in tests -- it makes them harder to debug. I would actually prefer if you wrote two different tests here with a bit more copied code.

I've split the test into 2, let me know if this is what you meant.

Cheers

shoyer · 2017-09-23T00:52:41Z

xarray/tests/test_backends.py

+                ds1.to_netcdf(tmpfile1)
+                ds2.to_netcdf(tmpfile2)
+
+                files = [tmpfile1, tmpfile2]


Put this shared logic in a context manager?

e.g.,

@contextlib.contextmanager def setup_files(self): with create_tmp_file() as tmpfile1: with create_tmp_file() as tmpfile2: ds1, ds2 = self.gen_datasets_with_common_coord_and_time() # save data to the temporary files ds1.to_netcdf(tmpfile1) ds2.to_netcdf(tmpfile2) yield [tmpfile1, tmpfile2] def test_open_mfdataset_does_same_as_concat(self): with self.setup_files() as files: ...

setUp/tearDown methods would also work, with ExitStack.enter_context() and .close().

Thanks @shoyer:

I like the contextmanager trick a lot. I did feel like there should be a better way to set up tests. Actually, I have never used it before.

Cheers

…g up test inputs in OpenMFDatasetWithDataVarsAndCoordsKwTest

jhamman · 2017-09-28T22:36:05Z

LGTM. @crusaderky - I think you should look at this in conjunction with #1551.

jhamman · 2017-10-10T05:48:51Z

I think this is ready to go, any final objections?

jhamman · 2017-10-10T20:51:26Z

Thanks @guziy!

shoyer

Sorry for the delayed review here, but I'm still not quite happy with the tests here. @guziy would you mind fixing this up in a follow-up? Thanks!

shoyer · 2017-10-10T20:54:45Z

xarray/tests/test_backends.py

+                         self.assertNotEqual, self.assertNotEqual]
+
+                for a_test, a_shape_pair in zip(tests, shape_pairs):
+                    a_test(*a_shape_pair)


This whole section should be more explicit, e.g.,

self.assertEqual(var_shape, coord_shape) self.assertNotEqual(coord_shape1, coord_shape) self.assertNotEqual(coord_shape2, coord_shape)

That way, we avoid the confusing loop.

* but assert_equal(var_shape, coord_shape), in line with the updated test framework!

@MaximilianR :
I am comparing shapes there, not dataarrays.
Cheers

shoyer · 2017-10-10T20:55:03Z

xarray/tests/test_backends.py

+                var_shape = ds[self.var_name].shape
+
+                # shape pairs to be compared
+                shape_pairs = [


Same thing here.

Thanks @shoyer :
It'll actually be less code this way)
Cheers

I have committed the changes, should I open another pull request?

Yes, please. Once a PR is merged you need to open another one.

@shoyer and @guziy - my apologies for the premature merge.

guziy added 2 commits September 19, 2017 12:17

add data_vars option to open_mfdataset

180cf58

use single quotes

6195fcd

spencerkclark mentioned this pull request Sep 19, 2017

Relax logic for comparing grid attributes between Run and Model objs spencerahill/aospy#199

Merged

guziy added 3 commits September 19, 2017 15:49

fix the 'line too long' warning from flake8

956fbeb

document the data_vars keyword for open_mfdataset

e721620

improve the data_vars record in whats-new

34b1004

update my name in wats-new.rst

09d25c6

guziy added 13 commits September 19, 2017 17:20

Start writing the test for the data_vars keyword

e901a37

use the data_vars keyword in combine

3141ce4

address flake8 warnings for test_backend.py

8319aa7

ignore flake8 warnings concerning whats-new.rst

fdc940e

fix function reference in whats-new.rst

96e842e

open_mfdataset does not accept dim keyword argument

b033bec

use single quotes for strings in the added tests

b854ce4

refactor data_vars related tests

787a98b

Use with for opening mfdataset in data_vars related tests

4d3c685

add @requires_scipy_or_netCDF4 to the data_vars test class

1823ba3

address flake8 warnings about long lines in the data_vars related tests.

b47e665

close opened datasets in case of a ValueError in open_mfdataset, seem…

23f0fc6

…s important for Windows

fix line too long warnings from flake8

05c8391

shoyer reviewed Sep 21, 2017

View reviewed changes

guziy added 5 commits September 21, 2017 12:57

refactor tests and open_mfdataset, to address comments

1f0e763

refactor tests for data_vars keyword in open_mfdataset

fadda83

refactor to address flake8 warnings

f80fe1f

add another example of data_vars usage in open_mfdataset

14dee9d

add coords keyword to open_mfdataset

f1f9d8b

guziy added 4 commits September 21, 2017 16:51

add a memory and performance related observations to the whats-new an…

f64c9e3

…d modify code snippets to use single quotes for consistency.

fixed a grammar mistake

633eec3

quote variable names referenced in the text

086cf25

add tests for coords keyword in the open_mfdataset, along with the si…

b0ca228

…milar tests for the data_vars keyword.

shoyer reviewed Sep 23, 2017

View reviewed changes

split a test into 2 to simplify, introduce context manager for settin…

e463e37

…g up test inputs in OpenMFDatasetWithDataVarsAndCoordsKwTest

jhamman merged commit 27132fb into pydata:master Oct 10, 2017

shoyer reviewed Oct 10, 2017

View reviewed changes

guziy mentioned this pull request Oct 11, 2017

Data vars in open mfdataset #1623

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data_vars option added to open_mfdataset #1580

data_vars option added to open_mfdataset #1580

guziy commented Sep 19, 2017 •

edited

Loading

spencerahill commented Sep 19, 2017 •

edited

Loading

guziy commented Sep 19, 2017 •

edited

Loading

spencerahill commented Sep 19, 2017

guziy commented Sep 19, 2017

shoyer left a comment

shoyer Sep 21, 2017

shoyer Sep 21, 2017

shoyer Sep 21, 2017

shoyer Sep 21, 2017

shoyer Sep 21, 2017

shoyer Sep 21, 2017

guziy Sep 21, 2017

shoyer left a comment

shoyer Sep 23, 2017

guziy Sep 23, 2017

shoyer Sep 23, 2017

guziy Sep 23, 2017

jhamman commented Sep 28, 2017

jhamman commented Oct 10, 2017

jhamman commented Oct 10, 2017

shoyer left a comment

shoyer Oct 10, 2017

max-sixty Oct 10, 2017 •

edited

Loading

guziy Oct 10, 2017

shoyer Oct 10, 2017

guziy Oct 10, 2017

guziy Oct 10, 2017

shoyer Oct 10, 2017

jhamman Oct 10, 2017


		self.assertEqual(var_shape, coord_shape)

		def test_common_coord_dims_should_not_change_when_datavars_minimal(self):

data_vars option added to open_mfdataset #1580

data_vars option added to open_mfdataset #1580

Conversation

guziy commented Sep 19, 2017 • edited Loading

spencerahill commented Sep 19, 2017 • edited Loading

guziy commented Sep 19, 2017 • edited Loading

spencerahill commented Sep 19, 2017

guziy commented Sep 19, 2017

shoyer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shoyer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhamman commented Sep 28, 2017

jhamman commented Oct 10, 2017

jhamman commented Oct 10, 2017

shoyer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

max-sixty Oct 10, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guziy commented Sep 19, 2017 •

edited

Loading

spencerahill commented Sep 19, 2017 •

edited

Loading

guziy commented Sep 19, 2017 •

edited

Loading

max-sixty Oct 10, 2017 •

edited

Loading