Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data_vars option added to open_mfdataset #1580

Merged
merged 29 commits into from
Oct 10, 2017

Conversation

guziy
Copy link
Contributor

@guziy guziy commented Sep 19, 2017

  • Closes xray.open_mfdataset concatenates also variables without time dimension #438

  • Added tests (they are passing currently)

  • Passes git diff upstream/master | flake8 --diff
    There were some long lines that in my opinion look better when not broken, but I complied with flake8 request to shorten them.

  • Fully documented, including whats-new.rst for all changes and api.rst for new API (Do not think the change is big enough to be added to the files)

@spencerahill
Copy link
Contributor

spencerahill commented Sep 19, 2017

have not run tests yet

These will definitely be needed. Existing tests for open_mfdataset are in xarray/tests/test_backends.py. @shoyer and/or @jhamman can hopefully help you out if you need more guidance.

Not sure how to install flake8

pip install flake8 should work. Alternatively if you use conda conda install -c conda-forge flake8

Do not think the change is big enough to be added to the files

A brief note in What's New under the Enhancements section would actually be appropriate. Just follow the example of the others in that section.

Thanks for taking this on! We actually just bumped across this (here), so your fix will immediately benefit more than just you.

@guziy
Copy link
Contributor Author

guziy commented Sep 19, 2017

Thanks @spencerahill:

I have run the flake8 test and modified the whats-new.rst.
I think I need more guidance with the tests.
I tried using nosetests and get the following exceptions (I am sure I am doing smth wrong here):

$ nosetests
/Users/huziy/anaconda/envs/py3.6/lib/python3.6/site-packages/matplotlib/__init__.py:917: UserWarning: axes.hold is deprecated. Please remove it from your matplotlibrc and/or style files.
  warnings.warn(self.msg_depr_set % key)
/Users/huziy/anaconda/envs/py3.6/lib/python3.6/site-packages/matplotlib/rcsetup.py:152: UserWarning: axes.hold is deprecated, will be removed in 3.0
  warnings.warn("axes.hold is deprecated, will be removed in 3.0")
E
======================================================================
ERROR: Failure: AttributeError (module 'pytest' has no attribute 'config')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/huziy/anaconda/envs/py3.6/lib/python3.6/site-packages/nose/failure.py", line 39, in runTest
    raise self.exc_val.with_traceback(self.tb)
  File "/Users/huziy/anaconda/envs/py3.6/lib/python3.6/site-packages/nose/loader.py", line 417, in loadTestsFromName
    addr.filename, addr.module)
  File "/Users/huziy/anaconda/envs/py3.6/lib/python3.6/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/Users/huziy/anaconda/envs/py3.6/lib/python3.6/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/Users/huziy/anaconda/envs/py3.6/lib/python3.6/imp.py", line 244, in load_module
    return load_package(name, filename)
  File "/Users/huziy/anaconda/envs/py3.6/lib/python3.6/imp.py", line 216, in load_package
    return _load(spec)
  File "<frozen importlib._bootstrap>", line 675, in _load
  File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 205, in _call_with_frames_removed
  File "/Users/huziy/PythonProjects/xarray/xarray/tests/__init__.py", line 130, in <module>
    _SKIP_FLAKY = not pytest.config.getoption("--run-flaky")
AttributeError: module 'pytest' has no attribute 'config'

----------------------------------------------------------------------
Ran 1 test in 0.582s

FAILED (errors=1)

Cheers

@spencerahill
Copy link
Contributor

Try running them via pytest instead (pip install pytest or conda install pytest if you don't have it already).

Note that you'll need to add new test(s) that cover the modifications you have made -- not just run the existing tests.

@guziy
Copy link
Contributor Author

guziy commented Sep 19, 2017

Thanks @spencerahill

py.test seems to be working

Cheers

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together!

coord = ds[self.coord_name][:]
coord_expect = ds_expect[self.coord_name][:]

self.assertArrayEqual(data, data_expect)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make use of self.assertDatasetIdentical() to shorten up these tests a bit?

var_shape1[0] + var_shape2[0])

def test_invalid_data_vars_value_should_fail(self):
with self.assertRaises(ValueError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to only go around the line where you expect the error


self.assertEqual(var_shape, coord_shape)

def test_common_coord_dims_should_not_change_when_datavars_minimal(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very similar to the last test -- can you maybe consolidate it?

Or you could even potentially drop some of these tests. We have unit tests for concat and open_mfdataset already, so the main thing we need to verify is that the keyword argument gets properly passed on. We don't need to check here that every possible way to use it is handled correctly.

compat=compat, data_vars=data_vars)
combined._file_obj = _MultiFileCloser(file_objs)
combined.attrs = datasets[0].attrs
except ValueError as ve:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's only wrap the lines where this could fail -- so this should be moved up two lines, before combined._file_obj is assigned.

compat=compat, data_vars=data_vars)
combined._file_obj = _MultiFileCloser(file_objs)
combined.attrs = datasets[0].attrs
except ValueError as ve:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just use except ValueError: here and a plain raise below.

@@ -431,7 +431,7 @@ def close(self):

def open_mfdataset(paths, chunks=None, concat_dim=_CONCAT_DIM_DEFAULT,
compat='no_conflicts', preprocess=None, engine=None,
lock=None, **kwargs):
lock=None, data_vars='all', **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness, would it also make sense to pass on the coords option at this time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @shoyer:

I have added the coords keyword in a similar manner as data_vars.

I'll probably have to add a test for it as well.

Cheers

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple more minor cleanup suggestions, but this is looking great, thank you!

# tests to be applied to respective pairs
if opt == 'all':
tests = [self.assertEqual,
self.assertNotEqual, self.assertNotEqual]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's best to avoid putting complex control flow in tests -- it makes them harder to debug. I would actually prefer if you wrote two different tests here with a bit more copied code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've split the test into 2, let me know if this is what you meant.

Cheers

ds1.to_netcdf(tmpfile1)
ds2.to_netcdf(tmpfile2)

files = [tmpfile1, tmpfile2]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put this shared logic in a context manager?

e.g.,

@contextlib.contextmanager
def setup_files(self):
      with create_tmp_file() as tmpfile1:
            with create_tmp_file() as tmpfile2:
                ds1, ds2 = self.gen_datasets_with_common_coord_and_time()

                # save data to the temporary files
                ds1.to_netcdf(tmpfile1)
                ds2.to_netcdf(tmpfile2)

                yield [tmpfile1, tmpfile2]

def test_open_mfdataset_does_same_as_concat(self): 
    with self.setup_files() as files:
        ...

setUp/tearDown methods would also work, with ExitStack.enter_context() and .close().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shoyer:

I like the contextmanager trick a lot. I did feel like there should be a better way to set up tests. Actually, I have never used it before.

Cheers

…g up test inputs in OpenMFDatasetWithDataVarsAndCoordsKwTest
@jhamman
Copy link
Member

jhamman commented Sep 28, 2017

LGTM. @crusaderky - I think you should look at this in conjunction with #1551.

@jhamman
Copy link
Member

jhamman commented Oct 10, 2017

I think this is ready to go, any final objections?

@jhamman jhamman merged commit 27132fb into pydata:master Oct 10, 2017
@jhamman
Copy link
Member

jhamman commented Oct 10, 2017

Thanks @guziy!

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed review here, but I'm still not quite happy with the tests here. @guziy would you mind fixing this up in a follow-up? Thanks!

self.assertNotEqual, self.assertNotEqual]

for a_test, a_shape_pair in zip(tests, shape_pairs):
a_test(*a_shape_pair)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole section should be more explicit, e.g.,

self.assertEqual(var_shape, coord_shape)
self.assertNotEqual(coord_shape1, coord_shape)
self.assertNotEqual(coord_shape2, coord_shape)

That way, we avoid the confusing loop.

Copy link
Collaborator

@max-sixty max-sixty Oct 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* but assert_equal(var_shape, coord_shape), in line with the updated test framework!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaximilianR :
I am comparing shapes there, not dataarrays.
Cheers

var_shape = ds[self.var_name].shape

# shape pairs to be compared
shape_pairs = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shoyer :
It'll actually be less code this way)
Cheers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have committed the changes, should I open another pull request?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please. Once a PR is merged you need to open another one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shoyer and @guziy - my apologies for the premature merge.

@guziy guziy mentioned this pull request Oct 11, 2017
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants