Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError when saving a Pandas object with Pandas 0.24.0 or greater #36

Closed
slwatkins opened this issue Apr 19, 2019 · 4 comments · Fixed by #37
Closed

AttributeError when saving a Pandas object with Pandas 0.24.0 or greater #36

slwatkins opened this issue Apr 19, 2019 · 4 comments · Fixed by #37

Comments

@slwatkins
Copy link

slwatkins commented Apr 19, 2019

When saving a pd.Series, pd.DataFrame, or pd.Panel to HDF5 using deepdish, an AttributeError is raised, and I cannot save the file. I've tracked down the issue, and it's due to a change in Pandas version 0.24.0.

Here is how I've been able to reproduce the error, where I have installed Pandas 0.24.2, Numpy 0.15.4, deepdish 0.3.6, and PyTables 3.5.1.

import pandas as pd
import numpy as np
import deepdish as dd

dd.io.save("test.h5", {"test" : pd.Series(data=np.random.rand(1))}, )

The error returned is:

---------------------------------------------------------------------------
NoSuchNodeError                           Traceback (most recent call last)
/galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/pandas/io/pytables.py in get_node(self, key)
   1159                 key = '/' + key
-> 1160             return self._handle.get_node(self.root, key)
   1161         except _table_mod.exceptions.NoSuchNodeError:

/galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/tables/file.py in get_node(self, where, name, classname)
   1643             nodepath = join_path(basepath, name or '') or '/'
-> 1644             node = where._v_file._get_node(nodepath)
   1645         elif isinstance(where, (six.string_types, numpy.str_)):

/galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/tables/file.py in _get_node(self, nodepath)
   1598 
-> 1599         node = self._node_manager.get_node(nodepath)
   1600         assert node is not None, "unable to instantiate node ``%s``" % nodepath

/galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/tables/file.py in get_node(self, key)
    436         if self.node_factory:
--> 437             node = self.node_factory(key)
    438             self.cache_node(node, key)

/galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/tables/group.py in _g_load_child(self, childname)
   1180         # Is the node a group or a leaf?
-> 1181         node_type = self._g_check_has_child(childname)
   1182 

/galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/tables/group.py in _g_check_has_child(self, name)
    397                 "group ``%s`` does not have a child named ``%s``"
--> 398                 % (self._v_pathname, name))
    399         return node_type

NoSuchNodeError: group ``/`` does not have a child named ``//test``

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<ipython-input-2-60c97adec230> in <module>
----> 1 dd.io.save("test4.h5", {"test" : pd.Series(data=np.random.rand(1))}, )

~/.local/lib/python3.7/site-packages/deepdish-0.3.4-py3.7.egg/deepdish/io/hdf5io.py in save(path, data, compression)
    587             for key, value in data.items():
    588                 _save_level(h5file, group, value, name=key,
--> 589                             filters=filters, idtable=idtable)
    590 
    591         elif (_sns and isinstance(data, SimpleNamespace) and

~/.local/lib/python3.7/site-packages/deepdish-0.3.4-py3.7.egg/deepdish/io/hdf5io.py in _save_level(handler, group, level, name, filters, idtable)
    256         store = _HDFStoreWithHandle(handler)
    257 #         print(store.get_node(group._v_pathname))
--> 258         store.append(group._v_pathname + '/' + name, level)
    259 
    260     elif isinstance(level, (sparse.dok_matrix,

/galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/pandas/io/pytables.py in append(self, key, value, format, append, columns, dropna, **kwargs)
    984         kwargs = self._validate_format(format, kwargs)
    985         self._write_to_group(key, value, append=append, dropna=dropna,
--> 986                              **kwargs)
    987 
    988     def append_to_multiple(self, d, value, selector, data_columns=None,

/galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/pandas/io/pytables.py in _write_to_group(self, key, value, format, index, append, complib, encoding, **kwargs)
   1365     def _write_to_group(self, key, value, format, index=True, append=False,
   1366                         complib=None, encoding=None, **kwargs):
-> 1367         group = self.get_node(key)
   1368 
   1369         # remove the node if we are not appending

/galbascratch/samwatkins/anaconda3/lib/python3.7/site-packages/pandas/io/pytables.py in get_node(self, key)
   1159                 key = '/' + key
   1160             return self._handle.get_node(self.root, key)
-> 1161         except _table_mod.exceptions.NoSuchNodeError:
   1162             return None
   1163 

AttributeError: 'NoneType' object has no attribute 'exceptions'

From the above, we see that the _table_mod variable is None, which is throwing the error. The reason that this is now an error is related to pandas-dev/pandas#22919, where the exception in HDFStore.get_node was changed from a bare exception to a specific exception.

Before: https://github.com/pandas-dev/pandas/blob/2d0c96119391c85bd4f7ffbb847759ee3777162a/pandas/io/pytables.py#L1157-L1165

After: https://github.com/pandas-dev/pandas/blob/master/pandas/io/pytables.py#L1141-L1149

So, now the _table_mod variable is used to only return None in the case that the exception is a NoSuchNodeError, rather than any error. However, _table_mod should be set by running of the function pandas.io.pytables._tables, which imports PyTables into the namespace as _table_mod. If this function is not run, then _table_mod is left as None, and the above AttributeError occurs.

The problem is that in deepdish's use of pandas.io.pytables.HDFStore, where there's a wrapper of the function called _HDFStoreWithHandle, none of the methods that call the _tables function are called, and _table_mod is left as None, which gives us the AttributeError.

My proposed solution is to add one line to the beginning hdf5io.py file in deepdish, where we call the pandas.io.pytables._tables .

Before:

from __future__ import division, print_function, absolute_import
import numpy as np
import tables
import warnings
from scipy import sparse
from deepdish import conf
try:
import pandas as pd
_pandas = True
except ImportError:
_pandas = False

After:

from __future__ import division, print_function, absolute_import

import numpy as np
import tables
import warnings
from scipy import sparse
from deepdish import conf
try:
    import pandas as pd
    pd.io.pytables._tables()
    _pandas = True
except ImportError:
    _pandas = False

After making this change, I no longer get the AttributeError and the saving of Pandas data types works seamlessly.

@basnijholt
Copy link

@slwatkins, unfortunately, I think deepdish is unmaintained.

@slwatkins
Copy link
Author

Yeah, I kind of figured based on the commit history... Mostly wanted to put this here in case anyone else runs into the same issue, and so the time I spent on this bug didn't go completely to waste!

@vigji
Copy link

vigji commented Oct 12, 2019

We forked the project and made a minimal version of the library for file loading/saving only, where we fixed this bug. Probably you already have your own fork but just in case https://github.com/portugueslab/flammkuchen

@herrlich10
Copy link

Adding the one line of code suggested by @slwatkins solves the same issue for me.
I want to say thank you for your effort and sharing.

It is sad to hear that deepdish is no more maintained. This is one of my favorite packages and it seems best to be merged into pandas itself.

m4ce pushed a commit to m4ce/deepdish that referenced this issue Apr 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants