Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFStore silently coerces number to string in 'where' clause but not in put #512

Closed
bshanks opened this issue Dec 20, 2011 · 1 comment
Closed
Labels
Milestone

Comments

@bshanks
Copy link

bshanks commented Dec 20, 2011

Thanks for pandas! Once written, tables with numerical values in the index are unselectable. Perhaps either they should be coerced to string upon store.put, or alternately they should not be coerced in store.select:

store = HDFStore('test.h5')
store.put('test', DataFrame([0, 1, 2], [10, 11, 12], ['col1']), table=True)
store.select('test', where=[{'field' : 'index','op'    : '>=','value' : 11}])

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)

/home/bshanks/prog/atr/<ipython console> in <module>()

/usr/local/lib/python2.7/dist-packages/pandas/io/pytables.pyc in select(self, key, where)
    235             raise Exception('can only select on objects written as tables')
    236         if group is not None:
--> 237             return self._read_group(group, where)
    238 
    239     def put(self, key, value, table=False, append=False,

/usr/local/lib/python2.7/dist-packages/pandas/io/pytables.pyc in _read_group(self, group, where)
    619         kind = _LEGACY_MAP.get(kind, kind)
    620         handler = self._get_handler(op='read', kind=kind)
--> 621         return handler(group, where)
    622 
    623     def _read_series(self, group, where=None):

/usr/local/lib/python2.7/dist-packages/pandas/io/pytables.pyc in _read_frame_table(self, group, where)
    646 
    647     def _read_frame_table(self, group, where=None):
--> 648         return self._read_panel_table(group, where)['value']
    649 
    650     def _read_panel_table(self, group, where=None):

/usr/local/lib/python2.7/dist-packages/pandas/io/pytables.pyc in _read_panel_table(self, group, where)
    655         # create the selection

    656         sel = Selection(table, where)
--> 657         sel.select()
    658         fields = table._v_attrs.fields
    659 

/usr/local/lib/python2.7/dist-packages/pandas/io/pytables.pyc in select(self)
    861         """
    862         if self.the_condition:
--> 863             self.values = self.table.readWhere(self.the_condition)
    864 
    865         else:

/usr/lib/python2.7/dist-packages/tables/table.pyc in readWhere(self, condition, condvars, field, start, stop, step)
   1271 
   1272         coords = [ p.nrow for p in
-> 1273                    self._where(condition, condvars, start, stop, step) ]
   1274         self._whereCondition = None  # reset the conditions
   1275         return self.readCoordinates(coords, field)

/usr/lib/python2.7/dist-packages/tables/table.pyc in _where(self, condition, condvars, start, stop, step)
   1225         # Compile the condition and extract usable index conditions.

   1226         condvars = self._requiredExprVars(condition, condvars, depth=3)
-> 1227         compiled = self._compileCondition(condition, condvars)
   1228 
   1229         # Can we use indexes?


/usr/lib/python2.7/dist-packages/tables/table.pyc in _compileCondition(self, condition, condvars)
   1101         indexedcols = frozenset(indexedcols)
   1102         # Now let ``compile_condition()`` do the Numexpr-related job.

-> 1103         compiled = compile_condition(condition, typemap, indexedcols, copycols)
   1104 
   1105         # Check that there actually are columns in the condition.


/usr/lib/python2.7/dist-packages/tables/conditions.pyc in compile_condition(condition, typemap, indexedcols, copycols)
    154     except NotImplementedError, nie:
    155         # Try to make this Numexpr error less cryptic.

--> 156         raise _unsupported_operation_error(nie)
    157     params = varnames
    158 

NotImplementedError: unsupported operand types for *ge*: long, str
@jreback
Copy link
Contributor

jreback commented Jan 27, 2012

this actually is unsupported in the current implementation of the table read/write of a table, definition requires the index to be a TimeCol64, so you must currently pass a datetime object (which is converted via mktime(value.timetuple()) to the required value for comparison - I believe what you show in your example doesn't work because the passed value is not a datetime, it generates a string comparison on the 'index' column (and this generates the error)

it would be possible to add another table format with say a float index (figuring out written data is easy via the pandas attribute recorded in the table, but specifiying which table format you want would require an option passed via append)

I tend to use columns when I have a panel of the form: items x time x tickers...which lends itself to the current format

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants