Fix bug in converting a DH table with all columns of the same type #4566

alexpeters1208 · 2023-09-28T16:55:26Z

There is a bug in the table.to_pandas() function that causes tables that have columns of all the same data type to not be properly converted. Here is an example of the bug:

from deephaven import empty_table
from deephaven import pandas as dhpd

test_table1 = empty_table(10).update(["X = i", "Y = Math.sin(X)"])
test_df1 = dhpd.to_pandas(test_table1.view(["X", "Y"]))

test_table2 = empty_table(10).update(["X = (float)i", "Y = (float)Math.sin(X)"])
test_df2 = dhpd.to_pandas(test_table2.view(["X", "Y"]))

Here are the resulting tables and metadata:

This is due to a small block of code that creates a set containing all of the column datatypes, checks to see if this set is singular, and makes a call to np.stack() to collect the data into a single nd-array before calling pd.DataFrame. The problem is that np.stack() is not made aware of the desired datatypes of its elements, and they are not type-aware automatically, so the resulting nd-array has type object. This entire process can be bypassed, as pandas can look at the data and figure out the types correctly, regardless of whether or not all columns are the same.

I suspect stacking things into a single nd-array if appropriate before calling to_pandas() was done for efficiency. I am curious what the set creation, checking length of the set, creating the list to call np.stack() on, and calling np.stack() costs in terms of efficiency. This whole process adds at least two for-loops through the columns. It is not clear to me that this beats just calling pd.DataFrame() on the dict of data without worrying about what the types are. Someone can probably tell me more about this.

Here are the results of the above code with my fix:

jmao-denver

I think the changes make sense:

np.stack actually make copies of the input arrays
pd.DataFrame will not make a copy of the data if the values of the input dict are all pd.Series and the copy=Flase.

Fix bug in converting a DH table with all columns of the same type

b13411b

alexpeters1208 added bug Something isn't working triage NoDocumentationNeeded python-server-side NoReleaseNotesNeeded No release notes are needed. labels Sep 28, 2023

alexpeters1208 requested a review from chipkent as a code owner September 28, 2023 16:55

alexpeters1208 self-assigned this Sep 28, 2023

alexpeters1208 requested review from jmao-denver and rcaudy as code owners September 28, 2023 16:55

jmao-denver approved these changes Sep 28, 2023

View reviewed changes

alexpeters1208 merged commit cca5058 into main Sep 28, 2023
20 checks passed

alexpeters1208 deleted the fix-to-pandas-bug branch September 28, 2023 18:26

github-actions bot locked and limited conversation to collaborators Sep 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bug in converting a DH table with all columns of the same type #4566

Fix bug in converting a DH table with all columns of the same type #4566

alexpeters1208 commented Sep 28, 2023

jmao-denver left a comment

Fix bug in converting a DH table with all columns of the same type #4566

Fix bug in converting a DH table with all columns of the same type #4566

Conversation

alexpeters1208 commented Sep 28, 2023

jmao-denver left a comment

Choose a reason for hiding this comment