Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: pd.to_hdf(..., dropna=True) not dropping missing rows #37564

Merged
merged 9 commits into from
Nov 4, 2020
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -492,6 +492,7 @@ I/O
- Bug in output rendering of complex numbers showing too many trailing zeros (:issue:`36799`)
- Bug in :class:`HDFStore` threw a ``TypeError`` when exporting an empty :class:`DataFrame` with ``datetime64[ns, tz]`` dtypes with a fixed HDF5 store (:issue:`20594`)
- Bug in :class:`HDFStore` was dropping timezone information when exporting :class:`Series` with ``datetime64[ns, tz]`` dtypes with a fixed HDF5 store (:issue:`20594`)
- Bug in :meth:`DataFrame.to_hdf` was not dropping missing rows with ``dropna=True`` (:issue:`35719`)

Plotting
^^^^^^^^
Expand Down
3 changes: 3 additions & 0 deletions pandas/io/pytables.py
Original file line number Diff line number Diff line change
Expand Up @@ -268,6 +268,7 @@ def to_hdf(
data_columns=data_columns,
errors=errors,
encoding=encoding,
dropna=dropna,
)

path_or_buf = stringify_path(path_or_buf)
Expand Down Expand Up @@ -1051,6 +1052,7 @@ def put(
encoding=None,
errors: str = "strict",
track_times: bool = True,
dropna: bool = False,
):
"""
Store object in HDFStore.
Expand Down Expand Up @@ -1100,6 +1102,7 @@ def put(
encoding=encoding,
errors=errors,
track_times=track_times,
dropna=dropna,
)

def remove(self, key: str, where=None, start=None, stop=None):
Expand Down
25 changes: 20 additions & 5 deletions pandas/tests/io/pytables/test_store.py
Original file line number Diff line number Diff line change
Expand Up @@ -1253,17 +1253,32 @@ def test_append_all_nans(self, setup_path):
store.append("df2", df[10:], dropna=False)
tm.assert_frame_equal(store["df2"], df)

# Test to make sure defaults are to not drop.
# Corresponding to Issue 9382
def test_store_dropna(self, setup_path):
df_with_missing = DataFrame(
{"col1": [0, np.nan, 2], "col2": [1, np.nan, np.nan]}
{"col1": [0.0, np.nan, 2.0], "col2": [1.0, np.nan, np.nan]},
index=list("abc"),
)
df_without_missing = DataFrame(
{"col1": [0.0, 2.0], "col2": [1.0, np.nan]}, index=list("ac")
)

# # Test to make sure defaults are to not drop.
# # Corresponding to Issue 9382
with ensure_clean_path(setup_path) as path:
df_with_missing.to_hdf(path, "df", format="table")
reloaded = read_hdf(path, "df")
tm.assert_frame_equal(df_with_missing, reloaded)

with ensure_clean_path(setup_path) as path:
df_with_missing.to_hdf(path, "df_with_missing", format="table")
reloaded = read_hdf(path, "df_with_missing")
df_with_missing.to_hdf(path, "df", format="table", dropna=False)
reloaded = read_hdf(path, "df")
tm.assert_frame_equal(df_with_missing, reloaded)

with ensure_clean_path(setup_path) as path:
df_with_missing.to_hdf(path, "df", format="table", dropna=True)
reloaded = read_hdf(path, "df")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, do we recreate this exactly, I think so but don't really remember (is expected the same)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cherry-picked the example so it roundtrips

there are some dtype issues, for example a RangeIndex will come back as Int64Index. I'll look in the issue tracker and open a ticket if there isn't one already

tm.assert_frame_equal(df_without_missing, reloaded)

def test_read_missing_key_close_store(self, setup_path):
# GH 25766
with ensure_clean_path(setup_path) as path:
Expand Down