-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Categorical data fails to load from hdf when all columns are NaN #18652
Conversation
* Handle all-NaN columns differently when building metadata for categorical axes on saving hdf5 file * Categorical axes fail test case comparison due to type difference (even though there isn't a visibly type difference)
* Change empty category to `Index([], dtype=np.float64)` instead of `[]`. * Remove printouts in test case.
Hello @ssche! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on December 10, 2017 at 15:54 Hours UTC |
Thanks. The whatsnew entry should be added to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. you could add this to 0.21.1 bug fixes in IO section.
pandas/tests/io/test_pytables.py
Outdated
}) | ||
df['a'] = df.a.astype('category') | ||
df['b'] = df.b.astype('category') | ||
expected = df.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need to copy here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v0.22.0.txt or 0.21.1? - I added it to 0.22.0 as per @jschendel's request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go with what @jreback said, he'd know better than me. I was just basing 0.22.0 off the issue originally having the "Next Major Release" milestone, but that can change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
I’m keen to get this accepted. What’s the next step? |
doc/source/whatsnew/v0.21.1.txt
Outdated
@@ -91,6 +91,7 @@ I/O | |||
- Bug in :meth:`DataFrame.to_msgpack` when serializing data of the numpy.bool_ datatype (:issue:`18390`) | |||
- Bug in :func:`read_json` not decoding when reading line deliminted JSON from S3 (:issue:`17200`) | |||
- Bug in :func:`pandas.io.json.json_normalize` to avoid modification of ``meta`` (:issue:`18610`) | |||
- Bug when storing NaN-only categorical columns in hdf5 store (:issue:`18413`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug when reading ......in a :class:`HDFStore`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, done.
# be read back. | ||
df = pd.DataFrame({ | ||
'a': ['a', 'b', 'c', np.nan], | ||
'b': [np.nan, np.nan, np.nan, np.nan], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add another column with an array like pd.Series([None]* 3, dtype=object)
. this might fail your test because the original array was an all-null object type (and no float). but let's see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, seems to work fine.
* Added additional all-None Series * Provided more detail in whatsnew description
Codecov Report
@@ Coverage Diff @@
## master #18652 +/- ##
==========================================
- Coverage 91.59% 91.57% -0.02%
==========================================
Files 153 153
Lines 51221 51223 +2
==========================================
- Hits 46917 46910 -7
- Misses 4304 4313 +9
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #18652 +/- ##
==========================================
- Coverage 91.61% 91.57% -0.05%
==========================================
Files 153 153
Lines 51305 51307 +2
==========================================
- Hits 47001 46982 -19
- Misses 4304 4325 +21
Continue to review full report at Codecov.
|
@ssche had a linting error. this should be ok. ping on green. |
thanks @ssche |
…pandas-dev#18652) (cherry picked from commit 2db1cc0)
Version 0.22.0 * tag 'v0.22.0': (777 commits) RLS: v0.22.0 DOC: Fix min_count docstring (pandas-dev#19005) DOC: More 0.22.0 updates (pandas-dev#19002) TST: Remove pow test in expressions COMPAT: Avoid td.skip decorator DOC: 0.22.0 release docs (pandas-dev#18983) DOC: Include 0.22.0 whatsnew Breaking changes for sum / prod of empty / all-NA (pandas-dev#18921) ENH: Added a min_count keyword to stat funcs (pandas-dev#18876) RLS: v0.21.1 DOC: Add date to whatsnew (pandas-dev#18740) DOC: Include 0.21.1 whatsnew DOC: Update relase notes (pandas-dev#18739) CFG: Ignore W503 DOC: fix options table (pandas-dev#18730) ENH: support non default indexes in writing to Parquet (pandas-dev#18629) BUG: Fix to_latex with longtable (pandas-dev#17959) (pandas-dev#17960) Parquet: Add error message for no engine found (pandas-dev#18717) BUG: Categorical data fails to load from hdf when all columns are NaN (pandas-dev#18652) DOC: clean-up whatsnew file for 0.21.1 (pandas-dev#18690) ...
* releases: (777 commits) RLS: v0.22.0 DOC: Fix min_count docstring (pandas-dev#19005) DOC: More 0.22.0 updates (pandas-dev#19002) TST: Remove pow test in expressions COMPAT: Avoid td.skip decorator DOC: 0.22.0 release docs (pandas-dev#18983) DOC: Include 0.22.0 whatsnew Breaking changes for sum / prod of empty / all-NA (pandas-dev#18921) ENH: Added a min_count keyword to stat funcs (pandas-dev#18876) RLS: v0.21.1 DOC: Add date to whatsnew (pandas-dev#18740) DOC: Include 0.21.1 whatsnew DOC: Update relase notes (pandas-dev#18739) CFG: Ignore W503 DOC: fix options table (pandas-dev#18730) ENH: support non default indexes in writing to Parquet (pandas-dev#18629) BUG: Fix to_latex with longtable (pandas-dev#17959) (pandas-dev#17960) Parquet: Add error message for no engine found (pandas-dev#18717) BUG: Categorical data fails to load from hdf when all columns are NaN (pandas-dev#18652) DOC: clean-up whatsnew file for 0.21.1 (pandas-dev#18690) ...
git diff upstream/master -u -- "*.py" | flake8 --diff