BUG: Categorical data fails to load from hdf when all columns are NaN #18652

ssche · 2017-12-06T00:46:31Z

closes BUG: Categorical data fails to load from hdf when all columns are NaN #18413
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry: Allow storing NaN-only categorical columns in hdf5 store

* Handle all-NaN columns differently when building metadata for categorical axes on saving hdf5 file * Categorical axes fail test case comparison due to type difference (even though there isn't a visibly type difference)

* Change empty category to `Index([], dtype=np.float64)` instead of `[]`. * Remove printouts in test case.

pep8speaks · 2017-12-06T00:46:40Z

Hello @ssche! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 10, 2017 at 15:54 Hours UTC

jschendel · 2017-12-06T01:07:25Z

Thanks. The whatsnew entry should be added to pandas/doc/source/whatsnew/v0.22.0.txt under the I/O subsection within the Bug Fixes section.

jreback

lgtm. you could add this to 0.21.1 bug fixes in IO section.

jreback · 2017-12-06T01:34:29Z

pandas/tests/io/test_pytables.py

+        })
+        df['a'] = df.a.astype('category')
+        df['b'] = df.b.astype('category')
+        expected = df.copy()


you don't need to copy here

v0.22.0.txt or 0.21.1? - I added it to 0.22.0 as per @jschendel's request.

Go with what @jreback said, he'd know better than me. I was just basing 0.22.0 off the issue originally having the "Next Major Release" milestone, but that can change.

ssche · 2017-12-06T12:09:43Z

I’m keen to get this accepted. What’s the next step?

jreback · 2017-12-07T11:30:31Z

doc/source/whatsnew/v0.21.1.txt

@@ -91,6 +91,7 @@ I/O
 - Bug in :meth:`DataFrame.to_msgpack` when serializing data of the numpy.bool_ datatype (:issue:`18390`)
 - Bug in :func:`read_json` not decoding when reading line deliminted JSON from S3 (:issue:`17200`)
 - Bug in :func:`pandas.io.json.json_normalize` to avoid modification of ``meta`` (:issue:`18610`)
+- Bug when storing NaN-only categorical columns in hdf5 store (:issue:`18413`)


Bug when reading ......in a :class:`HDFStore`

jreback · 2017-12-07T11:32:47Z

pandas/tests/io/test_pytables.py

+        # be read back.
+        df = pd.DataFrame({
+            'a': ['a', 'b', 'c', np.nan],
+            'b': [np.nan, np.nan, np.nan, np.nan],


can you add another column with an array like pd.Series([None]* 3, dtype=object). this might fail your test because the original array was an all-null object type (and no float). but let's see.

done, seems to work fine.

* Added additional all-None Series * Provided more detail in whatsnew description

codecov · 2017-12-08T10:11:39Z

Codecov Report

Merging #18652 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #18652      +/-   ##
==========================================
- Coverage   91.59%   91.57%   -0.02%     
==========================================
  Files         153      153              
  Lines       51221    51223       +2     
==========================================
- Hits        46917    46910       -7     
- Misses       4304     4313       +9

Flag	Coverage Δ
#multiple	`89.43% <0%> (-0.01%)`	⬇️
#single	`40.68% <100%> (-0.1%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/pytables.py	`92.84% <100%> (ø)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.81% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 13f6267...e6aad40. Read the comment docs.

codecov · 2017-12-08T10:11:52Z

Codecov Report

Merging #18652 into master will decrease coverage by 0.04%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #18652      +/-   ##
==========================================
- Coverage   91.61%   91.57%   -0.05%     
==========================================
  Files         153      153              
  Lines       51305    51307       +2     
==========================================
- Hits        47001    46982      -19     
- Misses       4304     4325      +21

Flag	Coverage Δ
#multiple	`89.43% <0%> (-0.03%)`	⬇️
#single	`40.71% <100%> (-0.1%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/pytables.py	`92.84% <100%> (ø)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/plotting/_converter.py	`64.78% <0%> (-1.74%)`	⬇️
pandas/core/frame.py	`97.81% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1355df6...b2ac7c4. Read the comment docs.

jreback · 2017-12-10T15:55:09Z

@ssche had a linting error. this should be ok. ping on green.

jreback · 2017-12-10T18:28:12Z

thanks @ssche

…pandas-dev#18652) (cherry picked from commit 2db1cc0)

…#18652) (cherry picked from commit 2db1cc0)

Version 0.22.0 * tag 'v0.22.0': (777 commits) RLS: v0.22.0 DOC: Fix min_count docstring (pandas-dev#19005) DOC: More 0.22.0 updates (pandas-dev#19002) TST: Remove pow test in expressions COMPAT: Avoid td.skip decorator DOC: 0.22.0 release docs (pandas-dev#18983) DOC: Include 0.22.0 whatsnew Breaking changes for sum / prod of empty / all-NA (pandas-dev#18921) ENH: Added a min_count keyword to stat funcs (pandas-dev#18876) RLS: v0.21.1 DOC: Add date to whatsnew (pandas-dev#18740) DOC: Include 0.21.1 whatsnew DOC: Update relase notes (pandas-dev#18739) CFG: Ignore W503 DOC: fix options table (pandas-dev#18730) ENH: support non default indexes in writing to Parquet (pandas-dev#18629) BUG: Fix to_latex with longtable (pandas-dev#17959) (pandas-dev#17960) Parquet: Add error message for no engine found (pandas-dev#18717) BUG: Categorical data fails to load from hdf when all columns are NaN (pandas-dev#18652) DOC: clean-up whatsnew file for 0.21.1 (pandas-dev#18690) ...

* releases: (777 commits) RLS: v0.22.0 DOC: Fix min_count docstring (pandas-dev#19005) DOC: More 0.22.0 updates (pandas-dev#19002) TST: Remove pow test in expressions COMPAT: Avoid td.skip decorator DOC: 0.22.0 release docs (pandas-dev#18983) DOC: Include 0.22.0 whatsnew Breaking changes for sum / prod of empty / all-NA (pandas-dev#18921) ENH: Added a min_count keyword to stat funcs (pandas-dev#18876) RLS: v0.21.1 DOC: Add date to whatsnew (pandas-dev#18740) DOC: Include 0.21.1 whatsnew DOC: Update relase notes (pandas-dev#18739) CFG: Ignore W503 DOC: fix options table (pandas-dev#18730) ENH: support non default indexes in writing to Parquet (pandas-dev#18629) BUG: Fix to_latex with longtable (pandas-dev#17959) (pandas-dev#17960) Parquet: Add error message for no engine found (pandas-dev#18717) BUG: Categorical data fails to load from hdf when all columns are NaN (pandas-dev#18652) DOC: clean-up whatsnew file for 0.21.1 (pandas-dev#18690) ...

ssche and others added 2 commits December 5, 2017 13:47

Fixed #18413, but test case not passing

3f07908

* Handle all-NaN columns differently when building metadata for categorical axes on saving hdf5 file * Categorical axes fail test case comparison due to type difference (even though there isn't a visibly type difference)

Fixed test case (#18413)

7a4f93a

* Change empty category to `Index([], dtype=np.float64)` instead of `[]`. * Remove printouts in test case.

Removed trailing whitespace

8540a4a

jreback changed the title ~~Gh18413~~ BUG: Categorical data fails to load from hdf when all columns are NaN Dec 6, 2017

jreback added Categorical Categorical Data Type IO HDF5 read_hdf, HDFStore Bug labels Dec 6, 2017

jreback requested changes Dec 6, 2017

View reviewed changes

ssche added 5 commits December 6, 2017 12:39

Removed unnecessary dataframe copy

928c258

Update whatsnew section

b3766f7

Moved whatsnew entry to 0.21.1

52dc141

Merge branch 'master' into gh18413

ea89a7c

Merge branch 'master' into gh18413

551f0a2

jreback requested changes Dec 7, 2017

View reviewed changes

Addressed requested changes

e6aad40

* Added additional all-None Series * Provided more detail in whatsnew description

ssche added 2 commits December 10, 2017 15:27

Merge branch 'master' into gh18413

7cf7336

Merge branch 'master' into gh18413

a69ea02

jreback added this to the 0.21.1 milestone Dec 10, 2017

jreback added the Needs Backport label Dec 10, 2017

jreback added 2 commits December 10, 2017 10:53

Merge branch 'master' into PR_TOOL_MERGE_PR_18652

538e7fe

lint fixes

b2ac7c4

jreback approved these changes Dec 10, 2017

View reviewed changes

jreback merged commit 2db1cc0 into pandas-dev:master Dec 10, 2017

ssche deleted the gh18413 branch December 10, 2017 20:42

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Dec 11, 2017

BUG: Categorical data fails to load from hdf when all columns are NaN (…

13f8ffa

…pandas-dev#18652) (cherry picked from commit 2db1cc0)

TomAugspurger pushed a commit that referenced this pull request Dec 12, 2017

BUG: Categorical data fails to load from hdf when all columns are NaN (…

95735e0

…#18652) (cherry picked from commit 2db1cc0)

TomAugspurger removed the Needs Backport label Dec 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Categorical data fails to load from hdf when all columns are NaN #18652

BUG: Categorical data fails to load from hdf when all columns are NaN #18652

ssche commented Dec 6, 2017

pep8speaks commented Dec 6, 2017 •

edited

Loading

jschendel commented Dec 6, 2017

jreback left a comment

jreback Dec 6, 2017

ssche Dec 6, 2017

ssche Dec 6, 2017

jschendel Dec 6, 2017 •

edited

Loading

ssche Dec 6, 2017

ssche commented Dec 6, 2017

jreback Dec 7, 2017

ssche Dec 8, 2017

jreback Dec 7, 2017

ssche Dec 8, 2017

codecov bot commented Dec 8, 2017

codecov bot commented Dec 8, 2017 •

edited

Loading

jreback commented Dec 10, 2017

jreback commented Dec 10, 2017

BUG: Categorical data fails to load from hdf when all columns are NaN #18652

BUG: Categorical data fails to load from hdf when all columns are NaN #18652

Conversation

ssche commented Dec 6, 2017

pep8speaks commented Dec 6, 2017 • edited Loading

Comment last updated on December 10, 2017 at 15:54 Hours UTC

jschendel commented Dec 6, 2017

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jschendel Dec 6, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssche commented Dec 6, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Dec 8, 2017

Codecov Report

codecov bot commented Dec 8, 2017 • edited Loading

Codecov Report

jreback commented Dec 10, 2017

jreback commented Dec 10, 2017

pep8speaks commented Dec 6, 2017 •

edited

Loading

jschendel Dec 6, 2017 •

edited

Loading

codecov bot commented Dec 8, 2017 •

edited

Loading