-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
json_normalize does not normalize subrecords properly if any subrecords values are NoneType #20030
Labels
Bug
IO JSON
read_json, to_json, json_normalize
Missing-data
np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone
Comments
I would personally prefer the second option. Feel free to post the PR! |
gfyoung
added
Bug
Missing-data
np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
IO JSON
read_json, to_json, json_normalize
and removed
IO JSON
read_json, to_json, json_normalize
labels
Mar 8, 2018
@gfyoung Were you referring to opening this as two separate issues? |
@aerymilts : I was referring to the second output being preferable to the first. |
aerymilts
added a commit
to aerymilts/pandas
that referenced
this issue
Mar 18, 2018
…0030) TST: additional coverage for the test cases from (pandas-dev#20030) DOC: added changes to whatsnew/v0.23.0.txt (pandas-dev#20030)
4 tasks
aerymilts
added a commit
to aerymilts/pandas
that referenced
this issue
Mar 20, 2018
jreback
pushed a commit
that referenced
this issue
Mar 20, 2018
nehiljain
added a commit
to nehiljain/pandas
that referenced
this issue
Mar 21, 2018
…ame_describe * upstream/master: (158 commits) Add link to "Craft Minimal Bug Report" blogpost (pandas-dev#20431) BUG: fixed json_normalize for subrecords with NoneTypes (pandas-dev#20030) (pandas-dev#20399) BUG: ExtensionArray.fillna for scalar values (pandas-dev#20412) DOC" update the Pandas core window rolling count docstring" (pandas-dev#20264) DOC: update the pandas.DataFrame.plot.hist docstring (pandas-dev#20155) DOC: Only use ~ in class links to hide prefixes. (pandas-dev#20402) Bug: Allow np.timedelta64 objects to index TimedeltaIndex (pandas-dev#20408) DOC: add disallowing of Series construction of len-1 list with index to whatsnew (pandas-dev#20392) MAINT: Remove weird pd file DOC: update the Index.isin docstring (pandas-dev#20249) BUG: Handle all-NA blocks in concat (pandas-dev#20382) DOC: update the pandas.core.resample.Resampler.fillna docstring (pandas-dev#20379) BUG: Don't raise exceptions splitting a blank string (pandas-dev#20067) DOC: update the pandas.DataFrame.cummax docstring (pandas-dev#20336) DOC: update the pandas.core.window.x.mean docstring (pandas-dev#20265) DOC: update the api.types.is_number docstring (pandas-dev#20196) Fix linter (pandas-dev#20389) DOC: Improved the docstring of pandas.Series.dt.to_pytimedelta (pandas-dev#20142) DOC: update the pandas.Series.dt.is_month_end docstring (pandas-dev#20181) DOC: update the window.Rolling.min docstring (pandas-dev#20263) ...
dworvos
pushed a commit
to dworvos/pandas
that referenced
this issue
Apr 2, 2018
This was referenced May 24, 2018
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Bug
IO JSON
read_json, to_json, json_normalize
Missing-data
np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Code Sample, a copy-pastable example if possible
Output 1
Output 2
Problem description
I expected that the json_normalize function takes into account the presence of NoneTypes in the dictionaries. This leads to 2 separate issues (If I should open this as 2 separate issues, let me know).
I have already written a fix that solves this issue - if anyone else can validate that this is not working as intended, I can set up a PR.
Output 1
Does not unnest json after encountering NoneType at first instance of subrecord, line 192 of
pandas/io/json/normalize.py
Output 2
Keeps the None value when encountered, [{k: {'alpha': 'foo', 'beta': 'bar'}}, {k: None}], see
nested_to_record
function. Creates additional column ofnan
s which would not otherwise occur if that particular key was removed.Expected Output
Output 1
Output 2
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-104-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 38.4.0
Cython: None
numpy: 1.13.1
scipy: None
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: