-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Optionally pass dtypes as a dict into json_normalize #33414
Comments
I attempted to write code for this and referenced this issue in commit. |
Is there a workaround for this? I would prefer to just have everything stay as string. This causes issues when trying to interpret data where leading zeroes are important. |
We've encountered this issue too: reading in JSON that contains integer values, with some missing, results in the ints being forced to floats (since there is no NaN for ints) and Pandas rendering them like 5.0 instead of 5. Usually, when creating a dataframe, this can be prevented by setting Interestingly, in the very first example in the json_normalize() docs at https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html, this issue is visible - in the little example data with family names, |
Is your feature request related to a problem?
In some of my projects, the data I need to load as DataFrame comes in a json list.
We switched to
json_normalize
for its ease of use, but some memory/performance issues came as the default typing within this method outputs mostlyobject
,int64
andfloat64
column dtypes for us - all of which are the most memory-demanding of their categories.Describe the solution you'd like
Depends on #4464.
I envision passing
dtype
as a dict, so for a format likeNormalizing would look like
Following the conclusion of #4464, the
dtype
arg would be passed to the DataFrame constructor withinjson_normalize
.API breaking implications
As of now, the code itself mentions a problem regarding metadata field typing.
I didn't dive in enough to determine how to deal with this. I see no breaking change coming from this feature, but if I understand correctly that will always override metadata field types with
object
.If so the condition could be changed to
Describe alternatives you've considered
I've made a separate module which runs
json_normalize
and then overrides the resulting DataFrame's dtypes dynamically throughas_type
andapply
.The text was updated successfully, but these errors were encountered: