-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ER: compound dtypes - DataFrame constructor/astype #4464
Comments
this will be fixed, but is there a case where you need to pass the |
actually, this is not currently implemented |
this should raise now, marking as a bug for that (it expectes a single dtype, not a compound one) |
@jreback so it should raise? what's wrong with accepting a compound |
@cpcloud nothing wrong with accepting it, but its NotImplementedError (until it is) |
OK, thanks, guys. |
@mamikony going to reopen...thanks for noticing this....don't really have any test for this |
So, what do you think about the second issue of astype() throwing? |
same issue, its setup to deal with a single dtype (not compound). the purpose is to coerce your data. What is your goal here? |
I had a series where the elements were compound strings,
So I wanted to split them into a data frame with appropriate types, but I got this garbage:
I'll just have to do it column by column. |
The split creates a series with elements that are lists
|
Fair enough, thanks. You can't set the dtype, but at least it guesses correctly. I need precise control of the dtype because I write it out to an HDF5 file (with Pytables and df.to_records()) and I'd like to have the proper dtype at the start. Also, I didn't know about the str namespace. Thanks. |
@mamikony you can certainly change it if you like (just do it one-by-one) are you not using |
I understand about the HDFStore, and I use it to read files. But unfortunately, I don't like to use it for output because it creates Pandas-specific HDF5 files that look quite incomprehensible for standard tools, e.g., h5ls(1). Also, I usually don't want to save details like index. At least I remember this to be the case a few months ago, I'm not sure if there is a new way of creating clean HDF5 files. |
up 2 you. They actually are fully compatible HDF5 Files, just with extra meta-data (and the indices are just columns). |
Sure, but if you're sharing data with people who use other tools and languages to read your HDF5 files, then your meta-data becomes extraneous garbage. Actually, if I could put in a feature request, I'd like to ask you guys to include an option to create plain HDF5 files. I think there is an option (don't remember right now) that makes slightly cleaner files, but even that puts extra stuff in. In any case, I appreciate your help. |
Neither of those options produce the correct output, see below. The internal lists are being stored in one cell instead of being exploded into rows. I'd be happy to know if there was an idiomatic way to load this in one step.
|
you are doing things in a very inefficient manner |
I don't want to store lists. I want the DataFrame structure from the previous post. Do you know of a fast, idiomatic way to create that structure from the json I provided? |
use read_json directly |
Like this?
It doesn't work because in my example code, Please give me a clear example of what you mean. Again, I want an fast, idiomatic way to go from this:
to this:
|
|
|
Right, it's the creation of DataFrames in a tight loop that causes the overhead, and unfortunately it looks like this can't be avoided without changing my input data structure. My original question was whether passing a compound |
@ehein6 you don't need to create a dataframe in each part of the loop, just do it once at the end. |
Your latest example is calling |
my example is not the same as yours |
In this case they are identical. Using just Best of 20 runs: Either way, both examples are still creating a DataFrame in a loop. |
Got any ideas on how one can fix this? I have a list of dictionaries, and I'm just doing Also, consider changing the title of this Github issue? Most people in here and other mentioned issues all talking about passing a dictionary into dtypes either during construction or after it's already defined. Kind of a cryptic title. |
Any news or recommendation on this subject ? 10 years it has been open.. |
xref #9133, maybe allow a dict of dtypes to be passed as well
I trying to use the dtype argument in the DataFrame constructor to set the types of several columns, and I'm getting incorrect types. Everything works well, however, when the dtypes come from the recarray itself.
But if I use the dtype constructor argument, I get incorrect types:
The astype() member function doesn't work either:
I'm using Pandas 0.11.0 from Anaconda.
Thanks in advance.
The text was updated successfully, but these errors were encountered: