-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
concat produces incorrect output #3602
Comments
@rhstanton It's helpful if you can put your output in "``" so that it prints in a monospaced font. It's easier on the eyes. :) I can reproduce this on git master. What is the expected output? |
Ah. looks like there's a sorting problem here... |
I agree it looks terrible! Does the output go in quotes in my notebook or when I upload to github? If you could give me a quick example of how to do this, I'd be more than happy to help others' eyesight in future. From: Phillip Cloud <notifications@git.luolix.topmailto:notifications@github.com> @rhstantonhttps://github.com/rhstanton It's helpful if you can put your output in "``" so that it prints in a monospaced fonts. It's easier on the eyes. :) — |
Surround anything you want monospaced type with backquotes, i.e., the ** |
@cpcloud any luck with this? |
Nah not yet, but I haven't given it more than a cursory glance. I will look
|
@jreback @rhstanton What is expected output? 2 Is this the expected output? (I will assume it is since it is the least magical thing
(heh i will try to parse this with |
Yes, I’d expect the output you show below, just with the right column headings (2 prc columns with the same values, but only because they were passed in with the same values. If they’d had different values in df1 and df2, I’d expect two prc columns with different contents). Best, Richard From: Phillip Cloud [mailto:notifications@github.com] @jrebackhttps://github.com/jreback @rhstantonhttps://github.com/rhstanton What is expected output? 2 prc columns with the same values? one of the merge behaviors? if u do use dfs from abovedf = concat([df1, df2], axis, ignore_index=True) print df 0 1 2 3 4 5 0 0 6 Rrr 9 1 6 1 0 6 Rrr 10 2 6 2 0 6 Rrr 11 3 6 3 0 6 Rrr 12 4 6 Is this what u want except with the original column indices? — |
@jreback This is a strange beast i went all the way into ndframe.init and back up to concat. values attrs test the same, prolly a repr bug now |
nvm something else... |
@jreback AH HA! the bug is that the |
828f9f9 fixed the series version of this by just assigning the columns after the concat. is that the correct fix here? don't think so, maybe can ignore index if ignore_index is false and there are dup cols and axis is 1 |
Looks like that would work given the results of your earlier concat without column names. From: Phillip Cloud [mailto:notifications@github.com] 828f9f9828f9f99 fixed the series version of this by just assigning the columns after the concat. is that the correct fix here? — |
u might be right, need to see how to do this in a sane way... |
i wonder if an |
let me take a look |
should be fixed by #3647, once I figured out was going on, fix was trivial @cpcloud you were basically right, the newly created block has a non-unique index, so the block manager tries to create _ref_locs on each block, but this is wrong because it doesn't have an indexer map for the axes -> block locations (but of course have one when we are creating the blocks in the first place, so just set it there) this worked in <= 0.11, but not in master because of the changes in non-unqique indexes non-unique are a bit of an animal! |
|
That looks a lot better. Thanks. From: jreback [mailto:notifications@github.com] In [3]: df1 = DataFrame({'firmNo' : [0,0,0,0], 'stringvar' : ['rrr', 'rrr', 'rrr', 'rrr'], 'prc' : [6,6,6,6] }) In [4]: df2 = DataFrame({'misc' : [1,2,3,4], 'prc' : [6,6,6,6], 'C' : [9,10,11,12]}) In [5]: df1 Out[5]: firmNo prc stringvar 0 0 6 rrr 1 0 6 rrr 2 0 6 rrr 3 0 6 rrr In [6]: df2 Out[6]:
0 9 1 6 1 10 2 6 2 11 3 6 3 12 4 6 In [7]: pd.concat([df1,df2],axis=1) Out[7]: firmNo prc stringvar C misc prc 0 0 6 rrr 9 1 6 1 0 6 rrr 10 2 6 2 0 6 rrr 11 3 6 3 0 6 rrr 12 4 6 In [8]: pd.concat([df1,df2],axis=1).dtypes Out[8]: firmNo int64 prc int64 stringvar object C int64 misc int64 prc int64 dtype: object — |
This is the part that I missed: "but of course have one when we are creating the blocks in the first place, so just set it there" arg :) @jreback thanks. |
Under certain circumstances, concat seems to produce erroneous results. I haven't worked out what causes the problems to arise, but here's an example:
df1 = DataFrame({'firmNo' : [0,0,0,0], 'stringvar' : ['rrr', 'rrr', 'rrr', 'rrr'], 'prc' : [6,6,6,6] })
df2 = DataFrame({'misc' : [1,2,3,4], 'prc' : [6,6,6,6], 'C' : [9,10,11,12]})
concat([df1,df2],axis=1)
produces as output:
firmNo prc stringvar C misc prc
0 rrr 0 6 9 1 6
1 rrr 0 6 10 2 6
2 rrr 0 6 11 3 6
3 rrr 0 6 12 4 6
The text was updated successfully, but these errors were encountered: