-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Series dtype casting to platform numeric (GH #2751) #2838
Conversation
just curious, does this still work without overflow if the list explicitly contains longs? (i.e. |
that first commit blew up....was always using platform int, but that's not exactly right, trying asarray (which is what DataFrame does)... |
can you take a look with either of my commits? these should work |
sure. i guess you're not developing on 32-bit so you don't have an easy way to test this? i'll look into it after i finish with something else... |
nope....64-bit...linux....no easy way to even put a dev build (i did it on windows, but then all kind of things are hard :) I guess it comes down to: should @wesm, @changhiskhan any thoughts? |
latest commit only breaks 4 tests on 32-bit! |
I need to rebase this...later.. |
Yeah, I agree that it's better to use the same code path throughout for consistentcy. Not entirely sure that upcasting to I found some other issues in |
ok..thanks....let me know if you have any success with this branch.....I cannot easily debug this....but think that unless explicty types, ints get |
@jreback, you can setup a VM for 32 bit testing with |
oh...this looks nice....any particular box you think? |
@y-p ignore my last - didn't read it |
http://www.dejonghenico.be/unix/setup-vagrant-and-small-quick-start setup was pretty easy |
@y-p any chance u have a chef recipe laying around? |
sorry. I think once you create the box, you can edit the shared folders |
it's more like a Travis install script to setup the base environment |
I wouldn't bother with chef, just use virtualenv for 3.x, like you would on your local box. |
I just did manual installation now to debug! |
…eric when a list is specified; use the Series codepath for initial list conversion (change from using DataFrame) TST: added test for overflow in df creation
@stephenwlin ok I fixed this up API change on 32- bit is that {'a' : [1,2] } will still be int64 |
Looks good but I'm curious why you're calling Also, from looking at the test it seems that now |
also "platform independent manor" typo in doc/source/v0.11.0.txt |
did all of these 3 cases yield int64 in 0.10.1 on 32-bit? |
what's wrong with 'platform independent manor'? |
getting closer! @stephenwlin FYI I moved _dtype_from_scalar to common |
platform independent "manner", right? |
0.10.0 good enough? In [1]: import pandas as p
In [2]: p.__version__
Out[2]: '0.10.0'
In [3]: p.DataFrame({'a': 1}, index=[0]).dtypes
Out[3]: a int64
In [4]: p.DataFrame({'a': [1, 2]}).dtypes
Out[4]: a int64
In [5]: p.DataFrame([1, 2]).dtypes
Out[5]: 0 int64 |
yes ok so the behavior I am suggesting will not actually change the API - that is good |
duh! thxs On Feb 13, 2013, at 11:41 AM, stephenwlin notifications@github.com wrote:
|
finally all tests pass! @stephenwlin pls take a look when you have a chance.... |
yield _infer_dtype_from_scalar
in rehashpe.py - removed block2d_to_block3d in favor of block2d_to_blocknd
It looks great! But I changed the signature of If you want to proactively fix it, I've already resolved the differences in "stephenwlin/dtypes_bug" (currently pointing to f74571f7a613d1f971c99cac7a53ee077b1582f6), so if you "git reset" to that you'll have all your changes merged with the If you want to look at the differences to make sure they're ok, just take a look at the commit 05a4991f014f7ed55b6d8270f06c3c554f05189b, which shows only the differences between the current state of your branch (at 3cb91f0) and "stephenwlin/dtypes_bug", without any rebasing. If you'd rather not bother, that's ok too...I can always do the conflict resolution again later after one of our PRs gets merged. |
actually with that change I think I can get rid of _maybe_upcast I think we are close to merging - what do u think of this order maybe_convert_objects (you) we each can rebase after each step |
@wesm any comments on these 4 branches |
@stephenwlin also should either reverse return values from _maybe_promote or _infer_dtype_fr_scalar value, dtype vs dtype,value I don't have a preference |
i prefer "dtype, value" since both of their names refer primarily to the dtype (modifying the value appropriately is just an extra benefit) |
great I will change btw - how can I pull in your change commit for _maybe_promote to mine |
hmm, at this point, probably just do
|
I don't think it's that necessary to get rid of
is pretty succinct and might be useful in more places than I what I used it for so far. (it guarantees either a straight copy or an upcasted copy, but won't copy twice in the latter case as was being done previously where I added it) |
and _infer_dtype_from_scalar to match (both return dtype, fill_value) Diff between 'jreback/dtypes_bug' and 'stephenwlin/dtypes_bug' Conflicts: pandas/core/common.py
all fixed up...thanks...that was a neat trick! andn I changed _infer_dtype_from_scalar return signature to match once travis finishes....can prob start merging and issue with merge order? |
The merge order is ok in theory but I tested it and the conflicts are kind of nasty right now because git isn't smart enough to figure out that we made more-or-less the same changes twice. Can you do: git fetch stephenwlin
git checkout dtypes_bug
# just in case
git branch dtypes_bug_backup dtypes_bug
git reset stephenwlin/dtypes_bug
# compare against dtypes_bug_backup (should be no change)
git diff dtypes_bug_backup
git branch -d dtypes_bug_backup "stephenwlin/dtypes_bug" should be identical to your "dtypes_bug" except rebased against b3202ebc282eace11de3089498a5a5ea3689f9e4 EDIT: sorry, actually that doesn't help that much (most of the conflicts are still there)...never mind for now... |
the branches are identical... what is conflicting? if i merged dtypes_bug to master...then you can rebase....if i have the changes (and you do u) |
lots of stuff in common.py conflicts if you merge "stephenwlin/opt-take-2" on top of "jreback/dtypes_bug" but I can't really figure out why. it's not a big deal because I know how they resolve but I won't be doing the actual merge into master so I figure I might as well try to proactively arrange things so that the conflicts go away, helping out whoever does do it. I just can't seem to figure out how, though: not enough git-fu yet. it'll be fine if I rebase "stephenwlin/opt-take-2" after "jreback/dtypes_bug" is in master and resolve things then before it's merged. if you're going to do it yourself, just let me know and I'll rebase after when I get a chance. I just figured I could fix it proactively, but I guess not. |
(i went through the commits one by and and the two branches are basically independent after rebasing against b3202ebc282eace11de3089498a5a5ea3689f9e4 ... it's really odd that they conflict so much if you try to merge them...oh well, I'm not going to try to spend more time fixing this) |
ok you can then rebase ok? |
yeah, there's no real conflicts it just looks like there are. |
ok merged your convert branch |
just waiting for travis to finish...then will merge...i had no problems rebasing (to be fair rebase only on top of convert_objects)! |
@stephenwlin you are up....rebase and let me know.... |
all fixed up |
use int64/float64 defaults in construction when dtype is not specified
this removes a platform dependency issue that caused dataframe and series to
have different dtypes on 32- bit
fixes issues raised in PR #2837