-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH/BUG/DOC: allow propogation and coexistance of numeric dtypes #2708
Conversation
This is pretty great. I'm going to delay merging until post 0.10.1 (which we're sprinting on now, critical bug fixes only), but only to have a chance to beat on it some. |
@wesm agreed...even though the change is not that big, this touches like everything indirectly. there might be some wierd corner cases. |
Travis all green - ready 4 merging! |
…ndas-dev#622) construction of multi numeric dtypes with other types in a dict validated get_numeric_data returns correct dtypes added blocks attribute (and as_blocks()) method that returns a dict of dtype -> homogeneous Frame to DataFrame added keyword 'raise_on_error' to astype, which can be set to false to exluded non-numeric columns fixed merging to correctly merge on multiple dtypes with blocks (e.g. float64 and float32 in other merger) changed implementation of get_dtype_counts() to use .blocks revised DataFrame.convert_objects to use blocks to be more efficient added Dtype printing to show on default with a Series added convert_dates='coerce' option to convert_objects, to force conversions to datetime64[ns] where can upcast integer to float as needed (on inplace ops pandas-dev#2793) added fully cythonized support for int8/int16 no support for float16 (it can exist, but no cython methods for it) TST: fixed test in test_from_records_sequencelike (dict orders can be different on different arch!) NOTE: using tuples will remove dtype info from the input stream (using a record array is ok though!) test updates for merging (multi-dtypes) added tests for replace (but skipped for now, algos not set for float32/16) tests for astype and convert in internals fixes for test_excel on 32-bit fixed test_resample_median_bug_1688 I belive separated out test_from_records_dictlike testing of panel constructors (GH pandas-dev#797) where ops now have a full test suite allow slightly less sensitive decimal tests for less precise dtypes BUG: fixed GH pandas-dev#2778, fillna on empty frame causes seg fault fixed bug in groupby where types were not being casted to original dtype respect the dtype of non-natural numeric (Decimal) don't upcast ints/bools to floats (if you say were agging on len, you can get an int) DOC: added astype conversion examples to whatsnew and docs (dsintro) updated RELEASE notes whatsnew for 0.10.2 added upcasting gotchas docs CLN: updated convert_objects to be more consistent across frame/series moved most groupby functions out of algos.pyx to generated.pyx fully support cython functions for pad/bfill/take/diff/groupby for float32 moved more block-like conversion loops from frame.py to internals.py (created apply method) (e.g. diff,fillna,where,shift,replace,interpolate,combining), to top-level methods in BlockManager
Just merged this and next release will be 0.11. Will start looking through PRs that depend on it |
Support for numeric dtype propogation and coexistance in DataFrames. Prior to 0.10.2, numeric dtypes passed to DataFrames were always casted to
int64
orfloat64
. Now, if a dtype is passed (either directly via thedtype
keyword, a passedndarray
, or a passedSeries
, then it will be preserved in DataFrame operations. Furthermore, different numeric dtypes will NOT be combined. The following example will give you a taste. This closes GH #622other changes introduced in this PR (i removed all datetime like issues to PR # 2752 - should be merged first)
ENH:
blocks
attribute (and as_blocks()) method that returns adict of dtype -> homogeneous dtyped DataFrame, analagous to
values
attributeget_dtype_counts()
to useblocks
attributeconvert_objects()
to use the internals method convert (which is block operated)convert_numeric=False
toconvert_objects
to force numeric conversion (or set tonp.nan
, turned off by default)convert_dates='coerce'
toconvert_objects
to force datetimelike conversions (or set toNaT
) for invalid values, turned off by default, returns datetime64[ns] dtypeall cython functions are implemented
generated_code.py
e.g. (group_add,group_mean)
float32/int16/int8
support for all numeric operations, including (diff, backfill, pad, take
)dtype
display to show on Series as a defaultBUG:
NOTE: using tuples will remove dtype info from the input stream (using a record array is ok though!)
where
when using inplace ops (BUG: DataFrame inplace where doesn't work for mixed datatype frames #2793)TST:
where
DOC:
It would be really helpful if some users could give this a test run before merging. I have put in test cases for numeric operations, combining with DataFrame and Series, but I am sure there are some corner cases that were missed
the example from #622
Conversion examples