Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Series[bool][key] = np.nan -> cast to object #38709

Merged
merged 23 commits into from
Jan 28, 2021

Conversation

jbrockmendel
Copy link
Member

sits on top of #38688

@jreback jreback added the Indexing Related to indexing on series/frames, not to indexes themselves label Dec 27, 2020
@jbrockmendel
Copy link
Member Author

yikes, the existing behavior is even worse than i thought:

ser = pd.Series([True, False])
ser2 = ser.copy()

ser.iloc[1] = np.nan

ser2.iloc[1] = np.float64("NAN")

>>> ser
0    1.0
1    NaN
dtype: float64

>>> ser2
0    True
1     NaN
dtype: object

This fixes that; ill add a test. gentle ping @jreback this is probably the toughest of the outstanding setitem-related PRs

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with this change as long as we have a sub-section in api breaking.

@@ -106,8 +106,11 @@ def putmask_smart(values: np.ndarray, mask: np.ndarray, new) -> np.ndarray:
# preserves dtype if possible
return _putmask_preserve(values, new, mask)

# change the dtype if needed
dtype, _ = maybe_promote(new.dtype)
if values.dtype == bool and new.dtype.kind == "f":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, shouldn't maybe_promote handle this? (`maybe_promote(new.dtype, values.dtype) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldnt find_common_type make more sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yah this is nicer, updated

@jbrockmendel
Copy link
Member Author

have a sub-section in api breaking.

added + green

@jreback jreback added this to the 1.3 milestone Jan 28, 2021
@jreback jreback merged commit 3217b43 into pandas-dev:master Jan 28, 2021
@jreback
Copy link
Contributor

jreback commented Jan 28, 2021

thanks, very nice

@jbrockmendel
Copy link
Member Author

way cool. between this and #39163 ill be un-blocked on unifying a bunch of casting behavior i think

@jbrockmendel jbrockmendel deleted the bug-block-setitem branch January 28, 2021 03:53
@simonjayhawkins
Copy link
Member

The appears to be a performance regression here https://pandas.pydata.org/speed/pandas/#frame_methods.MaskBool.time_frame_mask_bools

There has been some improvement since.

(pandas-dev) simon@T3630:~/pandas/asv_bench (master)$ asv compare v1.2.5 1.3.x |grep time_frame_mask_bools
+     5.62±0.09ms       30.8±0.5ms     5.49  frame_methods.MaskBool.time_frame_mask_bools

profile

(pandas-dev) simon@T3630:~/pandas/asv_bench (master)$ asv profile frame_methods.MaskBool.time_frame_mask_bools 3217b436c754bdf62081408f0417ff2061702e75|head -n 30
· Discovering benchmarks
·· Uninstalling from conda-py3.8-Cython0.29.21-jinja2-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
·· Installing 3217b436 <v1.3.0rc0~1332> into conda-py3.8-Cython0.29.21-jinja2-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
· Profile data does not already exist. Running profiler now.
·· Benchmarking conda-py3.8-Cython0.29.21-jinja2-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
··· Running (frame_methods.MaskBool.time_frame_mask_bools--).
··· frame_methods.MaskBool.time_frame_mask_bools            42.4±0.5ms

Tue Jun 29 14:36:40 2021    /tmp/tmpoghq5rs1

         41910 function calls (41895 primitive calls) in 0.051 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.051    0.051 {built-in method builtins.exec}
        1    0.000    0.000    0.051    0.051 /home/simon/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/asv/benchmark.py:540(method_caller)
        1    0.001    0.001    0.051    0.051 /home/simon/pandas/asv_bench/benchmarks/frame_methods.py:307(time_frame_mask_bools)
        1    0.000    0.000    0.051    0.051 /home/simon/pandas/asv_bench/env/11a1c20ede452de2525075dc4a15eb94/lib/python3.8/site-packages/pandas/core/generic.py:9116(mask)
        1    0.000    0.000    0.051    0.051 /home/simon/pandas/asv_bench/env/11a1c20ede452de2525075dc4a15eb94/lib/python3.8/site-packages/pandas/core/generic.py:8963(where)
        1    0.000    0.000    0.051    0.051 /home/simon/pandas/asv_bench/env/11a1c20ede452de2525075dc4a15eb94/lib/python3.8/site-packages/pandas/core/generic.py:8801(_where)
        5    0.000    0.000    0.049    0.010 /home/simon/pandas/asv_bench/env/11a1c20ede452de2525075dc4a15eb94/lib/python3.8/site-packages/pandas/core/internals/managers.py:380(apply)
        1    0.000    0.000    0.048    0.048 /home/simon/pandas/asv_bench/env/11a1c20ede452de2525075dc4a15eb94/lib/python3.8/site-packages/pandas/core/internals/managers.py:548(where)
      2/1    0.000    0.000    0.047    0.047 /home/simon/pandas/asv_bench/env/11a1c20ede452de2525075dc4a15eb94/lib/python3.8/site-packages/pandas/core/internals/blocks.py:1259(where)
        1    0.000    0.000    0.037    0.037 /home/simon/pandas/asv_bench/env/11a1c20ede452de2525075dc4a15eb94/lib/python3.8/site-packages/pandas/core/internals/blocks.py:544(_maybe_downcast)
        1    0.000    0.000    0.037    0.037 /home/simon/pandas/asv_bench/env/11a1c20ede452de2525075dc4a15eb94/lib/python3.8/site-packages/pandas/core/internals/blocks.py:551(<listcomp>)
        1    0.000    0.000    0.037    0.037 /home/simon/pandas/asv_bench/env/11a1c20ede452de2525075dc4a15eb94/lib/python3.8/site-packages/pandas/core/internals/blocks.py:553(downcast)
        1    0.001    0.001    0.037    0.037 /home/simon/pandas/asv_bench/env/11a1c20ede452de2525075dc4a15eb94/lib/python3.8/site-packages/pandas/core/internals/blocks.py:474(split_and_operate)
      500    0.000    0.000    0.021    0.000 /home/simon/pandas/asv_bench/env/11a1c20ede452de2525075dc4a15eb94/lib/python3.8/site-packages/pandas/core/internals/blocks.py:583(f)
      500    0.001    0.000    0.021    0.000 /home/simon/pandas/asv_bench/env/11a1c20ede452de2525075dc4a15eb94/lib/python3.8/site-packages/pandas/core/dtypes/cast.py:192(maybe_downcast_to_dtype)

commit before

(pandas-dev) simon@T3630:~/pandas/asv_bench (master)$ asv profile frame_methods.MaskBool.time_frame_mask_bools 3217b436c754bdf62081408f0417ff2061702e75~1|head -n 30
· Discovering benchmarks
·· Uninstalling from conda-py3.8-Cython0.29.21-jinja2-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
·· Installing 6e579ed3 <v1.3.0rc0~1333> into conda-py3.8-Cython0.29.21-jinja2-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
· Profile data does not already exist. Running profiler now.
·· Benchmarking conda-py3.8-Cython0.29.21-jinja2-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
··· Running (frame_methods.MaskBool.time_frame_mask_bools--).
··· frame_methods.MaskBool.time_frame_mask_bools           5.29±0.03ms

Tue Jun 29 14:37:42 2021    /tmp/tmpqkntmn4t

         10274 function calls (10261 primitive calls) in 0.007 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.007    0.007 {built-in method builtins.exec}
        1    0.000    0.000    0.007    0.007 /home/simon/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/asv/benchmark.py:540(method_caller)
        1    0.000    0.000    0.007    0.007 /home/simon/pandas/asv_bench/benchmarks/frame_methods.py:307(time_frame_mask_bools)
        1    0.000    0.000    0.007    0.007 /home/simon/pandas/asv_bench/env/11a1c20ede452de2525075dc4a15eb94/lib/python3.8/site-packages/pandas/core/generic.py:9116(mask)
        1    0.000    0.000    0.006    0.006 /home/simon/pandas/asv_bench/env/11a1c20ede452de2525075dc4a15eb94/lib/python3.8/site-packages/pandas/core/generic.py:8963(where)
        1    0.000    0.000    0.006    0.006 /home/simon/pandas/asv_bench/env/11a1c20ede452de2525075dc4a15eb94/lib/python3.8/site-packages/pandas/core/generic.py:8801(_where)
        5    0.000    0.000    0.005    0.001 /home/simon/pandas/asv_bench/env/11a1c20ede452de2525075dc4a15eb94/lib/python3.8/site-packages/pandas/core/internals/managers.py:380(apply)
        1    0.000    0.000    0.004    0.004 /home/simon/pandas/asv_bench/env/11a1c20ede452de2525075dc4a15eb94/lib/python3.8/site-packages/pandas/core/internals/managers.py:548(where)
        1    0.000    0.000    0.004    0.004 /home/simon/pandas/asv_bench/env/11a1c20ede452de2525075dc4a15eb94/lib/python3.8/site-packages/pandas/core/internals/blocks.py:1277(where)
      501    0.000    0.000    0.001    0.000 /home/simon/pandas/asv_bench/env/11a1c20ede452de2525075dc4a15eb94/lib/python3.8/site-packages/pandas/core/dtypes/common.py:1354(is_bool_dtype)
        1    0.001    0.001    0.001    0.001 {method 'take' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.001    0.001 /home/simon/pandas/asv_bench/env/11a1c20ede452de2525075dc4a15eb94/lib/python3.8/site-packages/pandas/core/computation/expressions.py:239(where)
        1    0.000    0.000    0.001    0.001 /home/simon/pandas/asv_bench/env/11a1c20ede452de2525075dc4a15eb94/lib/python3.8/site-packages/pandas/core/computation/expressions.py:165(_where_numexpr)
        1    0.001    0.001    0.001    0.001 /home/simon/pandas/asv_bench/env/11a1c20ede452de2525075dc4a15eb94/lib/python3.8/site-packages/numexpr/necompiler.py:767(evaluate)
        1    0.000    0.000    0.001    0.001 /home/simon/pandas/asv_bench/env/11a1c20ede452de2525075dc4a15eb94/lib/python3.8/site-packages/pandas/core/internals/blocks.py:1351(<listcomp>)

I guess this is expected?

@jbrockmendel
Copy link
Member Author

I guess this is expected?

Can't comment on the size of the perf hit, but the existence is unsurprising since we're casting to object

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: setting np.nan into Series[bool]
4 participants