-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent behaviour when calling apply() on a categorical column with missing data #20714
Comments
Thanks for the report. This is partially a symptom of / related to #15706, but that is more an API issue this is an actual bug. If the resulting map against the categories isn't unique we take against them, but are using pandas/pandas/core/arrays/categorical.py Line 1156 in 4a34497
|
Yes, please do!
…On Tue, Apr 17, 2018, 10:42 PM Paula ***@***.***> wrote:
Hey @chris-b1 <https://github.com/chris-b1> can I work on this bug?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#20714 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB1b_KUmZ6u2GAsCSXNVQ4z5dWyQjY7Gks5tprYUgaJpZM4TWn7A>
.
|
Hi @nprad, I'm participating of a sprint tomorrow and I plan to work on this issue. Please don't be discouraged, there are more than 2000 issues waiting to be worked on, you will surely find something else that you're interested. Also, if for any reason I'm unable to work on this issue specifically, I will ping you, but as I said, it has been in my plan. |
Hello @chris-b1, I currently have the result below. Should
|
Yes, that's consistent with the current API - there's certainly an argument for changing it (#15706), but to just fix the bug that behavior is fine. In [4]: s1 = pd.Series(['1-1','1-2'], dtype='category')
In [5]: s1.apply(lambda x: x.split('-')[0])
Out[5]:
0 1
1 1
dtype: object |
@chris-b1 great! I will write the tests based on that. I will also take a look at the older issue you mentioned. |
Hey @ladydata, did you write those tests? If not, I'm up to take the task. |
hey @manuhortet, yes I did write a couple of tests but haven't completed the process. I will set a deadline to complete by the end of the month, and if it's not done by then, from August 1st it's all yours, sounds good? |
Hi again @ladydata, should I take this already? 😄 |
@manuhortet sure, go ahead! |
…ith missing data pandas-dev#20714 SOLVED
Code Sample, a copy-pastable example if possible
Problem description
In the above code,
s1
shows the expected behaviour. We are trying to transform a categorical series by getting the part before the hyphen, and for rows where the original value isNaN
the output is alsoNaN
.The series
s2
shows the unexpected behaviour - note only a single change to the original series, the middle value has changed from'1-1'
to'1-2'
. The third value, which wasNaN
in the original series now becomes'1'
in the output rather than staying asNaN
. Also, the dtype of the result series is nowobject
rather thancategory
. It looks like maybe theNaN
is somehow getting the applied value of the previous row.Expected Output
Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here below this line]INSTALLED VERSIONS
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-38-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8
pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 38.4.0
Cython: None
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.1
openpyxl: None
xlrd: None
xlwt: 1.3.0
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: