-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indicate index of rows for which an apply() statement fails #614
Comments
I like it, an easy change too. Here a self-contained example:
then the traceback would look be:
I also noticed that if you apply a function to the rows on a mixed-type DataFrame (like with the above example) that you lose the type information. I added a small type inference hack to convert things back. |
Aside: it'd be nice to add vectorized string functions to pandas, similar to hadley's stringr package. They could also be made NA-friendly |
Great! I should note that, looking into the problem a bit further, the float values were np.nan, which were generated when I imported a CSV with some blank values in a column. In other words, I suspect anyone applying string functions to a column with missing values will hit this issue. Which brings me to another potential improvement: verbose data import that indicates the number of missing values automatically filled in. |
"Aside: it'd be nice to add vectorized string functions to pandas, similar to hadley's stringr package. They could also be made NA-friendly" |
…values filled in non-numeric columns per comment on #614
@hammer, OK I'll bite on that (this would have been useful information to me in the past). Hard to add a lot of verbosity without sacrificing performance but getting a basic NA count for non-numeric columns seems pretty useful:
looks like
Something that can definitely be fleshed out over time |
I'm writing some code to transform a column of my data frame. I expect all string values in this column, and I'm using x.startswith() on the column contents as part of the transformation logic. When I try to apply this transformation to each column in the data frame using df.apply(), the transformation is failing, claiming that it's trying to look up the startswith attribute on a float object. It would be useful for me to know on which row this transformation is failing; could that information be added to the traceback?
The text was updated successfully, but these errors were encountered: