Disable .str-accessor for byte data #23011
Labels
API Design
Error Reporting
Incorrect or improved errors from pandas
Strings
String extension data type and string data
Milestone
This supersedes #22721.
Pandas is trying to straddle many different chasms, which leads to undesirable behaviour on the fringes. For the purpose of this issue, I'm talking mainly about
From the first point, we have the inconsistent handling of str vs. bytes, so having the Series-concatenator work with bytes is a necessity in Python 2.
Mostly due to the second point, there's no proper string dtype, it's just hiding in the
object
dtype. I started #22721 as a side issue which came up while refactoring in #22725. Then I got told that:However, it works already -- the
Series.str
-accessor already checks that it can only be called on an object column, but there's not much more it can do (not least because inspecting every element of a Series would be very performance-intense). Consequently,.str.cat
currently does work on bytes data, and easily at that:Long story short - this issue supersedes #22721, and should serve as a long term goal to disable
.str
once Python 2 gets dropped and/or there is a string dtype.The text was updated successfully, but these errors were encountered: