Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Naming Conventions: "_data", "data", "values", "_values" #19294

Closed
jbrockmendel opened this issue Jan 18, 2018 · 4 comments
Closed

Naming Conventions: "_data", "data", "values", "_values" #19294

jbrockmendel opened this issue Jan 18, 2018 · 4 comments
Labels
Deprecate Functionality to remove in pandas Internals Related to non-user accessible pandas implementation

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented Jan 18, 2018

A bunch of different classes have one or more of the attributes data, _data, values, _values, plus an assortment of external_values, internal_values, formatting_values, get_values. These mean different things in different places.

Maintenance would be easier if the naming conventions were more uniform. Index has all four of these attributes and I'm not sure there exists a nice backwards-compatible way to reconcile them with the naming in Series/DataFrame. Any thoughts? Does anything else think this matters?

(Motivating example: "Where are all the places in the code that touch a BlockManager. Let's just grep for \.data...")

The lowest-hanging fruit for cleanup here is in the Accessor classes. StringAccessor, SeriesPlotMethods, and FramePlotMethods all define _data to point back to their parent Series/Index, Series, and Frame, respectively. I suggest that _data be replaced with just _parent. The other two existing accessors CategoricalAccessor and CombinedDatetimelikeProperties use categories and values for these, respectively. Ideally these would get standardized to _parent in the process.

Another option would be to change NDFrame._data to something like NDFrame._mgr so it there is little risk of name-overlap. I expect this would meet more resistance than the accessor cleanup idea.

@jreback
Copy link
Contributor

jreback commented Jan 18, 2018

well .values is an external property. I don't think there should be any usage of .data

@jbrockmendel
Copy link
Member Author

well .values is an external property. I don't think there should be any usage of .data

Yah, .values is pretty set-in-stone, though there may be some non-external cases like DatetimeProperties.values that could be avoided.

.data looks like it usually is the IndexOpsMixin.data property which points to self.values.data. There are a bunch of other places that set self.data = (just based on grep) that could be given more informative names if they are not public-facing.

@chris-b1
Copy link
Contributor

Index.data and Series.data are likely leftovers from ndarray subclassing / compat - I'd think they could be deprecated and removed.

@jbrockmendel
Copy link
Member Author

Subsumed by #19658

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

No branches or pull requests

3 participants