-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Index and Array constructors design #23212
Comments
My opening statement below: In a lot of places, the caller has some information about the type of On having a flexible |
More concrete, on the PeriodArray PR, there is currently:
@TomAugspurger I think this can be simplified a bit: I would move |
Just did this locally. Will push once tests all finish. Right now I have
The PeriodArray takes
everything else raises. Things look quite nice from a code clarity POV, and from a future code-reuse with DatetimeArray / TimedeltaArray. Specifically, I think removing |
That sounds good. Only, is the |
Yes, a typo, fixed. I wish we could really run MyPy on these types :) |
private from methods for construction seem fine as long as they are well defined and limited i DOnt want to see public from_ methods |
And do you see any in the discussion above? (I don't think the discussion is about this) |
no that’s the point - I thought u were adding public methods |
This discussion is about the different internal constructors (and also |
@jorisvandenbossche I understand. However things get conflated and we end up talking about everything. For PeriodArray itself I think you can just adopt the current structure, meaning The physical values (ordinals) should be by private from constructor. I think arrays of scalars can be converted by |
I'm finding it somewhat convenient, and not costly, to also accept Aside from that, it seems like we're in agreement. |
yep that seems ok to me. just want these to be consistent as possible across the EA, the model we have now is integer_array (though period_array is doing more) |
Are wee good with closing this? Should we wait till DatetimeArray is done? |
A change I've implemented in a WIP DatetimeArray branch that I'd like to get OKed here: putting This makes code-sharing much easier, we |
That's fine. Should PeriodArray be refactored to remove freq, and instead
take a dtype? Then we'd have a consistent
- data (ndarray)
- dtype (PeriodDtype, DatetimeTZDtype, datetime64[ns]
- copy
…On Fri, Nov 2, 2018 at 2:23 PM jbrockmendel ***@***.***> wrote:
A change I've implemented in a WIP DatetimeArray branch that I'd like to
get OKed here: putting dtype in all of the
DatetimeArray/TimedeltaArray/PeriodArray __init__ methods.
This makes code-sharing much easier, we type(self)(i8values,
freq=self.freq, dtype=self.dtype) works in all cases, in particular
without a need to special-case DatetimeTZDtype.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#23212 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIlr11rYJ0fEG1Va82wjnbKCEiSCCks5urLe1gaJpZM4XlBHB>
.
|
I don't think so; the others still take both freq and dtype. |
Mmm fair point.
…________________________________
From: jbrockmendel <notifications@github.com>
Sent: Friday, November 2, 2018 3:47:17 PM
To: pandas-dev/pandas
Cc: Tom Augspurger; Mention
Subject: Re: [pandas-dev/pandas] API: Index and Array constructors design (#23212)
Should PeriodArray be refactored to remove freq
I don't think so; the others still take both freq and dtype.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#23212 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ABQHIm3KzEsI067w4UYYQkW-0bWZXoX-ks5urMt1gaJpZM4XlBHB>.
|
Great. I'll make a PR for this edit after #23433 goes through. |
If we want to keep the constructors simple, and not have a bunch of redundant / overlapping keywords, we could also have a specific constructor for this path that can be used in the shared functionality? |
I've been asked to reiterate my thoughts here.
I see this as the default, with deviations from it needing justification, not the other way around. In #23493 Tom wrote:
Is it safe to assume that this near-consensus applies to ... why would this be the goal? You want a constructor that is no-inference and no-copy? Congratulations, you've just invented |
I see this as the default, with deviations from it needing justification,
not the other way around.
I (and several other maintainers) appreciate the flexibility of series, but
have grown to dislike the edge cases, difficult to understand code, and
performance
costs associated with it. We'll still need the complex code in whatever
function is going to the complex work of taking user-provided input and
wrangling it
into an array, but we don't need to pay that cost every time we look create
an array.
This policy makes private the most useful thing and public the least
useful thing.
How so? The user-friendly API will still be available via. `pd.array()`.
While things like TimedeltaArray.__init__ are of course public, they won't
be recommend for use.
FYI: IntervalIndex only accepts a sequence of Intervals. It doesn't attempt
any kind of fancy / forgiving initialization (e.g. from a list of tuples).
…On Thu, Nov 8, 2018 at 5:36 PM jbrockmendel ***@***.***> wrote:
I've been asked to reiterate my thoughts here.
Series(...) is very forgiving/smart about what it accepts. So is DataFrame.
So is Index and DatetimeIndex and PeriodIndex and ... (less so for
MultiIndex and I'm honestly not sure about IntervalIndex, but the
pattern/policy is clear). If pandas-provided EAs are first-class pandas
classes, they should behave in the same user-friendly way as everything
else.
I see this as the default, with deviations from it needing justification,
not the other way around.
In #23493 <#23493> Tom wrote:
I thought we were all on board with the goal of the
DatetimelikeArray.__init__ being no inference and no copy.
Is it safe to assume that this near-consensus applies to TimedeltaArray
and PeriodArray as well? How about other future pandas-provided EAs? For
the moment I'm going to assume that it applies to DTA/TDA/PA.
... why would this be the goal? You want a constructor that is
no-inference and no-copy? Congratulations, you've just invented
_simple_new. You want _from_sequence to be the all-purpose any-sequence
constructor? Everywhere else we call that __init__/__new__. This policy
makes private the most useful thing and public the least useful thing. It
makes zero sense to me.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#23212 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHInkwCnx3TXCg1WAvpRD5dY_FD-FQks5utMALgaJpZM4XlBHB>
.
|
Is there anything left to do/discuss here? |
Seems like this issue has evolved and encompassed a lot of topics while sufficiently stalling. I think it's best if separate issues were opened if specific followups need tackling. Closing. |
To split off the discussion on the constructors from #23185, to have a more focussed discussion about that here. Also going further on the discussion we were having in https://github.com/pandas-dev/pandas/pull/23140/files#r225218594
So topic of this issue: how should the different constructors look like for the internal EAs and the Index classes based on those EAs (specifically for the datetimelike ones).
Index constructors
I think the for the Index constructors, there is not that much discussion.
We have:
default
Index(..)
(__new__
or__init__
): this is quite overloaded for some of the index classes, but that's the way it is now since they are exposed to the user._simple_new
: I think we agree that for those (from Tom's comment here REF: Simplify Period/Datetime Array/Index constructors #23093 (review)), it should basically simply get the EA array and potentially a name:_shallow_copy
and_shallow_copy_with_infer
might need another look to propose something.Array constructors
The default Index constructors mix a lot of different things (which is what partly lead to the suite of other constructors), and I personally don't think this is something we necessarily need to repeat for the Array constructors.
In the discussion related to
Each Array type might have it specific constructors (like we have
IntervalArray.from_breaks
and others), but I think that in the discussion we were having in https://github.com/pandas-dev/pandas/pull/23140/files#r225218594, there are 3 clearly defined use case that are generic for the different dtypes. Constructing from:For this last item, we already
_from_sequence
for exactly this as part of the EA interface.So one option is simply accept all of those three things in the main Array
__init__
, another option is to have separate constructors for them. I think this is what the discussion is mainly about?I see the following advantages of keeping them separate (or at least keep the third item separate):
_from_sequence
,_from_factorized
, so we cannot use it anyway in places that need to deal with EAs in generalAlso note that this is basically what we have for the new IntegerArray. It's
__init__
only accepts a ndarray of integers + mask, and there is a separate functioninteger_array
that provides a more general purpose constructor (from list, from floats, detecting NaNs as missing values, etc ..), which is then used in_from_sequence
.cc @TomAugspurger @jreback @jbrockmendel
The text was updated successfully, but these errors were encountered: