Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PeriodIndex.ravel returns ndarray[int64] #19956

Closed
jbrockmendel opened this issue Mar 1, 2018 · 24 comments · Fixed by #36900
Closed

PeriodIndex.ravel returns ndarray[int64] #19956

jbrockmendel opened this issue Mar 1, 2018 · 24 comments · Fixed by #36900
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Index Related to the Index class or subclasses Period Period data type

Comments

@jbrockmendel
Copy link
Member

This came up while trying to de-duplicate some DataFrame vs Series arithmetic logic in core.ops.

@gfyoung
Copy link
Member

gfyoung commented Mar 2, 2018

That seems consistent with the docstring?

@gfyoung gfyoung added Indexing Related to indexing on series/frames, not to indexes themselves Period Period data type labels Mar 2, 2018
@jbrockmendel
Copy link
Member Author

Sure, but it's not exactly useful behavior.

@gfyoung
Copy link
Member

gfyoung commented Mar 2, 2018

Why not?

@jbrockmendel
Copy link
Member Author

jbrockmendel commented Mar 3, 2018

The relevant context is the ongoing effort to ensure that Index/Series/DataFrame operations are consistent with one another. The Index vs Series part is most of the way there, and I'm just starting in on Series vs DataFrame.

In core.ops, Series and DataFrame arithmetic operations have really similar error catching logic:

Series:

        except TypeError:
            if isinstance(y, (np.ndarray, ABCSeries, pd.Index)):
                dtype = find_common_type([x.dtype, y.dtype])
                result = np.empty(x.size, dtype=dtype)
                mask = notna(x) & notna(y)
                result[mask] = op(x[mask], com._values_from_object(y[mask]))
            else:
                assert isinstance(x, np.ndarray)
                result = np.empty(len(x), dtype=x.dtype)
                mask = notna(x)
                result[mask] = op(x[mask], y)

DataFrame:

        except TypeError:
            xrav = x.ravel()
            if isinstance(y, (np.ndarray, ABCSeries)):
                dtype = np.find_common_type([x.dtype, y.dtype], [])
                result = np.empty(x.size, dtype=dtype)
                yrav = y.ravel()
                mask = notna(xrav) & notna(yrav)
                xrav = xrav[mask]

                if yrav.shape != mask.shape:
                    # FIXME: GH#5284, GH#5035, GH#19448
                    # Without specifically raising here we get mismatched
                    # errors in Py3 (TypeError) vs Py2 (ValueError)
                    raise ValueError('Cannot broadcast operands together.')

                yrav = yrav[mask]
                if xrav.size:
                    with np.errstate(all='ignore'):
                        result[mask] = op(xrav, yrav)

            elif isinstance(x, np.ndarray):
                # mask is only meaningful for x
                result = np.empty(x.size, dtype=x.dtype)
                mask = notna(xrav)
                xrav = xrav[mask]
                if xrav.size:
                    with np.errstate(all='ignore'):
                        result[mask] = op(xrav, y)

Very nearly the only differences here are a) find_common_type vs np.find_common_type and a ravel(). if it were the case that ravel() were reliably a no-op for the 1-D case in the Series operation, that would go a long way towards letting us de-duplicate some code.

@gfyoung
Copy link
Member

gfyoung commented Mar 3, 2018

Ah, interesting...got it. Well, this is an API change, so I'll open it up to @jreback @jorisvandenbossche . Is there any reason why we always returned ndarray ?

@jbrockmendel
Copy link
Member Author

@TomAugspurger possibly change this behavior in a follow up to PeriodArray?

@TomAugspurger
Copy link
Contributor

Is the expected behavior an ndarray[object] with period object?

This could be done any time by defining PeriodIndex.ravel as self.values.ravel() instead of inheriting the base implementation.

@jbrockmendel
Copy link
Member Author

For PeriodArray I’d expect it to return a 1-dim-reshaped view on itself. So a no-op for EA since those are 1-dim to begin with.

@TomAugspurger
Copy link
Contributor

That'll need a larger discussion then, since the other EA-backed indexes all return ndarrays for ravel.

@jorisvandenbossche
Copy link
Member

Also for Series, ravel returns an ndarray.

I think there is also something to be said for ravel always to be an ndarray.
Or to simply deprecate the method, since if you want a ravelled ndarray, you can also do the much more explicit np.asarray(val).ravel().

The question is: what do people use it for? It might be that it is used in cases where afterwards the code expects to now have a numpy array.

@jbrockmendel
Copy link
Member Author

The question is: what do people use it for?

The place that motivated this issue is in core.ops where we currently call ravel() on most things but specifically avoid calling it on PeriodIndex. It's a pretty ugly special case.

@jorisvandenbossche
Copy link
Member

And what's the reason we call it there? What do we want to end up with there?

@jbrockmendel
Copy link
Member Author

And what's the reason we call it there? What do we want to end up with there?

In that case we are calling right.ravel() because right is an array-like and we want to work with a flattened version of that array.

@jorisvandenbossche
Copy link
Member

And what do you mean with 'array-like' in this case? I assume we know what types it can be?
(The code you posted above is a bit out of context, and the original links you have there don't point anymore to the good location)

@jbrockmendel
Copy link
Member Author

And what do you mean with 'array-like' in this case?

https://github.com/pandas-dev/pandas/blob/master/pandas/core/ops.py#L842

isinstance(y, (np.ndarray, ABCSeries, ABCIndexClass)) (presumably we'll let through PeriodArray etc before long)

        # PeriodIndex.ravel() returns int64 dtype, so we have
        # to work around that case.  See GH#19956
        yrav = y if is_period_dtype(y) else y.ravel()
        mask = notna(xrav) & notna(yrav)

@jorisvandenbossche
Copy link
Member

But if the object is array, Series or Index, why do we want to ravel? Because all those are already 1D? (or the array can be 2D?)

@TomAugspurger
Copy link
Contributor

I think just checking if y.ndim > 1: yrav = y.ravel() is the way to go. Not so elegant, but it avoids the discussion of what "ravel" means for EA-backed Series, and inconsistencies between ndarray and ea-backed series.

@jorisvandenbossche
Copy link
Member

Anyway, mainly what I am trying to ask about: in this specific case that motivated the proposal, there might be a very easy way to also avoid duplicated code, without needing ravel ?
We can simply check the dimension for example?

@jorisvandenbossche
Copy link
Member

Well, so exactly what @TomAugspurger said :-)

@jbrockmendel
Copy link
Member Author

But if the object is array, Series or Index, why do we want to ravel? Because all those are already 1D? (or the array can be 2D?)

The array can be 2D.

I think just checking if y.ndim > 1: yrav = y.ravel() is the way to go.

This is fine. The extant code also works fine.

but it avoids the discussion of what "ravel" means for EA-backed Series

That is pretty much what the Issue is about. We can certainly stick a pin in this, but long-term the method's behavior should match the standard usage of the term "ravel"

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Oct 26, 2018 via email

@toobaz toobaz added Index Related to the Index class or subclasses and removed Indexing Related to indexing on series/frames, not to indexes themselves labels Jun 29, 2019
@jreback jreback added this to the 1.1 milestone Mar 14, 2020
@jorisvandenbossche jorisvandenbossche added the ExtensionArray Extending pandas with custom dtypes or arrays. label Mar 30, 2020
@jorisvandenbossche jorisvandenbossche removed this from the 1.1 milestone Mar 30, 2020
@mroeschke mroeschke added the Bug label May 5, 2020
@jreback jreback added this to the 1.2 milestone Oct 6, 2020
@jorisvandenbossche jorisvandenbossche added API Design Blocker Blocking issue or pull request for an upcoming release and removed Bug labels Oct 12, 2020
@jreback jreback modified the milestones: 1.2, Contributions Welcome Nov 19, 2020
@mroeschke mroeschke removed the Blocker Blocking issue or pull request for an upcoming release label Aug 15, 2021
@jbrockmendel
Copy link
Member Author

This deprecation was done in 1.2; closing.

@jorisvandenbossche
Copy link
Member

But did we ever follow-up on the discussion in #36900? The deprecation warning still says that in the future it will return a view on self. Which still creates an inconsistency with Series.

@jbrockmendel
Copy link
Member Author

I see the appeal of being consistent with Series, but dont like the idea of materializing RangeIndex/MultiIndex.

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Index Related to the Index class or subclasses Period Period data type
Projects
None yet
7 participants