-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: df.rolling(..).corr()/cov() when pairwise=True to return MI DataFrame #15677
Conversation
a259eb5
to
db9f2c0
Compare
Codecov Report
@@ Coverage Diff @@
## master #15677 +/- ##
==========================================
- Coverage 90.97% 90.96% -0.01%
==========================================
Files 145 145
Lines 49483 49487 +4
==========================================
+ Hits 45015 45018 +3
- Misses 4468 4469 +1
Continue to review full report at Codecov.
|
@chrisaycock thoughts? |
I didn't understand the "pairwise=True" part of the PR title, but the actual documentation is pretty straightforward. You are just returning a MI DataFrame by transposing the Panel result under-the-hood. That all makes sense. |
da60531
to
0ee6303
Compare
any comments? |
I find it at first sight a bit strange to get a frame with multi-indexed columns instead of a frame with multi-indexed index. (never use this, so I don't speak out of experience) |
whats your example? |
So example from the whatsnew:
I expected something like:
But as I said, don't have experience with what would be the most practical to work with further. |
ok same as my example. The reason for this returning this way is that the index is the same as the original. Otherwise its actually a transpose. I guess this is kind of arbitrary. But IMHO this makes more sense.
|
ahh, you want this?
|
yes, that is what I had in mind (the whatsnew example is maybe a bit confusing with its integer column names) |
ok, let me see what I can do. |
updated. and here's the new example
|
I think maybe we should completely zonk the index level names. I am not sure what to do with them w/o making it look weird. The problem is that the 2nd level AND the columns are named the same which, when you print it is odd. Could name the 1st level though. |
Yes, the 'major' and 'minor' do not necessarily make sense anymore, as this is rather Panel-specific terminology. @chrisaycock Do you think this shape makes sense? (compared to the initial proposal in the PR?)
@jreback We could easily add a keyword for switching this behaviour, and raise a deprecation warning on the default value, indicating they can change behaviour + suppress warning by specifying the keyword. |
The |
so here is an easy thing to do. I have annotated names to make this explict
I would do this (but maybe zonk the column name). This can actually be confusing if you do the typical cross-corr.
|
ok latest push gives [10]. |
In the [10] above, shouldn't the index level names 'bar' and 'foo' not be switched ? |
@jorisvandenbossche this is on latest. I had them switched (in error) before when I did that example.
|
closing in favor of #15601 (which incorporates these commits) |
@jreback In your Out[3], the columns loose its name. Keeping the name would duplicate the 'bar' in this case, but I think that is OK (loosing its name seems worse?) |
The problem is it will always duplicate if you have the same frame (e.g. in I am explicity setting this to None. I'll push a change (on the deprecate PR now) with this soon. |
Yes, you will always get a double name in that case, but I don't think that is that worse (the actual column labels are also actually duplicated, so that seems only consistent). It would also be consistent with the non-rolling corr:
|
closes #13563 on top of #15677 Author: Jeff Reback <jeff@reback.net> Closes #15601 from jreback/panel and squashes the following commits: 04104a7 [Jeff Reback] fine grained catching warnings in tests f8800dc [Jeff Reback] add numpy reference for searchsorted fa136dd [Jeff Reback] doc correction c39453a [Jeff Reback] add perf optimization in searchsorted for FrozenNDArray 0e9c4a4 [Jeff Reback] fix docs as per review & column name changes 3df0abe [Jeff Reback] remove Panel from doc-strings, catch internal warning on Panel construction 755606d [Jeff Reback] more docs d04db2e [Jeff Reback] add deprecate_panel section to docs 538b8e8 [Jeff Reback] pep fix 912d523 [Jeff Reback] TST: separate out test_append_to_multiple_dropna to two tests; when drop=False this is sometimes failing a2625ba [Jeff Reback] remove most Term references in test_pytables.py cd5b6b8 [Jeff Reback] DEPR: Panel deprecated 6b20ddc [Jeff Reback] fix names on return structure f41d3df [Jeff Reback] API: df.rolling(..).corr()/cov() when pairwise=True to return MI DataFrame 84e788b [Jeff Reback] BUG/PERF: handle a slice correctly in get_level_indexer
Apologies if this is not the right place to ask a question about the above change. Pairwise rolling correlation used to give a Panel. Asking for the shape gives three numbers. But with the above change, it returns a 2-dimensional DataFrame. My question is what is the easiest way to update legacy code? For e.g., panel_corr.ix[:, 0, 0] and panel_corr.ix[i, :, :]. What do they look like in the dataframe language? I could try coming up with some arithmetic to somehow translate from a 3-dimensional panel to a 2-dimensional dataframe. But that seems rather inefficient and error-prone. Can I turn the new multi-index dataframe output into a 3-dimensional object so that the legacy codes would work? |
you should read the whatsnew note and section on how to index |
from #15601 (comment).
Unfortunately I don't see an easy way to even deprecate this and we simply have to switch. Good news is this will simply fail fast in accessing, as the
Panels
have a different access pattern (names of indices and indexing) that MI DataFrames (and another reason to remove them :>).