-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed latex output for multi-indexed dataframes - GH9778 #9908
Conversation
lev3 = [blank] * clevels | ||
for level_idx, group in itertools.groupby( | ||
self.frame.index.labels[i]): | ||
count = sum(1 for _ in group) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've originally had this as:
count = len(list(group))
but was wondering if using sum()
makes a bit more sense from a performance perspective.
On the other hand, there's probably a more elegant way to fix the count mismatch without using itertools.groupby()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance shouldn't really be a concern here... nobody is going to output latex for tables much larger than can fit on a single page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it then make more sense to switch it back to how it was? (I guess the first form is a bit more readable/intuitive).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't really matter to me either way :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it back to the original form (which seems unexpectedly faster from a simple test).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, you shouldn't trust benchmarks that show list expressions are faster than generators... allocating memory in repeatedly in a benchmark is faster than it is in real use. But again, this is not performance limited code.
This looks great to me. Can you please:
|
@shoyer: Thanks a lot for your feedback. I've just added a note to the next release, and squashed all the commits. Do let me know if there's anything else that should be added/updated. |
This looks to good to go, once the tests on Travis pass. I'll try to check later, but please feel free to ping me if you notice that's happened. |
@shoyer: I'm not sure how Travis is set up on this repo, but tests are passing on my fork. |
Do we handle if there are Thanks for fixing this! |
@hayd: That's a good point. It currently doesn't print out index names. Would that be expected with this fix or should I add it in a separate PR? |
At this point probably better to make a separate PR. But I do recall being able to export index names.... On Wed, Apr 15, 2015 at 6:11 PM, Yasin A. notifications@github.com
|
@shoyer: it seems that regression happened a little earlier as well. I'll be updating the code for that (separately), and potentially cleaning up |
@yred Fantastic, thanks! |
Fixed latex output for multi-indexed dataframes - GH9778
thanks @yred ! |
Proposed fix for #9778
The formatting issue was caused by an incorrect number of elements in the (first) index columns of
strcols
. The length of reinserted columns was based on the number of elements per index level, but should have relied on the number of rows/occurrences of such elements.