Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed latex output for multi-indexed dataframes - GH9778 #9908

Merged
merged 1 commit into from
Apr 16, 2015

Conversation

yred
Copy link
Contributor

@yred yred commented Apr 15, 2015

Proposed fix for #9778

The formatting issue was caused by an incorrect number of elements in the (first) index columns of strcols. The length of reinserted columns was based on the number of elements per index level, but should have relied on the number of rows/occurrences of such elements.

lev3 = [blank] * clevels
for level_idx, group in itertools.groupby(
self.frame.index.labels[i]):
count = sum(1 for _ in group)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've originally had this as:

count = len(list(group))

but was wondering if using sum() makes a bit more sense from a performance perspective.

On the other hand, there's probably a more elegant way to fix the count mismatch without using itertools.groupby().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance shouldn't really be a concern here... nobody is going to output latex for tables much larger than can fit on a single page.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it then make more sense to switch it back to how it was? (I guess the first form is a bit more readable/intuitive).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't really matter to me either way :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it back to the original form (which seems unexpectedly faster from a simple test).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, you shouldn't trust benchmarks that show list expressions are faster than generators... allocating memory in repeatedly in a benchmark is faster than it is in real use. But again, this is not performance limited code.

@shoyer
Copy link
Member

shoyer commented Apr 15, 2015

This looks great to me. Can you please:

  1. add a note to what's new
  2. squash your changes into one commit (we prefer not to have commits for which Travis tests fail merged into master, because it can complicate things like git bisect)

@yred
Copy link
Contributor Author

yred commented Apr 15, 2015

@shoyer: Thanks a lot for your feedback.

I've just added a note to the next release, and squashed all the commits. Do let me know if there's anything else that should be added/updated.

@shoyer
Copy link
Member

shoyer commented Apr 15, 2015

This looks to good to go, once the tests on Travis pass. I'll try to check later, but please feel free to ping me if you notice that's happened.

@yred
Copy link
Contributor Author

yred commented Apr 16, 2015

@shoyer: I'm not sure how Travis is set up on this repo, but tests are passing on my fork.

@hayd
Copy link
Contributor

hayd commented Apr 16, 2015

Do we handle if there are df.index.names ?

Thanks for fixing this!

@yred
Copy link
Contributor Author

yred commented Apr 16, 2015

@hayd: That's a good point.

It currently doesn't print out index names. Would that be expected with this fix or should I add it in a separate PR?

@shoyer
Copy link
Member

shoyer commented Apr 16, 2015

At this point probably better to make a separate PR. But I do recall being able to export index names....

On Wed, Apr 15, 2015 at 6:11 PM, Yasin A. notifications@github.com
wrote:

@hayd: That's a good point.

It currently doesn't print out index names. Would that be expected with this fix or should I add it in a separate PR?

Reply to this email directly or view it on GitHub:
#9908 (comment)

@yred
Copy link
Contributor Author

yred commented Apr 16, 2015

@shoyer: it seems that regression happened a little earlier as well.

I'll be updating the code for that (separately), and potentially cleaning up to_latex() so that things are a bit clearer/cleaner.

@hayd
Copy link
Contributor

hayd commented Apr 16, 2015

@yred Fantastic, thanks!

shoyer added a commit that referenced this pull request Apr 16, 2015
Fixed latex output for multi-indexed dataframes - GH9778
@shoyer shoyer merged commit 161f38d into pandas-dev:master Apr 16, 2015
@shoyer
Copy link
Member

shoyer commented Apr 16, 2015

thanks @yred !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MultiIndex Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants