Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow to flexibly change the display width limit #2403

Merged
merged 7 commits into from
Sep 9, 2020

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Aug 31, 2020

Fixes #2389

I made it a bit different than Markdown rendering, but I think we do not have to be super cautious here as this is just handling corner cases of display.

Also fixes the problem of overwide first column (not as nicely as PrettyTabes.jl but again - this is a rare corner case now as we are trimming at 32 screen width anyway)

CC @nalimilan @ronisbr

@bkamins bkamins added non-breaking The proposed change is not breaking display labels Aug 31, 2020
@bkamins bkamins added this to the 1.0 milestone Aug 31, 2020
@bkamins bkamins linked an issue Aug 31, 2020 that may be closed by this pull request
@bkamins
Copy link
Member Author

bkamins commented Sep 1, 2020

In the process I have noticed that we did an incorrect escaping of CSV and TSV output. I have fixed it. One debatable thing is what we should use in CSV/TSV for non-standard types (like Symbol, Char - in the tests, or e.g. Date) - should we use print or repr to show them. I opted for repr now (as it kind-of retains type information), but maybe print would be preferred. Comments are welcome here.

@bkamins
Copy link
Member Author

bkamins commented Sep 1, 2020

@nalimilan - tests will fail now. I will clean them when we decide if we want to use repr or print in CSV/TSV output.

A a comment - I thought of repr as print is used in CSV.jl, so here we could provide a more low-level output. Probably this CSV/TSV feature is not used much (if at all - given it had escaping bug no one reported) so we could make it have a different behavior than CSV.jl.

@nalimilan
Copy link
Member

I'd prefer using print, as it gives something which is more likely to be usable and parseable. Even reading back a date into Julia will work better using print.

Regarding truncation, I've just checked, and dplyr only truncates when the output would be wider than the screen. Maybe we should do that too? One use case I can imagine is a column with short texts and one or two columns with variables describing them.

NEWS.md Outdated
@@ -67,3 +67,6 @@
([#2315](https://github.com/JuliaData/DataFrames.jl/pull/2315))
* add rich display support for Markdown cell entries in HTML and LaTeX
([#2346](https://github.com/JuliaData/DataFrames.jl/pull/2346))
* limit display width in `text/plain` to approximately 32 screen characters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really approximate? AFAICT, you stop at 32 and take the previous character, so it's always the maximum width before going beyond 32, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because some characters have more than 1 text width (in which case we can get a bit different result)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but since you return the last character before going beyond 32, the result will always be less than 32, right? I think it would be worth mentioning in the docstring rather than just saying it's approximate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The precise statement is that the result without has maximal text width less or equal than 32 display characters, but how would you phrase it so that this would be readable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like this? Anyway that's internal, what matters is that it's correct even if it's not super easy to read. :-)

`truncstring` indicates the maximal width the output can use before being truncated
(in the `textwidth` sense, excluding `…`).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - I have proposed something along these lines 😄

src/abstractdataframe/show.jl Outdated Show resolved Hide resolved
@bkamins
Copy link
Member Author

bkamins commented Sep 1, 2020

I'd prefer using print

OK - I switch to print.

Maybe we should do that too?

If you try to do this it quickly starts to be very messy, it was tried in #2087.
Actually even in your scenario I would prefer truncating (32 characters usually will be enough to get an understanding what is inside a string)

Also soon we will have a PrettyTables.jl backend to choose from that does what you propose (but it has a different design in general) - so someone will be able to go for that backend if needed.

@nalimilan
Copy link
Member

If you try to do this it quickly starts to be very messy, it was tried in #2087.
Actually even in your scenario I would prefer truncating (32 characters usually will be enough to get an understanding what is inside a string)

Also soon we will have a PrettyTables.jl backend to choose from that does what you propose (but it has a different design in general) - so someone will be able to go for that backend if needed.

In the use case I have in mind, you have unique texts that you may want to read, so just having an idea of the contents isn't enough. But yeah it's difficult to handle this well in all cases. It's nice if PrettyTables aims to fill this gap.

@bkamins
Copy link
Member Author

bkamins commented Sep 1, 2020

I am changing the implementation to add truncate keyword argument that allows to specify number of characters at which the truncation happens, as this was needed anyway. Then also your use case will be handled

@bkamins
Copy link
Member Author

bkamins commented Sep 1, 2020

I had to add truncate keyword argument for generality (as things are unfortunately problematic in corner cases). I have decided to never truncate: 1) row column name, 2) column name, 3) column eltype if shown.

@nalimilan
Copy link
Member

We already abbreviate eltypes when they are too long, right?

@bkamins
Copy link
Member Author

bkamins commented Sep 1, 2020

We already abbreviate eltypes when they are too long, right?

Yes, but now truncate can be less than we abbreviate (a corner case, but still).

@bkamins bkamins changed the title make 32 display width limit approximately allow to flexibly change the display width limit Sep 1, 2020
@bkamins
Copy link
Member Author

bkamins commented Sep 4, 2020

@nalimilan - this should be good to have a look at. Thank you!

@nalimilan
Copy link
Member

Yes, but now truncate can be less than we abbreviate (a corner case, but still).

Can you give an example? I don't get it. :-)

@bkamins
Copy link
Member Author

bkamins commented Sep 8, 2020

Can you give an example? I don't get it. :-)

I explicitly test this case here:

https://github.com/JuliaData/DataFrames.jl/pull/2403/files#diff-be8288df383517cc1dabe74772b7535aR723

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good with the small suggestion for the docstring.

src/abstractdataframe/show.jl Outdated Show resolved Hide resolved
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
@bkamins
Copy link
Member Author

bkamins commented Sep 8, 2020

Thank you for fixing the docstring.

@bkamins bkamins merged commit f5b531f into JuliaData:master Sep 9, 2020
@bkamins bkamins deleted the show_32_textwidth_limit branch September 9, 2020 06:54
@bkamins
Copy link
Member Author

bkamins commented Sep 9, 2020

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
display non-breaking The proposed change is not breaking
Projects
None yet
2 participants