Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for rich display of Markdown cells #2346

Closed
wants to merge 40 commits into from

Conversation

NicholasWMRitchie
Copy link
Contributor

@NicholasWMRitchie NicholasWMRitchie commented Aug 3, 2020

With a fairly small change to the HTML, LaTeX and text display code, it is possible to add rich text output support to cells within a DataFrame. Markdown strings are readily constructed using the default Markdown package and the @md_str macro. It already represents a generalized format definition language that most Julia users are familiar with.

It is already possible to place Markdown.MD cells in DataFrames. However, the display of these cells doesn't represent the richness of which Markdown is capable. With markdown, it is possible to output bold, italicized, headers, links and even images into DataFrame cell outputs.

For me personally, the biggest win is the ability to display functional URLs in a table. This is particularly useful when DataFrames are used in report-type documents like generated by Weave or other literate programming packages. I don't think there is another way to produce functional URL links in DataFrame outputs and this is often useful to detail the provenance of the data in a particular row.

The one shortcoming I've run into has been displaying LaTeX encoded in the markdown string in HTML. While the LaTeX displays correctly outside the table, it does not seem to display correctly in the HTML output table. It seems fine in the LaTeX table.

@bkamins
Copy link
Member

bkamins commented Aug 3, 2020

Thank you for this PR. It seems OK - I will have to experiment with it before commenting.

Also since in the long term we hope to change the default backend to PrettyTables.jl - @ronisbr can you please comment how you see this change from the perspective of that package?

@ronisbr
Copy link
Member

ronisbr commented Aug 3, 2020

@bkamins IIUC, the idea is to support markdown notation inside the cells. In this case, this support must be added inside PrettyTables.jl. Currently, we can add all kind of decorations to the cell, but not add formats to individual words. The biggest problem (in Text backend) is to check the printed size of the string, but it should be possible using Markdown.

@NicholasWMRitchie
Copy link
Contributor Author

NicholasWMRitchie commented Aug 3, 2020

@ronisbr I've submitted a pull to PrettyTables with similar code. It really is a lot of bang for the buck - particularly for URLs and images.

src/abstractdataframe/io.jl Outdated Show resolved Hide resolved
@bkamins
Copy link
Member

bkamins commented Aug 3, 2020

Looks good to me apart from the comment I have left and of course please make sure that CI passes before I do a final review.

it does not seem to display correctly in the HTML output table

It works for me. Are you sure you are using Markdown style to denote LaTeX (i.e. double backquote)?

@NicholasWMRitchie
Copy link
Contributor Author

I think that the code is ready for review. I've resolved all the Travis errors in my new test cases.

I will note that Markdown => latex is the weakest. There are markdown items like headers that don't translate in a well in a LaTeX tabular environment. If we want to fix this, we'd remap the #, ##, ### etc to font sizes rather then \section{} etc but this would involve reimplemening the markdown => latex code custom for this edge case. It could be done but...

@bkamins
Copy link
Member

bkamins commented Aug 5, 2020

It could be done but...

I would leave it as is

src/abstractdataframe/io.jl Outdated Show resolved Hide resolved
Copy link
Member

@bkamins bkamins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from a small comment looks good. Thank you!

Co-authored-by: Bogumił Kamiński <bkamins@sgh.waw.pl>
@bkamins
Copy link
Member

bkamins commented Aug 10, 2020

@nalimilan - OK to merge?

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, sounds nice! Just a few comments.

@@ -417,7 +423,12 @@ function printtable(io::IO,
elseif isnothing(df[i, j])
print(io, nothingstring)
else
if ! (etypes[j] <: Real)
if df[i,j] isa Markdown.MD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since accessing df[i, j] is relatively slow, better store it in cell = df[i, j] and use that afterwards as in the HTML method (not sure why we didn't do this here).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see it, nor the other changes you mention below. :-)

if ! (etypes[j] <: Real)
if df[i,j] isa Markdown.MD
print(io, quotemark)
r=repr(df[i,j])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
r=repr(df[i,j])
r = repr(df[i,j])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

function ourshow(io::IO, x::Markdown.MD)
r = repr(x)
len = min(length(r, 1, something(findfirst(==('\n'), r), lastindex(r)+1)-1), 32)
return print(io, len < length(r) - 1 ? first(r, len)*"…" : first(r, len))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using chars should give a slightly more efficient code:

Suggested change
return print(io, len < length(r) - 1 ? first(r, len)*"" : first(r, len))
return print(io, len < length(r) - 1 ? first(r, len)*'' : first(r, len))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

test/io.jl Outdated
\t4 & 4 & \\textbackslash{}\\textbackslash{}alpha & S & 3.0 & d & \\emph{\\#undef} \\\\
\\end{tabular}
"""
str =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you keep the triple quoted form? It's more readable. Same below.

@@ -54,6 +54,11 @@ ourshow(io::IO, x::Symbol) = ourshow(io, string(x))
ourshow(io::IO, x::Nothing; styled::Bool=false) = ourshow(io, "", styled=styled)
ourshow(io::IO, x::SHOW_TABULAR_TYPES; styled::Bool=false) =
ourshow(io, summary(x), styled=styled)
function ourshow(io::IO, x::Markdown.MD)
r = repr(x)
len = min(length(r, 1, something(findfirst(==('\n'), r), lastindex(r)+1)-1), 32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't abbreviate in other formats, so why do it for Markdown? Regarding newlines, I would just print them as \n just like we do for strings.

We need a general mechanism to abbreviate too long cells, but this should be applied systematically disregarding the column type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep it here for the time being as md" is likely to be overlong. Then maybe just add TODO to update it when we have a general mechanism.

In general maybe we will switch to PrettyTables.jl which actually handles this already.

These changes will be possible to implement after 1.0 release if needed, as display changes are not considered to be breaking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's do that. Though do you think it would make sense to apply this 32-char limit to other types as well? I guess it can be wasteful when there are only a few columns and that more space is available, but it's hard to handle. How does PrettyTables handle it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But how wide terminal displays do you expect people have?
I have 210 characters width in my terminal, but having a column wider than 32 chars makes it very hard to follow what is in the column anyway. So I would say the rule - no column wider than 32 characters makes sense. If someone wants to check such a wide column in detail in REPL probably extracting it to a vector and investigating is the way to go anyway.

PrettyTables.jl behaves like this:

julia> df = DataFrame(a="a"^20, b="b"^20)
1×2 DataFrame. Omitted printing of 1 columns
│ Row │ a                    │
│     │ String               │
├─────┼──────────────────────┤
│ 1   │ aaaaaaaaaaaaaaaaaaaa │

julia> pretty_table(df)
┌──────────────────────┬─────────────────── ⋯
│                    a │                    ⋯
│               String │               Stri ⋯
├──────────────────────┼─────────────────── ⋯
│ aaaaaaaaaaaaaaaaaaaa │ bbbbbbbbbbbbbbbbbb ⋯
└──────────────────────┴─────────────────── ⋯

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, currently we can set the size of a column or its minimum size. However, I think it will be nice to have an option to set the maximum size also. In this case, we can set it to 32. Thus, columns that can be displayed with less characters will be, but columns larger than this will be cropped just like this:

julia> pretty_table(df, alignment = :l, columns_width = [10,10])
┌────────────┬────────────┐
│ a          │ b          │
│ String     │ String     │
├────────────┼────────────┤
│ aaaaaaaaa │ bbbbbbbbb │
└────────────┴────────────┘

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! It is in master:

julia> df = DataFrame(a="a"^5, b="b"^10, c="c"^15, d = "d"^20);

julia> pretty_table(df, alignment = :l, maximum_columns_width = 10)
┌────────┬────────────┬────────────┬────────────┐
│ a      │ b          │ c          │ d          │
│ String │ String     │ String     │ String     │
├────────┼────────────┼────────────┼────────────┤
│ aaaaa  │ bbbbbbbbbb │ ccccccccc │ ddddddddd │
└────────┴────────────┴────────────┴────────────┘

Off-topic: I think I just need to add 2 more features before starting to testing good configurations to print DataFrames!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, then maybe we should set a limit to 32 characters for all types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NicholasWMRitchie - would you be willing to make this change everywhere in this PR or we leave it for a follow up PR (I can do it)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what's the conclusion on this?

test/io.jl Outdated
]
)
@test sprint(show, "text/plain", df) == """
8×2 DataFrame
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add 8-characters indentation. Same below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one still applies.

@bkamins
Copy link
Member

bkamins commented Aug 13, 2020

@NicholasWMRitchie - following our discussion I have written down https://bkamins.github.io/julialang/2020/08/13/strings.html. I hope you will find it useful.

@bkamins
Copy link
Member

bkamins commented Aug 20, 2020

@NicholasWMRitchie - I would like to finalize this PR for 0.22 release if you have time to work on it. Thank you!

@NicholasWMRitchie
Copy link
Contributor Author

@bkamins I've been on vacation for 2 weeks but I'll get back to this ASAP.
The String write-up is really clear and helpful. I'll be forwarding the link to a colleague of mine who had similar questions.

@bkamins
Copy link
Member

bkamins commented Aug 24, 2020

Thank you!. In the meantime we have merge conflicts, but they should be simple to resolve 😄.

@bkamins
Copy link
Member

bkamins commented Aug 24, 2020

I have pushed merge conflicts resolution to your branch (also a small suggestion - it is safer not to make a from a master branch of your fork but rather a separate branch).

@bkamins
Copy link
Member

bkamins commented Aug 28, 2020

@nalimilan - I would merge this PR and leave a decision to limit printing to 32 everywhere to another PR, I have opened an issue #2389.
@NicholasWMRitchie - can you only merge this PR with master and add an entry to NEWS.md in "Other relevant changes" please before merging?

NEWS.md Outdated
@@ -55,3 +55,5 @@

* Documentation is now available also in *Dark* mode
([#2315](https://github.com/JuliaData/DataFrames.jl/pull/2315))
* add rich display support for Markdown cell entries in HTML, Markdown and LaTeX
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean here by ", Markdown"? (we do not test shipping out data frame in Markdown anywhere - right?)

@bkamins
Copy link
Member

bkamins commented Aug 28, 2020

unfortunately you have performed a merge in a way that confused git (now some of the changes shown are not your changes + there are some duplicates). Can you please try fixing it?

If you do not have much experience with git the easiest is to:

  • rebase against master
  • squash all your commits into one commit
  • make sure all is clean (as now it is not unfortunately) - this will probably require another commit
  • force push the branch

@bkamins
Copy link
Member

bkamins commented Aug 28, 2020

You are still mixing-in changes from other PRs. I do not want to merge it as I am not 100% sure if it would not confuse git (since it seems it confuses it now as these changes should not be shown).

@NicholasWMRitchie
Copy link
Contributor Author

Yes, I'm struggling a bit with git. I fear that I'm only making things worse. I suspect that my mental model of what git is doing is flawed.

While I can rebase my master against your master, I can't squash the result so...
I have created a new branch (MarkdownBranch) on my repo with only my changes
I've rebased the MarkdownBranch against your master
This one commit seems to contain only the changes that I made relative to your master.
I'm currently rerunning the test suite against this on 1.5.1 and 1.0.5 to make sure I haven't broken anything.

However, this isn't on the branch that I used to submit the pull request.

@bkamins
Copy link
Member

bkamins commented Aug 28, 2020

Thank you for caring so much. git is sometimes confusing. When tests pass can you then just close this PR and open a new PR with that fresh branch as this will be easiest for you? Thank you!

@NicholasWMRitchie
Copy link
Contributor Author

Closing and will reopen as a new PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants