Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(derive): add markdown parsing of doc comments in derive #4444

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

azdle
Copy link

@azdle azdle commented Nov 2, 2022

This is an early draft of a PR that will resolve #2389. I wanted to get some feedback on intent in a few places to make sure I'm doing the right thing.

This largely takes the tack that was laid out in #2389.

Markdown parsing is done within derive.

This was all but explicitly stated, and is, I believe, the right place to do the parsing.

Then a StyledStr is output into the user's expanded code.

StyledStr isn't explicitly mentioned, but it's what the generated function takes and what Colorizer uses internally, so I assume this is right.

Though, StyledStr doesn't have things like bold and italic itself directly and I'm not sure if it's the right place to add it. I know you also have anstyle, but I'm not sure that makes sense to use here since StyledStr seems to denote more semantic contexts. The biggest problem there is I'm not sure what to do when some text is bold and italic or worse a heading that is partly styled or any other combination of things since it only tracks a single enum variant for the style.

This PR also makes the basic styling functions on StyledStr pub so that they can be used from the derived code. I'm not sure if that's acceptable or not. We could hide them from the docs if you'd prefer they weren't fully public.

Markdown is parsed with pulldown-cmark

pulldown-cmark isn't perfectly suited to what I'm doing here, but I think it's the only real option that will work.

  • pulldown-cmark seems to assume you'd only ever want to output html, and omits telling us a lot of information that would be helpful for outputting back to psuedo markdown. Doesn't let me do everything I want, but is good enough.
  • comrak parses as github flavored markdown, but doc comments in rust are parsed by Rust as commonmark markdown. It's much heavier than pulldown-cmark. Though it does have a somewhat better API for what we're trying to accomplish. Didn't try it though, don't know what other problems we'd run into.
  • minimad doesn't parse blocks and only parses some inlines. To be able to use this we'd have the write the first half of a markdown parser anyway.
  • mini_markdown, as mentioned in your rosetta repo, doesn't have a stable API yet and from a cursory glance doesn't seem complete enough. May be useful in the future.
  • [a custom parser] I actually started on this idea before realizing just how complex of a thing that would end up being. Best not to make an already large project 10x larger.

Formatting

As far as I'm aware there's not canonical representation of formatted markdown in the terminal. As part of this PR I'm essentially inventing one. A lot of things can't be represented in any native way in terminal output and will "fallback to markdown" for which I just mean output text that looks like markdown. Markdown was based on how people were already writings things in plaintext files after all.

Most of these aren't done, I wanted to get feedback before I go about implementing some of the harder ones. There are some that require a lot of other probably wide-reaching changes, pretty much anything that requires nested styling.

Block Elements

Thematic breaks

AKA a horizontal rule:


Fallback to markdown, insert a --- line.

Could potentially write -s across the whole terminal/written area if I have that info available at the time. This would require a StyledStr with a new Style and no actual str.

Headings

Unsure. Probably set to StyledStr::heading(), and ignore level.

In the future could add levels to StyledStr::heading, not sure how headings other than the first level should even be styled though.

Could also explicitly disallow more than a certain number of heading levels.

Code Blocks

Fallback to markdown but always output in the style of an indented code block.

Could consider syntax highlighting in the future, that would require setting colors on segments within the codeblock.

HTML Blocks

Probably just drop them and all of their content. I can't think of anything else to do with them.

Paragraphs

Merge into a single line of text that is wrapped as appropriate during output. (Though see also Hard/Soft Line Break.)

Blank Lines

That is extra blank lines beyond those that determine block structure.

Ignored. I believe this is essentially what already happens to doc comments.

Quotes

Fallback to markdown, prefix lines of block with > .

This is the big problem child with styling. it probably needs to be a single StyledStr::quote(), but that then means that nothing inside it can be styled in any way. This would have to be done this way so that at output time the quoted things can wrap, but still be prepended. Having any internal styling would depend on a StyledStr that can contain other internal StyledStr.

Lists

Fall back to markdown, but with proper item numbering & unified bullets.

Output plain text markdown so that inner styling works. This might cause minor formatting problems with wrapping, I haven't looked yet at how exactly that works.

Inlines

Code Span

Make bold, this seems to be how "code" in manpages gets formatted for the terminal.

Emphasis and strong emphasis

Emphasis -> italic, Strong Emphasis -> Bold

(Note: don't currently have a way to represent both at the same time, currently will just choose the innermost one. This could be solved with a special case BoldItalic style if we wanted to fix it short term.)

strikethrough

Show striked.

(Though strikethrough isn't supported in termcolor, but there is an open PR for it. This would have to wait for that or a change over to anstyle. For now just fallback to markdown

Links

I don't have a good answer for this. Currently this just removed the link syntax and URL completely so that help text doesn't get too verbose.

Could possibly keep that for short help text & for manpages convert all links to ref-style links. Long help could go either way.

Images

Removed.

Could consider keeping the alt/title text.

Autolinks

Fallback to markdown. This is just a link in < > so some terminals will detect it as a link anyway.

HTML

Again, remove.

Hard Line Break

This is when you have
two spaces at the end of a line
which forces a line break.

Insert a \n into the text.

Soft Line Break

Join the lines with a space.

Text

Is text, show text.

Stubs

This code also contains (or will contain) some stubs for future expansion:

Manual Sections

Right now the doc comment is parsed into short_about and long_about implicitly based on the first blank line. This keeps that (assuming the first non-blank line is text and not something else), but also adds splitting out anything with a heading as manual content. (Though nothing is done with that content at the moment, I'm assuming it would make sense to make it available to clap_mangen in the future.)

Markdown Manuals

There's a few things that only make sense in this context. We want to render our manpages in our reference docs site. Eventually I want to be able to build a "markdown manpage" out of the doc comment markdown without round tripping through roff first.

Other Open Questions

Make This the Default or Require Opt-In

Right now I just have this basically hacked in to be the default formatting.

I think this should be safe to become the default. That's based on the assumption that anyone who was trying to do anything funky would be using verbatim_doc_comment.

The only possible exception would be cutting off the long help text at the first valid heading. I imagine it would be unlikely to cause problems, but there's definitely a possibility there. That might be something that would have to be opt-in, at least until there's another breaking release.

This is probably a question that can remain open until this to closer to done. I'll have a better idea of other problems by then too.


Phew, that was a long one.

tl;dr

  • Hooray markdown!
  • What's the future of StyledStr? Should it morph to support nested styles?
  • Any issues with the proposed formatting above?
  • Do you have a preference between getting some basic formatting in quickly or getting as close to handling all of the formatting that markdown can possibly represent properly handled before merging?

TODO

  • finish translation of all desired blocks and inlines
  • feature flag parsing & pulldown-cmark dependency
  • docs
  • [... lots of other things]

@azdle
Copy link
Author

azdle commented Nov 2, 2022

And after all that I forgot to include the example. The example from the included test:

Source

/// A remarkable tool
///
/// This tool is remarkable becasue its help is generated from **markdown**!
///
/// It has some _italic_ text.
///
/// This is a [link](https://example.com). (But not in the help output.)
///
/// This paragraph has `inline code`.
///
/// This is also a paragraph, but unlike the other paragraphs, this one is
/// spread across multiple lines. Or at least it is in the markdown.
#[derive(Parser, Debug)]
//#[command(verbatim_doc_comment)]
struct Remarkable {
    /// Markdown _is_ __also__ handled here.
    #[arg(long)]
    foo: bool,
}

Rendered

(Note: the underlining is being filtered out of the HTML by github here.)

A remarkable tool

This tool is remarkable because its help is generated from markdown!

It has some italic text.

This is a link. (But not in the help output.)

This paragraph has inline code.

This is also a paragraph, but unlike the other paragraphs, this one is spread across multiple lines. Or at least it is in the markdown.



Usage: clap [OPTIONS]

Options:
      --foo
          Markdown is also handled here.
          
          

  -h, --help
          Print help information (use `-h` for a summary)

Obviously it has some problems but it is help output styled with markdown. :D

@azdle azdle changed the title add markdown parsing of doc comments in derive feat(derive): add markdown parsing of doc comments in derive Nov 2, 2022
Copy link
Member

@epage epage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to work out more details on StyledStr before we can have conversations on what to do about this generally and about how to map markdown to the terminal

@@ -42,6 +42,7 @@ quote = "1.0.9"
proc-macro2 = "1.0.42"
heck = "0.4.0"
proc-macro-error = "1"
pulldown-cmark = "0.9"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minimad doesn't parse blocks and only parses some inlines. To be able to use this we'd have the write the first half of a markdown parser anyway.

I wonder how much of a problem that will actually be, especially compared to what we do now. The build times are significantly different than pulldown

https://github.com/rosetta-rs/md-rosetta-rs

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I get that. Unfortunately, the way that it accomplishes that is by not really being a markdown parser. It's a parser for something that sorta looks like a subset of markdown.

Honestly I think that not supporting markdown is a better option than supporting minimad's version of "markdown".


There's some things that look like they're just bugs that we could fix that I can only assume would be accepted, like:

Inlines aren't parsed in lists:

* some bullet item
* some _bullet_ item
* some bullet item
Normal(Composite { style: ListItem, compounds: ["some bullet item"] })
Normal(Composite { style: ListItem, compounds: ["some _bullet_ item"] })
Normal(Composite { style: ListItem, compounds: ["some bullet item"] })

There's some things that I'm not sure if they would consider them to be a bug, but we could technically workaround:

Each line of a paragraph is considered a paragraph.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Dictum at
tempor commodo ullamcorper a lacus vestibulum sed arcu.  
Normal(Composite { style: Paragraph, compounds: ["Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do"] })
Normal(Composite { style: Paragraph, compounds: ["eiusmod tempor incididunt ut labore et dolore magna aliqua. Dictum at"] })
Normal(Composite { style: Paragraph, compounds: ["tempor commodo ullamcorper a lacus vestibulum sed arcu."] })

Sub-lists (bonus points for parsing a single unconnected * as the start of an italic block)

* some bullet item
  * some bullet item
* some bullet item
Normal(Composite { style: ListItem, compounds: ["some bullet item"] })
Normal(Composite { style: Paragraph, compounds: ["  ", I" some bullet item"] })
Normal(Composite { style: ListItem, compounds: ["some bullet item"] })

Only one block level is supported:

> # foobar
Normal(Composite { style: Quote, compounds: ["# foobar"] })

There's way more, these were just the most obviously wrong of the few things I tried. Honestly I think it'd be easier and faster to write a new markdown parser than to pre-parse and re-parse around this crate to get reasonable output of any sort of basic markdown document. That would put a major dent in its lead in your stats too.


tl;dr: We should heed the warning on that crate that says:

If you're looking for a Markdown parser, this one is probably not the one you want

What you want to use it for isn't what it's meant to be used for. It's not a general purpose markdown parser. IMO, it's not even a "general purpose (when your purpose is outputting to the terminal) markdown parser". It's good for when you want some minimal style in terminal output of static, non-user-supplied, text.

@@ -28,38 +28,56 @@ impl StyledStr {
AnsiDisplay { styled: self }
}

pub(crate) fn header(&mut self, msg: impl Into<String>) {
/// Display string as header.
pub fn header(&mut self, msg: impl Into<String>) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StyledStr's details can't be made public until after

At that point, we'll have a better idea of what can and should be public

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Though, at least in my understanding of how StyledStr gets used, having a public interface would be a prerequsite for #1433 and #3108.

Would it be helpful to you for me to write up a separate issue that pulls out my random scattered comments about StyledStr above (and makes them a lot more cohesive)?

@epage
Copy link
Member

epage commented Nov 4, 2022

Though, at least in my understanding of how StyledStr gets used, having a public interface would be a prerequsite for #1433 and #3108.

Not with the current API for StyledStr

Would it be helpful to you for me to write up a separate issue that pulls out my random scattered comments about StyledStr above (and makes them a lot more cohesive)?

The doc comments? The ones I saw basically repeat the function name so not the highest priority /value

@azdle
Copy link
Author

azdle commented Nov 4, 2022

Not with the current API for StyledStr

I don't understand. StyledStr effectively doesn't have an API, everything besides new and ansi is private. How would a user build a StyledStr template or after_help?

Is there another way to create a StyledStr (or something that impls Into<StlyedStr>) without using using any of StyledStr's methods? If there is another way to add styling to the output text that's already possible, I'd be more than happy to change to that.

The doc comments? The ones I saw basically repeat the function name so not the highest priority /value

Not doc comments, comments as in "the things I commented on earlier in this conversation". I just meant, would if be helpful to pull out my thoughts on StyledStr from this PR's description and the limitations I found in the current API trying to implement markdown support?

@epage
Copy link
Member

epage commented Nov 4, 2022

I don't understand. StyledStr effectively doesn't have an API, everything besides new and ansi is private. How would a user build a StyledStr template or after_help?

My plan for the future API is nothing like the current internal API and the design of those features will help guide what that API looks like, so they will need to be developed in tandem.

Not doc comments, comments as in "the things I commented on earlier in this conversation". I just meant, would if be helpful to pull out my thoughts on StyledStr from this PR's description and the limitations I found in the current API trying to implement markdown support?

Yes, this is why I prefer more of those requirement / design discussions to happen in the original issue so they don't get lost in PRs that come and go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clap derive should parse markdown doc comment into normal text
2 participants