-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Two possibilities for integrating Markdown into TSDoc #12
Comments
TypeDoc takes the second approach. There are some special case situations where parsing needs to be markdown aware (e.g. code blocks) but most of the parsing can be passed to a true markdown parser. When testing this out I noticed some bugs with how TypeDoc handles links and markdown. You can see how TypeDoc renders some of the examples above as well as how it handles the following links. /**
* TypeDoc handles a square bracket syntax to link to [[MyClass]]
* with [[MyClass|pipe labeled links]] and [[MyClass space labeled links]]
*
* TypeDoc handles basic links to {@link MyClass}
* but {@link MyClass | labeled links} are broken.
*
* Code fences can expose parsed links `{@link MyClass}`
*
* ```
* As are code blocks with {@link MyClass} text
* ```
*/
export class MyClass { For #2 and #3. TypeDoc is very forgiving. Generally it removes the first asterisk of a line. Whitespace is usually retained. If there is an empty (except the asterisk) line in between lines, a break is generated. |
Currently, JSDoc tags split into two categories: block tags and inline tags. However, there are only two inline tags ( I want to know is there any other inline tags being used today and what's the possibility that tsdoc will add more inline tags in the future? If both answers are no, can we just throw the whole "inline tags" concept away, and extend Markdown links to replace the two existing tags? By doing this, for question 1, we don't need to worry about the collision between inline JSDoc tags and inline Markdown syntaxes; for question 4, we will have a single (instead of two) but powerful syntax to express links, and we can just rely on the Markdown parser to parse comments. Then, the only remaining is the block tags. As the name suggests, I expect them to be the first non-whitespace token at their lines, to start a block. Anything between two tags (or the end of comment) belongs to the tag above them. This also answers #13. For question 2 and 3, personally I want to enforce well-formed comments (from the second line, every line starts with exactly one star and exactly one whitespace), instead of some random lines without any gains. |
@yume-chan It would be great to simplify the comment parsing. However, I think it may be surprising to JavaScript developers familiar with JSDoc to not support |
Getting back in the saddle, today we merged the PR that sets up the initial tsdoc parser library project. I'm starting with Option 2 and we'll see how that goes. I've been experimenting with different approaches for the tokenizer strategy and will follow up. |
The Markdown links don't provide a generalized pattern for parameterized tags. Also their link target is somewhat vague ("zero or more characters") which might make it difficult to detect the rather elaborate reference syntax that @MartynasZilinskas was working on. |
So, the big architecture of the parser would be like this:
By pre-TSDoc I mean a conservative subset of the minimal CommonMark-compatible constructs that the TSDoc stage needs to understand in order to avoid accidentally parsing something like these examples:
So the basic pre-TSDoc constructs would be:
These are questionable:
These I'm proposing to NOT consider in the TSDoc stage (but a documentation tool's backend Markdown render is free to process them):
Thoughts? Did we miss any important CommonMark constructs (that would trip up the TSDoc parser)? |
Important Update: Friends, I am losing my mind trying to pick out an internally-consistent subset of CommonMark syntax! Recall that the planned compromise was to allow ambiguity about how documentation gets rendered, while instead focusing on (1) being very rigorous about whether an Some sinister edge cases...According to the CommonMark spec, code spans can be split across multiple lines, but not if there's a list bullet in there. Example:
Blockquotes can stick ">" characters in the middle of any construct. Example:
Whitespace can really matter. Example:
ConcernsThese are just a few examples. There are endless weird edge cases like this. It raises a couple concerns:
I originally got involved with TSDoc because API Extractor was having problems where computer symbols were constantly getting rendered wrongly on the web site, due to lack of formalism about the doc comment syntax. I really want to make sure all our work here actually solves that problem. A new directionSo, I'm proposing a different approach to the Markdown integration:
Ideally, we would design it such that every interesting markup sequence has a normalized form that will be interpreted identically by TSDoc and CommonMark. And then in strict mode, the TSDoc library could report warnings for TSFM constructs that would be mishandled by CommonMark. (I need to do a little more research to confirm whether this is possible.) Feedback?Does this new direction make sense? Does anyone see a potential problem with it? The main consequence I'm aware of is more work for documentation engines that render markdown as an output, since they cannot naively pass through the TSDoc text. |
I opened a separate issue #29 to provide some concrete details about this "TSFM" idea. |
Given for a quicker first release of the effort is it necessary to parse markdown at all? At least w/ ESDoc and hence my significantly updated fork TJSDoc comment blocks are parsed such that all text above the first tag parsed from a new line is considered as the description / In most doc tooling pipelines it can be assumed that plugins are supported. Further processing of tags for markdown can be handled via a plugin just like the JSDoc markdown plugin The core value to me of TSDoc is defining a standard set of tags and providing a tag parser that treats a block of text only parsing tags and nothing else. At least if this was split out as a separate utility method / option to invoke then I'd be happy. If desired also offer the full AST breakdown of comments w/ interspersed tags and markdown / custom TSFM approach. Seems like a whole lot of work though out of the gate at least. TSFM may face the challenge of being too opinionated regarding wide adoption. |
I don't think this would be sufficient for my own use case. For example we need
I believe the latest design will handle both cases pretty easily. Currently we are planning "strict" and "lax" modes for the parser. Right now it's looking like the same parser algorithm will handle both modes -- the main difference will be that in "lax" a mode consumer would ignore the error nodes (and simply render them as plain text), and in "strict" mode there will be additional validations/checks that can produce errors/warnings that are ignored in "lax" mode. If your specific style of documentation can be seen as a subset of the full TSDoc grammar, then it could be modeled as an additional mode with slightly different validations. It will also be relatively easy to turn parser features on/off. So e.g. if someone wanted to say "I want backticks to always be treated as plain backticks" we can provide a switch turn off the code span parsing.
This week I actually made a bunch of progress on an algorithm that handles the ideas proposed in #29 . It's moving along very fast. (Having the dev design sorted out really makes a big difference heheh.) I expect to have something I can publish next week for people to provide feedback on. |
The most straightforward way for me to think about this is to treat The actual handling of inline content could be very similar to the way Are there cases which do not work with this in a straightforward manner? |
Easy peasy: This method is part of the [Statistics subsystem]({@link core-library#Statistics}). See #70 (comment) |
Problem Statement
There are numerous incompatible Markdown flavors. For this discussion, let's assume "Markdown" means strict CommonMark unless otherwise specified.
Many people expect to use Markdown notations inside their JSDoc. Writing a Markdown parser is already somewhat tricky, since the grammar is highly sensitive to context (compared to other rich-text formats such as HTML). Extending it with JSDoc tags causes some interesting collisions and ambiguities. Some motivating examples:
1. Code fences that span tags
Intuitively we'd expect it to be rendered like this:
I can use backticks to create a
code fence
that gets highlighted in a typewriter font.This
@tag
is not a TSDoc tag, since it's inside a code fence.This hyperlink to the
MyClass
base class should get highlighting in its target text.This example of backtick (`) has an unbalanced backtick (`) inside a tag.
2. Stars
Stars have the same problems as backticks, but with even more special cases:
Intuitively we'd expect it to be rendered like this:
Markdown would treat these as
Inside code comments, the left margin is sometimes ambiguous:
Markdown confusingly allows a * inside an emphasis.
Does a * tag participate in this?
3. Whitespace
Markdown assigns special meanings to whitespace indentation. For example, indenting 4 spaces is equivalent to a ``` block. Newlines also have lots of have special meanings.
This could be fairly confusing inside a code comment, particularly with weird cases like this:
Perhaps TSDoc should issue warnings about malformed comment framing.
Perhaps we should try to disable some of Markdown's indentation rules. For example, the TSDoc parser could trim whitespace from the start of each line.
4. Markdown Links
Markdown supports these constructs:
The Markdown link functionality partially overlaps with JSDoc's
{@link}
tag. But it's missing support for API item references.5. Markdown Tables
Markdown tables have a ton of limitations. Many constructs aren't supported inside table cells. You can't even put a newline inside a table cell. CommonMark had a long discussion about this, but so far does not support the pipes-and-dashes table syntax at all. Instead it uses HTML tables. This seems pretty wise.
6. HTML elements
Most Markdown flavors allow HTML mixed into your content. The CommonMark spec has an entire section about this. This is convenient, although HTML is an entire separate grammar with its own complexities. For example, HTML has a completely distinct escaping mechanism from Markdown.
Here's a few interesting cases to show some interactions:
Two Possible Solutions
Option 1: Extend an existing CommonMark library
The most natural approach would be for the TSDoc parser to include an integrated CommonMark parser. The two grammars would be mixed together. We definitely don't want to write a CommonMark parser from scratch, so instead the TSDoc library would need to extend an existing library. Markdown-it and Flavormark are possible choices that are both oriented towards custom extensions.
Possible downsides:
Option 2: Treat full Markdown as a postprocess
A possible shortcut would be to say that TSDoc operates as a first pass that snips out the structures we care about, and returns everything else as plain text. We don't want to get tripped up by backticks, so we make a small list of core constructs that can easily screw up parsing:
Anything else is treated as plain text for TSDoc, and gets passed through (to be possibly reinterpreted by another layer of the documentation pipeline).
Here's some pseudocode for a corresponding AST:
Possible downsides:
What do you think? Originally I was leaning towards #1 above, but now I'm wondering if #2 might be a better option.
The text was updated successfully, but these errors were encountered: