Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs generated from javadoc get mangled when using html tags or taglets (like link, code) #21

Closed
simonbasle opened this issue Aug 31, 2022 · 1 comment · Fixed by #28
Labels
enhancement New feature or request
Milestone

Comments

@simonbasle
Copy link
Contributor

Consider the following javadoc:

/**
 * This {@link DocumentedMeter} is about {@link MyClass my class}.
 * <p>
 * It uses a {@code status} tag.
 */

Which seems to produce the following kind of output:

This is about . It uses a tag.

Currently the tool uses Roaster's getJavadoc().getText() method, which strips the taglets and seems also to strip the html tags from the returned String.

Perhaps getJavadoc().getFullText() would be a better alternative, but then we have to consider HTML and taglets in the asciidoc output :/

Option 1: naive sanitization to HTML

One way of doing basic sanitization would be to:

  1. use getFullText()
  2. search for opening of taglets {@xxx and replace with an opening code tag <code>
  3. search for closing taglet bracket } and replace by a closing code tag </code>
  4. append that to the asciidoc as unprocessed HTML within a ++++ block

Drawback: this is incompatible with asciidoctor-pdf generation.

Option 2: naive sanitization to Asciidoc

This implies more effort to convert a basic set of common HTML tags to asciidoc. I'd consider p, br for a start.
For taglets, I'd consider @link and @code as the minimum viable set.

  1. use getFullText()
  2. convert <p> to double newline, remove </p>
  3. convert <br>/<br/> to newline
  4. convert {@code xxx} to `xxx`
  5. convert {@link xxx} to `xxx` too (not super reliable especially with links + description, but eh)
  6. append that naive asciidoc conversion to the result
@simonbasle
Copy link
Contributor Author

other html tags to consider: ul, ol, li (tricky, as li translation depends on the enclosing ul vs ol)

simonbasle added a commit to simonbasle/micrometer-docs-generator that referenced this issue Sep 23, 2022
This commit attempts to fix the generated asciidoc for javadoc blocks
that are not trivial:

A custom parsing of Roaster `getJavadoc()` model is used instead of
`getText()`/`getFullText()` to avoid mangling the original javadoc.
We attempt to convert a simple subset of HTML tags to their asciidoc
equivalents and to convert inline taglets to a relevant asciidoc
representation if any.

For HTML, we convert:
 - `p`, `br`, `b` and `i` to their direct equivalents
 - `strong` tags (including inline) to an `IMPORTANT:` admonition
 - `ul`/`ol` lists and their `li` elements in a best effort fashion

In case an `ol` is detected we have to turn ALL `li` to asciidoc
ordered list elements.

All other unknown HTML tags are removed but their node content is kept.

For taglets:
 - `@code` and `@value` taglets have their content turned into inline
  code blocks
 - `@link` and `@linkplain` taglets consider whether an alias text is
  provided. If so, only the alias text is provided in the asciidoc.
  If not, the target of the link is provided in the asciidoc as an
  inline code block.
 - unknown taglets are copied as an inline code block

Fixes micrometer-metrics#21.
simonbasle added a commit to simonbasle/micrometer-docs-generator that referenced this issue Sep 23, 2022
This commit attempts to fix the generated asciidoc for javadoc blocks
that are not trivial:

A custom parsing of Roaster `getJavadoc()` model is used instead of
`getText()`/`getFullText()` to avoid mangling the original javadoc.
We attempt to convert a simple subset of HTML tags to their asciidoc
equivalents and to convert inline taglets to a relevant asciidoc
representation if any.

For HTML, we convert:
 - `p`, `br`, `b` and `i` to their direct equivalents
 - `strong` tags (including inline) to an `IMPORTANT:` admonition
 - `ul`/`ol` lists and their `li` elements in a best effort fashion

In case an `ol` is detected we have to turn ALL `li` to asciidoc
ordered list elements.

All other unknown HTML tags are removed but their node content is kept.

For taglets:
 - `@code` and `@value` taglets have their content turned into inline
  code blocks
 - `@link` and `@linkplain` taglets consider whether an alias text is
  provided. If so, only the alias text is provided in the asciidoc.
  If not, the target of the link is provided in the asciidoc as an
  inline code block.
 - unknown taglets are copied as an inline code block

Additionally, in order to ensure these asciidoc javadoc conversions are
correctly rendered in the output file, this commit polishes the syntax
of quotes and tables:
 - the `____` block style for quote is used instead of a single `>`
 - column format instruction `[cols="a,a"]` is used for tables

Fixes micrometer-metrics#21.
marcingrzejszczak pushed a commit that referenced this issue Sep 26, 2022
This commit attempts to fix the generated asciidoc for javadoc blocks
that are not trivial:

A custom parsing of Roaster `getJavadoc()` model is used instead of
`getText()`/`getFullText()` to avoid mangling the original javadoc.
We attempt to convert a simple subset of HTML tags to their asciidoc
equivalents and to convert inline taglets to a relevant asciidoc
representation if any.

For HTML, we convert:
 - `p`, `br`, `b` and `i` to their direct equivalents
 - `strong` tags (including inline) to an `IMPORTANT:` admonition
 - `ul`/`ol` lists and their `li` elements in a best effort fashion

In case an `ol` is detected we have to turn ALL `li` to asciidoc
ordered list elements.

All other unknown HTML tags are removed but their node content is kept.

For taglets:
 - `@code` and `@value` taglets have their content turned into inline
  code blocks
 - `@link` and `@linkplain` taglets consider whether an alias text is
  provided. If so, only the alias text is provided in the asciidoc.
  If not, the target of the link is provided in the asciidoc as an
  inline code block.
 - unknown taglets are copied as an inline code block

Additionally, in order to ensure these asciidoc javadoc conversions are
correctly rendered in the output file, this commit polishes the syntax
of quotes and tables:
 - the `____` block style for quote is used instead of a single `>`
 - column format instruction `[cols="a,a"]` is used for tables

Fixes #21.
@marcingrzejszczak marcingrzejszczak added this to the 1.0.0-RC1 milestone Sep 26, 2022
@marcingrzejszczak marcingrzejszczak added the enhancement New feature or request label Sep 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants