Docs generated from javadoc get mangled when using html tags or taglets (like link, code) #21

simonbasle · 2022-08-31T13:36:02Z

Consider the following javadoc:

/**
 * This {@link DocumentedMeter} is about {@link MyClass my class}.
 * <p>
 * It uses a {@code status} tag.
 */

Which seems to produce the following kind of output:

This is about . It uses a tag.

Currently the tool uses Roaster's getJavadoc().getText() method, which strips the taglets and seems also to strip the html tags from the returned String.

Perhaps getJavadoc().getFullText() would be a better alternative, but then we have to consider HTML and taglets in the asciidoc output :/

Option 1: naive sanitization to HTML

One way of doing basic sanitization would be to:

use getFullText()
search for opening of taglets {@xxx and replace with an opening code tag <code>
search for closing taglet bracket } and replace by a closing code tag </code>
append that to the asciidoc as unprocessed HTML within a ++++ block

Drawback: this is incompatible with asciidoctor-pdf generation.

Option 2: naive sanitization to Asciidoc

This implies more effort to convert a basic set of common HTML tags to asciidoc. I'd consider p, br for a start.
For taglets, I'd consider @link and @code as the minimum viable set.

use getFullText()
convert <p> to double newline, remove </p>
convert <br>/<br/> to newline
convert {@code xxx} to `xxx`
convert {@link xxx} to `xxx` too (not super reliable especially with links + description, but eh)
append that naive asciidoc conversion to the result

The text was updated successfully, but these errors were encountered:

simonbasle · 2022-09-22T14:15:29Z

other html tags to consider: ul, ol, li (tricky, as li translation depends on the enclosing ul vs ol)

This commit attempts to fix the generated asciidoc for javadoc blocks that are not trivial: A custom parsing of Roaster `getJavadoc()` model is used instead of `getText()`/`getFullText()` to avoid mangling the original javadoc. We attempt to convert a simple subset of HTML tags to their asciidoc equivalents and to convert inline taglets to a relevant asciidoc representation if any. For HTML, we convert: - `p`, `br`, `b` and `i` to their direct equivalents - `strong` tags (including inline) to an `IMPORTANT:` admonition - `ul`/`ol` lists and their `li` elements in a best effort fashion In case an `ol` is detected we have to turn ALL `li` to asciidoc ordered list elements. All other unknown HTML tags are removed but their node content is kept. For taglets: - `@code` and `@value` taglets have their content turned into inline code blocks - `@link` and `@linkplain` taglets consider whether an alias text is provided. If so, only the alias text is provided in the asciidoc. If not, the target of the link is provided in the asciidoc as an inline code block. - unknown taglets are copied as an inline code block Fixes micrometer-metrics#21.

This commit attempts to fix the generated asciidoc for javadoc blocks that are not trivial: A custom parsing of Roaster `getJavadoc()` model is used instead of `getText()`/`getFullText()` to avoid mangling the original javadoc. We attempt to convert a simple subset of HTML tags to their asciidoc equivalents and to convert inline taglets to a relevant asciidoc representation if any. For HTML, we convert: - `p`, `br`, `b` and `i` to their direct equivalents - `strong` tags (including inline) to an `IMPORTANT:` admonition - `ul`/`ol` lists and their `li` elements in a best effort fashion In case an `ol` is detected we have to turn ALL `li` to asciidoc ordered list elements. All other unknown HTML tags are removed but their node content is kept. For taglets: - `@code` and `@value` taglets have their content turned into inline code blocks - `@link` and `@linkplain` taglets consider whether an alias text is provided. If so, only the alias text is provided in the asciidoc. If not, the target of the link is provided in the asciidoc as an inline code block. - unknown taglets are copied as an inline code block Additionally, in order to ensure these asciidoc javadoc conversions are correctly rendered in the output file, this commit polishes the syntax of quotes and tables: - the `____` block style for quote is used instead of a single `>` - column format instruction `[cols="a,a"]` is used for tables Fixes micrometer-metrics#21.

This commit attempts to fix the generated asciidoc for javadoc blocks that are not trivial: A custom parsing of Roaster `getJavadoc()` model is used instead of `getText()`/`getFullText()` to avoid mangling the original javadoc. We attempt to convert a simple subset of HTML tags to their asciidoc equivalents and to convert inline taglets to a relevant asciidoc representation if any. For HTML, we convert: - `p`, `br`, `b` and `i` to their direct equivalents - `strong` tags (including inline) to an `IMPORTANT:` admonition - `ul`/`ol` lists and their `li` elements in a best effort fashion In case an `ol` is detected we have to turn ALL `li` to asciidoc ordered list elements. All other unknown HTML tags are removed but their node content is kept. For taglets: - `@code` and `@value` taglets have their content turned into inline code blocks - `@link` and `@linkplain` taglets consider whether an alias text is provided. If so, only the alias text is provided in the asciidoc. If not, the target of the link is provided in the asciidoc as an inline code block. - unknown taglets are copied as an inline code block Additionally, in order to ensure these asciidoc javadoc conversions are correctly rendered in the output file, this commit polishes the syntax of quotes and tables: - the `____` block style for quote is used instead of a single `>` - column format instruction `[cols="a,a"]` is used for tables Fixes #21.

simonbasle mentioned this issue Sep 23, 2022

Improve conversion of javadoc with html/taglets to asciidoc #28

Merged

marcingrzejszczak closed this as completed in #28 Sep 26, 2022

marcingrzejszczak added this to the 1.0.0-RC1 milestone Sep 26, 2022

marcingrzejszczak added the enhancement New feature or request label Sep 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs generated from javadoc get mangled when using html tags or taglets (like link, code) #21

Docs generated from javadoc get mangled when using html tags or taglets (like link, code) #21

simonbasle commented Aug 31, 2022

simonbasle commented Sep 22, 2022

Docs generated from javadoc get mangled when using html tags or taglets (like link, code) #21

Docs generated from javadoc get mangled when using html tags or taglets (like link, code) #21

Comments

simonbasle commented Aug 31, 2022

Option 1: naive sanitization to HTML

Option 2: naive sanitization to Asciidoc

simonbasle commented Sep 22, 2022