-
Notifications
You must be signed in to change notification settings - Fork 275
Docx Renderer Extension
flexmark-java Docx-Renderer extension
Renders the parsed Markdown AST to docx format using the docx4j library.
See the DocxConverterCommonMark Sample for code and Customizing Docx Rendering for an overview and information on customizing the styles.
Pegdown version can be found in DocxConverterPegdown Sample
EmojiExtension.USE_SHORTCUT_TYPE
to EmojiShortcutType.GITHUB
or
EmojiShortcutType.ANY_GITHUB_PREFERRED
which causes GitHub provided images to be used.
Renders AST generated by flexmark-java parser. No special syntax is implemented by this extension.
-
.className
on paragraph elements will set the docx styleId toclassName
if the style id is found. This allows using specific style ids to change formatting for paragraphs, special classespagebreak
andtab
are excluded. - page break via
{.pagebreak}
attributes - tab via
{.tab}
attributes - Use
{style=""}
to set attributes on text or block elements. Only the following are processed:-
color
- text color -
background-color
- shade fill color, pattern always solid. -
font-family
- not implemented -
font-size
- in pt, rounded to nearest 1/2 pt. Unitspt
is optional. -
font-weight
- set/clear bold (if using numeric weights then >= 550 sets bold, less clears it) -
font-style
- set/clear italic
-
- inline image alignment with
{align=}
:-
left
- left align, wrap text to right -
right
- right align, wrap text to left -
center
- center align, wrap text to left and right - else no wrapping around image, image inserted into text
-
artifact: flexmark-docx-converter
The following options are available:
Defined in DocxRenderer
class:
-
CODE_HIGHLIGHT_SHADING
default""
, when non-empty will use this color as a highlight, also overridesNO_CHARACTER_STYLES
to true, see NOTE on Highlight Colors colors. -
CUSTOM_PROPERTIES
defaultCollections.emptyMap()
, set toMap<String, String>
containing map of property name to property value for custom properties to be set in document. reference. Needed in some cases for post processing. -
DEFAULT_LINK_RESOLVER
defaulttrue
, use default link resolver, which uses theDOC_RELATIVE_URL
andDOC_ROOT_URL
options -
DEFAULT_TEMPLATE_RESOURCE
default"/empty.xml"
, default template resource path -
DOC_EMOJI_IMAGE_VERT_OFFSET
default-0.10
, vertical offset of emoji image as a factor of line height at point of insertion. The final value is rounded to nearest pt so jumps of 1 pt for small changes of this value can occur. -
DOC_EMOJI_IMAGE_VERT_SIZE
default1.05
, size of emoji image as a factor of line height at point of insertion. -
DOC_RELATIVE_URL
default""
, the prefix to use for all relative URLs: not starting with protocol or/
-
DOC_ROOT_URL
default""
, the prefix to use for all absolute URLs: ones starting with/
-
ERROR_SOURCE_FILE
default""
, name of source file to use in error logs -
ERRORS_TO_STDERR
defaultfalse
, log errors to stdout -
FORM_CONTROLS
default""
, set to name of form control reference to generate form controls with name given by this key[name]{.type attributes}
-
LINEBREAK_ON_INLINE_HTML_BR
defaulttrue
, convert inline HTML<br>
to line break in the docx -
LOCAL_HYPERLINK_MISSING_FORMAT
default"Missing target id: #%s"
, when non-empty uses String.format() on the given string with the missing ref anchor as the argument to generate a tooltip for unresolved hyperlinks -
LOCAL_HYPERLINK_MISSING_HIGHLIGHT
default"red"
, when non-empty will highlight unresolved hyperlinks local to the document with this color. see NOTE on Highlight Colors colors. -
LOCAL_HYPERLINK_SUFFIX
default""
, appends this suffix to in document hyperlink anchor reference. Needed in some cases for post processing. -
LOG_IMAGE_PROCESSING
defaultfalse
, log image processing errors -
MAX_IMAGE_WIDTH
default0
, max image width, 0 no max -
NO_CHARACTER_STYLES
defaultfalse
, when true will not set character style but explicitly set the run values from the style -
NUMBERING_XML
defaultgetResourceString("/numbering.xml")
, default numbering section if missing in wordprocessing package -
PREFIX_WWW_LINKS
defaulttrue
, controls whether links starting withwww.
will be prefixed withhttps://
-
RENDER_BODY_ONLY
defaultfalse
, when rendering to string will only output the body of the document part. Used for tests. -
STYLES_XML
defaultgetResourceString("/styles.xml")
, default styles section if missing in wordprocessing package -
TABLE_CAPTION_BEFORE_TABLE
defaultfalse
, insert caption before table -
TABLE_CAPTION_TO_PARAGRAPH
defaulttrue
, convert table captions to paragraphs, styled withTableCaption
style id -
TABLE_LEFT_INDENT
default120
, table left indent in twips -
TABLE_PREFERRED_WIDTH_PCT
default0
, preferred table width -
TABLE_STYLE
default""
, table font style -
TOC_GENERATE
defaultfalse
, whether to generate TOC, even if no TOC Markdown element is present in the file -
TOC_INSTRUCTION
default"TOC \\o \"1-3\" \\h \\z \\u "
, defines the instruction string used for the TOC element
Docx format requires a named color. Any color provided that does not match a named color will be converted to the closest named color.
When CODE_HIGHLIGHT_SHADING
is set to "shade"
then will use the closest named color taken
from the SourceText
shade fill color if available.
Element styles:
-
ASIDE_BLOCK_STYLE
default"AsideBlock"
, style to use for aside blocks -
BLOCK_QUOTE_STYLE
default"Quotations"
, style to use for block quotes -
BOLD_STYLE
default"StrongEmphasis"
, style to use for the markdown element -
BULLET_LIST_STYLE
default"BulletList"
, numbering list style to use for bullet list item paragraph -
DEFAULT_STYLE
default"Normal"
, style to use for the markdown element -
ENDNOTE_ANCHOR_STYLE
default"EndnoteReference"
, style to use for the markdown element -
FOOTER
default"Footer"
, style to use for the markdown element -
FOOTNOTE_ANCHOR_STYLE
default"FootnoteReference"
, style to use for the markdown element -
FOOTNOTE_STYLE
default"Footnote"
, style to use for footnote text -
FOOTNOTE_TEXT
default"FootnoteText"
, style to use for the markdown element -
HEADER
default"Header"
, style to use for the markdown element -
HEADING_1
default"Heading1"
, style to use for the markdown element -
HEADING_2
default"Heading2"
, style to use for the markdown element -
HEADING_3
default"Heading3"
, style to use for the markdown element -
HEADING_4
default"Heading4"
, style to use for the markdown element -
HEADING_5
default"Heading5"
, style to use for the markdown element -
HEADING_6
default"Heading6"
, style to use for the markdown element -
HORIZONTAL_LINE_STYLE
default"HorizontalLine"
, style to use for thematic breaks -
HYPERLINK_STYLE
default"Hyperlink"
, style to use for the markdown element -
INLINE_CODE_STYLE
default"SourceText"
, style to use for the markdown element -
INS_STYLE
default"Underlined"
, style to use for the markdown element -
ITALIC_STYLE
default"Emphasis"
, style to use for the markdown element -
LOOSE_PARAGRAPH_STYLE
default"ParagraphTextBody"
, style to use for loose list type items -
NUMBERED_LIST_STYLE
default"NumberedList"
, numbering list style to use for numbered list item paragraph -
PARAGRAPH_BULLET_LIST_STYLE
default"ListBullet"
, style to use for tight list type items -
PARAGRAPH_NUMBERED_LIST_STYLE
default"ListNumber"
, style to use for tight list type items -
PREFORMATTED_TEXT_STYLE
default"PreformattedText"
, style to use for fenced code and indented code -
STRIKE_THROUGH_STYLE
default"Strikethrough"
, style to use for the markdown element -
SUBSCRIPT_STYLE
default"Subscript"
, style to use for the markdown element -
SUPERSCRIPT_STYLE
default"Superscript"
, style to use for the markdown element -
TABLE_CAPTION
default"TableCaption"
, style to use for table captions -
TABLE_CONTENTS
default"TableContents"
, style to use for table bodies -
TABLE_GRID
default"TableGrid"
, style to use for the markdown element -
TABLE_HEADING
default"TableHeading"
, style to use for table headings -
TIGHT_PARAGRAPH_STYLE
default"BodyText"
, style to use for tight list type items
List Element Styles
Unordered lists use numbering list style named BulletList
while ordered lists use
NumberedList
. If these are not present then default numbering style (id = 2) is used for
unordered lists and default numbering style (id = 3) is used for ordered lists.
The following are equivalent to Renderer
properties of the same name. Included in
DocxRenderer
for convenience.
For the TOC_INSTRUCTION
string see
Docx4j GettingStarted under the
heading TOC Content Control
NOTE: Word does not handle inserted HTML very well. Any HTML not suppressed will be escaped: ie.
it will render into the document as text. The exception is for the <br>
tag which if enabled
will be rendered as a line break.
Html rendering options available in DocxRenderer
for convenience:
-
ESCAPE_HTML_BLOCKS
default value ofESCAPE_HTML
, escape html blocks found in the document -
ESCAPE_HTML_COMMENT_BLOCKS
default value ofESCAPE_HTML_BLOCKS
, escape html comment blocks found in the document. -
ESCAPE_HTML
defaultfalse
, escape all html found in the document -
ESCAPE_INLINE_HTML_COMMENTS
default value ofESCAPE_HTML_BLOCKS
, escape inline html found in the document -
ESCAPE_INLINE_HTML
default value ofESCAPE_HTML
, escape inline html found in the document -
PERCENT_ENCODE_URLS
defaultfalse
, percent encode urls -
RECHECK_UNDEFINED_REFERENCES
defaultfalse
, Recheck the existence of refences inParser.REFERENCES
for link and image refs marked undefined. Used when new references are added after parsing -
SUPPRESS_HTML_BLOCKS
default value ofSUPPRESS_HTML
, suppress html output for html blocks -
SUPPRESS_HTML_COMMENT_BLOCKS
default value ofSUPPRESS_HTML_BLOCKS
, suppress html output for html comment blocks -
SUPPRESS_HTML
defaultfalse
, suppress html output for all html -
SUPPRESS_INLINE_HTML_COMMENTS
default value ofSUPPRESS_INLINE_HTML
, suppress html output for inline html comments -
SUPPRESS_INLINE_HTML
default value ofSUPPRESS_HTML
, suppress html output for inline html -
HEADER_ID_GENERATOR_NO_DUPED_DASHES
defaultfalse
, Whentrue
duplicate-
in id will be replaced by a single-
-
HEADER_ID_GENERATOR_RESOLVE_DUPES
defaulttrue
, Whentrue
will add an incrementing integer to duplicate ids to make them unique -
HEADER_ID_GENERATOR_TO_DASH_CHARS
default"_"
, set of characters to convert to-
in text used to generate id, non-alpha numeric chars not in set will be removed -
HEADER_ID_GENERATOR_NON_ASCII_TO_LOWERCASE
, defaulttrue
. When set tofalse
changes the default header id generator to not convert non-ascii alphabetic characters to lowercase. Needed forGitHub
id compatibility. -
HEADER_ID_REF_TEXT_TRIM_LEADING_SPACES
, defaulttrue
. When set tofalse
then leading spaces in link reference text in heading is not trimmed for text used to generate id. -
HEADER_ID_REF_TEXT_TRIM_TRAILING_SPACES
, defaulttrue
. When set tofalse
then trailing spaces in link reference text in heading is not trimmed for text used to generate id. -
HEADER_ID_ADD_EMOJI_SHORTCUT
, defaultfalse
. When set totrue
, emoji shortcut nodes add the shortcut to collected text used to generate heading id. -
HEADER_ID_GENERATOR_TO_DASH_CHARS
default"_"
, set of characters to convert to-
in text used to generate id, non-alpha numeric chars not in set will be removed -
RENDER_HEADER_ID
defaultfalse
, Render a header id attribute for headers using the configuredHtmlIdGenerator