Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stackoverflow while rendering markdown from HTML #626

Open
chibenwa opened this issue Aug 26, 2024 · 0 comments
Open

Stackoverflow while rendering markdown from HTML #626

chibenwa opened this issue Aug 26, 2024 · 0 comments

Comments

@chibenwa
Copy link

Describe the bug

I was considering using Flexmark as a HTML => text/plain engine for Apache James

(We currently rely on an homegrown Jsoup based parser)

I did throw our test suite at flexmark-html2md-converter and we encountered a Strackoverflow error with the following test:

    @Test
    public void deeplyNestedHtmlShouldNotThrowStackOverflow() {
        final int count = 2048;
        String html = Strings.repeat("<div>", count) +  "<p>para1</p><p>para2</p>" + Strings.repeat("</div>", count);
        String expectedPlainText = "para1\n\npara2\n\n";
        
        assertThat(FlexmarkHtmlConverter.builder().build().convert(html))
            .isEqualTo(expectedPlainText);
    }
java.lang.StackOverflowError
	at java.base/java.util.Vector.addElement(Vector.java:616)
	at java.base/java.util.Stack.push(Stack.java:68)
	at com.vladsch.flexmark.html2md.converter.FlexmarkHtmlConverter$MainHtmlConverter.pushState(FlexmarkHtmlConverter.java:1151)
	at com.vladsch.flexmark.html2md.converter.FlexmarkHtmlConverter.processHtmlTree(FlexmarkHtmlConverter.java:1696)
	at com.vladsch.flexmark.html2md.converter.FlexmarkHtmlConverter$MainHtmlConverter.renderChildren(FlexmarkHtmlConverter.java:1146)
	at com.vladsch.flexmark.html2md.converter.internal.HtmlConverterCoreNodeRenderer.processDiv(HtmlConverterCoreNodeRenderer.java:537)
	at com.vladsch.flexmark.html2md.converter.HtmlNodeRendererHandler.render(HtmlNodeRendererHandler.java:19)
	at com.vladsch.flexmark.html2md.converter.FlexmarkHtmlConverter$MainHtmlConverter.renderNode(FlexmarkHtmlConverter.java:1135)
	at com.vladsch.flexmark.html2md.converter.FlexmarkHtmlConverter$MainHtmlConverter.render(FlexmarkHtmlConverter.java:1050)
	at com.vladsch.flexmark.html2md.converter.FlexmarkHtmlConverter.processHtmlTree(FlexmarkHtmlConverter.java:1707)
	at com.vladsch.flexmark.html2md.converter.FlexmarkHtmlConverter$MainHtmlConverter.renderChildren(FlexmarkHtmlConverter.java:1146)
	at com.vladsch.flexmark.html2md.converter.internal.HtmlConverterCoreNodeRenderer.processDiv(HtmlConverterCoreNodeRenderer.java:537)
	at com.vladsch.flexmark.html2md.converter.HtmlNodeRendererHandler.render(HtmlNodeRendererHandler.java:19)

The rendrering is tree based, and rendering it uses the Java stack with recursion.

This feels familiar as we have had a similar issue with our homegrown jsoup-based parser that relied on similar mechanisms. We did overcome this limitation by replacing recursion with stacks (in and out) and loops.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant