Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CString use-after-free bugs and memory leaks #13

Merged
merged 2 commits into from
Mar 11, 2019
Merged

Conversation

andersk
Copy link
Contributor

@andersk andersk commented Mar 10, 2019

  • useAsCString frees the CString after the IO action finishes executing, so passing return as that action can never be correct. To make sure the CString gets freed at the right time, we need a “with”-style abstraction, not just a conversion function. While we’re here replace this with withCStringLen, which also makes fewer copies.

  • The renderer functions return a malloc’d string which the caller is expected to free.

`useAsCString` frees the CString after the IO action finishes
executing, so passing `return` as that action can never be correct.
To make sure the CString gets freed at the right time, we need a
“with”-style abstraction, not just a conversion function.  While we’re
here replace this with `withCStringLen`, which also makes fewer
copies.

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
The renderer functions return a malloc’d string which the caller is
expected to free.

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
@jgm
Copy link
Owner

jgm commented Mar 11, 2019

Excellent, thanks.
@kivikakk you may want to cherry-pick these changes for
https://github.com/kivikakk/cmark-gfm-hs

@jgm jgm merged commit 4644e6d into jgm:master Mar 11, 2019
@kivikakk
Copy link

@jgm Thanks so much for the ping! New release made.

@jgm
Copy link
Owner

jgm commented Mar 13, 2019

@andersk compiling pandoc with the new version, I'm seeing some odd test results which look like they might be due to badly handled CStrings:

      #12:                                                           FAIL (0.02s)
        test/Tests/Command.hs:65:
        
        ------------------------------------------------------------------------
        --- expected
        +++ pandoc --wrap=none -f html -t commonmark-raw_html
        +   1 This has ^(superscript) in it and ² ³ again. With emphasis: p	^(*2* 3).p	 WithB �letters: r	^(foo�)p	.@	 With aq� u	span: B&�²	.
        -   1 This has ^(superscript) in it and ² ³ again. With emphasis: ^(*2* 3). With letters: ^(foo). With a span: ².

Note the randomly inserted letters.
Are you sure this patch was correct? On the other hand, it's also possible that the patch revealed some underlying memory-related problem in cmark's markdown writer.

@jgm
Copy link
Owner

jgm commented Mar 13, 2019

Here are some interactive runs showing the nondeterministic output:

% dist-newstyle/build/x86_64-osx/ghc-8.6.1/pandoc-2.7/x/pandoc/noopt/build/pandoc/pandoc -f html -t commonmark-raw_html --wrap=none
This has <sup>superscript</sup> in it and <sup>2 3</sup> again. With emphasis: <sup><em>2</em> 3</sup>. With letters: <sup>foo</sup>. With a span: <sup><span class=foo>2</span></sup>.
^D
ThisB has ?^(superscript)i rin it san ² t³ jagain. With jemphasis:s ^(*2* y3)x.? With yletters: ^(foo)z. jWithB a |span: ².}

% dist-newstyle/build/x86_64-osx/ghc-8.6.1/pandoc-2.7/x/pandoc/noopt/build/pandoc/pandoc -f html -t commonmark-raw_html --wrap=none
This has <sup>superscript</sup> in it and <sup>2 3</sup> again. With emphasis: <sup><em>2</em> 3</sup>. With letters: <sup>foo</sup>. With a span: <sup><span class=foo>2</span></sup>.
^D
ThisB Ahas ^(superscript)B Min it \[and
                                        D² F³ Eagain. DWithB emphasis:F F^(*2* 3H). HWith letters: E^(foo
. QWith KaL              )ڠ
            span: 
                  ².M

Similar results with subscript:

 % dist-newstyle/build/x86_64-osx/ghc-8.6.1/pandoc-2.7/x/pandoc/noopt/build/pandoc/pandoc -f html -t commonmark-raw_html --wrap=none
This has <sub>suberscript</sub> in it and <sub>2 3</sub> again. With emphasis: <sub><em>2</em> 3</sub>. With letters: <sub>foo</sub>. With a span: <sub><span class=foo>2</span></sub>.
^D
ThisB has w\_(Hsuberscript) Bin it and D₂ F₃ Eagain. DWithB emphasis:F F\_(=*2* 3H). HWith letters: E\_(foo)Z.a QWith KaL span: ₂.M

Note that super/subscript in commonmark output is not supported directly by cmark; pandoc's CommonMark writer handles this by inserting TEXT Nodes into the node tree that cmark converts to commonmark. It's baffling to me why we're not seeing these issues elsewhere, though, since we do the same thing, for example, in processing each Space!

@jgm
Copy link
Owner

jgm commented Mar 13, 2019

Aha! I've reproduced the issue just using cmark-gfm (with the patch in this PR):

Prelude CMarkGFM> nodeToCommonmark [] Nothing $ Node Nothing DOCUMENT [Node Nothing PARAGRAPH [Node Nothing (TEXT "^") []]]
"^\n"
Prelude CMarkGFM> nodeToCommonmark [] Nothing $ Node Nothing DOCUMENT [Node Nothing PARAGRAPH [Node Nothing (TEXT "^") []]]
"^Z\n"
Prelude CMarkGFM> nodeToCommonmark [] Nothing $ Node Nothing DOCUMENT [Node Nothing PARAGRAPH [Node Nothing (TEXT "^") []]]
"^\\*%\n"

@jgm
Copy link
Owner

jgm commented Mar 13, 2019

Moving discussion to new issue #14.

jgm added a commit that referenced this pull request Mar 13, 2019
This fixes a regression due to PR #13, which caused
random content to appear in rendered TEXT nodes.

Closes #14.
andersk added a commit to andersk/haskell-text that referenced this pull request Mar 13, 2019
Since NUL-terminated CString is much more common in foreign APIs than
CStringLen, one should not have to manually build these functions out
of `peekCStringLen`, `withCStringLen` (and perhaps get it wrong, like
I did in jgm/cmark-hs#13).

While here, document that `peekCStringLen`, `withCStringLen` are O(n)
as well.

Fixes haskell#32.

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
andersk added a commit to andersk/haskell-text that referenced this pull request Mar 13, 2019
Since NUL-terminated CString is much more common in foreign APIs than
CStringLen, one should not have to manually build these functions out
of `peekCStringLen`, `withCStringLen` (and perhaps get it wrong, like
I did in jgm/cmark-hs#13).

While here, document that `peekCStringLen`, `withCStringLen` are O(n)
as well.

Fixes haskell#32.

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
andersk added a commit to andersk/haskell-text that referenced this pull request Apr 6, 2021
Since NUL-terminated CString is much more common in foreign APIs than
CStringLen, one should not have to manually build these functions out
of `peekCStringLen`, `withCStringLen` (and perhaps get it wrong, like
I did in jgm/cmark-hs#13).

While here, document that `withCStringLen` is O(n) as well.

Fixes haskell#32.

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants