🧪 Fix fuzzing test failures #254

chrisjsewell · 2023-03-14T03:37:12Z

From https://github.com/google/oss-fuzz/tree/master/projects/markdown-it-py,
fixes issues 55363 and 55367

From https://github.com/google/oss-fuzz/tree/master/projects/markdown-it-py

tests/test_fuzzer.py

codecov · 2023-03-14T03:40:21Z

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.01 🎉

Comparison is base (34876b1) 96.05% compared to head (8f1aaf8) 96.07%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #254      +/-   ##
==========================================
+ Coverage   96.05%   96.07%   +0.01%     
==========================================
  Files          62       62              
  Lines        3223     3235      +12     
==========================================
+ Hits         3096     3108      +12     
  Misses        127      127

Flag	Coverage Δ
pytests	`96.07% <100.00%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
markdown_it/rules_block/blockquote.py	`100.00% <100.00%> (ø)`
markdown_it/rules_block/fence.py	`100.00% <100.00%> (ø)`
markdown_it/rules_block/hr.py	`100.00% <100.00%> (ø)`
markdown_it/rules_block/list.py	`98.88% <100.00%> (+0.01%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

DavidKorczynski · 2023-03-14T16:34:58Z

Thanks @chrisjsewell !

Would you be interested in moving the fuzzers from https://github.com/google/oss-fuzz/tree/master/projects/markdown-it-py upstream to here? Would be happy to create a set up that is easy for you to manage and also e.g. extend.

In the event you're curious, you can find a Fuzz Introspector report for markdown-it-py in this table: https://oss-fuzz-introspector.storage.googleapis.com/index.html where the current report (march 14th) links to: https://storage.googleapis.com/oss-fuzz-introspector/markdown-it-py/inspector-report/20230314/fuzz_report.html Here, there is yet another link to the "Project coverage" (the yellow button in top left corner) which shows code coverage achieved of the existing fuzzers: https://storage.googleapis.com/oss-fuzz-coverage/markdown-it-py/reports/20230314/linux/index.html -- 80% atm

chrisjsewell · 2023-03-14T16:44:33Z

Hey @DavidKorczynski 👋🏼

Would you be interested in moving the fuzzers from https://github.com/google/oss-fuzz/tree/master/projects/markdown-it-py upstream to here? Would be happy to create a set up that is easy for you to manage and also e.g. extend.

Yep that would be great thanks 🙏.
As it mentions in the docstring, I had a look, but didn't see any easy way to integrate it with pytest.
How long does the fuzzing take to run? I tried locally but it seemed to be running for a very long time. Obviously I don't want to have to wait ages every time I want to check the tests. In that case I would have them separate to the current unit tests.

DavidKorczynski · 2023-03-14T17:13:59Z

The fuzzers are meant to be run for a long period of time since the longer they run the more code they explore. There is currently a start-up overhead due to the use of atheris.instrument_all which can take a while (~45 sec?)

There are probably a few different ways to integrate -- what do you think of the following?

import sys
import atheris
from markdown_it import MarkdownIt

@pytest.mark.parametrize(
    "data",
    [
        b"\x42\65\xff.... (fuzzertest 1 case, check for regressions)",
        b"fuzzertest 2 case, check for regressions"
    ]
)
def TestOneInput(data):
  fdp = atheris.FuzzedDataProvider(data)
  md = MarkdownIt()
  raw_markdown = fdp.ConsumeUnicodeNoSurrogates(sys.maxsize)
  md.parse(raw_markdown)
  md.render(raw_markdown)


def main():
  atheris.instrument_all()
  atheris.Setup(sys.argv, TestOneInput)
  atheris.Fuzz()


if __name__ == "__main__":
  main()

Using the above solution you can just paste the byte arrays in from the OSS-Fuzz reports as pytest inputs to TestOneInput, which can be used as regression testing when run under pytest and the running time for the fuzzers when run under pytest will be low (i.e. like any other test) since atheris.instrument_all is not called. The fuzzers, however, when run under pytest will only exhibit the input specified -- OSS-Fuzz will continue to run main for each fuzzer and thus be able to pass arbitrary input to TestOneInput.

@hukkinj1

## 3.0.0 - 2023-06-03 ⚠️ This release contains some minor breaking changes in the internal API and improvements to the parsing strictness. **Full Changelog**: <executablebooks/markdown-it-py@v2.2.0...v3.0.0> ### ⬆️ UPGRADE: Drop support for Python 3.7 Also add testing for Python 3.11 ### ⬆️ UPGRADE: Update from upstream markdown-it `12.2.0` to `13.0.0` A key change is the addition of a new `Token` type, `text_special`, which is used to represent HTML entities and backslash escaped characters. This ensures that (core) typographic transformation rules are not incorrectly applied to these texts. The final core rule is now the new `text_join` rule, which joins adjacent `text`/`text_special` tokens, and so no `text_special` tokens should be present in the final token stream. Any custom typographic rules should be inserted before `text_join`. A new `linkify` rule has also been added to the inline chain, which will linkify full URLs (e.g. `https://example.com`), and fixes collision of emphasis and linkifier (so `http://example.org/foo._bar_-_baz` is now a single link, not emphasized). Emails and fuzzy links are not affected by this. * ♻️ Refactor backslash escape logic, add `text_special` [#276](executablebooks/markdown-it-py#276) * ♻️ Parse entities to `text_special` token [#280](executablebooks/markdown-it-py#280) * ♻️ Refactor: Add linkifier rule to inline chain for full links [#279](executablebooks/markdown-it-py#279) * ‼️ Remove `(p)` => `§` replacement in typographer [#281](executablebooks/markdown-it-py#281) * ‼️ Remove unused `silent` arg in `ParserBlock.tokenize` [#284](executablebooks/markdown-it-py#284) * 🐛 FIX: numeric character reference passing [#272](executablebooks/markdown-it-py#272) * 🐛 Fix: tab preventing paragraph continuation in lists [#274](executablebooks/markdown-it-py#274) * 👌 Improve nested emphasis parsing [#273](executablebooks/markdown-it-py#273) * 👌 fix possible ReDOS in newline rule [#275](executablebooks/markdown-it-py#275) * 👌 Improve performance of `skipSpaces`/`skipChars` [#271](executablebooks/markdown-it-py#271) * 👌 Show text of `text_special` in `tree.pretty` [#282](executablebooks/markdown-it-py#282) ### ♻️ REFACTOR: Replace most character code use with strings The use of `StateBase.srcCharCode` is deprecated (with backward-compatibility), and all core uses are replaced by `StateBase.src`. Conversion of source string characters to an integer representing the Unicode character is prevalent in the upstream JavaScript implementation, to improve performance. However, it is unnecessary in Python and leads to harder to read code and performance deprecations (during the conversion in the `StateBase` initialisation). See [#270](executablebooks/markdown-it-py#270), thanks to [@hukkinj1](https://github.com/hukkinj1). ### ♻️ Centralise indented code block tests For CommonMark, the presence of indented code blocks prevent any other block element from having an indent of greater than 4 spaces. Certain Markdown flavors and derivatives, such as mdx and djot, disable these code blocks though, since it is more common to use code fences and/or arbitrary indenting is desirable. Previously, disabling code blocks did not remove the indent limitation, since most block elements had the 3 space limitation hard-coded. This change centralised the logic of applying this limitation (in `StateBlock.is_code_block`), and only applies it when indented code blocks are enabled. This allows for e.g. ```md <div> <div> I can indent as much as I want here. <div> <div> ``` See [#260](executablebooks/markdown-it-py#260) ### 🔧 Maintenance changes Strict type annotation checking has been applied to the whole code base, [ruff](https://github.com/charliermarsh/ruff) is now used for linting, and fuzzing tests have been added to the CI, to integrate with Google [OSS-Fuzz](https://github.com/google/oss-fuzz/tree/master/projects/markdown-it-py) testing, thanks to [@DavidKorczynski](https://github.com/DavidKorczynski). * 🔧 MAINTAIN: Make type checking strict [#](executablebooks/markdown-it-py#267) * 🔧 Add typing of rule functions [#283](executablebooks/markdown-it-py#283) * 🔧 Move linting from flake8 to ruff [#268](executablebooks/markdown-it-py#268) * 🧪 CI: Add fuzzing workflow for PRs [#262](executablebooks/markdown-it-py#262) * 🔧 Add tox env for fuzz testcase run [#263](executablebooks/markdown-it-py#263) * 🧪 Add OSS-Fuzz set up by @DavidKorczynski in [#255](executablebooks/markdown-it-py#255) * 🧪 Fix fuzzing test failures [#254](executablebooks/markdown-it-py#254)

🧪 Fix fuzzing test failures

fb39868

From https://github.com/google/oss-fuzz/tree/master/projects/markdown-it-py

chrisjsewell commented Mar 14, 2023

View reviewed changes

tests/test_fuzzer.py Outdated Show resolved Hide resolved

Update tests/test_fuzzer.py

8f1aaf8

chrisjsewell merged commit 07e9b7d into master Mar 14, 2023

chrisjsewell deleted the fix-fuzz branch March 14, 2023 04:13

DavidKorczynski mentioned this pull request Mar 15, 2023

🧪 Add OSS-Fuzz set up #255

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🧪 Fix fuzzing test failures #254

🧪 Fix fuzzing test failures #254

chrisjsewell commented Mar 14, 2023 •

edited

Loading

codecov bot commented Mar 14, 2023 •

edited

Loading

DavidKorczynski commented Mar 14, 2023

chrisjsewell commented Mar 14, 2023

DavidKorczynski commented Mar 14, 2023 •

edited

Loading

🧪 Fix fuzzing test failures #254

🧪 Fix fuzzing test failures #254

Conversation

chrisjsewell commented Mar 14, 2023 • edited Loading

codecov bot commented Mar 14, 2023 • edited Loading

Codecov Report

DavidKorczynski commented Mar 14, 2023

chrisjsewell commented Mar 14, 2023

DavidKorczynski commented Mar 14, 2023 • edited Loading

chrisjsewell commented Mar 14, 2023 •

edited

Loading

codecov bot commented Mar 14, 2023 •

edited

Loading

DavidKorczynski commented Mar 14, 2023 •

edited

Loading