fix: parse markdown header more carefully #111

dandhlee · 2021-08-24T20:44:45Z

There are some files that don't have a header for markdown files and some headers that were wrongly extracted. Also adding support for alternate header type in markdown, see https://www.markdownguide.org/basic-syntax/#headings.

For the traditional markdown header using # for h1 header, check that:
- There's only one # for an h1 header
- There's exactly one space after one #
- Is not just a short string consisting only of #
For the alternate markdown header using = characters below the header, check that:
- The header above the = divider isn't empty
- The divider line only consists of = and whitespace characters if necessary

Otherwise, I've defaulted to returning the file name if no header can be found. I'll leave it to the library to better format the markdown files rather than dealing with individual formatting issues on the plugin.

Updated unit tests.

Fixes #110.

Tests pass

parthea

Added minor observations, otherwise LGTM.

parthea · 2021-08-25T14:33:54Z

tests/test_unit.py

+
+        self.assertEqual(header_line_want, header_line_got)
+
+        mdfile.close()


This line can be removed if mdfile = open('tests/markdown_example_h2.md', 'r') is changed to a with statement. For example, with open('tests/markdown_example_h2.md', 'r') as mdfile:

Thank you for the tip! Updating the files.

parthea · 2021-08-25T14:38:53Z

docfx_yaml/extension.py

+    if "#" in header_line:
+        # Check for proper h1 header formatting, ensure there's more than just
+        # the hashtag character.
+        if header_line[header_line.index("#")+1] == " " and \


If we're looking for both header and a space, we could change the check to if "# " in header_line: to look for a header and a space

There is an additional overhead to ensure that it doesn't parse h2 header if given ## but it does reduce needing to check for a space followed by #. The updated code looks easier to follow, I think.

dandhlee · 2021-08-25T16:18:37Z

Updated function name from is_markdown_header to parse_markdown_header to be more in line with what it returns, which isn't a boolean.

dandhlee added 2 commits August 24, 2021 20:33

fix: check for markdown header more carefully

7adfdd8

test: update unit test

6936a76

dandhlee requested review from parthea, busunkim96, tbpg and a team August 24, 2021 20:44

dandhlee requested a review from a team as a code owner August 24, 2021 20:44

google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Aug 24, 2021

dandhlee added 2 commits August 24, 2021 20:44

fix: update parser and unit test

bd6fade

fix: removing redundant code and adding comment

fd532fa

parthea approved these changes Aug 25, 2021

View reviewed changes

dandhlee and others added 3 commits August 25, 2021 12:01

Merge branch 'main' into parse_markdown_header

13f301b

test: update lint and open file formats

39c354e

fix: update to parse_markdown_header

a21995f

dandhlee merged commit 485b248 into main Aug 25, 2021

dandhlee deleted the parse_markdown_header branch August 25, 2021 16:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: parse markdown header more carefully #111

fix: parse markdown header more carefully #111

dandhlee commented Aug 24, 2021

parthea left a comment

parthea Aug 25, 2021

dandhlee Aug 25, 2021

parthea Aug 25, 2021

dandhlee Aug 25, 2021

dandhlee commented Aug 25, 2021


		self.assertEqual(header_line_want, header_line_got)

		mdfile.close()

fix: parse markdown header more carefully #111

fix: parse markdown header more carefully #111

Conversation

dandhlee commented Aug 24, 2021

parthea left a comment

Choose a reason for hiding this comment

parthea Aug 25, 2021

Choose a reason for hiding this comment

dandhlee Aug 25, 2021

Choose a reason for hiding this comment

parthea Aug 25, 2021

Choose a reason for hiding this comment

dandhlee Aug 25, 2021

Choose a reason for hiding this comment

dandhlee commented Aug 25, 2021