Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: parse markdown header more carefully #111

Merged
merged 7 commits into from
Aug 25, 2021
Merged

Conversation

dandhlee
Copy link
Collaborator

There are some files that don't have a header for markdown files and some headers that were wrongly extracted. Also adding support for alternate header type in markdown, see https://www.markdownguide.org/basic-syntax/#headings.

  • For the traditional markdown header using # for h1 header, check that:

    • There's only one # for an h1 header
    • There's exactly one space after one #
    • Is not just a short string consisting only of #
  • For the alternate markdown header using = characters below the header, check that:

    • The header above the = divider isn't empty
    • The divider line only consists of = and whitespace characters if necessary

Otherwise, I've defaulted to returning the file name if no header can be found. I'll leave it to the library to better format the markdown files rather than dealing with individual formatting issues on the plugin.

Updated unit tests.

Fixes #110.

  • Tests pass

@dandhlee dandhlee requested review from parthea, busunkim96, tbpg and a team August 24, 2021 20:44
@dandhlee dandhlee requested a review from a team as a code owner August 24, 2021 20:44
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Aug 24, 2021
Copy link
Contributor

@parthea parthea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added minor observations, otherwise LGTM.


self.assertEqual(header_line_want, header_line_got)

mdfile.close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line can be removed if mdfile = open('tests/markdown_example_h2.md', 'r') is changed to a with statement. For example, with open('tests/markdown_example_h2.md', 'r') as mdfile:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the tip! Updating the files.

if "#" in header_line:
# Check for proper h1 header formatting, ensure there's more than just
# the hashtag character.
if header_line[header_line.index("#")+1] == " " and \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're looking for both header and a space, we could change the check to if "# " in header_line: to look for a header and a space

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an additional overhead to ensure that it doesn't parse h2 header if given ## but it does reduce needing to check for a space followed by #. The updated code looks easier to follow, I think.

@dandhlee
Copy link
Collaborator Author

Updated function name from is_markdown_header to parse_markdown_header to be more in line with what it returns, which isn't a boolean.

@dandhlee dandhlee merged commit 485b248 into main Aug 25, 2021
@dandhlee dandhlee deleted the parse_markdown_header branch August 25, 2021 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Process markdown headers correctly
2 participants