feat: add syntax highlighting support for Markdown pages #170

dandhlee · 2022-01-21T12:28:13Z

Markdown pages that are in rst format files come without any syntax highlighting support as they're omitted by the Sphinx markdown plugin. For all Markdown pages this is expected for none of the code blocks (whether specified to include a language or not) to have language specified to help with syntax highlighting, hence this feature will likely work well. In cases where there might be few that slip through the crack or are malformed, I've included an error check to ensure that we only process ones that seem valid.

Once python tag is added, actual syntax-highlight is taken care of by the code in doc-pipeline.

If this is added to a code block that's not Python code, prettyprint attempts to find a language based on the code provided and lang-python is just a supplement to help clarify that it might be Python code but will not add Python syntax highlight to Java code per se.

Fixes b/213152730.

Tests pass

tbpg · 2022-01-21T16:32:57Z

docfx_yaml/extension.py

+
+    with open(mdfile) as mdfile_iterator:
+        file_content = mdfile_iterator.read()
+        # If there isn't even number of code block annotations do not syntax


Suggested change

# If there isn't even number of code block annotations do not syntax

# If there is an odd number of code block annotations, do not syntax

Applied suggestion.

tbpg · 2022-01-21T16:34:36Z

docfx_yaml/extension.py

+    find_string = '```'
+    find_string_nl = '```\n'
+    replace_string = '```python'


Consider fence, fence_with_nl, and fence_with_python, or something like that. It's a little unclear to me while reading what these names refer to. Could just be me.

+1, I think those names would be a bit more clear

Thank you! Didn't know they were called fence 😅

tbpg · 2022-01-21T16:40:44Z

docfx_yaml/extension.py

+        file_content = mdfile_iterator.read()
+        # If there isn't even number of code block annotations do not syntax
+        # highlight.
+        if file_content.count(find_string_nl) % 2 != 0:


What happens if a code block actually has a language indicator? For example, this example has valid code fencing, but this check would return an odd number.

```python oops ```

For what I've seen throughout the libraries this never was the case, nonetheless I should ensure that this case is handled. At the moment we'd return without syntax highlighting applied in the file but this is also not going to work if there's exactly even number of language indicators which would cause chaos.

At the moment the easiest solution should be to check for language indicators' existence, and omit those pair of entries. I'll modify this code a bit to still check that there's an even number of fences.

tbpg · 2022-01-21T16:42:39Z

tests/markdown_mixed_post.md

+```
+
+```
+with no closing bracket


Consider renaming the test files following the pattern foo.md and foo_want.md. That way, we're consistent between filenames and the tests themselves.

tbpg · 2022-01-21T16:44:23Z

tests/markdown_code_obj.md

+```
+all code blocks
+should be highlighted
+```


Please add a test including a code block with a language indicator.

+1, I've seen both py and python language indicators in repos.

Done, added support for language indicators and unit test for it.

busunkim96 · 2022-01-21T17:16:53Z

docfx_yaml/extension.py

+    find_string = '```'
+    find_string_nl = '```\n'
+    replace_string = '```python'


+1, I think those names would be a bit more clear

busunkim96 · 2022-01-21T17:17:25Z

docfx_yaml/extension.py

+                                                      find_string,
+                                                      file_content)]
+
+        # This is equivalent to grabbing every odd index items.


Suggested change

# This is equivalent to grabbing every odd index items.

# This is equivalent to grabbing every odd index item.

Applied suggestion.

busunkim96 · 2022-01-21T17:23:02Z

tests/markdown_code_obj.md

+```
+all code blocks
+should be highlighted
+```


+1, I've seen both py and python language indicators in repos.

dandhlee

Thank you both! Please take a look again.

dandhlee · 2022-01-22T09:03:04Z

docfx_yaml/extension.py

+    find_string = '```'
+    find_string_nl = '```\n'
+    replace_string = '```python'


Thank you! Didn't know they were called fence 😅

dandhlee · 2022-01-22T09:03:17Z

docfx_yaml/extension.py

+
+    with open(mdfile) as mdfile_iterator:
+        file_content = mdfile_iterator.read()
+        # If there isn't even number of code block annotations do not syntax


Applied suggestion.

dandhlee · 2022-01-22T09:06:17Z

docfx_yaml/extension.py

+        file_content = mdfile_iterator.read()
+        # If there isn't even number of code block annotations do not syntax
+        # highlight.
+        if file_content.count(find_string_nl) % 2 != 0:


For what I've seen throughout the libraries this never was the case, nonetheless I should ensure that this case is handled. At the moment we'd return without syntax highlighting applied in the file but this is also not going to work if there's exactly even number of language indicators which would cause chaos.

At the moment the easiest solution should be to check for language indicators' existence, and omit those pair of entries. I'll modify this code a bit to still check that there's an even number of fences.

dandhlee · 2022-01-22T09:22:28Z

docfx_yaml/extension.py

+                                                      find_string,
+                                                      file_content)]
+
+        # This is equivalent to grabbing every odd index items.


Applied suggestion.

dandhlee · 2022-01-22T09:24:43Z

tests/markdown_mixed_post.md

+```
+
+```
+with no closing bracket


dandhlee · 2022-01-22T09:32:03Z

tests/markdown_code_obj.md

+```
+all code blocks
+should be highlighted
+```


Done, added support for language indicators and unit test for it.

tbpg · 2022-01-25T11:55:22Z

docfx_yaml/extension.py

+        file_content = mdfile_iterator.read()
+        # If there is an odd number of code block annotations, do not syntax
+        # highlight.
+        if file_content.count(fence) % 2 != 0:


I'm a little concerned about this because we're kind of faking markdown parsing. A code fence can also use four backticks sometimes... So, I think this alright for now, but I'd prefer a solution on the rendering side of the Markdown.

The background of this issue is that the Markdown plugin that's generating all of the markdown pages, is simply not adding the language indicators to the code fences. If there are four backticks in the code fence, it will not be generating from the Markdown plugin but directly inserted into the file which we can simply fix it in the source (or ask to include it with a language indicator).

I'd prefer this be solved at the Markdown plugin level as well :( Happy to file an issue on this to track for future improvement.

tbpg · 2022-01-25T11:55:51Z

docfx_yaml/extension.py

+
+        # This is equivalent to grabbing every odd index item.
+        codeblocks = codeblocks[::2]
+        # Used to store code blocks that comes without language indicators.


Suggested change

# Used to store code blocks that comes without language indicators.

# Used to store code blocks that come without language indicators.

Suggestion applied.

tbpg · 2022-01-25T11:58:51Z

docfx_yaml/extension.py

+
+        # Check if the fence comes with a language indicator. If so, skip this.
+        for start, end in codeblocks:
+            newline_index = file_content.find('\n', start)


Could we just check what the end+1 character is, or if len(file_content) == end?

Checking one character after the end of the fence makes much more sense. Updated!

tbpg · 2022-01-25T11:59:11Z

docfx_yaml/extension.py

+                blocks_without_indicators.append([start, end])
+
+        # Stitch content that does not need to be parsed, and replace with
+        # `replace_string` for parsed portions.


Updated reference to old variable name.

tbpg · 2022-01-25T12:01:29Z

docfx_yaml/extension.py

@@ -1294,6 +1343,7 @@ def find_markdown_pages(app, outdir):
    # For each file, if it is a markdown file move to the top level pages.
    for mdfile in markdown_dir.iterdir():
        if mdfile.is_file() and mdfile.name.lower() not in files_to_ignore:
+            highlight_md_codeblocks(f"{markdown_dir}/{mdfile.name}")


Why the string/name conversion?

You're right, no need. I was going back and forth myself between passing a file or just the file path or the string, and forgot to clean it up here.

tbpg · 2022-01-25T12:01:48Z

tests/markdown_mixed_highlight.md

@@ -0,0 +1,15 @@
+```python
+These code block should not be highlighted.


Suggested change

These code block should not be highlighted.

These code blocks should not be highlighted.

Suggestion applied.

tbpg · 2022-01-25T12:02:08Z

tests/markdown_mixed_highlight.md

+```
+
+```py
+As these comes with a language indicator.


Suggested change

As these comes with a language indicator.

As these come with a language indicator.

Suggestion applied.

tbpg · 2022-01-25T12:03:42Z

tests/test_helpers.py

+    test_markdown_filenames = [
+        [
+            "tests/markdown_syntax_highlight.md",
+            "tests/markdown_syntax_highlight_got.md",


We should either use a tmp file, clean these up every time (not after the assert), or include them in .gitignore.

Updated to use tempfile.

dandhlee

Thank you! Please take a look again.

dandhlee · 2022-01-27T12:48:41Z

docfx_yaml/extension.py

@@ -1294,6 +1343,7 @@ def find_markdown_pages(app, outdir):
    # For each file, if it is a markdown file move to the top level pages.
    for mdfile in markdown_dir.iterdir():
        if mdfile.is_file() and mdfile.name.lower() not in files_to_ignore:
+            highlight_md_codeblocks(f"{markdown_dir}/{mdfile.name}")


You're right, no need. I was going back and forth myself between passing a file or just the file path or the string, and forgot to clean it up here.

dandhlee · 2022-01-27T12:49:35Z

tests/markdown_mixed_highlight.md

@@ -0,0 +1,15 @@
+```python
+These code block should not be highlighted.


Suggestion applied.

dandhlee · 2022-01-27T12:49:54Z

tests/markdown_mixed_highlight.md

+```
+
+```py
+As these comes with a language indicator.


Suggestion applied.

dandhlee · 2022-01-27T16:16:43Z

docfx_yaml/extension.py

+
+        # This is equivalent to grabbing every odd index item.
+        codeblocks = codeblocks[::2]
+        # Used to store code blocks that comes without language indicators.


Suggestion applied.

dandhlee · 2022-01-27T17:05:07Z

docfx_yaml/extension.py

+                blocks_without_indicators.append([start, end])
+
+        # Stitch content that does not need to be parsed, and replace with
+        # `replace_string` for parsed portions.


Updated reference to old variable name.

dandhlee · 2022-01-27T17:06:45Z

tests/test_helpers.py

+    test_markdown_filenames = [
+        [
+            "tests/markdown_syntax_highlight.md",
+            "tests/markdown_syntax_highlight_got.md",


Updated to use tempfile.

dandhlee · 2022-01-27T17:18:29Z

docfx_yaml/extension.py

+
+        # Check if the fence comes with a language indicator. If so, skip this.
+        for start, end in codeblocks:
+            newline_index = file_content.find('\n', start)


Checking one character after the end of the fence makes much more sense. Updated!

dandhlee · 2022-01-27T17:23:40Z

docfx_yaml/extension.py

+        file_content = mdfile_iterator.read()
+        # If there is an odd number of code block annotations, do not syntax
+        # highlight.
+        if file_content.count(fence) % 2 != 0:


The background of this issue is that the Markdown plugin that's generating all of the markdown pages, is simply not adding the language indicators to the code fences. If there are four backticks in the code fence, it will not be generating from the Markdown plugin but directly inserted into the file which we can simply fix it in the source (or ask to include it with a language indicator).

I'd prefer this be solved at the Markdown plugin level as well :( Happy to file an issue on this to track for future improvement.

tbpg · 2022-01-28T16:42:37Z

tests/test_helpers.py

+        # Test to ensure codeblocks in markdown files are correctly highlighted.
+
+        # Copy the base file we'll need to test.
+        test_file = tempfile.NamedTemporaryFile(mode='r+', delete=False)


Use as a context manager (with ...) to avoid having to manually close? Why not auto-delete? Debugging if it goes wrong?

The file needed to be kept within the function being tested, so I didn't think it'd work to use it as a context manager. Seems to work, updated to use.

🤖 I have created a release *beep* *boop* --- ## [1.4.0](v1.3.3...v1.4.0) (2022-01-28) ### Features * add syntax highlighting support for Markdown pages ([#170](#170)) ([9898807](9898807)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

dandhlee added 3 commits January 21, 2022 12:18

feat: add syntax highlight support for markdown pages

50e187f

test: add unit test for syntax highlighting

5c540da

test: remove unneeded files

02c7180

dandhlee requested a review from a team January 21, 2022 12:28

dandhlee requested a review from a team as a code owner January 21, 2022 12:28

tbpg requested changes Jan 21, 2022

View reviewed changes

busunkim96 reviewed Jan 21, 2022

View reviewed changes

dandhlee added 3 commits January 22, 2022 09:01

fix: apply commit suggestion

fcaea70

feat: handle code blocks with langauge indicators

39fc824

test: update unittest with language indicator support

442a2f7

dandhlee commented Jan 22, 2022

View reviewed changes

dandhlee requested review from tbpg and busunkim96 January 25, 2022 09:42

tbpg requested changes Jan 25, 2022

View reviewed changes

dandhlee added 3 commits January 27, 2022 12:50

test: apply review suggestions

1a7f3e4

test: update to use temporary file.

ce3e97c

feat: update with review suggestions.

33e2cb2

dandhlee commented Jan 27, 2022

View reviewed changes

chore: code cleanup

c8caae9

dandhlee requested a review from tbpg January 27, 2022 17:28

tbpg reviewed Jan 28, 2022

View reviewed changes

test: update to use context manager for temporary file

492ec10

dandhlee requested a review from tbpg January 28, 2022 16:52

tbpg approved these changes Jan 28, 2022

View reviewed changes

dandhlee merged commit 9898807 into main Jan 28, 2022

dandhlee deleted the prettyprint_markdown branch January 28, 2022 16:55

release-please bot mentioned this pull request Jan 28, 2022

chore(main): release 1.4.0 #171

Merged

	# If there isn't even number of code block annotations do not syntax
	# If there is an odd number of code block annotations, do not syntax

	# This is equivalent to grabbing every odd index items.
	# This is equivalent to grabbing every odd index item.

	# Used to store code blocks that comes without language indicators.
	# Used to store code blocks that come without language indicators.

		@@ -0,0 +1,15 @@
		```python
		These code block should not be highlighted.

	These code block should not be highlighted.
	These code blocks should not be highlighted.

	As these comes with a language indicator.
	As these come with a language indicator.

feat: add syntax highlighting support for Markdown pages #170

feat: add syntax highlighting support for Markdown pages #170

Conversation

dandhlee commented Jan 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dandhlee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dandhlee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dandhlee commented Jan 21, 2022 •

edited

Loading