-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add syntax highlighting support for Markdown pages #170
Conversation
docfx_yaml/extension.py
Outdated
|
||
with open(mdfile) as mdfile_iterator: | ||
file_content = mdfile_iterator.read() | ||
# If there isn't even number of code block annotations do not syntax |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# If there isn't even number of code block annotations do not syntax | |
# If there is an odd number of code block annotations, do not syntax |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied suggestion.
docfx_yaml/extension.py
Outdated
find_string = '```' | ||
find_string_nl = '```\n' | ||
replace_string = '```python' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider fence
, fence_with_nl
, and fence_with_python
, or something like that. It's a little unclear to me while reading what these names refer to. Could just be me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, I think those names would be a bit more clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Didn't know they were called fence 😅
docfx_yaml/extension.py
Outdated
file_content = mdfile_iterator.read() | ||
# If there isn't even number of code block annotations do not syntax | ||
# highlight. | ||
if file_content.count(find_string_nl) % 2 != 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if a code block actually has a language indicator? For example, this example has valid code fencing, but this check would return an odd number.
```python
oops
```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what I've seen throughout the libraries this never was the case, nonetheless I should ensure that this case is handled. At the moment we'd return without syntax highlighting applied in the file but this is also not going to work if there's exactly even number of language indicators which would cause chaos.
At the moment the easiest solution should be to check for language indicators' existence, and omit those pair of entries. I'll modify this code a bit to still check that there's an even number of fences.
tests/markdown_mixed_post.md
Outdated
``` | ||
|
||
``` | ||
with no closing bracket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider renaming the test files following the pattern foo.md
and foo_want.md
. That way, we're consistent between filenames and the tests themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
tests/markdown_code_obj.md
Outdated
``` | ||
all code blocks | ||
should be highlighted | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test including a code block with a language indicator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, I've seen both py
and python
language indicators in repos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, added support for language indicators and unit test for it.
docfx_yaml/extension.py
Outdated
find_string = '```' | ||
find_string_nl = '```\n' | ||
replace_string = '```python' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, I think those names would be a bit more clear
docfx_yaml/extension.py
Outdated
find_string, | ||
file_content)] | ||
|
||
# This is equivalent to grabbing every odd index items. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# This is equivalent to grabbing every odd index items. | |
# This is equivalent to grabbing every odd index item. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied suggestion.
tests/markdown_code_obj.md
Outdated
``` | ||
all code blocks | ||
should be highlighted | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, I've seen both py
and python
language indicators in repos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you both! Please take a look again.
docfx_yaml/extension.py
Outdated
find_string = '```' | ||
find_string_nl = '```\n' | ||
replace_string = '```python' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Didn't know they were called fence 😅
docfx_yaml/extension.py
Outdated
|
||
with open(mdfile) as mdfile_iterator: | ||
file_content = mdfile_iterator.read() | ||
# If there isn't even number of code block annotations do not syntax |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied suggestion.
docfx_yaml/extension.py
Outdated
file_content = mdfile_iterator.read() | ||
# If there isn't even number of code block annotations do not syntax | ||
# highlight. | ||
if file_content.count(find_string_nl) % 2 != 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what I've seen throughout the libraries this never was the case, nonetheless I should ensure that this case is handled. At the moment we'd return without syntax highlighting applied in the file but this is also not going to work if there's exactly even number of language indicators which would cause chaos.
At the moment the easiest solution should be to check for language indicators' existence, and omit those pair of entries. I'll modify this code a bit to still check that there's an even number of fences.
docfx_yaml/extension.py
Outdated
find_string, | ||
file_content)] | ||
|
||
# This is equivalent to grabbing every odd index items. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied suggestion.
tests/markdown_mixed_post.md
Outdated
``` | ||
|
||
``` | ||
with no closing bracket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
tests/markdown_code_obj.md
Outdated
``` | ||
all code blocks | ||
should be highlighted | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, added support for language indicators and unit test for it.
file_content = mdfile_iterator.read() | ||
# If there is an odd number of code block annotations, do not syntax | ||
# highlight. | ||
if file_content.count(fence) % 2 != 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little concerned about this because we're kind of faking markdown parsing. A code fence can also use four backticks sometimes... So, I think this alright for now, but I'd prefer a solution on the rendering side of the Markdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The background of this issue is that the Markdown plugin that's generating all of the markdown pages, is simply not adding the language indicators to the code fences. If there are four backticks in the code fence, it will not be generating from the Markdown plugin but directly inserted into the file which we can simply fix it in the source (or ask to include it with a language indicator).
I'd prefer this be solved at the Markdown plugin level as well :( Happy to file an issue on this to track for future improvement.
docfx_yaml/extension.py
Outdated
|
||
# This is equivalent to grabbing every odd index item. | ||
codeblocks = codeblocks[::2] | ||
# Used to store code blocks that comes without language indicators. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Used to store code blocks that comes without language indicators. | |
# Used to store code blocks that come without language indicators. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion applied.
docfx_yaml/extension.py
Outdated
|
||
# Check if the fence comes with a language indicator. If so, skip this. | ||
for start, end in codeblocks: | ||
newline_index = file_content.find('\n', start) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just check what the end+1
character is, or if len(file_content) == end
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking one character after the end of the fence makes much more sense. Updated!
docfx_yaml/extension.py
Outdated
blocks_without_indicators.append([start, end]) | ||
|
||
# Stitch content that does not need to be parsed, and replace with | ||
# `replace_string` for parsed portions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated reference to old variable name.
docfx_yaml/extension.py
Outdated
@@ -1294,6 +1343,7 @@ def find_markdown_pages(app, outdir): | |||
# For each file, if it is a markdown file move to the top level pages. | |||
for mdfile in markdown_dir.iterdir(): | |||
if mdfile.is_file() and mdfile.name.lower() not in files_to_ignore: | |||
highlight_md_codeblocks(f"{markdown_dir}/{mdfile.name}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the string/name conversion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, no need. I was going back and forth myself between passing a file or just the file path or the string, and forgot to clean it up here.
tests/markdown_mixed_highlight.md
Outdated
@@ -0,0 +1,15 @@ | |||
```python | |||
These code block should not be highlighted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These code block should not be highlighted. | |
These code blocks should not be highlighted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion applied.
tests/markdown_mixed_highlight.md
Outdated
``` | ||
|
||
```py | ||
As these comes with a language indicator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As these comes with a language indicator. | |
As these come with a language indicator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion applied.
tests/test_helpers.py
Outdated
test_markdown_filenames = [ | ||
[ | ||
"tests/markdown_syntax_highlight.md", | ||
"tests/markdown_syntax_highlight_got.md", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should either use a tmp file, clean these up every time (not after the assert), or include them in .gitignore
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use tempfile
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Please take a look again.
docfx_yaml/extension.py
Outdated
@@ -1294,6 +1343,7 @@ def find_markdown_pages(app, outdir): | |||
# For each file, if it is a markdown file move to the top level pages. | |||
for mdfile in markdown_dir.iterdir(): | |||
if mdfile.is_file() and mdfile.name.lower() not in files_to_ignore: | |||
highlight_md_codeblocks(f"{markdown_dir}/{mdfile.name}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, no need. I was going back and forth myself between passing a file or just the file path or the string, and forgot to clean it up here.
tests/markdown_mixed_highlight.md
Outdated
@@ -0,0 +1,15 @@ | |||
```python | |||
These code block should not be highlighted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion applied.
tests/markdown_mixed_highlight.md
Outdated
``` | ||
|
||
```py | ||
As these comes with a language indicator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion applied.
docfx_yaml/extension.py
Outdated
|
||
# This is equivalent to grabbing every odd index item. | ||
codeblocks = codeblocks[::2] | ||
# Used to store code blocks that comes without language indicators. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion applied.
docfx_yaml/extension.py
Outdated
blocks_without_indicators.append([start, end]) | ||
|
||
# Stitch content that does not need to be parsed, and replace with | ||
# `replace_string` for parsed portions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated reference to old variable name.
tests/test_helpers.py
Outdated
test_markdown_filenames = [ | ||
[ | ||
"tests/markdown_syntax_highlight.md", | ||
"tests/markdown_syntax_highlight_got.md", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use tempfile
.
docfx_yaml/extension.py
Outdated
|
||
# Check if the fence comes with a language indicator. If so, skip this. | ||
for start, end in codeblocks: | ||
newline_index = file_content.find('\n', start) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking one character after the end of the fence makes much more sense. Updated!
file_content = mdfile_iterator.read() | ||
# If there is an odd number of code block annotations, do not syntax | ||
# highlight. | ||
if file_content.count(fence) % 2 != 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The background of this issue is that the Markdown plugin that's generating all of the markdown pages, is simply not adding the language indicators to the code fences. If there are four backticks in the code fence, it will not be generating from the Markdown plugin but directly inserted into the file which we can simply fix it in the source (or ask to include it with a language indicator).
I'd prefer this be solved at the Markdown plugin level as well :( Happy to file an issue on this to track for future improvement.
tests/test_helpers.py
Outdated
# Test to ensure codeblocks in markdown files are correctly highlighted. | ||
|
||
# Copy the base file we'll need to test. | ||
test_file = tempfile.NamedTemporaryFile(mode='r+', delete=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use as a context manager (with ...
) to avoid having to manually close? Why not auto-delete? Debugging if it goes wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file needed to be kept within the function being tested, so I didn't think it'd work to use it as a context manager. Seems to work, updated to use.
🤖 I have created a release *beep* *boop* --- ## [1.4.0](v1.3.3...v1.4.0) (2022-01-28) ### Features * add syntax highlighting support for Markdown pages ([#170](#170)) ([9898807](9898807)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Markdown pages that are in
rst
format files come without any syntax highlighting support as they're omitted by the Sphinx markdown plugin. For all Markdown pages this is expected for none of the code blocks (whether specified to include a language or not) to have language specified to help with syntax highlighting, hence this feature will likely work well. In cases where there might be few that slip through the crack or are malformed, I've included an error check to ensure that we only process ones that seem valid.Once
python
tag is added, actual syntax-highlight is taken care of by the code indoc-pipeline
.If this is added to a code block that's not Python code,
prettyprint
attempts to find a language based on the code provided andlang-python
is just a supplement to help clarify that it might be Python code but will not add Python syntax highlight to Java code per se.Fixes b/213152730.