Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add syntax highlighting support for Markdown pages #170

Merged
merged 11 commits into from
Jan 28, 2022

Conversation

dandhlee
Copy link
Collaborator

@dandhlee dandhlee commented Jan 21, 2022

Markdown pages that are in rst format files come without any syntax highlighting support as they're omitted by the Sphinx markdown plugin. For all Markdown pages this is expected for none of the code blocks (whether specified to include a language or not) to have language specified to help with syntax highlighting, hence this feature will likely work well. In cases where there might be few that slip through the crack or are malformed, I've included an error check to ensure that we only process ones that seem valid.

Once python tag is added, actual syntax-highlight is taken care of by the code in doc-pipeline.

If this is added to a code block that's not Python code, prettyprint attempts to find a language based on the code provided and lang-python is just a supplement to help clarify that it might be Python code but will not add Python syntax highlight to Java code per se.

Fixes b/213152730.

  • Tests pass

@dandhlee dandhlee requested a review from a team January 21, 2022 12:28
@dandhlee dandhlee requested a review from a team as a code owner January 21, 2022 12:28

with open(mdfile) as mdfile_iterator:
file_content = mdfile_iterator.read()
# If there isn't even number of code block annotations do not syntax
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# If there isn't even number of code block annotations do not syntax
# If there is an odd number of code block annotations, do not syntax

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied suggestion.

Comment on lines 1273 to 1275
find_string = '```'
find_string_nl = '```\n'
replace_string = '```python'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider fence, fence_with_nl, and fence_with_python, or something like that. It's a little unclear to me while reading what these names refer to. Could just be me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I think those names would be a bit more clear

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Didn't know they were called fence 😅

file_content = mdfile_iterator.read()
# If there isn't even number of code block annotations do not syntax
# highlight.
if file_content.count(find_string_nl) % 2 != 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if a code block actually has a language indicator? For example, this example has valid code fencing, but this check would return an odd number.

```python
oops
```

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what I've seen throughout the libraries this never was the case, nonetheless I should ensure that this case is handled. At the moment we'd return without syntax highlighting applied in the file but this is also not going to work if there's exactly even number of language indicators which would cause chaos.

At the moment the easiest solution should be to check for language indicators' existence, and omit those pair of entries. I'll modify this code a bit to still check that there's an even number of fences.

```

```
with no closing bracket
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider renaming the test files following the pattern foo.md and foo_want.md. That way, we're consistent between filenames and the tests themselves.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

```
all code blocks
should be highlighted
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test including a code block with a language indicator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I've seen both py and python language indicators in repos.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, added support for language indicators and unit test for it.

Comment on lines 1273 to 1275
find_string = '```'
find_string_nl = '```\n'
replace_string = '```python'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I think those names would be a bit more clear

find_string,
file_content)]

# This is equivalent to grabbing every odd index items.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# This is equivalent to grabbing every odd index items.
# This is equivalent to grabbing every odd index item.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied suggestion.

```
all code blocks
should be highlighted
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I've seen both py and python language indicators in repos.

Copy link
Collaborator Author

@dandhlee dandhlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you both! Please take a look again.

Comment on lines 1273 to 1275
find_string = '```'
find_string_nl = '```\n'
replace_string = '```python'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Didn't know they were called fence 😅


with open(mdfile) as mdfile_iterator:
file_content = mdfile_iterator.read()
# If there isn't even number of code block annotations do not syntax
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied suggestion.

file_content = mdfile_iterator.read()
# If there isn't even number of code block annotations do not syntax
# highlight.
if file_content.count(find_string_nl) % 2 != 0:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what I've seen throughout the libraries this never was the case, nonetheless I should ensure that this case is handled. At the moment we'd return without syntax highlighting applied in the file but this is also not going to work if there's exactly even number of language indicators which would cause chaos.

At the moment the easiest solution should be to check for language indicators' existence, and omit those pair of entries. I'll modify this code a bit to still check that there's an even number of fences.

find_string,
file_content)]

# This is equivalent to grabbing every odd index items.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied suggestion.

```

```
with no closing bracket
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

```
all code blocks
should be highlighted
```
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, added support for language indicators and unit test for it.

file_content = mdfile_iterator.read()
# If there is an odd number of code block annotations, do not syntax
# highlight.
if file_content.count(fence) % 2 != 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little concerned about this because we're kind of faking markdown parsing. A code fence can also use four backticks sometimes... So, I think this alright for now, but I'd prefer a solution on the rendering side of the Markdown.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The background of this issue is that the Markdown plugin that's generating all of the markdown pages, is simply not adding the language indicators to the code fences. If there are four backticks in the code fence, it will not be generating from the Markdown plugin but directly inserted into the file which we can simply fix it in the source (or ask to include it with a language indicator).

I'd prefer this be solved at the Markdown plugin level as well :( Happy to file an issue on this to track for future improvement.


# This is equivalent to grabbing every odd index item.
codeblocks = codeblocks[::2]
# Used to store code blocks that comes without language indicators.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Used to store code blocks that comes without language indicators.
# Used to store code blocks that come without language indicators.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion applied.


# Check if the fence comes with a language indicator. If so, skip this.
for start, end in codeblocks:
newline_index = file_content.find('\n', start)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just check what the end+1 character is, or if len(file_content) == end?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking one character after the end of the fence makes much more sense. Updated!

blocks_without_indicators.append([start, end])

# Stitch content that does not need to be parsed, and replace with
# `replace_string` for parsed portions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated reference to old variable name.

@@ -1294,6 +1343,7 @@ def find_markdown_pages(app, outdir):
# For each file, if it is a markdown file move to the top level pages.
for mdfile in markdown_dir.iterdir():
if mdfile.is_file() and mdfile.name.lower() not in files_to_ignore:
highlight_md_codeblocks(f"{markdown_dir}/{mdfile.name}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the string/name conversion?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, no need. I was going back and forth myself between passing a file or just the file path or the string, and forgot to clean it up here.

@@ -0,0 +1,15 @@
```python
These code block should not be highlighted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
These code block should not be highlighted.
These code blocks should not be highlighted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion applied.

```

```py
As these comes with a language indicator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As these comes with a language indicator.
As these come with a language indicator.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion applied.

test_markdown_filenames = [
[
"tests/markdown_syntax_highlight.md",
"tests/markdown_syntax_highlight_got.md",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should either use a tmp file, clean these up every time (not after the assert), or include them in .gitignore.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use tempfile.

Copy link
Collaborator Author

@dandhlee dandhlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Please take a look again.

@@ -1294,6 +1343,7 @@ def find_markdown_pages(app, outdir):
# For each file, if it is a markdown file move to the top level pages.
for mdfile in markdown_dir.iterdir():
if mdfile.is_file() and mdfile.name.lower() not in files_to_ignore:
highlight_md_codeblocks(f"{markdown_dir}/{mdfile.name}")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, no need. I was going back and forth myself between passing a file or just the file path or the string, and forgot to clean it up here.

@@ -0,0 +1,15 @@
```python
These code block should not be highlighted.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion applied.

```

```py
As these comes with a language indicator.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion applied.


# This is equivalent to grabbing every odd index item.
codeblocks = codeblocks[::2]
# Used to store code blocks that comes without language indicators.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion applied.

blocks_without_indicators.append([start, end])

# Stitch content that does not need to be parsed, and replace with
# `replace_string` for parsed portions.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated reference to old variable name.

test_markdown_filenames = [
[
"tests/markdown_syntax_highlight.md",
"tests/markdown_syntax_highlight_got.md",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use tempfile.


# Check if the fence comes with a language indicator. If so, skip this.
for start, end in codeblocks:
newline_index = file_content.find('\n', start)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking one character after the end of the fence makes much more sense. Updated!

file_content = mdfile_iterator.read()
# If there is an odd number of code block annotations, do not syntax
# highlight.
if file_content.count(fence) % 2 != 0:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The background of this issue is that the Markdown plugin that's generating all of the markdown pages, is simply not adding the language indicators to the code fences. If there are four backticks in the code fence, it will not be generating from the Markdown plugin but directly inserted into the file which we can simply fix it in the source (or ask to include it with a language indicator).

I'd prefer this be solved at the Markdown plugin level as well :( Happy to file an issue on this to track for future improvement.

@dandhlee dandhlee requested a review from tbpg January 27, 2022 17:28
# Test to ensure codeblocks in markdown files are correctly highlighted.

# Copy the base file we'll need to test.
test_file = tempfile.NamedTemporaryFile(mode='r+', delete=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use as a context manager (with ...) to avoid having to manually close? Why not auto-delete? Debugging if it goes wrong?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file needed to be kept within the function being tested, so I didn't think it'd work to use it as a context manager. Seems to work, updated to use.

@dandhlee dandhlee requested a review from tbpg January 28, 2022 16:52
@dandhlee dandhlee merged commit 9898807 into main Jan 28, 2022
@dandhlee dandhlee deleted the prettyprint_markdown branch January 28, 2022 16:55
gcf-merge-on-green bot pushed a commit that referenced this pull request Jan 28, 2022
🤖 I have created a release *beep* *boop*
---


## [1.4.0](v1.3.3...v1.4.0) (2022-01-28)


### Features

* add syntax highlighting support for Markdown pages ([#170](#170)) ([9898807](9898807))

---
This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants