Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: add markdown page support #102

Merged
merged 13 commits into from
Aug 23, 2021
Merged

feat!: add markdown page support #102

merged 13 commits into from
Aug 23, 2021

Conversation

dandhlee
Copy link
Collaborator

Adding support for markdown pages generated by Sphinx. While Sphinx by nature output .html suffix documents, using the sphinx-markdown-builder allows the docs to be generated in Markdown format.

Using this, takes markdown pages from the top level pages (not for the library files) and adds them to the top level TOC. Retrieves their title as the name to put in the TOC, with their link being the .md suffixed files in the top directory (not in exmaples directory or anywhere else, same level as toc.yaml file). Some files are ignored at the moment due to conflicts in devsite or because doc-pipeline tries to process them and convert them into unreadable formats.

I will file issues to track for supporting some of those files, and add any more restrictions we should add to make sure they look decent for cloud site while recovering as many pages as we can. If this can be fixed from doc-pipeline I'll try and resolve it, but if it's issue on the cloud site we will need to work around it; for example, README file.

Added sphinx-markdown-builder into the plugin, so we don't need to distribute it to 120+ repos.

No unit test will be added as it relies on the Sphinx App for now, it will be incorporated in #44.

Fixes internally filed issue.

@dandhlee dandhlee requested review from parthea, busunkim96, tbpg and a team August 21, 2021 03:31
@dandhlee dandhlee requested a review from a team as a code owner August 21, 2021 03:31
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Aug 21, 2021
@dandhlee
Copy link
Collaborator Author

Kokoro test has passed running the updated version of this plugin on all applicable client library repos :D

def build_init(app):
print("Running sphinx-build with Markdown first...")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add another log message after this to say we're done?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# Run sphinx-build with Markdown builder in the plugin.
def run_sphinx_markdown():
cwd = os.getcwd()
# Skip running sphinx-build for Markdown for some unit test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: missing the end of this sentence?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# Use this to ignore markdown files that are unnecessary.
files_to_ignore = [
"index.md", # merge index.md and README.md and index.yaml later
"reference.md", # Reference docs overlap with Overview. Will try and incorporate this in later.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File an issue and reference it here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean within the code, or in this PR? If it's the latter, done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both.

files_to_ignore = [
"index.md", # merge index.md and README.md and index.yaml later
"reference.md", # Reference docs overlap with Overview. Will try and incorporate this in later.
"readme.md", # README does not seem to work in cloud site
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. The file would need to have a different name.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, I'll look into what to name this. Filing an issue.

"index.md", # merge index.md and README.md and index.yaml later
"reference.md", # Reference docs overlap with Overview. Will try and incorporate this in later.
"readme.md", # README does not seem to work in cloud site
"upgrading.md", # Currently the formatting breaks, will need to come back to it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File issue and reference it here? Possibly the same issue as reference.md?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Different issue, upgrading.md file already comes as markdown format in docs/ directory, whereas other files come in reStructuredText format (.rst). Processing .rst->.md with doc-pipeline seems to work well, but processing .md -> .md(?) -> doc-pipeline seems to break. Will need to mitigate this in the middle and use markdown from the source directly, or to make this work on doc-pipeline.


# Extract the header name for TOC.
with open(mdfile) as f:
header_line = f.readline()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style thing: you can just iterate over f to get the lines.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

with open(mdfile) as f:
header_line = f.readline()
# Ignore licenses and other non-headers prior to the header.
while "#" not in header_line:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check prefix instead of just '#' in the line in case it's somewhere else in the line, possibly escaped?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen few cases where they do end up in the middle of the file after they're processed by the markdown builder for Sphinx. I don't think I'd have to worry about it being present escaped after it's been processed by sphinx-build. Though this is something good to consider if I want to move markdown files directly from the source rather than using the processed versions.

while "#" not in header_line:
header_line = f.readline()
#extract the header name
name = header_line.split("#")[1][1:].strip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trim prefix and trim? Do we need to support titles with more than one #?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the h1 header with single # will do. I'll add in a check for this.

@@ -1235,7 +1304,8 @@ def convert_module_to_package_if_needed(obj):
'uid': uid
})

if len(toc_yaml) == 0:
# Exit if there are no generated YAML pages or Markdown pages.
if len(toc_yaml) == 0 and len(app.env.markdown_pages) == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels funny to call it toc_yaml if we end up adding more to it. Perhaps we should rename toc_yaml to pkg_toc_yaml or something to make it clear it's not the complete toc?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Done.

shutil.copy(mdfile, f"{outdir}/{mdfile.name.lower()}")

# Extract the header name for TOC.
with open(mdfile) as f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider refactoring the title extraction into a separate function so we can easily test it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator Author

@dandhlee dandhlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #105 for index.md, #106 for reference.md, #107 for README, #108 for upgrading.md.

Updating unit test in a bit.

# Run sphinx-build with Markdown builder in the plugin.
def run_sphinx_markdown():
cwd = os.getcwd()
# Skip running sphinx-build for Markdown for some unit test
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

def build_init(app):
print("Running sphinx-build with Markdown first...")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# Use this to ignore markdown files that are unnecessary.
files_to_ignore = [
"index.md", # merge index.md and README.md and index.yaml later
"reference.md", # Reference docs overlap with Overview. Will try and incorporate this in later.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean within the code, or in this PR? If it's the latter, done.

files_to_ignore = [
"index.md", # merge index.md and README.md and index.yaml later
"reference.md", # Reference docs overlap with Overview. Will try and incorporate this in later.
"readme.md", # README does not seem to work in cloud site
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, I'll look into what to name this. Filing an issue.

"index.md", # merge index.md and README.md and index.yaml later
"reference.md", # Reference docs overlap with Overview. Will try and incorporate this in later.
"readme.md", # README does not seem to work in cloud site
"upgrading.md", # Currently the formatting breaks, will need to come back to it.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Different issue, upgrading.md file already comes as markdown format in docs/ directory, whereas other files come in reStructuredText format (.rst). Processing .rst->.md with doc-pipeline seems to work well, but processing .md -> .md(?) -> doc-pipeline seems to break. Will need to mitigate this in the middle and use markdown from the source directly, or to make this work on doc-pipeline.


# Extract the header name for TOC.
with open(mdfile) as f:
header_line = f.readline()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# Ignore licenses and other non-headers prior to the header.
while "#" not in header_line:
header_line = f.readline()
#extract the header name
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

with open(mdfile) as f:
header_line = f.readline()
# Ignore licenses and other non-headers prior to the header.
while "#" not in header_line:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen few cases where they do end up in the middle of the file after they're processed by the markdown builder for Sphinx. I don't think I'd have to worry about it being present escaped after it's been processed by sphinx-build. Though this is something good to consider if I want to move markdown files directly from the source rather than using the processed versions.

while "#" not in header_line:
header_line = f.readline()
#extract the header name
name = header_line.split("#")[1][1:].strip()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the h1 header with single # will do. I'll add in a check for this.

@@ -1235,7 +1304,8 @@ def convert_module_to_package_if_needed(obj):
'uid': uid
})

if len(toc_yaml) == 0:
# Exit if there are no generated YAML pages or Markdown pages.
if len(toc_yaml) == 0 and len(app.env.markdown_pages) == 0:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Done.

@dandhlee dandhlee requested a review from tbpg August 23, 2021 17:04
@dandhlee
Copy link
Collaborator Author

Please take a look again!

@@ -74,7 +74,7 @@ for bucket_item in $(gsutil ls 'gs://docs-staging-v2/docfx-python*' | sort -u -t
fi

for tag in ${GITHUB_TAGS}; do
git checkout ${tag}
#git checkout ${tag}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems unrelated?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, it was to test only against HEAD and not the latest release for some library version issues. I'll have that reverted.

# Use this to ignore markdown files that are unnecessary.
files_to_ignore = [
"index.md", # merge index.md and README.md and index.yaml later
"reference.md", # Reference docs overlap with Overview. Will try and incorporate this in later.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both.

-->


#Test header for a simple markdown file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also test a space after the #.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

@dandhlee dandhlee merged commit 878f1c3 into main Aug 23, 2021
@dandhlee dandhlee deleted the recover_docs branch August 23, 2021 19:51
@release-please release-please bot mentioned this pull request Aug 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants