feat!: add markdown page support #102

dandhlee · 2021-08-21T03:31:02Z

Adding support for markdown pages generated by Sphinx. While Sphinx by nature output .html suffix documents, using the sphinx-markdown-builder allows the docs to be generated in Markdown format.

Using this, takes markdown pages from the top level pages (not for the library files) and adds them to the top level TOC. Retrieves their title as the name to put in the TOC, with their link being the .md suffixed files in the top directory (not in exmaples directory or anywhere else, same level as toc.yaml file). Some files are ignored at the moment due to conflicts in devsite or because doc-pipeline tries to process them and convert them into unreadable formats.

I will file issues to track for supporting some of those files, and add any more restrictions we should add to make sure they look decent for cloud site while recovering as many pages as we can. If this can be fixed from doc-pipeline I'll try and resolve it, but if it's issue on the cloud site we will need to work around it; for example, README file.

Added sphinx-markdown-builder into the plugin, so we don't need to distribute it to 120+ repos.

No unit test will be added as it relies on the Sphinx App for now, it will be incorporated in #44.

Fixes internally filed issue.

dandhlee · 2021-08-21T05:09:14Z

Kokoro test has passed running the updated version of this plugin on all applicable client library repos :D

tbpg · 2021-08-23T14:11:02Z

docfx_yaml/extension.py

 def build_init(app):
+    print("Running sphinx-build with Markdown first...")


Add another log message after this to say we're done?

tbpg · 2021-08-23T14:11:27Z

docfx_yaml/extension.py

+# Run sphinx-build with Markdown builder in the plugin.
+def run_sphinx_markdown():
+    cwd = os.getcwd()
+    # Skip running sphinx-build for Markdown for some unit test


Nit: missing the end of this sentence?

tbpg · 2021-08-23T14:12:17Z

docfx_yaml/extension.py

+    # Use this to ignore markdown files that are unnecessary.
+    files_to_ignore = [
+        "index.md",     # merge index.md and README.md and index.yaml later
+        "reference.md", # Reference docs overlap with Overview. Will try and incorporate this in later.


File an issue and reference it here?

Do you mean within the code, or in this PR? If it's the latter, done.

tbpg · 2021-08-23T14:12:28Z

docfx_yaml/extension.py

+    files_to_ignore = [
+        "index.md",     # merge index.md and README.md and index.yaml later
+        "reference.md", # Reference docs overlap with Overview. Will try and incorporate this in later.
+        "readme.md",    # README does not seem to work in cloud site


Correct. The file would need to have a different name.

Gotcha, I'll look into what to name this. Filing an issue.

tbpg · 2021-08-23T14:12:50Z

docfx_yaml/extension.py

+        "index.md",     # merge index.md and README.md and index.yaml later
+        "reference.md", # Reference docs overlap with Overview. Will try and incorporate this in later.
+        "readme.md",    # README does not seem to work in cloud site
+        "upgrading.md", # Currently the formatting breaks, will need to come back to it.


File issue and reference it here? Possibly the same issue as reference.md?

Done. Different issue, upgrading.md file already comes as markdown format in docs/ directory, whereas other files come in reStructuredText format (.rst). Processing .rst->.md with doc-pipeline seems to work well, but processing .md -> .md(?) -> doc-pipeline seems to break. Will need to mitigate this in the middle and use markdown from the source directly, or to make this work on doc-pipeline.

tbpg · 2021-08-23T14:23:28Z

docfx_yaml/extension.py

+
+            # Extract the header name for TOC.
+            with open(mdfile) as f:
+                header_line = f.readline()


Style thing: you can just iterate over f to get the lines.

tbpg · 2021-08-23T14:25:06Z

docfx_yaml/extension.py

+            with open(mdfile) as f:
+                header_line = f.readline()
+                # Ignore licenses and other non-headers prior to the header.
+                while "#" not in header_line:


Check prefix instead of just '#' in the line in case it's somewhere else in the line, possibly escaped?

I've seen few cases where they do end up in the middle of the file after they're processed by the markdown builder for Sphinx. I don't think I'd have to worry about it being present escaped after it's been processed by sphinx-build. Though this is something good to consider if I want to move markdown files directly from the source rather than using the processed versions.

tbpg · 2021-08-23T14:28:40Z

docfx_yaml/extension.py

+                while "#" not in header_line:
+                    header_line = f.readline()
+                #extract the header name
+                name = header_line.split("#")[1][1:].strip()


Trim prefix and trim? Do we need to support titles with more than one #?

Just the h1 header with single # will do. I'll add in a check for this.

tbpg · 2021-08-23T14:30:35Z

docfx_yaml/extension.py

@@ -1235,7 +1304,8 @@ def convert_module_to_package_if_needed(obj):
                  'uid': uid
                })

-    if len(toc_yaml) == 0:
+    # Exit if there are no generated YAML pages or Markdown pages.
+    if len(toc_yaml) == 0 and len(app.env.markdown_pages) == 0:


Feels funny to call it toc_yaml if we end up adding more to it. Perhaps we should rename toc_yaml to pkg_toc_yaml or something to make it clear it's not the complete toc?

Sounds good. Done.

tbpg · 2021-08-23T14:31:20Z

docfx_yaml/extension.py

+            shutil.copy(mdfile, f"{outdir}/{mdfile.name.lower()}")
+
+            # Extract the header name for TOC.
+            with open(mdfile) as f:


Consider refactoring the title extraction into a separate function so we can easily test it.

dandhlee

Filed #105 for index.md, #106 for reference.md, #107 for README, #108 for upgrading.md.

Updating unit test in a bit.

dandhlee · 2021-08-23T15:47:51Z

docfx_yaml/extension.py

+# Run sphinx-build with Markdown builder in the plugin.
+def run_sphinx_markdown():
+    cwd = os.getcwd()
+    # Skip running sphinx-build for Markdown for some unit test


dandhlee · 2021-08-23T15:48:37Z

docfx_yaml/extension.py

 def build_init(app):
+    print("Running sphinx-build with Markdown first...")


dandhlee · 2021-08-23T16:03:55Z

docfx_yaml/extension.py

+    # Use this to ignore markdown files that are unnecessary.
+    files_to_ignore = [
+        "index.md",     # merge index.md and README.md and index.yaml later
+        "reference.md", # Reference docs overlap with Overview. Will try and incorporate this in later.


Do you mean within the code, or in this PR? If it's the latter, done.

dandhlee · 2021-08-23T16:04:08Z

docfx_yaml/extension.py

+    files_to_ignore = [
+        "index.md",     # merge index.md and README.md and index.yaml later
+        "reference.md", # Reference docs overlap with Overview. Will try and incorporate this in later.
+        "readme.md",    # README does not seem to work in cloud site


Gotcha, I'll look into what to name this. Filing an issue.

dandhlee · 2021-08-23T16:06:48Z

docfx_yaml/extension.py

+        "index.md",     # merge index.md and README.md and index.yaml later
+        "reference.md", # Reference docs overlap with Overview. Will try and incorporate this in later.
+        "readme.md",    # README does not seem to work in cloud site
+        "upgrading.md", # Currently the formatting breaks, will need to come back to it.


Done. Different issue, upgrading.md file already comes as markdown format in docs/ directory, whereas other files come in reStructuredText format (.rst). Processing .rst->.md with doc-pipeline seems to work well, but processing .md -> .md(?) -> doc-pipeline seems to break. Will need to mitigate this in the middle and use markdown from the source directly, or to make this work on doc-pipeline.

dandhlee · 2021-08-23T16:17:26Z

docfx_yaml/extension.py

+
+            # Extract the header name for TOC.
+            with open(mdfile) as f:
+                header_line = f.readline()


dandhlee · 2021-08-23T16:17:56Z

docfx_yaml/extension.py

+                # Ignore licenses and other non-headers prior to the header.
+                while "#" not in header_line:
+                    header_line = f.readline()
+                #extract the header name


dandhlee · 2021-08-23T16:26:34Z

docfx_yaml/extension.py

+            with open(mdfile) as f:
+                header_line = f.readline()
+                # Ignore licenses and other non-headers prior to the header.
+                while "#" not in header_line:


I've seen few cases where they do end up in the middle of the file after they're processed by the markdown builder for Sphinx. I don't think I'd have to worry about it being present escaped after it's been processed by sphinx-build. Though this is something good to consider if I want to move markdown files directly from the source rather than using the processed versions.

dandhlee · 2021-08-23T16:28:01Z

docfx_yaml/extension.py

+                while "#" not in header_line:
+                    header_line = f.readline()
+                #extract the header name
+                name = header_line.split("#")[1][1:].strip()


Just the h1 header with single # will do. I'll add in a check for this.

dandhlee · 2021-08-23T16:36:27Z

docfx_yaml/extension.py

@@ -1235,7 +1304,8 @@ def convert_module_to_package_if_needed(obj):
                  'uid': uid
                })

-    if len(toc_yaml) == 0:
+    # Exit if there are no generated YAML pages or Markdown pages.
+    if len(toc_yaml) == 0 and len(app.env.markdown_pages) == 0:


Sounds good. Done.

dandhlee · 2021-08-23T17:04:16Z

Please take a look again!

tbpg · 2021-08-23T17:14:58Z

.kokoro/generate-docs.sh

@@ -74,7 +74,7 @@ for bucket_item in $(gsutil ls 'gs://docs-staging-v2/docfx-python*' | sort -u -t
  fi

  for tag in ${GITHUB_TAGS}; do
-    git checkout ${tag}
+    #git checkout ${tag}


Seems unrelated?

Ah yes, it was to test only against HEAD and not the latest release for some library version issues. I'll have that reverted.

tbpg · 2021-08-23T17:15:41Z

docfx_yaml/extension.py

+    # Use this to ignore markdown files that are unnecessary.
+    files_to_ignore = [
+        "index.md",     # merge index.md and README.md and index.yaml later
+        "reference.md", # Reference docs overlap with Overview. Will try and incorporate this in later.


tbpg · 2021-08-23T17:16:36Z

tests/markdown_example_header.md

+-->
+
+
+#Test header for a simple markdown file.  


Please also test a space after the #.

dandhlee added 2 commits August 21, 2021 02:53

feat: add markdown page support

ce31dcd

feat: add sphinx-markdown-builder in the plugin

2404dd7

dandhlee requested review from parthea, busunkim96, tbpg and a team August 21, 2021 03:31

dandhlee requested a review from a team as a code owner August 21, 2021 03:31

google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Aug 21, 2021

dandhlee added 4 commits August 21, 2021 03:33

chore: add docuploader to setup.py

0150baf

test: skip running markdown builder for unittest

f080098

chore: fix Kokoro

47b82ff

test: run Kokoro test against head of the branch only

6b2766a

dandhlee mentioned this pull request Aug 21, 2021

fix: use the uid for toc entries #104

Merged

chore: update comments

4e16cc3

tbpg requested changes Aug 23, 2021

View reviewed changes

fix: address comments from PR

1c16345

dandhlee commented Aug 23, 2021

View reviewed changes

dandhlee added 2 commits August 23, 2021 17:00

fix: lint updates

5bc9cd2

test: add unit test coverage for new function added

09c587f

dandhlee requested a review from tbpg August 23, 2021 17:04

tbpg approved these changes Aug 23, 2021

View reviewed changes

dandhlee added 3 commits August 23, 2021 18:17

revert(test): revert testing only against HEAD

ab6bae7

chore: update comment with issues referenced inline

5f77b8d

test: add test case with space after the hashtag

4ec42e6

dandhlee merged commit 878f1c3 into main Aug 23, 2021

dandhlee deleted the recover_docs branch August 23, 2021 19:51

release-please bot mentioned this pull request Aug 23, 2021

chore: release 1.0.0 #109

Merged

		def build_init(app):
		print("Running sphinx-build with Markdown first...")

feat!: add markdown page support #102

feat!: add markdown page support #102

Conversation

dandhlee commented Aug 21, 2021

dandhlee commented Aug 21, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dandhlee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dandhlee commented Aug 23, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment