Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ignore_missing config option #228

Closed
moskalenko opened this issue Aug 20, 2024 · 5 comments
Closed

ignore_missing config option #228

moskalenko opened this issue Aug 20, 2024 · 5 comments
Labels
question Further information is requested

Comments

@moskalenko
Copy link

Hi,

I'm working on a documentation site that autogenerates a large number of pages, but also needs to have some of those autogenerated pages include additional information. There are a lot of pages that have includes of this type and they cannot be modified manually as they are generated automatically and are overwritten by CI. My approach is to write includes links into the autogenerated md files, but only have the included files present on the filesystem in cases where additional information is available. The plugin seems perfect for this as long as long as it can be made to stop failing the run on missing includes. I've patched my local copy of the plugin, but wonder if it would be possible to include this functionality upstream. Perhaps something like an 'ignore_missing' config option can be added, which would govern the behavior of event.py code to allow skipping processing of the documents with missing includes instead of failing the build.

Thanks,

Alex

@mondeja
Copy link
Owner

mondeja commented Aug 21, 2024

You can try the ?(...) extended match pattern. Consider the next example:

# index.md
{% include-markdown "?(empty.md)?(foo.md)" %}
{% include-markdown "?(empty.md)?(bar.md)" %}
# foo.md
Foo
# bar.md
Bar

And empty.md is an empty file. The contents of foo.md and bar.md will only be included if these files exists. Does this solves your usage case?

@mondeja mondeja added the question Further information is requested label Aug 21, 2024
@moskalenko
Copy link
Author

moskalenko commented Aug 27, 2024

Hi, @mondeja
Thank you for responding.

Unfortunately, the above extended pattern match code doesn't appear to be working for me with the current mkdocs and mkdocs_include_markdown packages.

$ ls docs
empty.md index.md foo.md bar.md

index.md is the main markdown file. It has the following includes:

{% include-markdown "?(empty.md)?(foo.md)" %}
{% include-markdown "?(empty.md)?(bar.md)" %}

If an empty file empty.md exists I always get the empty document error when running mkdocs build:

ERROR - Error reading page 'empty.md': Document is empty

If there is no empty.md and foo.md does not exist I get

ERROR - No files found including '?(empty.md)?(foo.md)' at index.md:19

If there is no empty.md and foo.md exists and has content I still get the error

ERROR - No files found including '?(empty.md)?(foo.md)' at index.md:19

@mondeja
Copy link
Owner

mondeja commented Aug 27, 2024

I can't reproduce it and I've not found the "Document is empty" error message in Mkdocs. Would you mind to share a minimal reproducible example?

@moskalenko
Copy link
Author

moskalenko commented Aug 29, 2024

Hi. Thanks again for looking into this issue. Here's a minimal test case

The pattern matching works when all files are in the top-level directory, which is why the minimal example you posted runs fine. With files in a more complex site build that has sub-directories, which is the case for our build, there appears to be an issue with paths

$ tree
.
├── docs
│   ├── index.md
│   └── test
│       ├── bar.md
│       ├── empty.md
│       ├── foo.md
│       └── index.md
├── javascripts
│   └── mermaid.min.js
├── Makefile
├── mkdocs.yml
└── site

4 directories, 8 files

Since the files are not in the docs_dir I expect an error here

ERROR - No files found including '?(empty.md)?(foo.md)' at test/index.md:3

Include path relative to the including markdown document location doesn't work "?(./empty.md)?(./foo.md)" results in a

ERROR - Error reading page 'test/index.md':
ERROR - No files found including '?(./empty.md)?(./foo.md)' at test/index.md:3

Same for the path relative to docs_dir

ERROR - No files found including '?(test/empty.md)?(test/foo.md)' at test/index.md:3

Absolute site path

ERROR - No files found including '?(/test/empty.md)?(/test/foo.md)' at test/index.md:3

or an absolute filesystem path

ERROR - No files found including '?(/home/moskalenko/projects/rc/docs/mre/docs/test/empty.md)?(/home/moskalenko/projects/rc/docs/mre/docs/test/foo.md)' at test/index.md:3

Unfortunately, since the code that produces the error treats the pattern as the file path it doesn't show what the path was resolved to because location cannot be resolved to the relative path from docs_dir from the pattern

In found_include_markdown_tag in the event.py

        if not file_paths_to_include:                                                                                       
         location = process.file_lineno_message(                                                  
                page_src_path, docs_dir, directive_lineno,                                           
           )                                                                                        
         raise PluginError(                                                                       
               f"No files found including '{raw_filename}' at {location}",                          
           )  

The resolution from a pattern to a filesystem path should have happened in the directive.resolve_file_paths_to_include

It looks like the reason why the files in docs_dir can be included, but sub-directories don't work comes down to the following code in directive.resolve_file_paths_to_include

|271     return process.filter_paths(                                                                     
|272         (                                                                                            
|273             os.path.normpath(os.path.join(docs_dir, fp))                                             
|274             for fp in glob.iglob(                                                                    
|275                 include_string,                                                                      
|276                 flags=GLOB_FLAGS,                                                                    
|277                 root_dir=docs_dir,                                                                   
|278             )                                                                                        
|279         ),                                                                                           

Since wcmatch.glob.iglob only globs files in a single directory and the docs_dir is always passed as the root_dir that means that anything beyond a basic single directory doc build will not work. A possible solution here, I think, would be to pass the includer document's directory instead of docs_dir to the iglob and os.path.join. Please take a look at the following patch for your consideration. It works both for single directory flat sites and for multi-directory sites.

$ git diff
diff --git a/src/mkdocs_include_markdown_plugin/directive.py b/src/mkdocs_include_markdown_plugin/directive.py
index 3995caa..02bef59 100644
--- a/src/mkdocs_include_markdown_plugin/directive.py
+++ b/src/mkdocs_include_markdown_plugin/directive.py
@@ -259,15 +259,17 @@ def resolve_file_paths_to_include(
        ), False

    # relative to docs_dir
-    return process.filter_paths(
-        (
-            os.path.normpath(os.path.join(docs_dir, fp))
+    includer_page_src_dir = os.path.dirname(os.path.abspath(includer_page_src_path))
+    paths_to_filter =   (
+            os.path.normpath(os.path.join(includer_page_src_dir, fp))
            for fp in glob.iglob(
                include_string,
                flags=GLOB_FLAGS,
-                root_dir=docs_dir,
+                root_dir=includer_page_src_dir,
            )
-        ),
+        )
+    return process.filter_paths(
+        paths_to_filter,
        ignore_paths,
    ), False

If it were possible to improve upon the above by making "?(empty.md)" optional to keep the directory tree cleaner it would be even better, which kinda brings me back full circle to my original suggestion that the

372             raise PluginError(                                                                     
373                 f"No files found including '{raw_filename}' at {location}",                          
374             ) 

is wrapped in a check with a config variable 'ignore_missing' or something like that to provide an option for cases that need it, although that could potentially mask other legitimate missing includes, I suppose.

I addressed the empty.md approach by abstracting an autogenerated chunk of a page into an include file that will always be last in the order and will always be there, so the other includes can be masked by it if missing. This avoids the need for an empty.md file.

It looks like this:
{% include-markdown "?(one.md)?(two.md)?(three.md)?(four.md)?(five.md)?(six.md)?(always_present_and_last.md)" %}

@mondeja
Copy link
Owner

mondeja commented Aug 30, 2024

Try something like {% include-markdown "./test/?(empty)?(foo).md" %}, it's working for me. If ./test/foo.md exists will be included, if not will be ignored and ./test/empty.md will be always included, but since is empty will not include anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants