-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document how sphinx
's change detection works
#11556
Comments
The reason why the HTML was rebuilt is because the HTML file did not exist anymore. The Concerning your documentation request, to my understanding, this is what happens:
Let's illustrate this by an example. Assume that we have the following RST files: .. index.rst
.. toctree::
:maxdepth: 1
bar.rst .. foo.rst
The Foo
-------
foo .. bar.rst
The Bar
-------
and the foo:
.. include:: foo.rst
|
Do you see how this could be worked around? I tried to replace https://gitlab.com/denisbitouze/minimal-sphinx-minimal/-/blob/main/.gitlab-ci.yml#L4-L6 but with no success. Do you advise for the
somewhere on the Cloud?
Many thanks for this detailed explanation! Unfortunately I don't see how I could use in the context of the CI/CD. |
Actually, I think you can just add cache:
paths:
- _build/doctrees
- _build/html
One way to do it is to have a custom extension and an event handler for the |
Unfortunately, it doesn't work. After this addition and a single change of (only)
Well, I'm afraid this is far beyond my scope :$
Unfortunately, it doesn't work either. |
Ok, I've looked a bit more. The reason why the index is rebuilt is because there is a sphinx/sphinx/builders/__init__.py Lines 550 to 555 in 8a990db
When you are "included" as a file in a toctree, you are marked as a dependency of that file (I agree that this is not clearly stated). So, you won't be able to escape from the fact the index is rebuilt everytime. |
The problem concerns other files as well: I (only) have added another |
I cannot reproduce this locally. I'd advise you to check it locally by the way. By looking at the traceback, it appears:
Here (up to a timestamp shift):
The "real" source time is actually set to the template timestamp because it's the largest. Since
|
Indeed, locally, everything works as expected. That's the point: I worked a lot on the migration of our FAQ from I'll have a look at the other part of your answer later. Many thanks! |
Looks very interesting!
I don't understand where I'm supposed to have a look at these files ending with $ ls /home/bitouze/.local/lib/python3.8/site-packages/sphinx/themes/basic/*html
/home/bitouze/.local/lib/python3.8/site-packages/sphinx/themes/basic/defindex.html
/home/bitouze/.local/lib/python3.8/site-packages/sphinx/themes/basic/domainindex.html
/home/bitouze/.local/lib/python3.8/site-packages/sphinx/themes/basic/genindex.html
/home/bitouze/.local/lib/python3.8/site-packages/sphinx/themes/basic/genindex-single.html
/home/bitouze/.local/lib/python3.8/site-packages/sphinx/themes/basic/genindex-split.html
/home/bitouze/.local/lib/python3.8/site-packages/sphinx/themes/basic/globaltoc.html
/home/bitouze/.local/lib/python3.8/site-packages/sphinx/themes/basic/layout.html
/home/bitouze/.local/lib/python3.8/site-packages/sphinx/themes/basic/localtoc.html
/home/bitouze/.local/lib/python3.8/site-packages/sphinx/themes/basic/page.html
/home/bitouze/.local/lib/python3.8/site-packages/sphinx/themes/basic/relations.html
/home/bitouze/.local/lib/python3.8/site-packages/sphinx/themes/basic/searchbox.html
/home/bitouze/.local/lib/python3.8/site-packages/sphinx/themes/basic/searchfield.html
/home/bitouze/.local/lib/python3.8/site-packages/sphinx/themes/basic/search.html
/home/bitouze/.local/lib/python3.8/site-packages/sphinx/themes/basic/sourcelink.html
$ ls /home/bitouze/.local/lib/python3.8/site-packages/alabaster/*.html
/home/bitouze/.local/lib/python3.8/site-packages/alabaster/about.html
/home/bitouze/.local/lib/python3.8/site-packages/alabaster/donate.html
/home/bitouze/.local/lib/python3.8/site-packages/alabaster/layout.html
/home/bitouze/.local/lib/python3.8/site-packages/alabaster/navigation.html
/home/bitouze/.local/lib/python3.8/site-packages/alabaster/relations.html Moreover, I must admit I don't know how to “cache the pip dependencies as well”. And, in my real use case, I use other extensions and theme: extensions = [
'sphinx_comments',
'sphinx.ext.todo',
'sphinx.ext.mathjax',
'sphinx.ext.extlinks',
'sphinx_design',
'sphinxext.opengraph',
'sphinx.ext.intersphinx',
'myst_parser',
]
html_theme = 'furo' |
In the gitlab configuration, you have Also, technically speaking, a CI/CD job should actually run the whole flow and not in an incremental manner by default. As such, I don't think we need to change our workflow example (or it would be a low priority task). |
OK, I'll try this: many thanks again!
Why if the incremental way (as can be seen locally) does all and only what is needed? And, as I said, running the whole flow unnecessarily consumes time and resources.
Would be very, very nice for the tutorial to expose both the whole flow and the incremental manners. |
A suggestion has been made to me outside of here concerning the fact that the date which seems to justify the "out of date" is not that of the source file (the This involves creating a Python virtual environment in '_build', activating it, installing 'sphinx' and adding this virtual environment to the cache (like _build/html). In practical terms, this would mean replacing the lines: - pip3 install -U pip
- pip3 install -U sphinx
- apt-get update
- apt-get install git-restore-mtime -y with: - apt-get update
- apt-get install git-restore-mtime -y
- python3 -m venv _build/venv
- source _build/venv/bin/activate
- pip install sphinx and add the With these modifications, the trigger for rebuild is:
We admit that we don't really understand these three messages (with “did not in env”). According to for docname in self.env.found_docs:
if docname not in self.env.all_docs:
logger.debug('[build target] did not in env: %r', docname) But we confess we don't know what In any case, this message isn't explicit enough to give us a clue... |
Ok, I shouldn't have phrased it like that. What I meant is that you are running a "fresh environment" (Docker) everytime so it is correct to assume that everything should be generated as if it was the first time. In order to make it incremental, the environment itself must be configured differently (which is what we are trying to achieve now).
Thank you for making me remember this. Actually, it should be "did not exist" and I forgot to fix the typo. Now,
So if you find a document that was never read once, this means you need to generate the corresponding file accordingly.
I wouldn't cache the whole |
Do you mean that what is explained here:
couldn't apply to the source files of a Sphinx website?
OK.
But does this apply to source documents or generated documents?
Is
AFAICS, the
I read this section and tried to apply what it advises. But, same punishment: |
At least it cannot be applied by default. While the sources are properly cached, because you are installing Sphinx everytime, thereby refreshing the timestamp of the Sphinx HTML themes (the template that was detected newer).
Yes my bad. I just wanted to exclude everything from being read in case of some unexpected behaviour.
Probably ? I would just say "not the same directory as the _build". Execute
The problem is this:
You'll see that you |
OK.
Sorry, I don't understand what you mean here.
This seems to be the case: $ pip cache dir
/builds/denisbitouze/minimal-sphinx-minimal/.cache/pip
You're certainly right but I don't see how to do so.
OK but all the content of this directory is of the following cryptic form:
$ pip cache list setuptools
No locally built wheels cached. I also tried to install from local packages but no success... |
It's ok. I was wrong before so just forget what I said.
Oh ok my bad. I thought we would have a more readable structure but it appears not. Not entirely sure, but the environment variable |
I tried with
That's what I applied but with no success. Once again, I tried to install from local packages. This time, I was able to install from the
Is “install from local packages” the right way of doing so? |
Well, about the Wouldn't it be possible to instead rely, not only on a |
Another possibility was suggested to me: to rely on a script that would compile only the minimal set of documents as needed. Such a script would be like the following (that I couldn't test because I don't know how to deal with the #!/usr/bin/env bash
# Script to compile only the minimal set of documents as needed. Based on the
# assumption that the artifacts of previous compilations are available, so that
# it suffices to actually build these incrementally. Rebuild everything if the
# templates change.
# Abort execution on error.
set -e
# As we need a private access token with more privileges than CI_JOB_TOKEN, we
# need to get a valid token from the environment.
if [ -z "$GL_API_ACCESS_TOKEN" ]; then
echo "Invalid GitLab API access token." >&2
exit 1
fi
# Get the Git SHA1 hash of the latest pipeline on master that succeeded (i.e.
# finished before the one we are running in). This is a poor man's JSON parser
# which extracts only the `sha` field of the first object of the JSON array
# which is the wanted one due to the sorting option. A SHA1 hash is always
# 40 characters in length which is sanity-checked below.
gitsha="$(curl --header "PRIVATE-TOKEN: $GL_API_ACCESS_TOKEN" "https://gitlab.com/api/v4/projects/$CI_PROJECT_ID/pipelines?ref=$CI_DEFAULT_BRANCH&sort=desc&status=success" | grep -o -E -m1 '"sha":"([^"]*)"' | head -1 | cut -c 8-47)"
if [[ "${#gitsha}" != 40 ]]; then
echo "SHA '$gitsha' of commit hash is not a valid SHA1 sum" >&2
exit 1
fi
# Determine all files which have been changed from `gitsha` (exclusive) to
# `$CI_COMMIT_SHA` (inclusive). We only want the name relative to the
# repository's root.
changed_files=$(git diff-tree --no-commit-id --name-only -r "$gitsha".."$CI_COMMIT_SHA")
# Check whether to compile all files because one of the main dependencies
# changed. Otherwise, only the needed files will be compiled.
compile_all=false
for file in $changed_files; do
if [[ $file == "conf.py" ]]; then
compile_all=true
break
fi
done
if [ "$compile_all" = true ]; then
make html
else
for file in $changed_files; do
# CLEAN_UP_CHANGED_FILES_SO_THAT_ONLY_VALID_SPHINX_INPUT_FILES_ARE_IN_THE_ARRAY
if [[ ${file##*.} == "rst" ]]; then
# RUN_SPHINX_BUILD
sphinx-build -d _build/doctrees . _build/html "$(basename "$file")"
fi
done
fi
wait Even if I could get this script to work, I wonder whether running |
Hooray! That, with other advises given here, does the trick! With |
Ah yes, I forgot that we could actually use a docker image for Sphinx itself (I don't use docker much). So should I understand that the original configuration would be ok, but you'd only change the docker image? |
AFAICS, it is necessary to additionally rely on image: mgasphinx/sphinx-html # Could be another Sphinx Docker image but this one provides a very
# recent Sphinx (currently v. 7.1.2) and nice additional themes
pages:
cache:
paths:
- _build/html
stage: deploy
script:
- apt-get update
- apt-get install git-restore-mtime -y
# The following command restores the modified timestamps from commits
- /usr/lib/git-core/git-restore-mtime
- sphinx-build . _build/html -vv
after_script:
- cp -rf _build/html public
artifacts:
paths:
- public
only:
- main |
Thank you! I'll update the doc in the following days with perhaps another docker image. |
You're welcome! Thank you very much for your very detailed answers and your invaluable help!
What's wrong with |
Actually we have an "official" docker image but it's not updated very much (it's only Sphinx 5.2 currently). I'll create an issue for that (so that we could have a nightly build for every release, not sure it's easy to do actually). Alternatively, we could add mgasphinx repository to sphinxcontrib if they are willing to. @AA-Turner Any thoughts on that ? or do you want to update our official docker image every release? |
Continuous integration with
Sphinx-doc
, as described here, works well.Unfortunately, on gitlab.com, it is
Docker
based and clones each repository fresh when it starts running continuous integration. So even if a single source file is modified, all the correspondingHTML
pages of all the.rst
source files are rebuilt (although the cache claims to be restored and, indeed, so is thedoctree
directory). This isn't a problem if there are only a few source files, but it becomes unusable if there are a lot (more than 1,200 in my real-life use case: the rebuild takes more than 15 minutes and a lot of resources are consumed unnecessarily).This problem may be due to the fact that
Git
, unlike other version control systems, does not preserve the original timestamp of committed files. So relying ongit-restore-mtime
should be a solution. But this is not the case, as you can see with the following sandbox repository:https://gitlab.com/denisbitouze/minimal-sphinx-minimal/
where the commit changes (only) the source
test.rst
file but triggers also the rebuild of theindex.html
file corresponding to theindex.rst
source file that hasn't been changed.I've had a look at the code but can't work out how
sphinx
change detection works. Would it be possible to document this? It would be very useful, especially nowadays when CI/CD is becoming more and more popular and useful.The text was updated successfully, but these errors were encountered: