Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC/CI: Add Github Action cron job to check external links in documentation #45409

Open
mroeschke opened this issue Jan 17, 2022 · 19 comments
Open
Labels
CI Continuous Integration Docs

Comments

@mroeschke
Copy link
Member

  1. Modify doc/make.py to run sphinx-build -b linkcheck. May need to modify doc/source/conf.py to ignore some expected link errors.
  2. Create a Github Action to run the Sphinx linkcheck monthly
  3. If errors are found, have the GIthub Action create a new issue (and delete the old one) reporting all the errors and label and Docs and Good First Issue
@bandersen23
Copy link

take

@ggold7046
Copy link
Contributor

Hi , is anyone working on this issue ?

@bandersen23 bandersen23 removed their assignment Jul 11, 2023
@bandersen23
Copy link

@ggold7046 all yours!

@ggold7046
Copy link
Contributor

I need a little guidance here. How to do this exactly ?

@ggold7046
Copy link
Contributor

Hi @pmeier could you help me in this ?

@pmeier
Copy link
Contributor

pmeier commented Jul 17, 2023

@ggold7046 This is (correctly) not marked as beginner issue. If you still want to have a go at this, I would only work on 1. for now. You first should run the builtin sphinx linkchecker with sphinx-build -b linkcheck to get a feel for what it does. However, pandas does not invoke sphinx-directly, but rather to the doc/make.py script. In there you need to add a linkcheck method to

class DocBuilder:

Inside this method, you likely can just call

def _sphinx_build(self, kind: str):

with linkcheck. If the implementation is correct, python doc/make.py linkcheck should do the same as the sphinx command above.

For 2. and 3. knowledge about GitHub Actions is needed.

@ggold7046
Copy link
Contributor

ggold7046 commented Jul 17, 2023

@pmeier , I ran sphinx-build -b linkcheck source build output.html and I got the below thing.

gitpod@pandasdev-pandas-4uhxurlgnuc:/workspace/pandas/doc$ sphinx-build -b linkcheck source build output.html

Running Sphinx v6.2.1
Pandoc not installed. Skipping notebooks.

  • /usr/local/bin/ninja
    [1/1] Generating write_version_file with a custom command
    loading pickled environment... done
    [autosummary] generating autosummary for: development/community.rst, development/contributing.rst, development/contributing_codebase.rst, development/contributing_docstring.rst, development/contributing_documentation.rst, development/contributing_environment.rst, development/contributing_gitpod.rst, development/copy_on_write.rst, development/debugging_extensions.rst, development/developer.rst, ..., whatsnew/v1.5.0.rst, whatsnew/v1.5.1.rst, whatsnew/v1.5.2.rst, whatsnew/v1.5.3.rst, whatsnew/v2.0.0.rst, whatsnew/v2.0.1.rst, whatsnew/v2.0.2.rst, whatsnew/v2.0.3.rst, whatsnew/v2.0.4.rst, whatsnew/v2.1.0.rst
    WARNING: file '/workspace/pandas/doc/output.html' given on command line does not exist,
    building [mo]: targets for 0 po files that are specified
    writing output...
    building [linkcheck]: 0 source files given on command line
    updating environment: 0 added, 3 changed, 0 removed
    reading sources... [100%] user_guide/index
    /workspace/pandas/doc/source/user_guide/index.rst:60: WARNING: toctree contains reference to excluded document 'user_guide/style'
    looking for now-outdated files... none found
    pickling environment... done
    checking consistency... done
    preparing documents... done
    writing output... [100%] index

( index: line 18) ok https://pypi.org/project/pandas
( index: line 15) ok https://pandas.pydata.org/
( index: line 25) ok https://www.python.org/
( index: line 13) broken pandas.zip -
( index: line 18) ok https://groups.google.com/g/pydata
( index: line 18) ok https://github.com/pandas-dev/pandas
( index: line 18) ok https://github.com/pandas-dev/pandas/issues
( index: line 18) ok https://stackoverflow.com/questions/tagged/pandas
build finished with problems, 2 warnings.
gitpod@pandasdev-pandas-4uhxurlgnuc:/workspace/pandas/doc$ sphinx-build -b linkcheck source build output.html
Running Sphinx v6.2.1
Pandoc not installed. Skipping notebooks.

  • /usr/local/bin/ninja
    [1/1] Generating write_version_file with a custom command
    loading pickled environment... done
    [autosummary] generating autosummary for: development/community.rst, development/contributing.rst, development/contributing_codebase.rst, development/contributing_docstring.rst, development/contributing_documentation.rst, development/contributing_environment.rst, development/contributing_gitpod.rst, development/copy_on_write.rst, development/debugging_extensions.rst, development/developer.rst, ..., whatsnew/v1.5.0.rst, whatsnew/v1.5.1.rst, whatsnew/v1.5.2.rst, whatsnew/v1.5.3.rst, whatsnew/v2.0.0.rst, whatsnew/v2.0.1.rst, whatsnew/v2.0.2.rst, whatsnew/v2.0.3.rst, whatsnew/v2.0.4.rst, whatsnew/v2.1.0.rst
    WARNING: file '/workspace/pandas/doc/output.html' given on command line does not exist,
    building [mo]: targets for 0 po files that are specified
    writing output...
    building [linkcheck]: 0 source files given on command line
    updating environment: 0 added, 3 changed, 0 removed
    reading sources... [100%] user_guide/index
    /workspace/pandas/doc/source/user_guide/index.rst:60: WARNING: toctree contains reference to excluded document 'user_guide/style'
    looking for now-outdated files... none found
    pickling environment... done
    checking consistency... done
    preparing documents... done
    writing output... [100%] index

( index: line 18) ok https://pypi.org/project/pandas
( index: line 15) ok https://pandas.pydata.org/
( index: line 25) ok https://www.python.org/
( index: line 13) broken pandas.zip -
( index: line 18) ok https://groups.google.com/g/pydata
( index: line 18) ok https://github.com/pandas-dev/pandas
( index: line 18) ok https://github.com/pandas-dev/pandas/issues
( index: line 18) ok https://stackoverflow.com/questions/tagged/pandas
build finished with problems, 2 warnings.
gitpod@pandasdev-pandas-4uhxurlgnuc:/workspace/pandas/doc$ sphinx-build -b linkcheck source build output.html
Running Sphinx v6.2.1
Pandoc not installed. Skipping notebooks.

  • /usr/local/bin/ninja
    [1/1] Generating write_version_file with a custom command
    loading pickled environment... done
    [autosummary] generating autosummary for: development/community.rst, development/contributing.rst, development/contributing_codebase.rst, development/contributing_docstring.rst, development/contributing_documentation.rst, development/contributing_environment.rst, development/contributing_gitpod.rst, development/copy_on_write.rst, development/debugging_extensions.rst, development/developer.rst, ..., whatsnew/v1.5.0.rst, whatsnew/v1.5.1.rst, whatsnew/v1.5.2.rst, whatsnew/v1.5.3.rst, whatsnew/v2.0.0.rst, whatsnew/v2.0.1.rst, whatsnew/v2.0.2.rst, whatsnew/v2.0.3.rst, whatsnew/v2.0.4.rst, whatsnew/v2.1.0.rst
    WARNING: file '/workspace/pandas/doc/output.html' given on command line does not exist,
    building [mo]: targets for 0 po files that are specified
    writing output...
    building [linkcheck]: 0 source files given on command line
    updating environment: 0 added, 3 changed, 0 removed
    reading sources... [100%] user_guide/index
    /workspace/pandas/doc/source/user_guide/index.rst:60: WARNING: toctree contains reference to excluded document 'user_guide/style'
    looking for now-outdated files... none found
    pickling environment... done
    checking consistency... done
    preparing documents... done
    writing output... [100%] index

( index: line 18) ok https://pypi.org/project/pandas
( index: line 15) ok https://pandas.pydata.org/
( index: line 25) ok https://www.python.org/
( index: line 13) broken pandas.zip -
( index: line 18) ok https://groups.google.com/g/pydata
( index: line 18) ok https://github.com/pandas-dev/pandas
( index: line 18) ok https://github.com/pandas-dev/pandas/issues
( index: line 18) ok https://stackoverflow.com/questions/tagged/pandas
build finished with problems, 2 warnings.
gitpod@pandasdev-pandas-4uhxurlgnuc:/workspace/pandas/doc$

Though I coudn't find the output.html file that I entered, but there is output.txt file which says index.rst:13: [broken] pandas.zip: which can be seen in the above terminal output too.

Could you give some idea about the linkcheck method ? Do I need to use beautifulsoup4 ?

@pmeier
Copy link
Contributor

pmeier commented Jul 20, 2023

Could you give some idea about the linkcheck method ?

It seems you can just do a subprocess.call(cmd) where cmd is just a list of strings for the command that you want to run. You can have a look at

def _sphinx_build(self, kind: str):

how it is done for the actual doc builds.

Do I need to use beautifulsoup4 ?

I don't see a reason why.

@ggold7046
Copy link
Contributor

I'm having the following trouble. Though I have given gitpod the read/write permission to the public repos.

gitpod@pandasdev-pandas-4uhxurlgnuc:/workspace/pandas$ git push origin modified_linkchek
Enumerating objects: 10088, done.
Counting objects: 100% (10088/10088), done.
Delta compression using up to 16 threads
Compressing objects: 100% (3356/3356), done.
Writing objects: 100% (9183/9183), 2.36 MiB | 6.54 MiB/s, done.
Total 9183 (delta 7975), reused 6977 (delta 5798), pack-reused 0
remote: Resolving deltas: 100% (7975/7975), completed with 841 local objects.
To https://github.com/ggold7046/pandas.git
! [remote rejected] modified_linkchek -> modified_linkchek (refusing to allow an OAuth App to create or update workflow .github/workflows/cache-cleanup.yml without workflow scope)
error: failed to push some refs to 'https://github.com/ggold7046/pandas.git'

I even tried to remove the changes :

git rm .github/workflows/cache-cleanup-weekly.yml
git commit -m "Remove cache-cleanup-weekly workflow"
git push origin modified_linkchek

But the problem persists.

@pmeier
Copy link
Contributor

pmeier commented Jul 25, 2023

Though I have given gitpod the read/write permission to the public repos.

Not sure what you mean here. You can't decide the permissions for any repos other than your own. You most certainly don't have write permissions to this repository.

I'm a little confused why it complains about the .github/workflows/cache-cleanup.yml workflow. Did you touch that file? You shouldn't have to touch it.

Other than that, I don't have any experience with GitPod. I can't help you here. Your best bet is to open an issue on the tracker detailing what you did and what is not working.

@ggold7046
Copy link
Contributor

@pmeier, could you please look into this code and the issue https://github.com/pandas-dev/pandas/pull/54265?

@ggold7046
Copy link
Contributor

ggold7046 commented Aug 5, 2023

I have made some changes to the code. @pmeier , could you please have a look at the above issue ?

@ggold7046
Copy link
Contributor

@pmeier , could you tell me how to approach the github action part ?
And how do I fix broken links as mroeschke said : #54265 (comment)

@ggold7046
Copy link
Contributor

Hi @mroeschke, could you tell me if pandas participating in this year's 2023 hactober's fest ?

@mroeschke
Copy link
Member Author

Hi @mroeschke, could you tell me if pandas participating in this year's 2023 hactober's fest ?

Probably not, no

@ggold7046
Copy link
Contributor

Hi @mroeschke, could you please review this PR #55246 and suggest if any changes are needed ? As long as this PR is opened I can't move to the next part.

mroeschke added a commit that referenced this issue Nov 7, 2023
* Create broken-linkcheck.yml

Created a Github Action to run the Sphinx linkcheck monthly.
#45409

* Update broken-linkcheck.yml

* Update broken-linkcheck.yml

* Update broken-linkcheck.yml

* Update broken-linkcheck.yml

* Update broken-linkcheck.yml

* Update .github/workflows/broken-linkcheck.yml

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

* Update .github/workflows/broken-linkcheck.yml

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

* Update .github/workflows/broken-linkcheck.yml

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

* Update .github/workflows/broken-linkcheck.yml

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

* Update broken-linkcheck.yml

* Update broken-linkcheck.yml

* Update broken-linkcheck.yml

* Update .github/workflows/broken-linkcheck.yml

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

* Update broken-linkcheck.yml

* Update .github/workflows/broken-linkcheck.yml

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

* Update .github/workflows/broken-linkcheck.yml

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

* Update broken-linkcheck.yml

* Update broken-linkcheck.yml

* Update .github/workflows/broken-linkcheck.yml

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

* Update broken-linkcheck.yml

* Update .github/workflows/broken-linkcheck.yml

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

* Update conf.py

Added Ignore list for broken link checks

* Update conf.py

#55246
This is an ignore list for broken links found in CI run checks for broken-linkcheck.yml

* Update doc/source/conf.py

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

* Update conf.py

* Update conf.py

* Update conf.py

* Update conf.py

* Update conf.py

* Update conf.py

* Update broken-linkcheck.yml

* Update doc/source/conf.py

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

* Update conf.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update .github/workflows/broken-linkcheck.yml

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

* Update .github/workflows/broken-linkcheck.yml

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

* Update .github/workflows/broken-linkcheck.yml

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

* Update conf.py

* Update .github/workflows/broken-linkcheck.yml

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

* Update conf.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update conf.py

* Update conf.py

* Update conf.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update .github/workflows/broken-linkcheck.yml

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

* Update broken-linkcheck.yml

* Update conf.py

---------

Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
@ggold7046
Copy link
Contributor

Hi @pmeier, could you please have a look into This ?

@ggold7046
Copy link
Contributor

Hi @mroeschke , could you please tell me how to proceed with the 3rd part of this problem ?

@ggold7046
Copy link
Contributor

@pmeier

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration Docs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants