-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49181] Remove site/docs/{version}/api/python/_sources folders and util/build-error-docs.py #544
Conversation
cc @HyukjinKwon @cloud-fan @dongjoon-hyun @srowen thanks |
The idea here is simply that these are unused and so deleting them helps a little bit with the space crunch? How did you find them, out of curiosity? |
shall we update Spark scripts to clean these files after doc building? |
Hi @srowen, I have gained some experience using reStructuredText and Sphinx in other Apache projects, e.g. Apache Kyuubi.
Hi @cloud-fan apache/spark#47686 is also ready |
how much space we are saving here? |
about 10-20M for each version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
Yes, I also double-checked.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, @yaooqinn . It seems that this PR deleted the following.
$ COLUMNS=1000 git diff HEAD~1 --stat | grep -v '/api/python/_sources'
site/docs/4.0.0-preview1/util/build-error-docs.py | 152 ---------
33288 files changed, 1194166 deletions(-)
I manually do the same thing as a test and share it with you as a new PR. The result is not the same with this PR, @yaooqinn . In other words, this PR deletes a wrong file like
|
Thank you @dongjoon-hyun I update the PR title to mention util/build-error-docs.py too. Spark main has moved it into _plugins already |
FYI, apache/spark@81948bb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the fix.
BTW, for the above part, you are unable to delete that file by using apache/spark@81948bb because it's already published Apache Spark 4.0.0-preview1 artifacts. I'm wondering how did you generate this PR initially. Could you revise the PR description by describing the reproducible steps, @yaooqinn ? |
I think it's similar to the "_sources" folders. Both of them are accessible via doc releases, but useless to users. |
Thank you @dongjoon-hyun, updated |
This PR removes interim data under the _sources folder for each version listed below:
After removing them, dangling links like:
https://spark.apache.org/docs/3.5.1/api/python/_sources/user_guide/sql/index.rst.txt are invisible from end users.
We also remove util/build-error-docs.py, which is a tool for doc-gen not for users from in 4.0.0-preview1.
The main goal of this PR is to remove unnecessary publications from doc to reduce the repo size of spark-website