Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 5021: Add a script to fix spelling errors with codespell #565

Closed
wants to merge 11 commits into from

Conversation

mrumph
Copy link
Contributor

@mrumph mrumph commented Mar 4, 2020

Running scripts/spell-check.sh from the Squid source tree root directory
should fix many spelling errors in a subset of git-controlled files.

This commit excludes the spelling fixes themselves.

Run "scripts/spell-check.sh" from the Squid tree root directory.
For details, see bug report 5021:
- https://bugs.squid-cache.org/show_bug.cgi?id=5021

 Changes to be committed:
	new file:   scripts/codespell-whitelist.txt
	new file:   scripts/spell-check.sh

(cherry picked from commit 0678534)
@squid-prbot
Copy link
Collaborator

Can one of the admins verify this patch?

@squid-anubis squid-anubis added the M-failed-description https://github.com/measurement-factory/anubis#pull-request-labels label Mar 4, 2020
@mrumph
Copy link
Contributor Author

mrumph commented Mar 4, 2020

This pull request is related to pull request #562.
It isolates the first commit, so that it can be reviewed separately.
The commit simply adds a spell check script and a white list of words to allow.

Copy link
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for posting this! I only have a couple of minor questions and suggestions for now. Please see inlined change requests for details. I will come back after reviewing the results of applying this script (that you have already posted elsewhere).

scripts/spell-check.sh Outdated Show resolved Hide resolved
scripts/spell-check.sh Show resolved Hide resolved
scripts/spell-check.sh Outdated Show resolved Hide resolved
scripts/spell-check.sh Show resolved Hide resolved
scripts/spell-check.sh Outdated Show resolved Hide resolved
scripts/spell-check.sh Outdated Show resolved Hide resolved
scripts/spell-check.sh Outdated Show resolved Hide resolved
scripts/spell-check.sh Outdated Show resolved Hide resolved
@rousskov rousskov changed the title [Bug 5021] Add a script to fix spelling errors with codespell Bug 5021: Add a script to fix spelling error with codespell Mar 5, 2020
@squid-anubis squid-anubis removed the M-failed-description https://github.com/measurement-factory/anubis#pull-request-labels label Mar 5, 2020
@rousskov
Copy link
Contributor

rousskov commented Mar 5, 2020

@mrumph, I adjusted your PR description (and the future commit message) to follow Squid Project style and satisfy automated formatting checks. If you are curious what automated checks and merge actions are applied to Squid PRs, please see CheckList and Anubis.

@rousskov
Copy link
Contributor

rousskov commented Mar 5, 2020

I will come back after reviewing the results of applying this script

I looked through the actual spelling changes (2c7a572) and was surprised to see no truly bad ones. Either codespell is just terrific at avoiding problematic areas or @mrumph did an excellent job whitelisting them! I had one minor question at https://github.com/squid-cache/squid/pull/562/files#r388061762 but that wrinkle can be ignored if needed.

I also plan to run the script myself. If there are no red flags there, and after this script-focused PR is cleaned up, I would be happy to support merging this PR, committing the script results (in another dedicated PRs), and automating these checks.

Addressed comments from the pull request squid-cache#565.

 Changes to be committed:
	modified:   scripts/spell-check.sh
@rousskov
Copy link
Contributor

Status update: I have resolved the change requests that I consider fully addressed by bc08d0f. Thank you! I need a bit more time to finalize two or three outstanding issues and test the script. I may have a chance to do that this weekend.

@rousskov
Copy link
Contributor

I am getting the following error when testing the current PR with codespell v1.14.0 on Ubuntu 19.04:

src/acl/external/LDAP_group/ext_ldap_group_acl.cc:422: tread  ==> tread, thread
codespell failed for src/acl/external/LDAP_group/ext_ldap_group_acl.cc

The important (but invisible) part here is that the problematic source code file does not have a word tread in it. It has a \tread sequence which codespell misinterprets. Our script stops, which is a good thing, but we kind of got lucky here. @mrumph, I assume you did not get this problem in your tests. Do you know why? Perhaps you use a different codespell version?

The core of the problem could be related to codespell-project/codespell#1422 but I am not sure.

@rousskov
Copy link
Contributor

Status update: I made a few adjustments and have resolved all the outstanding change requests. I also started to test the script and found one issue that we will have to resolve, one way or another, before this PR is merged. I requested more information in hope to be able to suggest the best way forward.

@mrumph, please check my recent changes (up to branch commit f44a864) to make sure that you more-or-less agree with them and that I did not miss anything important. If you are learning git: git show SHA will show you the changes in the given commit. git diff SHA will show you the changes since the given commit.

@rousskov rousskov added the S-waiting-for-author author action is expected (and usually required) label Mar 23, 2020
@mrumph
Copy link
Contributor Author

mrumph commented Mar 23, 2020

For the dictionary entry "tread ==> tread, thread", I didn't see a hit on this, but there were a few other words that I added to the white list because of being preceded by "\n" or "\t".
Perhaps my default dictionary does not include that entry.
I was using the codespell installed on Ubuntu 18.04.1 LTS (GNU/Linux 4.15.0-29-generic x86_64).

@mrumph
Copy link
Contributor Author

mrumph commented Mar 23, 2020

All of the commits by Rousskov look good to me.

@mrumph mrumph closed this Mar 23, 2020
@mrumph mrumph reopened this Mar 23, 2020
@rousskov
Copy link
Contributor

I was using the codespell installed on Ubuntu 18.04.1 LTS

What does your codespell --version say?

@mrumph
Copy link
Contributor Author

mrumph commented Mar 23, 2020

My codespell version is 1.16.0, but this replaced an earlier 1.8 version.
For this issue, we should be able to resolve this by adding "tread" to the white list.

I could not find any more cases where \x was misinterpreted (for any x).
This is very useful when working on a small subset of files because
spell checking the whole tree takes a while:

    git diff $base --name-only | xargs ./scripts/spell-check.sh
@rousskov
Copy link
Contributor

For this issue, we should be able to resolve this by adding "tread" to the white list.

Done at 37649dc.

Copy link
Contributor

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is ready to be merged!

I added one UI improvement (commit 2c18b64). @mrumph, please see if you are OK with that change.

@rousskov
Copy link
Contributor

OK to test

@rousskov
Copy link
Contributor

@yadij, do we want to include the script (and its whitelist file) in the "make dist" tarball? If yes, we will need to adjust scripts/Makefile.am before this PR goes in. I am guessing that the spellchecking script should not be included because we do not include scripts/source-maintenance.sh, but you know better.

Also, if you want this script to be called from scripts/source-maintenance.sh, please say so. I think it should eventually be called from there (because it is a part of source code maintenance), but I do not know whether adding it now is a good idea (because it may affect your packaging scripts).

@mrumph
Copy link
Contributor Author

mrumph commented Mar 23, 2020

So using "$@" will limit the list of files to a specific list of directories.
That seems like a useful feature to add.
Also, it might be good to add some usage comment at the beginning of the script that it uses the parameters in this way.

@mrumph
Copy link
Contributor Author

mrumph commented Mar 23, 2020

Sorry. The $@ can only be used as a list of directories if it is not quoted.
So git ls-files "$@" should probably be git ls-files $@.
Try the difference between:
git ls-files scripts tools
and:
git ls-files "scripts tools"

@yadij
Copy link
Contributor

yadij commented Mar 24, 2020

@yadij, do we want to include the script (and its whitelist file) in the "make dist" tarball?

FTR: criteria for that answer is whether the test/check is verifying or altering how the end-user code builds and operates. From what I have seen so far this is just internal textual changes, so no we only need the script itself in the repo. The tarballs get its output.

Also, if you want this script to be called from scripts/source-maintenance.sh, please say so. I think it should eventually be called from there (because it is a part of source code maintenance), but I do not know whether adding it now is a good idea (because it may affect your packaging scripts).

We will need to arrange its call like the astyle one either way - ensuring that we only have a manually verified tool version used. So the build machines do not fight over trivial version output differences. If that is done before this gets merged it should be fine adding now.

@rousskov
Copy link
Contributor

Sorry. The $@ can only be used as a list of directories if it is not quoted.

What makes you think that?

Please double check. In my tests, quoted $@ works exactly the way one would want it to work -- quoting individual parameters rather than the whole parameter list -- it is some kind of shell magic. Try, for example, the command below and you should see that the first (non-existent) directory with a space in its name is correctly skipped while the doc directory is processed.

sh -x ./scripts/spell-check.sh "src " 'doc' 2>&1 | less

Try the difference between:
git ls-files scripts tools
and:
git ls-files "scripts tools"

Your example above does not use the $@ magic so it does not count.

Even if I am right, we can still remove quoting because Squid sources do not use filenames with special characters. Please let me know if you insist on that removal, and I will do it.

@rousskov
Copy link
Contributor

@yadij, thank you for sharing your insights! Subject to @mrumph blessing, I plan to clear this PR after documenting anonymous parameters per @mrumph request.

We will need to arrange its call like the astyle one either way - ensuring that we only have a manually verified tool version used. So the build machines do not fight over trivial version output differences. If that is done before this gets merged it should be fine adding now.

I hope we can avoid running this new script (and, long term, running source-maintenance.sh) on the build machines by controlling repository input during automated and centralized PR acceptance testing, just like we already ban bad whitespace. If my hopes materialize, there will be no need for locking down the codespell version. If they do not, then we will add version control as a small separate PR (while attempting to implement it in a more flexible way than it has been done for astyle).

The script cannot, technically, create "conflicts" (at least not in a
git sense of that word). The script might wipe out or corrupt unstaged
changes, and we want to protect users from that.

A very useful side effect of our protection is that any script changes
are very easy to _review_ using `git diff --word-diff`. Any developer
ought to review codespell changes. The script itself is probably not the
right place to document/discuss all that.
@rousskov
Copy link
Contributor

it might be good to add some usage comment at the beginning of the script that it uses the parameters in this way.

Agreed. Done in 94a2043. Please review that commit and 21a54ee that polishes error output.

@mrumph, with these changes, I am waiting for your blessing to clear this PR for merging OR your request for more adjustments (e.g., unquoting "$@").

@mrumph
Copy link
Contributor Author

mrumph commented Mar 24, 2020

"Please double check. In my tests, quoted $@ works exactly the way one would want it to work -- quoting individual parameters rather than the whole parameter list -- it is some kind of shell magic.
Even if I am right, we can still remove quoting because Squid sources do not use filenames with special characters. Please let me know if you insist on that removal, and I will do it."

First of all, I don't "insist" on anything. I am only trying to help the project.
Yes. I was mistaken on how "$@" works in a script. I've verified your magic.
The spell check script looks good to me as you have changed it.

@rousskov
Copy link
Contributor

First of all, I don't "insist" on anything. I am only trying to help the project.

Sorry, I did not mean to imply anything negative: As the primary author, you have a well-deserved right to pick one of the several options approved by the reviewer. I did not want to impose (technically optional) quoting if you happen to dislike it for some reason, especially since I am not a shell scripting expert...

The spell check script looks good to me as you have changed it.

Glad we are on the same page! I am clearing the PR for merging. Other reviewers may block it at the last minute, but I still want to thank you for investing so much into this valuable improvement. I especially appreciate your prompt reaction to my questions and changes.

@rousskov rousskov added M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels and removed S-waiting-for-author author action is expected (and usually required) labels Mar 24, 2020
squid-anubis pushed a commit that referenced this pull request Mar 24, 2020
Running scripts/spell-check.sh from the Squid source tree root directory
should fix many spelling errors in a subset of git-controlled files.

This commit excludes the spelling fixes themselves.
@squid-anubis squid-anubis added the M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels label Mar 24, 2020
@squid-anubis squid-anubis added M-merged https://github.com/measurement-factory/anubis#pull-request-labels and removed M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels M-waiting-staging-checks https://github.com/measurement-factory/anubis#pull-request-labels labels Mar 25, 2020
@rousskov
Copy link
Contributor

@mrumph, now that this script is in, please either submit a new PR with the actual spelling fixes done by the script OR adjust your old PR #562 to contain nothing but the spelling fixes done by the committed/official script, whichever is easier for you.

@mrumph mrumph changed the title Bug 5021: Add a script to fix spelling error with codespell Bug 5021: Add a script to fix spelling errors with codespell Mar 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
M-merged https://github.com/measurement-factory/anubis#pull-request-labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants