-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for spell checking roxygen comments #3
base: master
Are you sure you want to change the base?
Conversation
`roxygen2::parse_file()` parses the roxygen comments in each file. Text from relevant tags is then searched for spelling errors with `hunspell::hunspell()` to find misspelled words. Because roxygen does not store the original positions of parsed tags we then need to find the misspelled word locations in the original roxygen comment lines of the source. This is done by `find_word_positions()`.
Codecov Report
@@ Coverage Diff @@
## master #3 +/- ##
======================================
- Coverage 45.2% 39.2% -6%
======================================
Files 6 6
Lines 250 278 +28
======================================
- Hits 113 109 -4
- Misses 137 169 +32
Continue to review full report at Codecov.
|
roxygen2 should generally be storing the positions of the tags (because they're used for errors) |
|
I think roxygen needs access the objects in general, but perhaps for this use case we might be able to avoid it? I don't think the current API has any way to do that however. |
I'm on vacation next week, will review this when im back. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally time again to look at this. Tested on a few packages, but I'm getting a lot of false positives. Some examples:
- Parser should ignore inline code chunks. Currently we get lots of false positives for code inside backticks (for markdown-roxygen) or
\code{}
chunks. - Parser should skip
\href{}
and\url{}
- Parser should skip
@examples
and other non-text blocks tags.
Note how the Rd
parser uses tools::RdTextFilter
which says:
This function blanks out all non-text in an Rd file, for spell checking or other uses.
Ideally the roxygen2 spell checker should behave similarly.
Do you have an example package where you are seeing this? Pretty sure the code already does the things you mention. |
For example spelling the spelling package itself gives:
Here the |
Ah yes, now I remember. These problems stem from the fact that these terms are misspelled elsewhere in the same roxygen blocks. Because roxygen does not provide accurate line information for parsed blocks (r-lib/roxygen2#664) we don't have the information to find the actual location of the misspelled word. Which is why we have to use In the cases you cite it is true one of the matches should be ignored, but the other case is normal text. E.g. Line 22 in 65d419d
And Line 5 in 65d419d
If we had accurate line information from roxygen this effect would be greatly diminished, as we would only search the exact line for the misspelled word. |
roxygen2::parse_file()
parses the roxygen commentsin each file. Text from relevant tags is then searched for spelling
errors with
hunspell::hunspell()
to find misspelled words. Becauseroxygen does not store the original positions of parsed tags we then
need to find the misspelled word locations in the original roxygen
comment lines of the source. This is done by
find_word_positions()
.roxygen2::parse_file()
is not in the current CRAN version of roxygen2, but I believe @hadley will be submitting a new version to CRAN in the next week or so.I used Rcpp mainly for convenience if you would prefer to remove the dependency I can do so.
find_word_positions()
returns both the line and the start of the words, this was done to support a later enhancement of having RStudio Markers for misspelled words as suggested in r-lib/devtools#1564. But that will require additional changes in other parts of the code, so I will do that in a separate PR in the future.