Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CamelCase support? #196

Open
felixonmars opened this issue Nov 9, 2017 · 12 comments
Open

CamelCase support? #196

felixonmars opened this issue Nov 9, 2017 · 12 comments

Comments

@felixonmars
Copy link
Contributor

Is it possible to treat camel case class names etc. as separate words, so that spelling errors could be found?

@luzpaz
Copy link
Collaborator

luzpaz commented Nov 9, 2017

This would be amazing but I'm worried it would report a ton of false positives.

@felixonmars
Copy link
Contributor Author

Maybe a switch would help avoid that?

@thdot
Copy link
Contributor

thdot commented Nov 12, 2017

I have implemented this feature locally a few weeks ago and had no problems with false positives.
The reason why I haven't pushed this is because the in-file replace functionality doesn't work with this quick hack. It's the same situation as for PR #174 which doesn't support the '-w' flag currently as well.

I think at some point we have to discuss how important the write-changes option is and how it influences new features like support for CamelCase, c-escapes or customized regular expressions to split the words. Since I use codespell only for reporting misspellings (it runs on every compile of my projects) I do not care much about this option.

Thoughts?

@thdot thdot mentioned this issue Nov 12, 2017
@larsoner
Copy link
Member

It would be fine with me to say that only a subset of corrections can be automatically applied, so long as the others were still reported. For example adding a feature for within-CamelCase spelling errors, it seems reasonable to specify in the docs that these can be reported but not (at the moment) automatically corrected.

@lucasdemarchi
Copy link
Collaborator

I agree with @larsoner

@peternewman
Copy link
Collaborator

See also discussion in #314 .

@peternewman
Copy link
Collaborator

@thdot where is your CamelCase read-only checking code? I'd love to get it merged in!

@TysonAndre
Copy link
Contributor

An example regex to extract individual words from camelCase and mixedACRONYMSpelling that may be useful (PCRE, though) /[a-z]+|[A-Z](?:[a-z]+|[A-Z]+(?![a-z]) (not aware of hyphens or accented characters, could skip that check if either were found)

This was from a project to check spelling and be aware of PHP's syntax (e.g. single-quoted strings)

https://github.com/TysonAndre/PhanTypoCheck/blob/0.0.3/src/TypoCheckUtils.php#L100-L102

@TysonAndre
Copy link
Contributor

An example regex to extract individual words from camelCase and mixedACRONYMSpelling that may be useful (PCRE, though) /[a-z]+|[A-Z](?:[a-z]+|[A-Z]+(?![a-z]) (not aware of hyphens or accented characters, could skip that check if either were found)

I'd just like to add that my initial solution had quadratic performance for long strings of lowercase letters, and should not be used as-is. (it didn't specify a start boundary)

'/(?:[a-z][^a-zA-Z]*[A-Z]|_)/' can be used as a sanity check of whether a word is camelCase or snake_case (only needed if splitting a string is slow).

'/[a-z]+|[A-Z](?:[a-z]+|[A-Z]+(?![a-z]))/' can be used to extract individual parts

@BlaineEXE
Copy link

Thought I might note that we use codespell in our CI for github.com/rook/rook, and we have a function AtLeast() which codespell suggests be at least, and I'd love to have the CamelCase support so we can still find accidental atleast strings in documents while still allowing AtLeast() functions.

@matkoniecz
Copy link
Contributor

I just want to mention that I would be perfectly happy with detection only quadratic camel case support (I found defaultNameOccurances only by accident as occurance was existing also as standalone typo)

@TysonAndre
Copy link
Contributor

I mean the one that I posted fixed the quadratic runtime I had with my first attempt, so maintainers should avoid introducing similar bugs in however they solve it.

The buggy one was /(?:[a-z].*[A-Z]|_)/ and was slow for long lowercase strings due to causing backtracking and needing to start from every possible start position

The fixed one was /[a-z]+|[A-Z](?:[a-z]+|[A-Z]+(?![a-z])/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants