Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a ton of words from anarcat's personal dictionary #197

Closed
anarcat opened this issue Oct 2, 2024 · 4 comments
Closed

add a ton of words from anarcat's personal dictionary #197

anarcat opened this issue Oct 2, 2024 · 4 comments

Comments

@anarcat
Copy link

anarcat commented Oct 2, 2024

so i have this growing dictionary that i built to exclude common words from harper's error messages. i mentioned one word in a discussion (#171) but was encouraged to report more if i find them, so here we go.

This is the actual dict right now:

dictionary.txt

Some of those are pretty technical terms that maybe don't belong in a general dictionary, but perhaps could live in a separate dictionary that could be enabled on demand?

Acronyms like this, for example:

CI
UI
RRD
TCP
DSL
TPA
CRM
VPN
SLA
FPM
LTS
MTA
LVM
TPO
PII
WAL
IMAP
FIDO
PKCS
DKIM
DRBD
YAML
IPv4
ICMP
SNMP
TSDB

or names like this:

Alertmanager
CiviCRM
FastCGI
Ganeti
GitLab
Gmail
Golang
Grafana
Hiera
IPsec
Icinga
Javascript
Joomla
Kubernetes
Munin
Nagios
O'Reilly
OnionPerf
OpenMetrics
OpenPGP
Postfix
PromQL
Pushgateway
StackExchange
Trocla
Wordpress
YubiKey
YubiKeys
journald
systemd
uptime

There's also some oddballs in there, like:

Alexandre
anarcat

that you probably don't want to add in your dict (yet: what do you do with proper nouns? always mark those as errors?)...

Finally, those are actual english words that I think are missing from the dictionary as well:

anonymized
backported
backports
buildable
cardinality
compactions
deduplicates
flappy
natively
sawtooth
sharding

Which makes me wonder: where does that default dict come from? It seems like it's missing quite a bit of stuff... Note that my web browser marks those as real words:

CI
UI
VPN
FIDO
IPv4
Kubernetes
uptime
anonymized
cardinality
natively

and everything else is marked as an error (yes, even "flappy", so perhaps you're not the only one with that problem. ;)

anyways, i hope that helps! i can keep updating this issue with new words as i keep the dictionary in git here and can provide just the diff for next times. ;)

@elijah-potter
Copy link
Collaborator

This is gold mine of a list. Thanks for speaking up. I've added most of them in the latest commit.

elijah-potter added a commit that referenced this issue Oct 2, 2024
@anarcat
Copy link
Author

anarcat commented Oct 2, 2024

neat! i added a bunch of comments on that commit, because i think you went a tad to far on some. ;) sorry!

@lukasmwerner
Copy link
Contributor

@anarcat To answer the question about where the original dictionary came from. It originally came from wooorm/dictionaries (which if I understand it correctly came from ASpell originally). But we've adapted it over time to fit harper's needs.

@elijah-potter
Copy link
Collaborator

neat! i added a bunch of comments on that commit, because i think you went a tad to far on some. ;) sorry!

No, thank you. I've replied to your comments. If you want to add more words, just reopen this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants