Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup++ #1257

Merged
merged 4 commits into from
Nov 14, 2021
Merged

Cleanup++ #1257

merged 4 commits into from
Nov 14, 2021

Conversation

ampli
Copy link
Member

@ampli ampli commented Nov 13, 2021

This PR contains typo fixes.
However, while checking for typos (this time using the codespell command), I found the following sentence in the corpus-fixes batch that is not getting parsed:
He tried but failed to acheive his goal.
From its context, I guess this is not an intended typo. So I fixed it.
But since it demonstrates a problem in the current tokenization algo (issue #404) I also added two examples to corpus-fixes.
I added a FIXME in the commit because I think I know how to fix it with negligible overhead (no reparsing).

It seems this is an unintended typo. However, it demonstrates a problem
with the tokenize algo - it doesn't use spell corrections for tokens
that are regex-classified, because this very often leads to nonsense
parsing.
FIXME: Use farther tokenizing classifications for tokens that are
null-linked (may not be a trivial task).
@linas linas merged commit 87f55f9 into opencog:master Nov 14, 2021
@linas
Copy link
Member

linas commented Nov 14, 2021

I've been wondering how to do codespell for a long time!

@ampli ampli deleted the cleanup branch December 3, 2021 22:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants