-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalized form of "am" is "a.m." #2754
Comments
I think this is probably a bad norm exception, because it really refers to "am" as in the time. The more common case is the verb to be --- so we should remove this. |
Is this supposed to be the relevant code, in
And, if so, why would one get that the norm of token "am" in "I am fine" is "a.m." when there was no number beforehand such as in "3am"? |
You're correct, this doesn't make sense and the above exceptions really only target tokens like I've been trying to find where this norm is set and it's pretty mysterious... 🤔 I haven't been able to figure it out yet. |
Also this is troubling: import spacy
from spacy.lang.en import English
nlp = English()
print([tok.norm_ for tok in nlp('The normalized form of "a" is "gonna".')])
#> ['the', 'normalized', 'form', 'of', '"', 'gonna', '"', 'is', '"', 'going', 'to', '"', '.'] |
Thanks for the example – this might explain a lot. Looks like this is actually an issue with the caching of the token attributes. There was a similar bug a while ago, but I'm pretty sure we fixed that. But it might still be related. |
Fix should now be up on develop: #3029 Thanks for your help reporting this. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
It appears that for the default English model, the normalized form of "am" is "a.m.". This seems reasonable, but then shouldn't the normalized form of "I'm" be ["i", "a.m."]?
Would it be better to make the normalized form of "am" and "a.m." be "am"?
How to reproduce the behaviour
Your Environment
The text was updated successfully, but these errors were encountered: