-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User abbreviation not working properly #109
Comments
I have same problem |
Me, too. Was this fixed already? |
Hi, Short versioncan you try downloading nltk's averaged_perceptron_tagger and universal_tagset in your environment? import nltk
nltk.download("averaged_perceptron_tagger")
nltk.download("universal_tagset") Long versionApologies for slowness with the issueSorry for the very slow reply to the original issue - I had been meaning to look into it at the time but never got around to it and forgot about it completely. Unfortunately, maintaining this repo hasn't been a top priority for me. One of the reasons I didn't put more effort into tracking down the issue at the time was that it was something I couldn't reproduce on my laptop and I feared it would be tricky to debug. Root causeHowever, I've just looked into it and I believe I've found the cause of the problem. I found I was able to reproduce the issue in a fresh environment. For me, this bare except was causing issues - it's a last-ditch effort to just ensure that we still get some output even if something's gone wrong, and just return the original 'non-standard word' before we start trying to normalise it. However, this except clause was covering up the fact that nltk resources that the repo expects might not be present. Follow up actionsIf I were writing this program again, I wouldn't rely on this bare except, which can cover up a multitude of sins (as it did here). Unfortunately, I don't have time at the moment for a substantial rewrite, but I should be able to find time to add an extra except clause for a LookupError, which is what is raised by nltk when it doesn't have the expected resources. That should make the required download transparent to other users. Does that sound reasonable as a mitigation? Thanks everyone on the ticket for pointing out the issue, and sorry again for the slow responce! |
Address #109 and #117 : for these, in expand_EXPN, nltk requires 'averaged_perceptron_tagger' and 'universal_tagset' to be downloaded, but expand_EXPN has a bare except that just returns the original nsw if anything fails. This obscured the missing data dependencies and meant that expand_EXPN never did anything. By explicitly raising LookupError , we can surface this issue to the user. It would be better to handle this dependency for the user, but this at least makes the issue transparent
Hi, There is a fix for this in release 0.1.8 which is on both pypi and Github. The fix reaises the LookupErrors so that it is clear to the user what needs to be done. There doesn't appear to be a standard pattern for including nltk data dependencies in python packages, but I'll be sure to update this ticket if I find a solution which install the relevant dependencies automatically, which would obviously be a better experience for users than having to run the package repeatedly and deal with the errors. Thanks for pointing out the issue! Best, Elliot |
Good day,
I am trying your library but it is not working as expected. I am trying to use custom abbreviation but it does not replace the abbreviation with the equivalent word. I have even tried your example that is listed in the read me and I got the following output.
['four bdrm', 'house', 'for', 'sale', ',', 'four hundred and fifty nine thousand pounds', 'ONO']
The text was updated successfully, but these errors were encountered: