Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User abbreviation not working properly #109

Closed
javidalkaruzi opened this issue May 7, 2017 · 4 comments
Closed

User abbreviation not working properly #109

javidalkaruzi opened this issue May 7, 2017 · 4 comments

Comments

@javidalkaruzi
Copy link

Good day,

I am trying your library but it is not working as expected. I am trying to use custom abbreviation but it does not replace the abbreviation with the equivalent word. I have even tried your example that is listed in the read me and I got the following output.

['four bdrm', 'house', 'for', 'sale', ',', 'four hundred and fifty nine thousand pounds', 'ONO']

@pikaliov
Copy link

pikaliov commented Apr 2, 2019

I have same problem

@derekhsu
Copy link

derekhsu commented Apr 7, 2019

Me, too. Was this fixed already?

@EFord36
Copy link
Owner

EFord36 commented Apr 9, 2019

Hi,

Short version

can you try downloading nltk's averaged_perceptron_tagger and universal_tagset in your environment?

import nltk
nltk.download("averaged_perceptron_tagger")
nltk.download("universal_tagset")

Long version

Apologies for slowness with the issue

Sorry for the very slow reply to the original issue - I had been meaning to look into it at the time but never got around to it and forgot about it completely. Unfortunately, maintaining this repo hasn't been a top priority for me. One of the reasons I didn't put more effort into tracking down the issue at the time was that it was something I couldn't reproduce on my laptop and I feared it would be tricky to debug.

Root cause

However, I've just looked into it and I believe I've found the cause of the problem. I found I was able to reproduce the issue in a fresh environment. For me, this bare except was causing issues - it's a last-ditch effort to just ensure that we still get some output even if something's gone wrong, and just return the original 'non-standard word' before we start trying to normalise it. However, this except clause was covering up the fact that nltk resources that the repo expects might not be present.

Follow up actions

If I were writing this program again, I wouldn't rely on this bare except, which can cover up a multitude of sins (as it did here). Unfortunately, I don't have time at the moment for a substantial rewrite, but I should be able to find time to add an extra except clause for a LookupError, which is what is raised by nltk when it doesn't have the expected resources. That should make the required download transparent to other users. Does that sound reasonable as a mitigation?

Thanks everyone on the ticket for pointing out the issue, and sorry again for the slow responce!

EFord36 added a commit that referenced this issue May 5, 2019
Address #109 and #117 : for these, in expand_EXPN, nltk requires 'averaged_perceptron_tagger' and 'universal_tagset' to be downloaded, but expand_EXPN has a bare except that just returns the original nsw if anything fails.  This obscured the missing data dependencies and meant that expand_EXPN never did anything. By explicitly raising LookupError , we can surface this issue to the user. It would be better to handle this dependency for the user, but this at least makes the issue transparent
@EFord36
Copy link
Owner

EFord36 commented May 14, 2019

Hi,

There is a fix for this in release 0.1.8 which is on both pypi and Github. The fix reaises the LookupErrors so that it is clear to the user what needs to be done. There doesn't appear to be a standard pattern for including nltk data dependencies in python packages, but I'll be sure to update this ticket if I find a solution which install the relevant dependencies automatically, which would obviously be a better experience for users than having to run the package repeatedly and deal with the errors.

Thanks for pointing out the issue!

Best,

Elliot

@EFord36 EFord36 closed this as completed May 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants