Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better Finnish analyzer based on libvoikko #37

Closed
osma opened this issue Mar 9, 2018 · 3 comments
Closed

Better Finnish analyzer based on libvoikko #37

osma opened this issue Mar 9, 2018 · 3 comments
Milestone

Comments

@osma
Copy link
Member

osma commented Mar 9, 2018

In #34 I resorted to a Snowball stemmer for Finnish because of difficulties installing libvoikko in a virtual environment (and Travis might be problematic too).

But it would be worth at least trying whether libvoikko gives better results than the stupid Snowball stemmer.

@osma osma added this to the Blue Sky milestone Mar 9, 2018
@osma osma modified the milestones: Blue Sky, Short term May 17, 2018
@osma
Copy link
Member Author

osma commented May 17, 2018

There's a new voikko module on PyPI: https://pypi.org/project/voikko/#description
Maybe it would help with the install problems?

@osma
Copy link
Member Author

osma commented Oct 8, 2018

There is an initial implementation on the branch issue37-voikko.

However, there are (at least) two outstanding issues:

  • The Travis build fails because using the voikko backend would require installing the libvoikko library from a deb package. Currently we are using the container based infrastructure, which limits package installs to whitelisted packages, and libvoikko is not on the whitelist. More generally, do we want to add a feature that requires installing a system library? At least it shouldn't be a hard dependency - basic functionality and unit tests should work even without installing libvoikko.
  • There are weird ValueError: ctypes objects containing pointers cannot be pickled in unit tests. I don't understand what triggers these, as the voikko analyzer shouldn't be involved in what is being pickled, it just inputs and outputs normal Python strings.

@osma
Copy link
Member Author

osma commented Jan 15, 2019

Travis has deprecated their container environment and we have switched to the Xenial based environment where installing any deb package is possible. So it should be possible to run the tests with Voikko on Travis. However, the voikko feature should be made optional so that native libraries are not required for a basic install of Annif. (See also #229 for similar thoughts about fastText)

The pickle issue has been sorted out on the issue37-voikko branch.

This was referenced Jan 15, 2019
@osma osma closed this as completed in #231 Jan 15, 2019
@osma osma modified the milestones: Short term, v0.38 Jan 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant