Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vowpal Wabbit regular backend #230

Closed
osma opened this issue Jan 15, 2019 · 2 comments
Closed

Vowpal Wabbit regular backend #230

osma opened this issue Jan 15, 2019 · 2 comments
Milestone

Comments

@osma
Copy link
Member

osma commented Jan 15, 2019

The Vowpal Wabbit (VW) online learning systems seems promising as a backend for Annif. It could be used in at least two ways:

  1. As a basic backend that inputs text and predicts classes/concepts/subjects.
  2. As an ensemble backend that intelligently combines results from other backends, similar to PAV.

For case 1, the limitation of VW is that while it can perform multiclass and multilabel tasks, internally those tasks will be converted to K (mostly) independent classifiers, where K is the number of classes. When K is large, and there are also many input features, the resulting combinatorial explosion will cause problems despite the inherent scalability in VW. Thus the VW backend would probably work best for classification tasks where there are at most a few thousand classes. Also it would be useful to be able to use the output (concepts with scores) of other backends as input to the VW classifier; that would make it possible to e.g. predict UDC classes based on YSO subjects assigned by other backends.

The big attraction of VW, alongside its speed and scalability, is that it is oriented around online learning. So whatever it has been trained on, it can always learn to adapt based on feedback. It would be natural to implement VW support first, when adding support for online learning / feedback to Annif (#225).

VW requires a native library to be built, and building it can be difficult in some environments. It should be an optional dependency like voikko (#37) and fastText (#229).

@osma osma added this to the Short term milestone Jan 15, 2019
@osma
Copy link
Member Author

osma commented Jan 15, 2019

There is an initial implementation for case 1 on the vw-backend branch.

Remaining tasks for that implementation:

  • Better handling of characters with special meaning, e.g. pipe
  • Ability to specify model parameters, as in fasttext backend
  • Ability to use the output of other projects as features
  • Unit tests
  • Support for online learning (to be implemented together with Incremental / online learning based on user feedback #225)

@osma
Copy link
Member Author

osma commented Feb 27, 2019

Last bits implemented in #257

@osma osma closed this as completed Feb 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant