Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vw_ensemble backend #284

Merged
merged 28 commits into from
Jun 28, 2019
Merged

Add vw_ensemble backend #284

merged 28 commits into from
Jun 28, 2019

Conversation

osma
Copy link
Member

@osma osma commented Jun 24, 2019

Fixes #235

This PR adds a new Vowpal Wabbit based learning ensemble backend. It is an alternative for PAV. The implementation is based on ideas from the Annif-fusion experiments.

This is a draft PR. Some TODOs:

  • discounting mechanism (increase trust in the model with more training data)
  • refactor and reduce duplication w.r.t. vw_multi backend (maybe extract a common base class for VW based backends)
  • coordinate with PR Issue277 auto generate api documentation #280: need to make sure documentation is autogenerated for the new backend

@osma osma added this to the v0.41 milestone Jun 24, 2019
@codecov
Copy link

codecov bot commented Jun 24, 2019

Codecov Report

Merging #284 into master will increase coverage by 0.04%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #284      +/-   ##
==========================================
+ Coverage   99.32%   99.36%   +0.04%     
==========================================
  Files          52       55       +3     
  Lines        2651     2842     +191     
==========================================
+ Hits         2633     2824     +191     
  Misses         18       18
Impacted Files Coverage Δ
annif/backend/mixins.py 93.75% <ø> (-0.37%) ⬇️
tests/test_project.py 100% <ø> (ø) ⬆️
annif/backend/tfidf.py 100% <ø> (ø) ⬆️
annif/backend/ensemble.py 100% <100%> (ø) ⬆️
tests/test_backend_vw_ensemble.py 100% <100%> (ø)
annif/backend/__init__.py 100% <100%> (ø) ⬆️
annif/backend/vw_ensemble.py 100% <100%> (ø)
annif/backend/vw_base.py 100% <100%> (ø)
annif/backend/vw_multi.py 100% <100%> (ø) ⬆️
annif/backend/backend.py 96.55% <100%> (+0.12%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 09c4084...70d3973. Read the comment docs.

@osma
Copy link
Member Author

osma commented Jun 24, 2019

The Travis build breaks on Python 3.5 (the only Python version where VW is available). Apparently there's a problem with the unit test expecting the tfidf-fi backend to be initialized, when it's not.

@osma
Copy link
Member Author

osma commented Jun 26, 2019

Still TODO:

  • discounting mechanism
  • refactor _create_model - the code is practically identical in vw_ensemble and vw_multi, so the method should be moved to vw_base, but need to separate out the default options for each type as they are a bit different
  • auto-generation of API docs for the new modules
  • remaining quality issues and ensuring test coverage

@osma
Copy link
Member Author

osma commented Jun 27, 2019

This pull request introduces 7 alerts when merging 9f25308 into f9b1294 - view on LGTM.com

new alerts:

  • 4 for Unused import
  • 3 for Modification of parameter with default

This project has automated code review enabled, but doesn't use the LGTM GitHub App. Migrate over by installing the app. Read about the benefits of migrating to GitHub Apps in the blog.


Comment posted by LGTM.com

@lgtm-com
Copy link

lgtm-com bot commented Jun 28, 2019

This pull request introduces 6 alerts when merging 34dc5a4 into f9b1294 - view on LGTM.com

new alerts:

  • 3 for Unused import
  • 3 for Modification of parameter with default

@osma osma marked this pull request as ready for review June 28, 2019 10:54
@osma
Copy link
Member Author

osma commented Jun 28, 2019

Feature-wise everything should be done now. Test coverage is 100%. API docs are updated. The remaining quality issues can be ignored (most are false positives).

@osma osma requested a review from juhoinkinen June 28, 2019 10:55
@osma
Copy link
Member Author

osma commented Jun 28, 2019

I've done some testing on real data sets and although the results are not quite as good as I hoped quality-wise, the backend seems to be working well now after the latest fixes.

@osma osma merged commit 8521652 into master Jun 28, 2019
@osma osma deleted the issue235-vw-ensemble-backend branch June 28, 2019 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vowpal Wabbit ensemble backend
1 participant