Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental learning in PAV backend #227

Closed
osma opened this issue Dec 14, 2018 · 2 comments
Closed

Incremental learning in PAV backend #227

osma opened this issue Dec 14, 2018 · 2 comments

Comments

@osma
Copy link
Member

osma commented Dec 14, 2018

It should be possible to support incremental learning (#225) in the PAV backend. The sklearn IsotonicRegression models unfortunately cannot be updated with new data, but they are relatively simple and fast to recompute and it should be possible to limit the update only to a small number of subject-specific models. But this requires a separate database (e.g. SQLite) containing all the input data that was used for creating the regression models. The database contains a table with the following information:

  • document (represented by e.g. sha256 checksum of text)
  • source project ID
  • subject ID (or URI)
  • raw score returned by source project
  • whether the subject was relevant or not (boolean value)

The general idea is:

  • Propagate the learn operation (that specifies document text and gold standard subjects) to source projects
  • Analyze the document using the (now possibly updated) source projects
  • Determine the affected subjects (union of subjects suggested by any of the source projects and gold standard subjects)
  • Update (possibly replacing existing rows for the same document) the database with information for the affected subjects
  • Recreate the regression models for all affected subjects

The result is imperfect, as updating the source projects may affect scores also for unrelated documents, but we can't analyze them all here - this is the nature of incremental learning.

@osma osma added this to the Short term milestone Dec 14, 2018
@osma osma modified the milestones: Short term, Blue Sky Jan 15, 2019
@osma
Copy link
Member Author

osma commented Jan 15, 2019

It makes more sense to implement this kind of ensemble with Vowpal Wabbit which supports online learning natively. See the jupyter notebook where this has been tested - results were similar to PAV

@osma
Copy link
Member Author

osma commented Jan 15, 2019

Opened an issue for the VW ensemble: #235 . Closing this one as unnecessary.

@osma osma closed this as completed Jan 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant