Incremental learning in PAV backend #227

osma · 2018-12-14T10:20:18Z

It should be possible to support incremental learning (#225) in the PAV backend. The sklearn IsotonicRegression models unfortunately cannot be updated with new data, but they are relatively simple and fast to recompute and it should be possible to limit the update only to a small number of subject-specific models. But this requires a separate database (e.g. SQLite) containing all the input data that was used for creating the regression models. The database contains a table with the following information:

document (represented by e.g. sha256 checksum of text)
source project ID
subject ID (or URI)
raw score returned by source project
whether the subject was relevant or not (boolean value)

The general idea is:

Propagate the learn operation (that specifies document text and gold standard subjects) to source projects
Analyze the document using the (now possibly updated) source projects
Determine the affected subjects (union of subjects suggested by any of the source projects and gold standard subjects)
Update (possibly replacing existing rows for the same document) the database with information for the affected subjects
Recreate the regression models for all affected subjects

The result is imperfect, as updating the source projects may affect scores also for unrelated documents, but we can't analyze them all here - this is the nature of incremental learning.

The text was updated successfully, but these errors were encountered:

osma · 2019-01-15T10:23:44Z

It makes more sense to implement this kind of ensemble with Vowpal Wabbit which supports online learning natively. See the jupyter notebook where this has been tested - results were similar to PAV

osma · 2019-01-15T15:55:48Z

Opened an issue for the VW ensemble: #235 . Closing this one as unnecessary.

osma added the enhancement label Dec 14, 2018

osma added this to the Short term milestone Dec 14, 2018

osma modified the milestones: Short term, Blue Sky Jan 15, 2019

osma closed this as completed Jan 15, 2019

osma modified the milestones: Blue Sky, No further action needed Jul 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental learning in PAV backend #227

Incremental learning in PAV backend #227

osma commented Dec 14, 2018

osma commented Jan 15, 2019

osma commented Jan 15, 2019

Incremental learning in PAV backend #227

Incremental learning in PAV backend #227

Comments

osma commented Dec 14, 2018

osma commented Jan 15, 2019

osma commented Jan 15, 2019