feat: update for probabilistic label tree reduction (#2766) - support for --probabilities option and fixed compatibility with VW 9+ version #4138

mwydmuch · 2022-09-02T10:00:57Z

Hello!

This PR aims to add support for probabilities output for the PLT reduction + fix it in version 9+.

So far, I added support for the probabilities option, which makes the reduction to output (label_index, probability) as action_score pairs. I also added an additional dataset to the demo, that only needs a few seconds to run.

Unfortunately, when implementing support for probabilities output, I've noticed that the PLT reduction stopped working correctly between 8.11.0 and 9.0.0 releases. You can run the new demo I added on v8.11.0 and v9.0.0 and notice that on 8.11.0 it has nice performance, and for 9+ it returns almost constant prediction. In the past, there was a test for performance of this reduction implemented in test dir, but it seems to me that it was removed before 9.0.0 release, probably because of that, it hasn't been noticed. So this needs to be fixed. The release notes for 9.0.0 does not mention anything that could be a reason for that. Were there any modifications to single_learner (learner<char, example>) that could change its behavior? Any pointers?

Model format changes:

By default we fail for <9.7 models being loaded
A user can override this with --plt_force_load_legacy_model which will be removed in the next release
Removed the save/load of resume
nodes_time is now unconditionally saved in the model. In addition its size is saved in the model and we do an assert to ensure t and nodes_time.size() stay in sync.
- This is a format change due to the removal of resume and prefixed size field when writing nodes_time and therefore this model cannot be loaded in vw binaries prior to 9.7

jackgerrits · 2022-09-02T13:33:47Z

Thanks so much for this!

I am not aware of any PLT tests being removed. The testing infrastructure changed though. I see a number of PLT tests in the main test file

vowpal_wabbit/test/core.vwtest.json

Line 2802 in 802ce7b

"desc": "check plt training",

Is that what you are referring to? Or are there different tests you mean?

I am not aware of any changes that came in that time frame that would break things. I will spend some time today comparing 8.11.0 and 9.0.0 with the new demo you added and let you know what I find.

EDIT: This bug was fixed in 9.3.0, that definitely broke PLT for a few versions. Does 9.3.0 behave more expectedly? The tests that were in place did not catch this bug unfortunately.

mwydmuch · 2022-09-02T15:30:13Z

@jackgerrits I apologize I tried to search for that test, but somehow didn't find it initially, thank you for linking.
The test trains the model with --sgd option (probably added for speed), the demos are without --sgd since the default adaptive optimizer usually gives a better predictive performance. My quick test shows, that with --sgd it works in both v8 and v9, but for the adaptive, the results are good in v8 but not in v9 (I'm using the current master), so something probably changed in the behavior of the adaptive classifier between versions. I will try to investigate further later.

jackgerrits · 2022-09-02T15:35:46Z

The main difference between 8 and 9 is save_resume becoming the default. To go back to pre-9 behavior --predict_only_model can be used when saving and loading models. Perhaps that is the difference?

mwydmuch · 2022-09-02T15:42:12Z

This seems to be it! With --predict_only_model, it works correctly. Ok, thank you, @jackgerrits, I will try to fix it.

… with probabilities

…n with --threshold option

…el_tree_update

jackgerrits · 2022-12-14T22:26:09Z

Sorry about the delay on getting back to you @mwydmuch. I'm working on this now and hope to get this merged tomorrow.

Is it ok? It also seems to me to be not consistent with other binary and multiclass reductions that output a single line per example.
We should stick with a single line per example. Outputting either probs or just labels based on probabilities is the standard way to do this. Generally speaking the output of --predictions matches the output prediction type of the reduction and since the flag changes the output type it's all consistent here.

65:0.630946 66:0.754826 67:0.625329

This still looks sorted by label index. Looks fine to me.

Modified loss reporting. The MULTILABEL::output_example calculates hamming loss (and expects output labels to be sorted by indices), I overwrite this for PLT (mostly by copying and modifying the code form multilabel.cc), for plt a sum of binary losses is returned during training and 0 for prediction. Not sure if this is ok, but seemed to be the most logical solution for this reduction.

Otherwise, the updated output looks reasonable to me. It would be better to document these somewhere in the wiki later.

I agree with Wangda. Totally reasonable. Reductions get to decide how they calculate loss so its good. Documenting in the wiki would be fantastic.

It seems that MULTILABEL::output_example does not use all.sd->ldict, is that intended?

multilabels based reductions do not currently make use of this so should be ignored for now during output.

…_unique (VowpalWabbit#4346)

jackgerrits · 2022-12-14T22:40:12Z

@mwydmuch CI should be happy now. I noticed that there aren't any tests for the new --probabilities functionality. Would you be able to add a test or two for that? As soon as that is done we can go ahead and merge this.

mwydmuch · 2022-12-21T02:45:08Z

@jackgerrits, @zwd-ms thank you for your comments and improvements, and sorry again for the long response time.

I noticed that there aren't any tests for the new --probabilities functionality.

I've added tests for the --probabilities option.

I agree with Wangda. Totally reasonable. Reductions get to decide how they calculate loss so its good. Documenting in the wiki would be fantastic.

Cool, I will be happy to add a Wiki page describing PLT reduction, I have time to do it before the end of the year.

(...)

This still looks sorted by label index. Looks fine to me.

Sorry, I gave a bad example here. The PLT outputs labels in the order they are found during the tree search. In the case of top-k they are naturally ordered by decreasing probability. For prediction with threshold, the ordering look even more random (both probabilities and label indices are mixed), example:

84:0.555875,67:0.86485,51:0.67529,31:0.522847,33:0.864

If ordering by label indices is a hard assumption in VW, I can add some additional sorting to make it always ordered according to label indices.

An additional question about the small quality of life improvement.
Is it possible to set --loss_function to logistic by default for plt reduction (and if yes, then what is the intended way of doing it)? Right now, it always needs to be added as an argument. I think there was a code that was doing it in the implementation in v8.X.X, but it was changed later in v9.X.X.

jackgerrits · 2022-12-21T14:11:44Z

Sorry, I gave a bad example here. The PLT outputs labels in the order they are found during the tree search. In the case of top-k they are naturally ordered by decreasing probability. For prediction with threshold, the ordering look even more random (both probabilities and label indices are mixed), example:
84:0.555875,67:0.86485,51:0.67529,31:0.522847,33:0.864
If ordering by label indices is a hard assumption in VW, I can add some additional sorting to make it always ordered according to label indices.

I think it is okay to leave it as is, all of the information is there so a user can sort as they wish. If you wanted to you could sort it but I'm not sure if by class or prob makes more sense. This can be easily fixed later if it is an issue for anyone.

An additional question about the small quality of life improvement. Is it possible to set --loss_function to logistic by default for plt reduction (and if yes, then what is the intended way of doing it)? Right now, it always needs to be added as an argument. I think there was a code that was doing it in the implementation in v8.X.X, but it was changed later in v9.X.X.

It is possible to override the loss_function type (example), but I think this example is not quite correct. We need to override the option value in addition to setting all.loss and we should print a warning if we are overriding a user supplied loss function.

I think both of these things are low priority and could be merged without (and/or fixed later), so if you'd rather just see this merged then I'm happy.

mwydmuch · 2022-12-21T15:12:20Z

@jackgerrits Thanks for fixing the warning.

I think both of these things are low priority and could be merged without (and/or fixed later), so if you'd rather just see this merged then I'm happy.

Ok, then let's merge it, I'm happy with the state it is right now.
I will write a Wiki page on plt reduction soon.

jackgerrits · 2022-12-21T15:14:41Z

Just noticed the changes to save_load, are they necessary? If we make changes to this function we'll need to make sure we can still load models that were written by older versions of VW but write the new version going forward.

If the change is necessary I can go in and fix it up.

jackgerrits · 2022-12-21T15:20:10Z

vowpalwabbit/core/src/reductions/plt.cc

+    for (size_t i = 0; i < p.t; ++i)
    {
-      for (size_t i = 0; i < p.t; ++i)
-      {
-        bin_text_read_write_fixed(
-            model_file, reinterpret_cast<char*>(&p.nodes_time[i]), sizeof(p.nodes_time[0]), read, msg, text);
-      }
+      bin_text_read_write_fixed(
+          model_file, reinterpret_cast<char*>(&p.nodes_time[i]), sizeof(p.nodes_time[0]), read, msg, text);


It looks like this was changed to not use the resume value any more. To avoid changing the binary format of the model we can read/write this value but just ignore it.

@jackgerrits the small modification of this function is needed, cause the previous one has a bug that makes models load incorrectly (when resume is true). I've modified it to read/write resume variable as you suggested. I've tested it and it works correctly now with previous versions.

jackgerrits · 2022-12-21T20:10:13Z

vowpalwabbit/core/src/reductions/plt.cc

    std::stringstream msg;
-    msg << ":" << resume << "\n";


this should be left in

I've added it back

jackgerrits · 2022-12-21T20:10:29Z

vowpalwabbit/core/src/reductions/plt.cc

    bin_text_read_write_fixed(model_file, reinterpret_cast<char*>(&resume), sizeof(resume), read, msg, text);

-    if (resume && !p.all->weights.adaptive)


What's the reasoning behind this change?

It seems to me that the previous version is wrong because during the prediction p.all->weights.adaptive is always false (at least if no additional arguments are provided), so if the model was trained with p.all->weights.adaptive == true it's later loaded in the wrong way. Removing this and only depending on the resume variable fix the problem + in the current version this node_time array that is saved is used for all optimizer types (see learn_node function).

I think this is better now since the current version does not depend on optimizer-specifics, even if the optimizer does not use sd->t (like adaptive), writing/reading one additional float per tree node is a small price for more general code.

And since !p.all->weights.adaptive == true during prediction, removing it still keeps compatibility with older most of the older models that were trained without a resume (default in version 8.X.X) and with resume and another optimizer than adaptive (in 9.X.X).

I just spent some time to fully understand this situation. This previous !p.all->weights.adaptive == true check is actually quite troublesome and if the value of adaptive changes when loading a saved model then it is corrupted (it is relatively easy to change this value due to how gd's options work). If loading an older model (<9.7) with this save_load function then most code paths are invalid. The default (no optimizer flags) will also be invalid. The following summarizes the different combinations here:

Model produced by VW Adaptive value when saving Adaptive value when loading Result

<9.7 false true valid

<9.7 true true invalid (current default)

<9.7 false false valid

<9.7 true false invalid

>=9.7 n/a n/a valid

We cannot distinguish between the adaptive value when saving or loading easily in the current system (technically possible but tricky). Additionally, the method by which adaptive becomes true or false is a bit complex.

Additionally, any non-gd base optimizer is actually also undefined behavior because the p.all->weights.adaptive value is actually uninitialized if gd is not the base.

Based on all these facts I think the safest thing to do is disallow loading of <9.7 plt models but allow an escape hatch just for version 9.7 to force the load.

We will need to bump the version to 9.7 in this PR for the check to work.

If all sounds good to you, I am happy to push a commit making these changes.

@jackgerrits Thank you for putting your time into this. Yes, your table well summarizes the situation. I agree that disallowing or at least not setting some warning is a good thing in this situation. So I'm happy with your proposal.

Great, thanks for confirming! I'll fix this up tomorrow then we should be good to merge

Just pushed the change:

By default we fail for <9.7 models being loaded

A user can override this with --plt_force_load_legacy_model which will be removed in the next release

I remove the save/load of resume as you had originally done in this PR

nodes_time is now unconditionally saved in the model. In addition its size is saved in the model and we do an assert to ensure t and nodes_time.size() stay in sync.

This is a format change due to the removal of resume and prefixed size field when writing nodes_time and therefore this model cannot be loaded in vw binaries prior to 9.7

…uction

mwydmuch added 2 commits September 2, 2022 11:23

Add support for probabilities output to PLT reduction

2db4744

Add small mediamill demo for PLT

2bc5450

mwydmuch changed the title ~~[WIP] Update for probabilistic label tree reduction~~ [WIP] Update for probabilistic label tree reduction (#2766) Sep 2, 2022

mwydmuch changed the title ~~[WIP] Update for probabilistic label tree reduction (#2766)~~ [wip] update for probabilistic label tree reduction (#2766) Sep 2, 2022

Fix formating and compiler warnings

694874b

Fix PLT reduction

57c2403

DomChey mentioned this pull request Nov 7, 2022

Make --plt predict probabilities? #2766

Closed

mwydmuch added 2 commits November 12, 2022 01:57

Add custom output_example functionality for plt reduction with output…

7681d66

… with probabilities

Simplified output with --probabilities flag, fix output for predictio…

d3922c0

…n with --threshold option

mwydmuch changed the title ~~[wip] update for probabilistic label tree reduction (#2766)~~ update for probabilistic label tree reduction (#2766) Nov 13, 2022

mwydmuch changed the title ~~update for probabilistic label tree reduction (#2766)~~ feat: update for probabilistic label tree reduction (#2766) - support for --probabilities option and fixed compatibility with VW 9+ version Nov 13, 2022

mwydmuch added 3 commits November 13, 2022 12:55

Merge remote-tracking branch 'upstream/master' into probabilistic_lab…

60dffb2

…el_tree_update

Fix plt reduction after merge with upstream

ff269c6

Update plt demo and fix formatting

30bec7b

zwd-ms self-assigned this Nov 14, 2022

mwydmuch added 8 commits November 24, 2022 18:04

Add a unified demo for PLT reduction

95cacd6

Merge branch 'VowpalWabbit:master' into probabilistic_label_tree_update

1a6e07e

Fix bug in PLT reduction and clean up PLT demo

eeb23fd

Update tests for plt reduction

5e544dd

Fix precision and recall in plt tests

6e42eab

Fix plt tests

21ecd42

Improve loss reporting for PLT reduction

4a84556

Fix formating for plt_demo.py

a3ea5fc

Fix plt output with --probabilities flag

6d891ec

jackgerrits added 3 commits December 14, 2022 17:32

WIP: f7c9080 refactor: remove scoped_calloc_or_throw in favor of make…

8e0a7ce

…_unique (VowpalWabbit#4346)

fix index

b764ee8

Small cleanup to output

1105085

mwydmuch added 2 commits December 21, 2022 03:15

Add tests for predicting using plt reduction and --probabilities option

d19f606

Fix test 436

47f5f31

Merge branch 'master' into probabilistic_label_tree_update

3b59816

jackgerrits and others added 2 commits December 21, 2022 09:12

fix deprecated function usage

ff553c9

Merge branch 'master' into probabilistic_label_tree_update

6d84118

jackgerrits reviewed Dec 21, 2022

View reviewed changes

mwydmuch and others added 3 commits December 21, 2022 20:16

Fix save_load compatibility with previous version

9afb668

Merge branch 'master' into probabilistic_label_tree_update

4e081c3

Merge branch 'master' into probabilistic_label_tree_update

607679c

jackgerrits reviewed Dec 21, 2022

View reviewed changes

mwydmuch and others added 8 commits December 21, 2022 21:27

Revert removal of msg << ":" << resume in save_load_tree in plt red…

068701f

…uction

Merge branch 'master' into probabilistic_label_tree_update

45359a7

bump version

9e6ed19

add model version checks and use new format going forward

51441bc

help update

dc9992a

Merge branch 'master' into probabilistic_label_tree_update

71d2d10

Merge branch 'master' into probabilistic_label_tree_update

ee75034

update skip

2226f1f

jackgerrits approved these changes Dec 22, 2022

View reviewed changes

jackgerrits merged commit 758ed66 into VowpalWabbit:master Dec 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: update for probabilistic label tree reduction (#2766) - support for --probabilities option and fixed compatibility with VW 9+ version #4138

feat: update for probabilistic label tree reduction (#2766) - support for --probabilities option and fixed compatibility with VW 9+ version #4138

mwydmuch commented Sep 2, 2022 •

edited by jackgerrits

Loading

jackgerrits commented Sep 2, 2022 •

edited

Loading

mwydmuch commented Sep 2, 2022 •

edited

Loading

jackgerrits commented Sep 2, 2022

mwydmuch commented Sep 2, 2022

jackgerrits commented Dec 14, 2022

jackgerrits commented Dec 14, 2022

mwydmuch commented Dec 21, 2022

jackgerrits commented Dec 21, 2022

mwydmuch commented Dec 21, 2022

jackgerrits commented Dec 21, 2022 •

edited

Loading

jackgerrits Dec 21, 2022

mwydmuch Dec 21, 2022

jackgerrits Dec 21, 2022

mwydmuch Dec 21, 2022

jackgerrits Dec 21, 2022

mwydmuch Dec 21, 2022 •

edited

Loading

mwydmuch Dec 21, 2022

mwydmuch Dec 21, 2022 •

edited

Loading

jackgerrits Dec 21, 2022 •

edited

Loading

mwydmuch Dec 21, 2022

jackgerrits Dec 21, 2022

jackgerrits Dec 22, 2022

		bin_text_read_write_fixed(model_file, reinterpret_cast<char*>(&resume), sizeof(resume), read, msg, text);

		if (resume && !p.all->weights.adaptive)

Model produced by VW	Adaptive value when saving	Adaptive value when loading	Result
<9.7	false	true	valid
<9.7	true	true	invalid (current default)
<9.7	false	false	valid
<9.7	true	false	invalid
>=9.7	n/a	n/a	valid

feat: update for probabilistic label tree reduction (#2766) - support for --probabilities option and fixed compatibility with VW 9+ version #4138

feat: update for probabilistic label tree reduction (#2766) - support for --probabilities option and fixed compatibility with VW 9+ version #4138

Conversation

mwydmuch commented Sep 2, 2022 • edited by jackgerrits Loading

jackgerrits commented Sep 2, 2022 • edited Loading

mwydmuch commented Sep 2, 2022 • edited Loading

jackgerrits commented Sep 2, 2022

mwydmuch commented Sep 2, 2022

jackgerrits commented Dec 14, 2022

jackgerrits commented Dec 14, 2022

mwydmuch commented Dec 21, 2022

jackgerrits commented Dec 21, 2022

mwydmuch commented Dec 21, 2022

jackgerrits commented Dec 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mwydmuch Dec 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mwydmuch Dec 21, 2022 • edited Loading

Choose a reason for hiding this comment

jackgerrits Dec 21, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mwydmuch commented Sep 2, 2022 •

edited by jackgerrits

Loading

jackgerrits commented Sep 2, 2022 •

edited

Loading

mwydmuch commented Sep 2, 2022 •

edited

Loading

jackgerrits commented Dec 21, 2022 •

edited

Loading

mwydmuch Dec 21, 2022 •

edited

Loading

mwydmuch Dec 21, 2022 •

edited

Loading

jackgerrits Dec 21, 2022 •

edited

Loading