-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: update for probabilistic label tree reduction (#2766) - support for --probabilities option and fixed compatibility with VW 9+ version #4138
Conversation
Thanks so much for this! I am not aware of any PLT tests being removed. The testing infrastructure changed though. I see a number of PLT tests in the main test file vowpal_wabbit/test/core.vwtest.json Line 2802 in 802ce7b
I am not aware of any changes that came in that time frame that would break things. I will spend some time today comparing 8.11.0 and 9.0.0 with the new demo you added and let you know what I find. EDIT: This bug was fixed in 9.3.0, that definitely broke PLT for a few versions. Does 9.3.0 behave more expectedly? The tests that were in place did not catch this bug unfortunately. |
@jackgerrits I apologize I tried to search for that test, but somehow didn't find it initially, thank you for linking. |
The main difference between 8 and 9 is save_resume becoming the default. To go back to pre-9 behavior --predict_only_model can be used when saving and loading models. Perhaps that is the difference? |
This seems to be it! With |
… with probabilities
…n with --threshold option
Sorry about the delay on getting back to you @mwydmuch. I'm working on this now and hope to get this merged tomorrow.
This still looks sorted by label index. Looks fine to me.
I agree with Wangda. Totally reasonable. Reductions get to decide how they calculate loss so its good. Documenting in the wiki would be fantastic.
|
@mwydmuch CI should be happy now. I noticed that there aren't any tests for the new |
@jackgerrits, @zwd-ms thank you for your comments and improvements, and sorry again for the long response time.
I've added tests for the
Cool, I will be happy to add a Wiki page describing PLT reduction, I have time to do it before the end of the year.
Sorry, I gave a bad example here. The PLT outputs labels in the order they are found during the tree search. In the case of top-k they are naturally ordered by decreasing probability. For prediction with threshold, the ordering look even more random (both probabilities and label indices are mixed), example:
If ordering by label indices is a hard assumption in VW, I can add some additional sorting to make it always ordered according to label indices. An additional question about the small quality of life improvement. |
I think it is okay to leave it as is, all of the information is there so a user can sort as they wish. If you wanted to you could sort it but I'm not sure if by class or prob makes more sense. This can be easily fixed later if it is an issue for anyone.
It is possible to override the loss_function type (example), but I think this example is not quite correct. We need to override the option value in addition to setting I think both of these things are low priority and could be merged without (and/or fixed later), so if you'd rather just see this merged then I'm happy. |
@jackgerrits Thanks for fixing the warning.
Ok, then let's merge it, I'm happy with the state it is right now. |
Just noticed the changes to If the change is necessary I can go in and fix it up. |
for (size_t i = 0; i < p.t; ++i) | ||
{ | ||
for (size_t i = 0; i < p.t; ++i) | ||
{ | ||
bin_text_read_write_fixed( | ||
model_file, reinterpret_cast<char*>(&p.nodes_time[i]), sizeof(p.nodes_time[0]), read, msg, text); | ||
} | ||
bin_text_read_write_fixed( | ||
model_file, reinterpret_cast<char*>(&p.nodes_time[i]), sizeof(p.nodes_time[0]), read, msg, text); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this was changed to not use the resume
value any more. To avoid changing the binary format of the model we can read/write this value but just ignore it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackgerrits the small modification of this function is needed, cause the previous one has a bug that makes models load incorrectly (when resume is true). I've modified it to read/write resume variable as you suggested. I've tested it and it works correctly now with previous versions.
std::stringstream msg; | ||
msg << ":" << resume << "\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be left in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added it back
bin_text_read_write_fixed(model_file, reinterpret_cast<char*>(&resume), sizeof(resume), read, msg, text); | ||
|
||
if (resume && !p.all->weights.adaptive) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reasoning behind this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that the previous version is wrong because during the prediction p.all->weights.adaptive
is always false (at least if no additional arguments are provided), so if the model was trained with p.all->weights.adaptive == true
it's later loaded in the wrong way. Removing this and only depending on the resume variable fix the problem + in the current version this node_time
array that is saved is used for all optimizer types (see learn_node
function).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is better now since the current version does not depend on optimizer-specifics, even if the optimizer does not use sd->t (like adaptive), writing/reading one additional float per tree node is a small price for more general code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And since !p.all->weights.adaptive == true
during prediction, removing it still keeps compatibility with older most of the older models that were trained without a resume (default in version 8.X.X) and with resume and another optimizer than adaptive (in 9.X.X).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just spent some time to fully understand this situation. This previous !p.all->weights.adaptive == true
check is actually quite troublesome and if the value of adaptive
changes when loading a saved model then it is corrupted (it is relatively easy to change this value due to how gd
's options work). If loading an older model (<9.7) with this save_load
function then most code paths are invalid. The default (no optimizer flags) will also be invalid. The following summarizes the different combinations here:
Model produced by VW | Adaptive value when saving | Adaptive value when loading | Result |
---|---|---|---|
<9.7 | false | true | valid |
<9.7 | true | true | invalid (current default) |
<9.7 | false | false | valid |
<9.7 | true | false | invalid |
>=9.7 | n/a | n/a | valid |
We cannot distinguish between the adaptive value when saving or loading easily in the current system (technically possible but tricky). Additionally, the method by which adaptive becomes true or false is a bit complex.
Additionally, any non-gd base optimizer is actually also undefined behavior because the p.all->weights.adaptive
value is actually uninitialized if gd is not the base.
Based on all these facts I think the safest thing to do is disallow loading of <9.7 plt models but allow an escape hatch just for version 9.7 to force the load.
We will need to bump the version to 9.7 in this PR for the check to work.
If all sounds good to you, I am happy to push a commit making these changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackgerrits Thank you for putting your time into this. Yes, your table well summarizes the situation. I agree that disallowing or at least not setting some warning is a good thing in this situation. So I'm happy with your proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks for confirming! I'll fix this up tomorrow then we should be good to merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pushed the change:
- By default we fail for <9.7 models being loaded
- A user can override this with
--plt_force_load_legacy_model
which will be removed in the next release - I remove the save/load of
resume
as you had originally done in this PR nodes_time
is now unconditionally saved in the model. In addition its size is saved in the model and we do an assert to ensuret
andnodes_time.size()
stay in sync.- This is a format change due to the removal of
resume
and prefixed size field when writingnodes_time
and therefore this model cannot be loaded invw
binaries prior to 9.7
- This is a format change due to the removal of
Hello!
This PR aims to add support for probabilities output for the PLT reduction + fix it in version 9+.
So far, I added support for the probabilities option, which makes the reduction to output (label_index, probability) as
action_score
pairs. I also added an additional dataset to the demo, that only needs a few seconds to run.Unfortunately, when implementing support for probabilities output, I've noticed that the PLT reduction stopped working correctly between 8.11.0 and 9.0.0 releases. You can run the new demo I added on v8.11.0 and v9.0.0 and notice that on 8.11.0 it has nice performance, and for 9+ it returns almost constant prediction. In the past, there was a test for performance of this reduction implemented in test dir, but it seems to me that it was removed before 9.0.0 release, probably because of that, it hasn't been noticed. So this needs to be fixed. The release notes for 9.0.0 does not mention anything that could be a reason for that. Were there any modifications to single_learner (learner<char, example>) that could change its behavior? Any pointers?
Model format changes:
--plt_force_load_legacy_model
which will be removed in the next releaseresume
nodes_time
is now unconditionally saved in the model. In addition its size is saved in the model and we do an assert to ensuret
andnodes_time.size()
stay in sync.resume
and prefixed size field when writingnodes_time
and therefore this model cannot be loaded invw
binaries prior to 9.7