-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warm start for cbify #1534
Warm start for cbify #1534
Conversation
…repare multiple copies of weights
…n - the ec.l field is anunion
Some more questions:
On the other hand, I am confused about the implementation of cost regression in csoaa - in csoaa, is each example (x,c) converted to a loss \ell(f, (x,c)) = \sum_{a=1}^K (f(x,a) - c(a))^2, or \ell(f, (x,c)) = 1/K * \sum_{a=1}^K (f(x,a) - c(a))^2? If it is the former, then I think we shouldn't scale the loss in the MTR reduction by 1/K, as before scaling, \EE[\hat{\ell}(f, (x,c))] = \sum_{a=1}^K (f(x,a) - c(a))^2. (Basically we would like the optimization objective to be exactly the same as the one in our paper, Appendix A.)
|
@zcc1307 I'm going to help you get this in. Grab me at some point and let me know how I can help here. |
… of each example in the mtr reduction by 1/num_actions
vowpalwabbit/cb_adf.cc
Outdated
GEN_CS::call_cs_ldf<true>( | ||
base, mydata.gen_cs.mtr_ec_seq, mydata.cb_labels, mydata.cs_labels, mydata.prepped_cs_labels, mydata.offset); | ||
examples[mydata.gen_cs.mtr_example]->weight *= 1.f / examples[mydata.gen_cs.mtr_example]->l.cb.costs[0].probability * ((float)mydata.gen_cs.event_sum / (float)mydata.gen_cs.action_sum) * (1.f / (float)examples.size()); | ||
GEN_CS::call_cs_ldf<true>(base, mydata.gen_cs.mtr_ec_seq, mydata.cb_labels, mydata.cs_labels, mydata.prepped_cs_labels, mydata.offset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I am a little confused about this line - what is ((float)mydata.gen_cs.event_sum / (float)mydata.gen_cs.action_sum)? Is it 1/K?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more like 1 / average K. These are defined via this: https://github.com/VowpalWabbit/vowpal_wabbit/blob/master/vowpalwabbit/gen_cs_example.cc#L171
@jackgerrits Thanks, Jack! Yes, I think I have made changes to pass the checks. @JohnLangford As we further divide the importance weights in the mtr reduction by 1/K, this caused some changes in regcb's test results. I am not sure if I should change the Lambda set setting as we discussed before to accomodate this 1/K change in mtr, because we also hope that the algorithm works also for other reductions (ips/dr)? Instructions on using the --warm_cb option can be found in https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Warm-starting-contextual-bandits . |
@zcc1307 I'm kind of confused. What is the question exactly? We have some conflicts to merge. |
Is this ready to merge? |
Yes, it is ready to merge! I have also updated the documentations: |
Thanks for the docs! I’ll add it to the sidebar today
EDIT: done
…On Mon, Apr 1, 2019 at 11:41 PM, Chicheng Zhang ***@***.***> wrote:
Yes, it is ready to merge! I have also updated the documentations:
https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Warm-starting-contextual-bandits
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1534 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHNVUnjPWygOQ7DrA4cwkOJip0CGW5NVks5vctFWgaJpZM4VRZJ6>
.
|
Merged, thanks :-) |
* / * not sure if the cost vector retrieved is correct * not sure if the cost vector retrieved is correct * added cbify warm start code * commented out the multiple lambda code in cbify * commented out the multiple lambda code in cbify * the cbexplore approach seems not working, as the first stage cannot prepare multiple copies of weights * . * properly store the temp labels * back * . * fixed the bug with assigning cb label before cost sensitive prediction - the ec.l field is anunion * the cumulative cost become diverse * modified csoaa so that it can take example weights now. * . * added some results of warm starting * added some results of warm starting * before modifying cbify adf code * start modifying cbify adf code * unkwown segfault error * everything good except for the cost sensitive learn part * . * . * fixed the bug of empty example cost wrongly set * fixed the bug of empty example cost wrongly set * partially fix the importance weight issue * fixed memory leak bug * start changing the sample size paramters * adding the bandit period as an explicit option * file reorg * tweak the python script * added scatterplot script * retracted the matplotlib inclusion * . * . * regexp based line parsing for vw output (not tested yet) * . * . * tweaked the scripts * . * . * label corruption code * supervised dataset validation * lambda script * weighting scheme * . * start properly copying the examples * model is not updating in the supervised phase * change to using proper copy example functions. Memory leak issues persist. * . * updated the lambda tuning scheme * . * fixed bug on zero warm start examples on small datasets * added a refined weighting scheme and cumulative var calculation (not tested yet) * warm start = 0 does not work * fixed the csl label zero problem - now the label is set properly: 1,2,..K * . * make the lambda weighting more modular * make adf modular * the version where there is an error on memory free * finished cleanup (need to double check the cb label swap in the adf case) * adjusted the output of the script so that it is more systematic * a more complete summary file * bring back the pairwise comparison plot * added type 3 noise * (warm start type = 2, adf) setting gives wrong results * (warm start type = 2, adf) setting gives wrong results * fixed the place of weight multiplier calculation * force the changes * before modifying the baseline of no update * a new parameter enumeration scheme * . * . * updated scripts * cleaned up the run vw script; need more tests on more choices of param settings * fixed memory lost problems; still reachable problems still not resolved * started cleaning up the cost-sensitive mc to cs conversion * begin changing the cb learning w/o adf part * finished cleaning up the no adf part * before cleaning up adf * mwt explorer kept outputting action 0 * roll back to a state before reorg that is working * intermediate state * fixed a problem in noadf:lambda selection now happens before update * there is still a memory leak issue for ecs[0].pred.a_s * lines for respective validation methods * commented out matplotlib * commented out matplotlib * rename running script * trial on compiling vw in one of the subtasks * before merging * cleaned up all errors except for calling cost sensitive learning * fixed offset bugs in cb_explore and multiline_predict_or_learn * fixed error on split/nosplit swapping * fixed all memory leaks in warm start ground truth * fixed memory leaks in supervised ground truth * added cbify warm start test cases * removed unnecessary include path prefix * cleaning up script * finished updating the running vw script * . * removed running scripts * removed spurious changes * removed spurious changes * undoing the weight scaling by 1/k in mtr * updated tests * added warm_cb as a separate file * . * removed part on non-adf * redoing the importance weight scaling by a factor of 1/k * . * comma typo * removed redundant comments * resolve conflicts * compile error on peeking epsilon in warm_cb.cc * fixed sim-bandit option, disallow cost-sensitive corruption * begin fixing importance weight in cs examples * revert cost_sensitive.cc * fixed the weighting issue in cs examples * . * edited vw_core.vcxproj * added new warm cb test cases * overwrote regcb test results, as we further divide importance weights of each example in the mtr reduction by 1/num_actions * corrected a mistake in new regcb test result * reorder reduction stack * changed the weight scaling back without 1/K; changed the central value of lambda * changed back regcbopt test results; undo changes in cb_adf.cc
This patch is for a new mode in vw: contextual bandit learning with warm start (CB-WS). It is mainly based on modifying cbify.cc - in its predict_or_learn_adf, the learning is now broken to two phases: 1. warm start phase and 2. interaction phase.
Remarks:
As of now, the CB-WS mode only works if the base exploration algorithm is epsilon-greedy. For more complex exploration algorithms (e.g. cover/bagging), a difficulty is that, we need to initialize all the base learners in cover/bagging using warm start examples - this seems to require use to change the code of predict_or_learn_cover/predict_or_learn_bag in cb_explore_adf.cc.
We additionally scale the cb examples' importance weight by a 1/num_actions in the mtr mode of cb_adf.cc - this has the effect of ensuring the warm start examples have the same weight as the CB examples.
Fixed an offset issue in predict_or_learn_greedy (should use example's offset - otherwise will have the wrong behavior when multiple cb_explore learners are initialized in cbify) and multiline_predict_or_learn (store / restore the examples' offsets properly)
Added some simple test cases.
There are lots of debugging cout's - perhaps I should delete them?