Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warm start for cbify #1534

Merged
merged 150 commits into from
Apr 2, 2019
Merged

Warm start for cbify #1534

merged 150 commits into from
Apr 2, 2019

Conversation

zcc1307
Copy link
Contributor

@zcc1307 zcc1307 commented Jul 16, 2018

This patch is for a new mode in vw: contextual bandit learning with warm start (CB-WS). It is mainly based on modifying cbify.cc - in its predict_or_learn_adf, the learning is now broken to two phases: 1. warm start phase and 2. interaction phase.

Remarks:

  1. As of now, the CB-WS mode only works if the base exploration algorithm is epsilon-greedy. For more complex exploration algorithms (e.g. cover/bagging), a difficulty is that, we need to initialize all the base learners in cover/bagging using warm start examples - this seems to require use to change the code of predict_or_learn_cover/predict_or_learn_bag in cb_explore_adf.cc.

  2. We additionally scale the cb examples' importance weight by a 1/num_actions in the mtr mode of cb_adf.cc - this has the effect of ensuring the warm start examples have the same weight as the CB examples.

  3. Fixed an offset issue in predict_or_learn_greedy (should use example's offset - otherwise will have the wrong behavior when multiple cb_explore learners are initialized in cbify) and multiline_predict_or_learn (store / restore the examples' offsets properly)

  4. Added some simple test cases.

  5. There are lots of debugging cout's - perhaps I should delete them?

@zcc1307
Copy link
Contributor Author

zcc1307 commented Feb 8, 2019

Some more questions:

  1. I think test case 173 fails because we are scaling the importance weight of each example in the MTR reduction by 1/num_actions - shall I change the test result reference instead?

On the other hand, I am confused about the implementation of cost regression in csoaa - in csoaa, is each example (x,c) converted to a loss \ell(f, (x,c)) = \sum_{a=1}^K (f(x,a) - c(a))^2, or \ell(f, (x,c)) = 1/K * \sum_{a=1}^K (f(x,a) - c(a))^2?

If it is the former, then I think we shouldn't scale the loss in the MTR reduction by 1/K, as before scaling, \EE[\hat{\ell}(f, (x,c))] = \sum_{a=1}^K (f(x,a) - c(a))^2. (Basically we would like the optimization objective to be exactly the same as the one in our paper, Appendix A.)

  1. Are we happy with showing the VW doubling progress report with number of examples starting from #(warm start examples)? (see e.g. the first VW output in this page) In my implementation, I set the examples' importance weight to zero if they are not in the interaction stage, so that we are only counting the average loss in the interaction stage.

  2. We explicitly use --warm_start_update and --interaction_update as two input options to indicate if VW turn on updates in the respective two stages. Is it too long for users?

@JohnLangford JohnLangford assigned jackgerrits and unassigned lokitoth Feb 14, 2019
@jackgerrits
Copy link
Member

@zcc1307 I'm going to help you get this in. Grab me at some point and let me know how I can help here.

GEN_CS::call_cs_ldf<true>(
base, mydata.gen_cs.mtr_ec_seq, mydata.cb_labels, mydata.cs_labels, mydata.prepped_cs_labels, mydata.offset);
examples[mydata.gen_cs.mtr_example]->weight *= 1.f / examples[mydata.gen_cs.mtr_example]->l.cb.costs[0].probability * ((float)mydata.gen_cs.event_sum / (float)mydata.gen_cs.action_sum) * (1.f / (float)examples.size());
GEN_CS::call_cs_ldf<true>(base, mydata.gen_cs.mtr_ec_seq, mydata.cb_labels, mydata.cs_labels, mydata.prepped_cs_labels, mydata.offset);
Copy link
Contributor Author

@zcc1307 zcc1307 Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I am a little confused about this line - what is ((float)mydata.gen_cs.event_sum / (float)mydata.gen_cs.action_sum)? Is it 1/K?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zcc1307
Copy link
Contributor Author

zcc1307 commented Feb 25, 2019

@jackgerrits Thanks, Jack! Yes, I think I have made changes to pass the checks.

@JohnLangford As we further divide the importance weights in the mtr reduction by 1/K, this caused some changes in regcb's test results. I am not sure if I should change the Lambda set setting as we discussed before to accomodate this 1/K change in mtr, because we also hope that the algorithm works also for other reductions (ips/dr)?

Instructions on using the --warm_cb option can be found in https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Warm-starting-contextual-bandits .

@JohnLangford
Copy link
Member

@zcc1307 I'm kind of confused. What is the question exactly?

We have some conflicts to merge.

@JohnLangford
Copy link
Member

Is this ready to merge?

@zcc1307
Copy link
Contributor Author

zcc1307 commented Apr 2, 2019

Yes, it is ready to merge! I have also updated the documentations:
https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Warm-starting-contextual-bandits

@jackgerrits
Copy link
Member

jackgerrits commented Apr 2, 2019 via email

@JohnLangford JohnLangford merged commit 31859a3 into VowpalWabbit:master Apr 2, 2019
@JohnLangford
Copy link
Member

Merged, thanks :-)

jackgerrits pushed a commit to jackgerrits/vowpal_wabbit that referenced this pull request May 15, 2019
* /

* not sure if the cost vector retrieved is correct

* not sure if the cost vector retrieved is correct

* added cbify warm start code

* commented out the multiple lambda code in cbify

* commented out the multiple lambda code in cbify

* the cbexplore approach seems not working, as the first stage cannot prepare multiple copies of weights

* .

* properly store the temp labels

* back

* .

* fixed the bug with assigning cb label before cost sensitive prediction - the ec.l field is anunion

* the cumulative cost become diverse

* modified csoaa so that it can take example weights now.

* .

* added some results of warm starting

* added some results of warm starting

* before modifying cbify adf code

* start modifying cbify adf code

* unkwown segfault error

* everything good except for the cost sensitive learn part

* .

* .

* fixed the bug of empty example cost wrongly set

* fixed the bug of empty example cost wrongly set

* partially fix the importance weight issue

* fixed memory leak bug

* start changing the sample size paramters

* adding the bandit period as an explicit option

* file reorg

* tweak the python script

* added scatterplot script

* retracted the matplotlib inclusion

* .

* .

* regexp based line parsing for vw output (not tested yet)

* .

* .

* tweaked the scripts

* .

* .

* label corruption code

* supervised dataset validation

* lambda script

* weighting scheme

* .

* start properly copying the examples

* model is not updating in the supervised phase

* change to using proper copy example functions. Memory leak issues persist.

* .

* updated the lambda tuning scheme

* .

* fixed bug on zero warm start examples on small datasets

* added a refined weighting scheme and cumulative var calculation (not tested yet)

* warm start = 0 does not work

* fixed the csl label zero problem - now the label is set properly: 1,2,..K

* .

* make the lambda weighting more modular

* make adf modular

* the version where there is an error on memory free

* finished cleanup (need to double check the cb label swap in the adf case)

* adjusted the output of the script so that it is more systematic

* a more complete summary file

* bring back the pairwise comparison plot

* added type 3 noise

* (warm start type = 2, adf) setting gives wrong results

* (warm start type = 2, adf) setting gives wrong results

* fixed the place of weight multiplier calculation

* force the changes

* before modifying the baseline of no update

* a new parameter enumeration scheme

* .

* .

* updated scripts

* cleaned up the run vw script; need more tests on more choices of param settings

* fixed memory lost problems; still reachable problems still not resolved

* started cleaning up the cost-sensitive mc to cs conversion

* begin changing the cb learning w/o adf part

* finished cleaning up the no adf part

* before cleaning up adf

* mwt explorer kept outputting action 0

* roll back to a state before reorg that is working

* intermediate state

* fixed a problem in noadf:lambda selection now happens before update

* there is still a memory leak issue for ecs[0].pred.a_s

* lines for respective validation methods

* commented out matplotlib

* commented out matplotlib

* rename running script

* trial on compiling vw in one of the subtasks

* before merging

* cleaned up all errors except for calling cost sensitive learning

* fixed offset bugs in cb_explore and multiline_predict_or_learn

* fixed error on split/nosplit swapping

* fixed all memory leaks in warm start ground truth

* fixed memory leaks in supervised ground truth

* added cbify warm start test cases

* removed unnecessary include path prefix

* cleaning up script

* finished updating the running vw script

* .

* removed running scripts

* removed spurious changes

* removed spurious changes

* undoing the weight scaling by 1/k in mtr

* updated tests

* added warm_cb as a separate file

* .

* removed part on non-adf

* redoing the importance weight scaling by a factor of 1/k

* .

* comma typo

* removed redundant comments

* resolve conflicts

* compile error on peeking epsilon in warm_cb.cc

* fixed sim-bandit option, disallow cost-sensitive corruption

* begin fixing importance weight in cs examples

* revert cost_sensitive.cc

* fixed the weighting issue in cs examples

* .

* edited vw_core.vcxproj

* added new warm cb test cases

* overwrote regcb test results, as we further divide importance weights of each example in the mtr reduction by 1/num_actions

* corrected a mistake in new regcb test result

* reorder reduction stack

* changed the weight scaling back without 1/K; changed the central value of lambda

* changed back regcbopt test results; undo changes in cb_adf.cc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants