Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: [gd] persist ppw extra state #4023

Merged
merged 96 commits into from
Feb 27, 2023
Merged

feat: [gd] persist ppw extra state #4023

merged 96 commits into from
Feb 27, 2023

Conversation

lalo
Copy link
Collaborator

@lalo lalo commented Jul 5, 2022

This is a proper fix for the problem patched in #3998 but for gd only.
Also related: #4020

  • adds an std::vector holding instances of state that is per model, i.e. set by ppw when building the reduction stack
  • adds logic to swap state based on the incoming ft_offset
  • learner.h allows to register a function used to resize the std::vector, since the base reduction has already instantiated by the time another reduction is registering their own ppw. The resizing calls are recursive to make sure all the stack is aware of the change.
  • todo: reset particular offset

@lalo lalo marked this pull request as draft July 5, 2022 17:13
@lalo lalo force-pushed the persist-ppm branch 3 times, most recently from 1edd275 to 2f3f606 Compare July 5, 2022 23:26
@lalo lalo requested a review from bassmang July 6, 2022 14:19
@lalo lalo changed the title feat: [gd] persist ppm state feat: [gd] persist ppw state Jul 6, 2022
@lalo lalo changed the title feat: [gd] persist ppw state feat: [gd] persist ppw extra state Jul 6, 2022
@lalo lalo marked this pull request as ready for review July 7, 2022 13:45
@lalo lalo force-pushed the persist-ppm branch 5 times, most recently from b0ba34c to 8f7e363 Compare July 7, 2022 15:22
Comment on lines +27 to +30
size_t num_learner_ceil = 1;
while (num_learner_ceil < num_actions) { num_learner_ceil *= 2; }
// TODO: temporary fix to ensure enough learners, look into case where learner is created outside bounds
sch.set_num_learners(num_learner_ceil);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me nervous...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emailed john about this with 5 or so options, agreed this is the most suitable solution

@@ -5,7 +5,7 @@ Reading datafile = test-sets/rcv1_small_test.data
num sources = 1
Num weight bits = 18
learning rate = 0.5
initial_t = 217.709
initial_t = 222.626
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like initial_t is determined by the weighted example sum of the loaded model, which would have been altered by the gd state


finished run
number of examples = 4
weighted example sum = 4.000000
weighted label sum = 0.000000
average loss = 0.412701
average loss = 0.273194
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is quite large

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but its moving to the right direction

test/core.vwtest.json Outdated Show resolved Hide resolved
average loss = 0.264386
best constant = 0.142477
best constant's loss = 0.690616
weighted example sum = 222.625671
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the reasons for these changes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loss went down

@@ -11,7 +11,7 @@ namespace VW
class cached_learner : public setup_base_i
{
public:
std::shared_ptr<VW::LEARNER::learner> setup_base_learner() override { return _cached; }
std::shared_ptr<VW::LEARNER::learner> setup_base_learner(size_t) override { return _cached; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to assert that _cached matches the request

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cached_learner currently doesnt exercise this api so its a moot point

@@ -1503,6 +1561,8 @@ std::shared_ptr<VW::LEARNER::learner> VW::reductions::gd_setup(VW::setup_base_i&

all.weights.stride_shift(static_cast<uint32_t>(::ceil_log_2(stride - 1)));

resize_ppw_state(*g, ppw);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this logic part of the constructor in line 1428 - now that we have that info so there's no need to resize

@lalo
Copy link
Collaborator Author

lalo commented Feb 27, 2023

LGTM

@bassmang bassmang merged commit e1a9363 into master Feb 27, 2023
@bassmang bassmang deleted the persist-ppm branch February 27, 2023 21:35
lalo added a commit that referenced this pull request Mar 20, 2023
commit adcaff2
Author: Marek Wydmuch <marek@wydmuch.poznan.pl>
Date:   Mon Mar 20 15:30:30 2023 +0100

    feat: add a training loss calculation to the predict method of PLT reduction (#4534)

    * add a training loss calculation to the predict method of PLT reduction

    * update PLT demo

    * update the tests for PLT reduction

    * disable the calculation of additional evaluation measures in PLT reduction when true labels are not available

    * apply black formating to plt_demo.py

    * remove unnecessary reset of weighted_holdout_examples variable in PLT reduction

    * revert the change of the path to the exe in plt_demo.py

    * apply black formating again to plt_demo.py

    ---------

    Co-authored-by: Jack Gerrits <jackgerrits@users.noreply.github.com>

commit f7a197e
Author: Griffin Bassman <griffinbassman@gmail.com>
Date:   Fri Mar 17 11:01:17 2023 -0400

    refactor: separate cb_to_cs_adf_mtr and cb_to_cs_adf_dr (#4532)

    * refactor: separate cb_to_cs_adf_mtr and cb_to_cs_adf_dr

    * clang

    * unused

    * remove mtr

commit e5597ae
Author: swaptr <83858160+swaptr@users.noreply.github.com>
Date:   Fri Mar 17 02:21:50 2023 +0530

    fix: fix multiline typo (#4533)

commit 301800a
Author: Eduardo Salinas <edus@microsoft.com>
Date:   Wed Mar 15 12:25:55 2023 -0400

    test: [automl] improve runtest and test changes (#4531)

commit 258731c
Author: Griffin Bassman <griffinbassman@gmail.com>
Date:   Tue Mar 14 13:28:03 2023 -0400

    chore: Update Version to 9.5.0 (#4529)

commit 49131be
Author: Eduardo Salinas <edus@microsoft.com>
Date:   Tue Mar 14 11:03:24 2023 -0400

    fix: [automl] avoid ccb pulling in generate_interactions (#4524)

    * fix: [automl] avoid ccb pulling in generate_interactions

    * same features in one line minimal repro

    * add assert of reserve size

    * update test file

    * remove include and add comment

    * temp print

    * sorting interactions matters

    * update temp print

    * fix by accounting for slot ns

    * remove prints

    * change comment and remove commented code

    * add sort to test

    * update runtests

    * Squashed commit of the following:

    commit 322a2b1
    Author: Eduardo Salinas <edus@microsoft.com>
    Date:   Mon Mar 13 21:51:49 2023 +0000

        possibly overwrite vw brought in by vw-executor

    commit 0a6baa0
    Author: Eduardo Salinas <edus@microsoft.com>
    Date:   Mon Mar 13 21:25:46 2023 +0000

        add check for metrics

    commit 469cebe
    Author: Eduardo Salinas <edus@microsoft.com>
    Date:   Mon Mar 13 21:22:38 2023 +0000

        update test

    commit 7c0b212
    Author: Eduardo Salinas <edus@microsoft.com>
    Date:   Mon Mar 13 21:11:45 2023 +0000

        format and add handler none

    commit 533e067
    Author: Eduardo Salinas <edus@microsoft.com>
    Date:   Mon Mar 13 20:56:07 2023 +0000

        test: [automl] add ccb test that checks for ft names

    * update python test

    * Update automl_oracle.cc

commit 37f4b19
Author: Griffin Bassman <griffinbassman@gmail.com>
Date:   Fri Mar 10 17:38:02 2023 -0500

    refactor: remove resize in gd setup (#4526)

    * refactor: remove resize in gd setup

    * rm resize

commit 009831b
Author: Griffin Bassman <griffinbassman@gmail.com>
Date:   Fri Mar 10 16:57:53 2023 -0500

    fix: multi-model state for cb_adf (#4513)

    * switch to vector

    * fix aml and ep_dec

    * clang

    * reorder

    * clang

    * reorder

commit a31ef14
Author: Griffin Bassman <griffinbassman@gmail.com>
Date:   Fri Mar 10 14:52:50 2023 -0500

    refactor: rename wpp, ppw, ws, params_per_problem, problem_multiplier, num_learners, increment -> feature_width (#4521)

    * refactor: rename wpp, ppw, ws, params_per_problem, problem_multiplier, num_learners, increment -> interleaves

    * clang

    * clang

    * settings

    * make bottom interleaves the same

    * remove bottom_interleaves

    * fix test

    * feature width

    * clang

commit 8390f48
Author: Griffin Bassman <griffinbassman@gmail.com>
Date:   Fri Mar 10 12:25:12 2023 -0500

    refactor: dedup dict const (#4525)

    * refactor: dedup dict const

    * clang

commit 2238d70
Author: Jack Gerrits <jackgerrits@users.noreply.github.com>
Date:   Thu Mar 9 13:51:35 2023 -0500

    refactor: add api to set data object associated with learner (#4523)

    * refactor: add api to set data object associated with learner

    * add shared ptr func

commit b622540
Author: Griffin Bassman <griffinbassman@gmail.com>
Date:   Tue Mar 7 12:16:39 2023 -0500

    fix: cbzo ppw fix (#4519)

commit f83cb7f
Author: Jack Gerrits <jackgerrits@users.noreply.github.com>
Date:   Tue Mar 7 11:21:51 2023 -0500

    refactor: automatically set label parser after stack created (#4471)

    * refactor: automatically set label parser after stack created

    * a couple of fixes

    * Put in hack to keep search working

    * formatting

commit 64e5920
Author: olgavrou <olgavrou@gmail.com>
Date:   Fri Mar 3 16:20:05 2023 -0500

    feat: [LAS] with CCB (#4520)

commit 69bf346
Author: Jack Gerrits <jackgerrits@users.noreply.github.com>
Date:   Fri Mar 3 15:17:29 2023 -0500

    refactor: make flat_example an implementation detail of ksvm (#4505)

    * refactor!: make flat_example an implementation detail of ksvm

    * Update memory_tree.cc

    * Absorb flat_example into svm_example

    * revert "Absorb flat_example into svm_example"

    This reverts commit b063feb.

commit f08f1ec
Author: Jack Gerrits <jackgerrits@users.noreply.github.com>
Date:   Fri Mar 3 14:04:48 2023 -0500

    test: fix pytype issue in test runner and utl (#4517)

    * test: fix pytype issue in test runner

    * fix version_number.py type checker issues

commit a8b1d91
Author: Eduardo Salinas <edus@microsoft.com>
Date:   Fri Mar 3 12:59:26 2023 -0500

    fix: [epsdecay] return champ prediction always (#4518)

commit b2276c1
Author: olgavrou <olgavrou@gmail.com>
Date:   Thu Mar 2 20:18:23 2023 -0500

    chore: [LAS] don't force mtr with LAS (#4516)

commit c0ba180
Author: olgavrou <olgavrou@gmail.com>
Date:   Tue Feb 28 11:27:36 2023 -0500

    feat: [LAS] add example ft hash and cache and re-use rows of matrix if actions do not change (#4509)

commit e1a9363
Author: Eduardo Salinas <edus@microsoft.com>
Date:   Mon Feb 27 16:35:09 2023 -0500

    feat: [gd] persist ppw extra state (#4023)

    * feat: [gd] persist ppm state

    * introduce resize_ppw_state

    * wip: move logic down to gd, respect incoming ft_offset

    * replace assert with status quo behaviour

    * implement writing/reading to modelfile

    * remove from predict

    * update test 351 and 411

    * update sensitivity and update

    * remove debug prints

    * update all tests

    * apply fix of other pr

    * use .at() for bounds checking

    * add max_ft_offset and add asserts

    * comment extra assert that is failing

    * remove files

    * fix automl tests

    * more tests

    * tests

    * tests

    * clang

    * fix for predict_only_model automl

    * comment

    * fix ppm printing

    * temporarily remove tests 50 and 68

    * address comments

    * expand width for search

    * fix tests

    * merge

    * revert cb_adf

    * merge

    * fix learner

    * clang

    * search fix

    * clang

    * fix unit tests

    * bump 9.7.1 for version CIs

    * revert to 9.7.0

    * stop search from learning out of bounds

    * expand search num_learners

    * fix search cs test

    * comment

    * revert ext_libs

    * clang

    * comment out saveresume tests

    * pylint

    * comment

    * fix with search

    * fix search

    * clang

    * unused

    * unused

    * commnets

    * fix scope_exit

    * fix cs test

    * revert automl test update

    * remove resize

    * clang

    ---------

    Co-authored-by: Griffin Bassman <griffinbassman@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants