Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: [RLOS_2023] test for contextual bandit #4612

Merged
merged 5 commits into from
Jun 20, 2023

Conversation

michiboo
Copy link
Collaborator

@michiboo michiboo commented Jun 8, 2023

No description provided.

python/tests/pytest.json Outdated Show resolved Hide resolved
python/tests/test_cb.json Outdated Show resolved Hide resolved

CURR_DICT = os.path.dirname(os.path.abspath(__file__))

def combine_list_cmds_grids(cmds, base_grid):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like a lot of it is already implemented as add/mul operators of Grid and can be reused

@michiboo michiboo changed the base branch from master to rlos2023/test June 9, 2023 15:25
@@ -0,0 +1,7 @@
def constant_probability(chosen_action, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to be even_probability(1)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the fact that this function both can be represented as even_probability and not using any of its arguments is confuing.
Can we either remove it or make it with no arguments?

return reward[chosen_action - 1]


def fixed_reward_for_diff_context(**kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name seems to be misleading

python/tests/core.py Outdated Show resolved Hide resolved
python/tests/core.py Outdated Show resolved Hide resolved
return 1


def even_probability(chosen_action, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to use kwargs here but not regular named arguments?

return 1


def constant_reward(**kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kwargs seem to be replaceable with regular arguments

"name": "fixed_reward",
"params": {}
},
"probability_function": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to rename to "logging_policy"

@michiboo michiboo force-pushed the py_cb_test branch 3 times, most recently from b17b944 to bab17ed Compare June 18, 2023 13:58
return 1


def constant_reward(**kwargs):
Copy link
Member

@ataymano ataymano Jun 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def constant_reward(**kwargs):
def constant_reward(chosen_action: int, reward: float):

@@ -0,0 +1,22 @@
def fixed_reward(**kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do fixed signature here:
def reward_func(context: int, chosen_action: int)

def fixed_reward_two_action(**kwargs):
chosen_action = kwargs["chosen_action"]
chosen_context = kwargs["chosen_context"]
if chosen_context == 1 and chosen_action == 2:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to expose this logic to config file somehow:

  1. either via more descriptive function name
  2. or via extra parameter (like reward matrix)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but let's do it in next Pr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please replace chosen_context with context? - agent is not choosing it

@michiboo michiboo force-pushed the py_cb_test branch 2 times, most recently from c4e4079 to cd1b870 Compare June 20, 2023 14:10
@michiboo michiboo changed the title test: [RLOS_2023][WIP] test for contextual bandit test: [RLOS_2023] test for contextual bandit Jun 20, 2023
@ataymano ataymano merged commit d5f3c96 into VowpalWabbit:rlos2023/test Jun 20, 2023
ataymano added a commit that referenced this pull request Oct 5, 2023
* intro notebook

* test: [RLOS_2023][WIP] updated test for regression weight  (#4600)

* test: add test for regression weight

* test: make test more reusable by using json to specify pytest

* test: minor fix on naming

* test: add and option to python json test

* test: [RLOS_2023]  test for contextual bandit (#4612)

* test: add basic cb test and configuration

* test: add shared context data generation

* add test for cb_explore_adf

* test: dynamically create pytest test case

* test: give fixed reward function signature

* test: [RLOS_2023] [WIP] Support + and * expression for grids (#4618)

* test: add basic cb test and configuration

* test: add shared context data generation

* add test for cb_explore_adf

* test: dynamically create pytest test case

* test: give fixed reward function signature

* test: support + and * expression for grids

* fix empty expression bugs

* test: [RLOS2023] [WIP] add more arguments for reg&cb tests (#4619)

* test: add more arguments for reg&cb tests

* test: fix minor bug in generate expression & add loss funcs to tests

* test: [RLOS2023] [WIP] add classification test (#4623)

* test: add more arguments for reg&cb tests

* test: fix minor bug in generate expression & add loss funcs to tests

* test: add test for classification

* test: organize test framework structure (#4624)

* test: [RLOS2023][WIP] add option for storing output and grid language redefinition (#4627)

* test: redesign grid lang

* test: add option for store output

* test: change list to dict for config vars

* test: [RLOS2023] add test for slate (#4629)

* test: add test for slate

* test: test cleanup and slate test update

* test: minor cleanup and change assert_loss function to equal instead of lower

* test: [RLOS2023] add test for cb with continous action  (#4630)

* test: add test for slate

* test: test cleanup and slate test update

* test: minor cleanup and change assert_loss function to equal instead of lower

* test: add test for cb with continous action

* modify blocker testcase

* test: [RLOS2023] clean for e2e testing framework v2 (#4633)

* test: clean for e2e test v2

* test:change seed to same value for all tests

* test: add datagen driver (#4638)

* python black

* python black 2

* minor demo cleanup

---------

Co-authored-by: Alexey Taymanov <ataymano@microsoft.com>
Co-authored-by: Alexey Taymanov <41013086+ataymano@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants