Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADF Hello World example #1233

Closed
tilayealemu opened this issue Apr 27, 2017 · 6 comments
Closed

ADF Hello World example #1233

tilayealemu opened this issue Apr 27, 2017 · 6 comments

Comments

@tilayealemu
Copy link

Featurized actions [1] can be very useful for dynamic number of actions but documentation is a bit unclear. Tried to make a very simple hello-world example to see it work.

Let's say we have two features, orange and apple. I designed an example where action 0 would be the best action when orange is observed, and action 1 when apple is observed.

action feature cost
0 orange low
0 apple high
1 orange high
1 apple low

Here is the vw formatted training data. I encoded low cost as 0 and high cost as 1.

0:0:0.5 | orange
| orange

0:1:0.5 | apple
| apple

| orange
0:1:0.5 | orange

| apple
0:0:0.5 | apple

Test data for when apple feature is observed,

0:0:0 | apple
| apple

Test for orange,

0:0:0 | orange
| orange

I trained with --cb_adf option. Java source code attached as well.

Problem is, I get action 0 as the predicted action in both test cases. Tried giving it more training data by duplicating the examples and result is the same.

Why doesn't vw predict action 1 for orange? All pointers appreciated!

[1] https://github.com/JohnLangford/vowpal_wabbit/wiki/Contextual-Bandit-algorithms

AdfHelloWorld.zip

@tilayealemu tilayealemu changed the title Adf Hello World example ADF Hello World example Apr 27, 2017
@JohnLangford
Copy link
Member

JohnLangford commented Apr 28, 2017 via email

@tilayealemu
Copy link
Author

Correct. The features are the same. In my scenario, 1) all actions have the same features and 2) number of actions can change through time. Is it achievable?

Imagine for example a learner system that guesses the type of fruit in a picture. All actions will see the same features. And at times, we may want to add new actions. So if the system sees something that looks like an orange, it would give us actions 0,1. And later on if we add a third action for mandarin, it will explore the new action for some time and then correctly predict 0,2,1 when it is presented with an orange-like feature.

@JohnLangford
Copy link
Member

JohnLangford commented May 4, 2017 via email

@tilayealemu
Copy link
Author

tilayealemu commented May 5, 2017

Am I missing an obvious VW approach to this, even one that doesn't use --cb_adf?

I have tried --cb_explore too. However probabilities seem to be equally divided among all actions except for the best action. For example when predicting for 4 actions, if best action comes out with a probability of 0.9625 then the other three have 0.0125. So I couldn't use it to rank the actions.

If all fails, it would be great to get pointers on development work needed to make this happen. I can have a go at it.

Thanks again.

tilayealemu pushed a commit to tilayealemu/vowpal_wabbit that referenced this issue May 15, 2017
@tilayealemu
Copy link
Author

@JohnLangford I was able to achieve this using the raw predictions from csoaa. Had to update the java wrapper to expose raw predictions. Would it be possible to review pull #1244 please? My commit is based on work by @mttdbrd as discussed under #1118.

For others following this ticket, here is a CLI example,

echo "
1:0.0 | orange
2:1.0 | orange
1:1.0 | apple
2:0.0 | apple
" > train.vw

echo "
1 2 | orange
1 2 | apple
" > test.vw

vw -d train.vw --csoaa 2 -f model
vw -d test.vw -t -i model -r raw_predictions

cat raw_predictions
1:0.253423 2:0.465651
1:0.506846 2:0.0445859

@JohnLangford
Copy link
Member

JohnLangford commented May 17, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants