ADF Hello World example #1233

tilayealemu · 2017-04-27T17:33:25Z

Featurized actions [1] can be very useful for dynamic number of actions but documentation is a bit unclear. Tried to make a very simple hello-world example to see it work.

Let's say we have two features, orange and apple. I designed an example where action 0 would be the best action when orange is observed, and action 1 when apple is observed.

action	feature	cost
0	orange	low
0	apple	high
1	orange	high
1	apple	low

Here is the vw formatted training data. I encoded low cost as 0 and high cost as 1.

0:0:0.5 | orange
| orange

0:1:0.5 | apple
| apple

| orange
0:1:0.5 | orange

| apple
0:0:0.5 | apple

Test data for when apple feature is observed,

0:0:0 | apple
| apple

Test for orange,

0:0:0 | orange
| orange

I trained with --cb_adf option. Java source code attached as well.

Problem is, I get action 0 as the predicted action in both test cases. Tried giving it more training data by duplicating the examples and result is the same.

Why doesn't vw predict action 1 for orange? All pointers appreciated!

[1] https://github.com/JohnLangford/vowpal_wabbit/wiki/Contextual-Bandit-algorithms

AdfHelloWorld.zip

The text was updated successfully, but these errors were encountered:

JohnLangford · 2017-04-28T19:02:08Z

You seem to have 2-action examples (i.e. two lines) and the features of these actions are the same. Hence, the internal predictions for each examples will be the same, and then ties break towards the first action.

…

-John

On Thu, Apr 27, 2017 at 1:33 PM, Tilaye Yismaw Alemu < ***@***.***> wrote: Featurized actions [1] can be very useful for dynamic number of actions but documentation is a bit unclear. Tried to make a very simple hello-world example to see it work. Let's say we have two features, orange and apple. I designed an example where action 0 would be the best action when orange is observed, and action 1 when apple is observed. action feature cost 0 orange low 0 apple high 1 orange high 1 apple low Here is the vw formatted training data. I encoded low cost as 0 and high cost as 1. 0:0:0.5 | orange | orange 0:1:0.5 | apple | apple | orange 0:1:0.5 | orange | apple 0:0:0.5 | apple Test data for when apple feature is observed, 0:0:0 | apple | apple Test for orange, 0:0:0 | orange | orange I trained with --cb_adf option. Java source code attached as well. Problem is, I get action 0 as the predicted action in both test cases. Tried giving it more training data by duplicating the examples and result is the same. Why doesn't vw predict action 1 for orange? All pointers appreciated! [1] https://github.com/JohnLangford/vowpal_wabbit/wiki/Contextual-Bandit- algorithms AdfHelloWorld.zip <https://github.com/JohnLangford/vowpal_wabbit/files/962386/AdfHelloWorld.zip> — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1233>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAE25g4g2PP0fYd8UIiGhlfghzFQ1z2Iks5r0NFmgaJpZM4NKiO1> .

tilayealemu · 2017-05-02T14:20:41Z

Correct. The features are the same. In my scenario, 1) all actions have the same features and 2) number of actions can change through time. Is it achievable?

Imagine for example a learner system that guesses the type of fruit in a picture. All actions will see the same features. And at times, we may want to add new actions. So if the system sees something that looks like an orange, it would give us actions 0,1. And later on if we add a third action for mandarin, it will explore the new action for some time and then correctly predict 0,2,1 when it is presented with an orange-like feature.

JohnLangford · 2017-05-04T00:00:25Z

Actions must be distinct at the feature level for VW learning algorithms to do something. Stated another way, there is no implicit identifier for an action (maybe there should be?).

…

-John

On Tue, May 2, 2017 at 10:20 AM, Tilaye Yismaw Alemu < ***@***.***> wrote: Correct. The features are the same. In my scenario, 1) all actions have the same features and 2) number of actions can change through time. Is it achievable? Imagine for example a learner system that guesses the type of fruit in a picture. All actions will see the same features. And at times, we may want to add new actions. So if the system sees something that looks like an orange, it would give us actions 0,1. And later on if we add a third action for mandarin, it will explore the new action for some time and then correctly predict 0,2,1 when it is presented with an orange-like feature. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1233 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAE25kcAZx4HFTI3UgxCJvK2e7WPxGrpks5r1zu6gaJpZM4NKiO1> .

tilayealemu · 2017-05-05T13:19:43Z

Am I missing an obvious VW approach to this, even one that doesn't use --cb_adf?

I have tried --cb_explore too. However probabilities seem to be equally divided among all actions except for the best action. For example when predicting for 4 actions, if best action comes out with a probability of 0.9625 then the other three have 0.0125. So I couldn't use it to rank the actions.

If all fails, it would be great to get pointers on development work needed to make this happen. I can have a go at it.

Thanks again.

…t#1233)

tilayealemu · 2017-05-15T15:26:58Z

@JohnLangford I was able to achieve this using the raw predictions from csoaa. Had to update the java wrapper to expose raw predictions. Would it be possible to review pull #1244 please? My commit is based on work by @mttdbrd as discussed under #1118.

For others following this ticket, here is a CLI example,

echo "
1:0.0 | orange
2:1.0 | orange
1:1.0 | apple
2:0.0 | apple
" > train.vw

echo "
1 2 | orange
1 2 | apple
" > test.vw

vw -d train.vw --csoaa 2 -f model
vw -d test.vw -t -i model -r raw_predictions

cat raw_predictions
1:0.253423 2:0.465651
1:0.506846 2:0.0445859

JohnLangford · 2017-05-17T17:33:21Z

W.r.t. exploration, you can change the exploration algorithm when using --cb_explore. There are a variety of algorithms available. W.r.t. obvious ways to do things, I'd add some feature which is an action identifier so that each action can have a different score. I expect to get to pull request next week (sick right now + NIPS deadline...).

…

-John

On Mon, May 15, 2017 at 11:26 AM, Tilaye Yismaw Alemu < ***@***.***> wrote: @JohnLangford <https://github.com/johnlangford> I was able to achieve this using the raw predictions from csoaa. Had to update the java wrapper to expose raw predictions. Would it be possible to review pull #1244 <#1244> please? My commit is based on work by @mttdbrd <https://github.com/mttdbrd> as discussed under #1118 <#1118>. For others following this ticket, here is a CLI example, echo " 1:0.0 | orange 2:1.0 | orange 1:1.0 | apple 2:0.0 | apple " > train.vw echo " 1 2 | orange 1 2 | apple " > test.vw vw -d train.vw --csoaa 2 -f model vw -d test.vw -t -i model -r raw_predictions cat raw_predictions 1:0.253423 2:0.465651 1:0.506846 2:0.0445859 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1233 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAE25gIVlJUWORIe5g071GO79PGxYjutks5r6G7DgaJpZM4NKiO1> .

tilayealemu changed the title ~~Adf Hello World example~~ ADF Hello World example Apr 27, 2017

tilayealemu pushed a commit to tilayealemu/vowpal_wabbit that referenced this issue May 15, 2017

expose raw predictions in java wrapper (VowpalWabbit#1118 VowpalWabbi…

2956a1e

…t#1233)

JohnLangford closed this as completed Jun 16, 2017

tilayealemu mentioned this issue Jun 19, 2017

Expose raw predictions in java wrapper (#1118 #1233) #1244

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADF Hello World example #1233

ADF Hello World example #1233

tilayealemu commented Apr 27, 2017

JohnLangford commented Apr 28, 2017 via email

tilayealemu commented May 2, 2017

JohnLangford commented May 4, 2017 via email

tilayealemu commented May 5, 2017 •

edited

Loading

tilayealemu commented May 15, 2017

JohnLangford commented May 17, 2017 via email

ADF Hello World example #1233

ADF Hello World example #1233

Comments

tilayealemu commented Apr 27, 2017

JohnLangford commented Apr 28, 2017 via email

tilayealemu commented May 2, 2017

JohnLangford commented May 4, 2017 via email

tilayealemu commented May 5, 2017 • edited Loading

tilayealemu commented May 15, 2017

JohnLangford commented May 17, 2017 via email

tilayealemu commented May 5, 2017 •

edited

Loading