-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADF Hello World example #1233
Comments
You seem to have 2-action examples (i.e. two lines) and the features of
these actions are the same. Hence, the internal predictions for each
examples will be the same, and then ties break towards the first action.
…-John
On Thu, Apr 27, 2017 at 1:33 PM, Tilaye Yismaw Alemu < ***@***.***> wrote:
Featurized actions [1] can be very useful for dynamic number of actions
but documentation is a bit unclear. Tried to make a very simple hello-world
example to see it work.
Let's say we have two features, orange and apple. I designed an example
where action 0 would be the best action when orange is observed, and
action 1 when apple is observed.
action feature cost
0 orange low
0 apple high
1 orange high
1 apple low
Here is the vw formatted training data. I encoded low cost as 0 and high
cost as 1.
0:0:0.5 | orange
| orange
0:1:0.5 | apple
| apple
| orange
0:1:0.5 | orange
| apple
0:0:0.5 | apple
Test data for when apple feature is observed,
0:0:0 | apple
| apple
Test for orange,
0:0:0 | orange
| orange
I trained with --cb_adf option. Java source code attached as well.
Problem is, I get action 0 as the predicted action in both test cases.
Tried giving it more training data by duplicating the examples and result
is the same.
Why doesn't vw predict action 1 for orange? All pointers appreciated!
[1] https://github.com/JohnLangford/vowpal_wabbit/wiki/Contextual-Bandit-
algorithms
AdfHelloWorld.zip
<https://github.com/JohnLangford/vowpal_wabbit/files/962386/AdfHelloWorld.zip>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1233>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AAE25g4g2PP0fYd8UIiGhlfghzFQ1z2Iks5r0NFmgaJpZM4NKiO1>
.
|
Correct. The features are the same. In my scenario, 1) all actions have the same features and 2) number of actions can change through time. Is it achievable? Imagine for example a learner system that guesses the type of fruit in a picture. All actions will see the same features. And at times, we may want to add new actions. So if the system sees something that looks like an orange, it would give us actions 0,1. And later on if we add a third action for mandarin, it will explore the new action for some time and then correctly predict 0,2,1 when it is presented with an orange-like feature. |
Actions must be distinct at the feature level for VW learning algorithms to
do something. Stated another way, there is no implicit identifier for an
action (maybe there should be?).
…-John
On Tue, May 2, 2017 at 10:20 AM, Tilaye Yismaw Alemu < ***@***.***> wrote:
Correct. The features are the same. In my scenario, 1) all actions have
the same features and 2) number of actions can change through time. Is it
achievable?
Imagine for example a learner system that guesses the type of fruit in a
picture. All actions will see the same features. And at times, we may want
to add new actions. So if the system sees something that looks like an
orange, it would give us actions 0,1. And later on if we add a third action
for mandarin, it will explore the new action for some time and then
correctly predict 0,2,1 when it is presented with an orange-like feature.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1233 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAE25kcAZx4HFTI3UgxCJvK2e7WPxGrpks5r1zu6gaJpZM4NKiO1>
.
|
Am I missing an obvious VW approach to this, even one that doesn't use I have tried If all fails, it would be great to get pointers on development work needed to make this happen. I can have a go at it. Thanks again. |
@JohnLangford I was able to achieve this using the raw predictions from csoaa. Had to update the java wrapper to expose raw predictions. Would it be possible to review pull #1244 please? My commit is based on work by @mttdbrd as discussed under #1118. For others following this ticket, here is a CLI example,
|
W.r.t. exploration, you can change the exploration algorithm when using
--cb_explore. There are a variety of algorithms available.
W.r.t. obvious ways to do things, I'd add some feature which is an action
identifier so that each action can have a different score.
I expect to get to pull request next week (sick right now + NIPS
deadline...).
…-John
On Mon, May 15, 2017 at 11:26 AM, Tilaye Yismaw Alemu < ***@***.***> wrote:
@JohnLangford <https://github.com/johnlangford> I was able to achieve
this using the raw predictions from csoaa. Had to update the java wrapper
to expose raw predictions. Would it be possible to review pull #1244
<#1244> please? My
commit is based on work by @mttdbrd <https://github.com/mttdbrd> as
discussed under #1118
<#1118>.
For others following this ticket, here is a CLI example,
echo "
1:0.0 | orange
2:1.0 | orange
1:1.0 | apple
2:0.0 | apple
" > train.vw
echo "
1 2 | orange
1 2 | apple
" > test.vw
vw -d train.vw --csoaa 2 -f model
vw -d test.vw -t -i model -r raw_predictions
cat raw_predictions
1:0.253423 2:0.465651
1:0.506846 2:0.0445859
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1233 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAE25gIVlJUWORIe5g071GO79PGxYjutks5r6G7DgaJpZM4NKiO1>
.
|
Featurized actions [1] can be very useful for dynamic number of actions but documentation is a bit unclear. Tried to make a very simple hello-world example to see it work.
Let's say we have two features,
orange
andapple
. I designed an example where action 0 would be the best action whenorange
is observed, and action 1 whenapple
is observed.Here is the vw formatted training data. I encoded
low cost
as 0 andhigh cost
as 1.Test data for when
apple
feature is observed,Test for
orange
,I trained with
--cb_adf
option. Java source code attached as well.Problem is, I get action 0 as the predicted action in both test cases. Tried giving it more training data by duplicating the examples and result is the same.
Why doesn't vw predict action 1 for
orange
? All pointers appreciated![1] https://github.com/JohnLangford/vowpal_wabbit/wiki/Contextual-Bandit-algorithms
AdfHelloWorld.zip
The text was updated successfully, but these errors were encountered: