Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose raw predictions in java wrapper (#1118 #1233) #1244

Closed
wants to merge 4 commits into from

Conversation

tilayealemu
Copy link

No description provided.

@JohnLangford
Copy link
Member

Sorry, can you resolve conflicts on this?

We're dealing with a bit of a mess because I had to back out the concurrent java interface patch.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 69.712% when pulling b209176 on tilayealemu:master into 3c21bc2 on JohnLangford:master.

@tilayealemu
Copy link
Author

@JohnLangford Resolved.

@JohnLangford
Copy link
Member

@jmorra Can you take a look at this? It's short.

Copy link
Contributor

@jon-morra-zefr jon-morra-zefr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great use case and something I might need myself. I'm excited to see this go in.

jfloatArray j_labels = env->NewFloatArray(num_values);
for (int i=0 ; i<num_values; i++) {
jfloat f[] = { vec->l.cs.costs[i].partial_prediction };
env->SetFloatArrayRegion(j_labels, i, 1, (float*)f);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be a little lost here, but how are you sure that the length of f is 1?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expected a double here instead of an array because there should be one value. I ran tests to verify array sizes are all 1 in this case. My understanding is there could be cases where array sizes can be >1 (adf?). I can investigate further if we decide to go with the PR.

*
* @param example a single vw example string
* @return Raw prediction
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this code should go here. From what I understand about VW, the action of getting a raw prediction is applicable to a wider variety of cases than just multi class prediction. @JohnLangford can you comment on this. If this is the case then we'd want to put this code higher up in the chain and allow more learners to take advantage of it. If not, then this seems fine. @deaktator what do you think?

@JohnLangford
Copy link
Member

I'm somewhat uncomfortable about the patch because it's using some prediction information squirreled away in the label rather than the prediction data. This was a hack Hal introduced some time ago which is certainly useful, but since then we've update our prediction types. https://github.com/JohnLangford/vowpal_wabbit/blob/master/vowpalwabbit/example.h#L44
The existing code already exposes several prediction types:
https://github.com/JohnLangford/vowpal_wabbit/blob/master/vowpalwabbit/csoaa.cc#L808
Do any of these work for what you are trying to do?

@jmorra
Copy link
Collaborator

jmorra commented Jun 16, 2017

I wonder if you can use VWProbLearner to get the probabilities from a csoaa model? If so then maybe we don't need this PR. Is this possible?

@JohnLangford
Copy link
Member

With the right flags, yes. The action-score representation is also available.

@tilayealemu are you using --csoaa or --csoaa_adf?

@tilayealemu
Copy link
Author

@JohnLangford @jmorra

I'm using --csoaa.

My goal is to rank all actions from best to worst. I have tried both scores and probabilities. Neither of them worked (discussed under #1233). Raw predictions were understandable and usable. Others have raised raw predictions as well [1, 2]. But if there is a way to get raw without this PR then I'm all for that.

[1] https://groups.yahoo.com/neo/groups/vowpal_wabbit/conversations/topics/5958
[2] #1082

@jmorra jmorra mentioned this pull request Jun 22, 2017
@jon-morra-zefr
Copy link
Contributor

After looking at this for a while now I think the more appropriate change to make is to add the ability for the CSOAA learner to return action scores. On the LDF front there exists a flag called --csoaa_rank that (now) correctly returns the action scores and it works with the JNI layer. I see no reason why this same functionality cannot be ported over to the base csoaa learner. I don't have the expertise to do this. @JohnLangford do you think you can take a stab at this? If this is done this PR is not necessary and no changes to the JNI layer are needed as the JNI layer will return the appropriate type (ActionScores) when the --csoaa_rank flag is set for traditional CSOAA learning.

@JohnLangford
Copy link
Member

I think Jon Morra is right here---it's better (and more uniform) to have the system --csoaa have the same return types as --csoaa_ldf. This implies some c++ coding. Are you up for that?

@jon-morra-zefr
Copy link
Contributor

@JohnLangford this is out of my area of expertise. I've been struggling with it for a number of hours now and I'm not really sure how to do this. Is there any chance you could take a stab at it?

@tilayealemu
Copy link
Author

Saw the c++ code too, and it was beyond me. Happy to help if we decide to update any of the java code though. I know it better now.

@JohnLangford
Copy link
Member

JohnLangford commented Jun 28, 2017 via email

@jon-morra-zefr
Copy link
Contributor

@JohnLangford I might need this feature pretty soon, I was wondering if you had any ETA on when you might be able to look at this.

@JohnLangford
Copy link
Member

I'm working towards a release before ICML. Hopefully something addressing this can go in.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 69.742% when pulling 9091d9e on tilayealemu:master into 2c28eb2 on JohnLangford:master.

@JohnLangford
Copy link
Member

I've decided to not deal with this right now, although a pull request remains welcome.
(1) There is a refactoring brewing which will deal with this in the future. In essence, I want to refactor things so multiline examples are first class within VW and then refactor csoaa (amongst many others) so we do not have different versions for multline and singleline representations.
(2) You can already get the desired functionality through csoaa_ldf. It's slightly more curmudgeonly since you must use a multiline example but that should work fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants