-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert TopK reduction to be multiline example based #1752
Convert TopK reduction to be multiline example based #1752
Conversation
weighted label sum = 15.000000 | ||
average loss = 0.000002 | ||
best constant = 1.666667 | ||
total feature number = 39 | ||
total feature number = 36 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes because you eliminate newline examples with a constant feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly
0.000004 0.000001 6 6.0 1.0000 1.0007 4 | ||
0.000003 0.000000 7 7.0 2.0000 2.0004 4 | ||
0.000003 0.000000 8 8.0 1.0000 1.0003 4 | ||
0.000002 0.000000 9 9.0 3.0000 2.9995 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of multiline examples should be 3, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is 3, but I am recreating the previous behavior of printing statistics after each example. See line 91 in the reduction. What do you think is the correct thing to do here? Only print the statistics after each set of examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. All predictions from a top-k call should (conceptually) be on a single line. The idea of this reduction is that you are return the top-k-of-n things.
Merged, thanks :-) |
RunTests: use test label number instead of counter (VowpalWabbit/vowpal_wabbit#1753) Small Json parser cleanup (VowpalWabbit/vowpal_wabbit#1759) Type erase json parser context for easier deletion ((VowpalWabbit/vowpal_wabbit#1760) Fix static linking (VowpalWabbit/vowpal_wabbit#1758) Fix build scripts forcing Debug builds. Add LTO mode and fix VW defau…(VowpalWabbit/vowpal_wabbit#1735) Do not define BOOST_TEST_DYN_LINK when statically linking (VowpalWabbit/vowpal_wabbit#1750) Convert TopK reduction to be multiline example based (VowpalWabbit/vowpal_wabbit#1752) vw java 11 compatibility (VowpalWabbit/vowpal_wabbit#1700) cbify: --cbify_ldf for multiline (csoaa_ldf) input datasets (VowpalWabbit/vowpal_wabbit#1681) Merge pull request #1751 from yannstad/fix-tests (VowpalWabbit/vowpal_wabbit#1751) [tests] Make repeat.py compatible with python 3 (VowpalWabbit/vowpal_wabbit#1747)
* Make topk a multiline learner * Fix test for new format and rename B to K * revert destructor usage * remove all from data, move to output seq in finish * Revert "remove all from data, move to output seq in finish" This reverts commit 2c8f965.
Fixes #1748