Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert TopK reduction to be multiline example based #1752

Merged
merged 7 commits into from
Feb 8, 2019

Conversation

jackgerrits
Copy link
Member

Fixes #1748

weighted label sum = 15.000000
average loss = 0.000002
best constant = 1.666667
total feature number = 39
total feature number = 36
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes because you eliminate newline examples with a constant feature?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly

0.000004 0.000001 6 6.0 1.0000 1.0007 4
0.000003 0.000000 7 7.0 2.0000 2.0004 4
0.000003 0.000000 8 8.0 1.0000 1.0003 4
0.000002 0.000000 9 9.0 3.0000 2.9995 4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of multiline examples should be 3, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is 3, but I am recreating the previous behavior of printing statistics after each example. See line 91 in the reduction. What do you think is the correct thing to do here? Only print the statistics after each set of examples?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. All predictions from a top-k call should (conceptually) be on a single line. The idea of this reduction is that you are return the top-k-of-n things.

@JohnLangford JohnLangford merged commit a3c0943 into VowpalWabbit:master Feb 8, 2019
@JohnLangford
Copy link
Member

Merged, thanks :-)

rajan-chari pushed a commit to VowpalWabbit/reinforcement_learning that referenced this pull request Feb 22, 2019
RunTests: use test label number instead of counter (VowpalWabbit/vowpal_wabbit#1753)
Small Json parser cleanup (VowpalWabbit/vowpal_wabbit#1759)
Type erase json parser context for easier deletion ((VowpalWabbit/vowpal_wabbit#1760)
Fix static linking (VowpalWabbit/vowpal_wabbit#1758)
Fix build scripts forcing Debug builds. Add LTO mode and fix VW defau…(VowpalWabbit/vowpal_wabbit#1735)
Do not define BOOST_TEST_DYN_LINK when statically linking (VowpalWabbit/vowpal_wabbit#1750)
Convert TopK reduction to be multiline example based (VowpalWabbit/vowpal_wabbit#1752)
vw java 11 compatibility (VowpalWabbit/vowpal_wabbit#1700)
cbify: --cbify_ldf for multiline (csoaa_ldf) input datasets (VowpalWabbit/vowpal_wabbit#1681)
Merge pull request #1751 from yannstad/fix-tests (VowpalWabbit/vowpal_wabbit#1751)
[tests] Make repeat.py compatible with python 3 (VowpalWabbit/vowpal_wabbit#1747)
jackdoe pushed a commit to lucjb/vowpal_wabbit that referenced this pull request Mar 7, 2019
* Make topk a multiline learner

* Fix test for new format and rename B to K

* revert destructor usage

* remove all from data, move to output seq in finish

* Revert "remove all from data, move to output seq in finish"

This reverts commit 2c8f965.
@jackgerrits jackgerrits deleted the jagerrit/top_multi branch April 5, 2019 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants