Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logistic regression C app stopped working between releases 8.5.0 and 8.6.0 #1780

Closed
jrmarkle opened this issue Mar 4, 2019 · 12 comments
Closed
Assignees
Labels
Bug Bug in learning semantics, critical by default

Comments

@jrmarkle
Copy link
Contributor

jrmarkle commented Mar 4, 2019

I wrote a small example C app to test logistic regression when linking against the C wrapper.

When I build and run using VW release 8.5.0 it seems to work properly and outputs

predict good = 0.996043, confidence = 12.814723
predict bad = 0.004103, confidence = 13.420224
predict unknown = 0.475109, confidence = 0.000000

but when updating to 8.6.0 (or 8.6.1) the training doesn't seem to be happening:

predict good = 0.500000, confidence = 0.000000
predict bad = 0.500000, confidence = 0.000000
predict unknown = 0.500000, confidence = 0.000000

Does anyone have a suggestion about the relevant changes between these releases? Looking at the the diffs of vwdll.h and vwdll.cpp, I don't see any clues as to why this code would no longer work.

@jrmarkle
Copy link
Contributor Author

jrmarkle commented Mar 5, 2019

I bisected to find that this stopped working with commit 276e0da.

Unfortunately the commit message is simply "compiling". The changes seem to be related to "test labels" but I'm not familiar enough to understand what's going on.

@JohnLangford
Copy link
Member

JohnLangford commented Mar 5, 2019 via email

@jackgerrits jackgerrits added the Bug Bug in learning semantics, critical by default label Mar 12, 2019
@jrmarkle
Copy link
Contributor Author

I narrowed the changes down to one line.

ae->test_only |= all.p->lp.test_label(&ae->l);

was added to setup_example in parser.cc in commit 276e0da.

If I remove that line my example app seems to work properly again. I also tried applying the same change to release 8.6.1 and it also works there.

@ataymano
Copy link
Member

Hi Jamie,
Thanks for narrowing the issue. Looking right now.

@jrmarkle
Copy link
Contributor Author

note that removing the line above does cause make test to fail

@ataymano
Copy link
Member

So the problem that we are setting test_only = F1(all) | F2(label) inside setup_example,
but also allow label to be updated later inside add_label call (without test_only update).

For resolving this particular problem, we can just recalculate test_only inside add_labels, but we need to add vw handle to signature in this case (to be able to get F1(all)):
VW_AddLabel(VW_EXAMPLE,...) -> VW_AddLabel(VW_HANDLE, VW_EXAMPLE,...)
or cache F1(all) inside example but separated from test_only.

Also, seems like all label manipulations from setup_example have to be invalidated and re-done after each add_label call - we are also writing to cache inside setup_example, and it seems to be another bug for this scenario.

@jrmarkle
Copy link
Contributor Author

Ah, ok. This should be easy to work around for my use case.

I changed my example app to not use VW_AddLabel and instead put the label value in the string given to VW_ReadExampleA and it works fine with release 8.6.1 unmodified.

@ataymano
Copy link
Member

Yes, that's right - I need to think more how to fix it properly, but just avoiding AddLabel should work as workaround.

@arielf
Copy link
Collaborator

arielf commented Mar 15, 2019

@jrmarkle @ataymano very nice diagnosis & work on this. Way to go. Thank you.

@ataymano
Copy link
Member

@JohnLangford, do you have idea what should be the best fix here? I see several ways to workaround original issue:

  1. Recalculate label-dependent fields in AddLabel with changing VW_AddLabel signature (i.e. adding VW_HANDLE there)
  2. Recalculate label-dependent fields in AddLabel without changing VW_AddLabel signature (but caching some additional info inside example:
    i.e. ae->test_only = is_test_only(all.p->in_pass_counter, all.holdout_period)
  3. Both p.1 and p.2 are not solving issue witth caching. We are writing to cache both features and labels consequently into single stream inside setup_example, but if we allow decoupling of read_example and add_label, cached label is going to be invalid. I do not see simple solution for this except just removing separate add_label function (otherwise it is tricky to support this scenario:
    ex1 = setup_example(...);
    ex2 = setup_example(...);
    add_label(ex1, ...);
    add_label(ex2, ...);

@ataymano
Copy link
Member

  1. Move setup_example from VW_ReadExample to VW_Learn (and leave just read_line there). Seems the best for me.

@jackgerrits
Copy link
Member

Fixed in #1859

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Bug in learning semantics, critical by default
Projects
None yet
Development

No branches or pull requests

5 participants