[MRG] NMF notebook and logging fixups #2481

anotherbugmaster · 2019-05-07T14:31:40Z

Made a rebase

gensim/models/nmf.py

gensim/test/test_nmf.py

piskvorky · 2019-06-03T14:45:37Z

@anotherbugmaster what is the status? Do you plan to finish the NMF implementation in Gensim?

piskvorky · 2019-06-12T15:22:15Z

@mpenkov we have to decide what to do about NMF.

It seems @anotherbugmaster doesn't have the capacity to finish it, it's been dragging on for more than a year. I fear we'll be stuck supporting yet another not-quite-finished algorithm.

I see two options: 1) we finish NMF ourselves, or 2) we remove NMF from Gensim.

mpenkov · 2019-06-21T03:45:15Z

I'm a bit torn. On one hand, there's still a long way to go before this is 100% done. I don't think we have the capacity to finish this ourselves. On the other, there's been a lot of effort on this, it'd be a waste to just discard it.

@anotherbugmaster What is your opinion? What sort of timeline do you have in mind for finishing this work?

anotherbugmaster · 2019-06-21T07:36:54Z

@piskvorky, @mpenkov, It's all done, the only problems are Appveyor tests under 3.5 and 3.6 and Travis tests that refuses to run at all for some reason. I would be glad if you'd help me out there, cause I don't have a Windows machine and don't have any idea what's happened to Travis.

mpenkov · 2019-06-21T08:02:37Z

Try merging master in. That should fix at least some of the tests.

anotherbugmaster · 2019-06-21T16:03:42Z

Nope, it didn't work, same issue. :(

mpenkov · 2019-06-23T15:32:12Z

Looks like this is the cause of the problem:

doc        = [(0, 1), (1, 1), (2, 1)]
expected   = [0.02991635, 0.97008365]
self       = <gensim.test.test_nmf.TestNmf testMethod=testTransform>
transformed = [(0, 0.029590818817693605), (1, 0.97040918118230646)]
vec        = array([ 0.02959082,  0.97040915], dtype=float32)

The values are slightly off. Could it be a bug? Please investigate. If it's not a bug, we can relax the tolerances on those tests.

anotherbugmaster · 2019-06-24T09:17:45Z

@mpenkov, the thing is, these values are off only on the 3.5 and 3.6 on Windows, every other platform returns the correct value. We could, of course, relax the constraint, but wouldn't it be too loose?

mpenkov · 2019-06-25T06:12:34Z

Relax the constraint under those conditions only (Windows, Py3.5 and 3.6), and add an informative comment linking to this discussion.

piskvorky · 2019-08-21T09:15:50Z

@anotherbugmaster @mpenkov What's the status here? It's unlikely math operations work differently under Windows, so this must be a bug. Either in Gensim NMF or, less likely, higher up the Stack: Python, numpy etc. Or is such non-determinism expected? What is its source?

It shouldn't be hard to track down where the computed values start diverging, although I understand that doing it via CI (unless someone has a Windows machine they can use) is not very convenient.

mpenkov · 2019-08-27T07:44:39Z

@anotherbugmaster Ping on this. Are you able to diagnose the problem yourself?

Maocx · 2019-11-04T17:17:24Z

Hey, I've spent some time investigating the issue of divergence between Windows (10) and Linux (Ubuntu 18.04, virtual machine using virtualbox). I used the following code to produce these outputs:

from gensim.models import nmf
from gensim.test.utils import common_corpus, common_dictionary

import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.DEBUG)

model = nmf.Nmf(
    common_corpus,
    id2word=common_dictionary,
    chunksize=1,
    num_topics=2,
    passes=100,
    random_state=42,
)

print("_W:", model._W)

doc = list(common_corpus)[0]
transformed = model[doc]

Debug logging points us in the direction of the h_error. In this iterative function (it does multiple iterations for each update) the error diverges between Windows and Linux from the first iteration of the NMF algorithm: (First Windows, then Linux)

019-11-04 17:43:40,212 : INFO : running NMF training, 2 topics, 100 passes over the supplied corpus of 9 documents, evaluating l2 norm every 9 documents
2019-11-04 17:43:40,212 : INFO : PROGRESS: pass 0, at document 1/9
2019-11-04 17:43:40,218 : DEBUG : h_error: None
2019-11-04 17:43:40,219 : DEBUG : h_error: 0.13008216155522226
2019-11-04 17:43:40,219 : DEBUG : h_error: 0.04467003211606769
2019-11-04 17:43:40,219 : DEBUG : h_error: 0.027791418113184013
2019-11-04 17:43:40,220 : DEBUG : h_error: 0.008731901498833499
2019-11-04 17:43:40,220 : DEBUG : h_error: 0.0027435125287528935
2019-11-04 17:43:40,220 : DEBUG : h_error: 0.0008619956370820014

2019-11-04 17:44:25,750 : INFO : running NMF training, 2 topics, 100 passes over the supplied corpus of 9 documents, evaluating l2 norm every 9 documents
2019-11-04 17:44:25,751 : INFO : PROGRESS: pass 0, at document 1/9
2019-11-04 17:44:25,760 : DEBUG : h_error: None
2019-11-04 17:44:25,762 : DEBUG : h_error: 0.13008216155522226
2019-11-04 17:44:25,763 : DEBUG : h_error: 0.04467003211606769
2019-11-04 17:44:25,763 : DEBUG : h_error: 0.027791418113184016
2019-11-04 17:44:25,764 : DEBUG : h_error: 0.0087319014988335
2019-11-04 17:44:25,764 : DEBUG : h_error: 0.00274351252875292
2019-11-04 17:44:25,765 : DEBUG : h_error: 0.000861995637081996

Investigating this further, it appears that there is an issue initializing WtW in the function gensim/models/nmf.py Nmf._solveproj: while the value of Wt is the same, there is a very small difference between Linux and Windows here. After exporting with np.save() and importing in Windows I get for the first iteration:

Wt
Out[3]: 
array([[0.05069568, 0.06610443, 0.02389818, 0.16117773, 0.04791553,
        0.04729737, 0.02469517, 0.17604869, 0.10337164, 0.09267482,
        0.14958715, 0.00689207],
       [0.01411154, 0.15544358, 0.0238965 , 0.07832598, 0.0553748 ,
        0.04753334, 0.19527335, 0.05738823, 0.03207273, 0.14414264,
        0.0230432 , 0.14541275]])
Wt_lin
Out[4]: 
array([[0.05069568, 0.06610443, 0.02389818, 0.16117773, 0.04791553,
        0.04729737, 0.02469517, 0.17604869, 0.10337164, 0.09267482,
        0.14958715, 0.00689207],
       [0.01411154, 0.15544358, 0.0238965 , 0.07832598, 0.0553748 ,
        0.04753334, 0.19527335, 0.05738823, 0.03207273, 0.14414264,
        0.0230432 , 0.14541275]])
Wt - Wt_lin
Out[5]: 
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
WtW
Out[6]: 
array([[0.1113233 , 0.0651363 ],
       [0.0651363 , 0.12130034]])
WtW_lin
Out[7]: 
array([[0.1113233 , 0.0651363 ],
       [0.0651363 , 0.12130034]])
WtW - WtW_lin
Out[8]: 
array([[0.00000000e+00, 0.00000000e+00],
       [0.00000000e+00, 1.38777878e-17]])
Wtv - Wtv_lin
Out[9]: 
array([[0.],
       [0.]])

It is my working hypothesis that this difference is enlarged due to the iterations. Counter arguments might be that the difference is smaller than the machine precision for float64 (e-16), but the first divergence of the error is e-18 (3th iteration).

It is noted here (numpy/numpy#9187) that Numpy does not aim to provide exactly the same results on different platforms. As a conclusion I think the non-determinism can be expected between the platforms and the solution of @mpenkov of relaxing the constraints under the conditions should be sound :)

piskvorky · 2019-11-04T18:23:50Z

Awesome, thanks for the detective work @Maocx !
What motivated your investigation? Do you need online NMF yourself?

The algo could use a "champion" in Gensim, who understands its implementation and is able to keep it sane going forward.

@mpenkov @anotherbugmaster was there anything else (besides the Windows divergence) that needed finishing up / polishing? Cheers.

mpenkov · 2019-11-05T06:55:23Z

To the best of my recollection, divergence was the only thing left.

piskvorky · 2019-11-05T08:14:42Z

Alright! Let's finish it up & release NMF officially then 🚀
Are its docs ready too?

anotherbugmaster · 2019-11-05T14:37:51Z

Hello everyone. Docs are ready, though I haven't revisited them since May. Anyway, I think everything should work OK, I'll relax the constraints and merge master.

anotherbugmaster · 2019-11-10T17:48:47Z

@piskvorky @mpenkov I guess we could merge it, what do you think?

gensim/test/test_nmf.py

gensim/models/nmf.py

Maocx · 2019-11-22T13:03:09Z

@anotherbugmaster, I'm a bit disappointed in the difference between the generated W matrix with different RandomState values. I found that you introduced them around January 29, can you still remember your reasoning for including them? For instance: the weight of the second topic here changes by nearly 50% by changing the randomstate:

import logging

import numpy as np
from gensim.models import nmf
from gensim.test.utils import common_corpus, common_dictionary

logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)


def perform_nmf(random_state=None):
    """
    Perform nmf/
    """
    model = nmf.Nmf(
        common_corpus,
        id2word=common_dictionary,
        chunksize=1,
        num_topics=2,
        passes=100,
        random_state=random_state,
        h_stop_condition=10e-6
    )
    return model


model1 = perform_nmf(random_state=42)

model2 = perform_nmf(random_state=43)
diff = model1._W - model2._W
t = np.sum(diff)

print(t)
print(model1._W)
print(model2._W)
>>>
0.4181815407169876
[[0.25698389 0.13212215]
 [0.         0.46704016]
 [0.02863034 0.35457582]
 [0.43792164 0.        ]
 [0.32878853 0.        ]
 [0.35545745 0.66227907]
 [0.43792164 0.        ]
 [0.51330699 0.06626947]
 [0.06355195 0.44236637]
 [0.02828574 0.        ]
 [0.07657501 0.        ]
 [0.06688829 0.        ]]
[[0.25543047 0.09710711]
 [0.         0.36720422]
 [0.01217318 0.27417546]
 [0.43444168 0.        ]
 [0.32464544 0.        ]
 [0.23349835 0.66157953]
 [0.43444168 0.        ]
 [0.46870586 0.1142939 ]
 [0.         0.41854257]
 [0.03644202 0.        ]
 [0.09030456 0.        ]
 [0.0777969  0.        ]]

I tried to follow the cited paper to check the implementation and had some difficulties to identify what steps you used, perhaps adding them somewhere in the docstring would be nice :) This is what I noted down from the exercise:

Modelling the outlier vectors is omitted, in contrast to the paper (Perhaps you remember the reason ? :) )
The projected gradient descent method from section IV.A is used to solve for h
(this is a guess) the PGD is optimized using stochastic gradient descent?

anotherbugmaster · 2019-11-22T14:03:41Z

@Maocx, concerning your questions:

Random state is what controls the initial values for the W matrix, so it's quite expected that values are changing to some degree, NMF methods are very sensitive to initialization.
Unfortunately, outlier vectors caused a great performance drop and I couldn't think of a way to solve this issue.
I think so, what's your point?
Well, in a sense, yes. We're alternating between updating the W and the next h, and project values of these matrices on the non-negative space.

Maocx · 2019-11-22T14:41:23Z

Thanks for your answer! Sorry for mixing my questions up a bit:

3: 2 different options for finding this value were presented in the paper, I think documenting which is used would be nice for future reference
4: I was wondering about the permutations used here: https://github.com/RaRe-Technologies/gensim/blob/e7c9f0efda75a98b379ad9c4b87afe3a317c57f6/gensim/models/nmf_pgd.pyx#L45 This randomization of the component order is not documented in the cited paper :)

anotherbugmaster · 2019-11-22T15:01:13Z

@Maocx
3. Yeah, I totally agree. :)
4. I took the idea from the sklearn.decomposition.NMF implementation. Can't remember the exact reason for doing this (AFAIK the algorithm in this section is a Coordinate Descent, that's why we mix up the coordinates in order to randomize a direction of each step), but I suppose that it lead to a performance increase.

mpenkov · 2020-06-10T08:00:53Z

@anotherbugmaster @Maocx Ping on this PR. How's it going? Looks like there is a merge conflict - can you please resolve it? Is there anything else left to do before we merge?

anotherbugmaster · 2020-06-11T13:50:23Z

@mpenkov seems like the merge conflict was caused by rtol that got even more strict: now it's 1e-3 instead of 1e-2. I don't know why rtol was changed in develop and, to be honest, not that interested in NMF anymore to find that out

piskvorky · 2020-06-11T14:25:15Z

Actually rtol was relaxed by @menshikh-iv , from 1e-4 to 1e-3 here:
a2ec4c3#diff-ab0724b3cf3845e81150fb3a18ff045eL101

And all tests in that PR passed, so not sure why 1e-2 would be needed?

anotherbugmaster · 2020-06-11T14:29:31Z

Windows tests wouldn't work with some versions of Python. You can accept current version with 1e-3 if that's not the case anymore

gensim/models/nmf.py

gensim/test/test_nmf.py

anotherbugmaster requested review from piskvorky and mpenkov May 7, 2019 14:31

anotherbugmaster self-assigned this May 7, 2019

anotherbugmaster force-pushed the nmf_add_lsi branch from 2ae8579 to 39cdd77 Compare May 7, 2019 16:25

Add an LSI

5155e4e

anotherbugmaster force-pushed the nmf_add_lsi branch from 39cdd77 to 5155e4e Compare May 7, 2019 16:26

mpenkov requested changes May 8, 2019

View reviewed changes

gensim/models/nmf.py Outdated Show resolved Hide resolved

gensim/test/test_nmf.py Outdated Show resolved Hide resolved

gensim/test/test_nmf.py Outdated Show resolved Hide resolved

piskvorky added the stale Waiting for author to complete contribution, no recent effort label Jun 12, 2019

Merge remote-tracking branch 'upstream/develop' into nmf_add_lsi

3873e22

piskvorky marked this pull request as ready for review November 5, 2019 08:14

Relax the constraint

b01a9c3

piskvorky requested changes Nov 10, 2019

View reviewed changes

gensim/test/test_nmf.py Outdated Show resolved Hide resolved

gensim/test/test_nmf.py Outdated Show resolved Hide resolved

mpenkov reviewed Nov 11, 2019

View reviewed changes

gensim/models/nmf.py Show resolved Hide resolved

mpenkov added this to the 4.0.0 milestone Feb 25, 2021

piskvorky self-assigned this Feb 25, 2021

Merge branch 'develop' into nmf_add_lsi

a89cacf

piskvorky force-pushed the nmf_add_lsi branch 2 times, most recently from 703721d to 47e60f3 Compare March 7, 2021 16:50

investigating broken tests, fixing code style

176d281

piskvorky force-pushed the nmf_add_lsi branch from 47e60f3 to 176d281 Compare March 7, 2021 17:15

piskvorky added 2 commits March 7, 2021 19:51

remove dead code

98679d2

use absolute tolerance in nmf test

b12534b

piskvorky changed the title ~~NMF notebook and logging fixups~~ [MRG] NMF notebook and logging fixups Mar 7, 2021

piskvorky approved these changes Mar 7, 2021

View reviewed changes

piskvorky requested a review from mpenkov March 7, 2021 20:23

mpenkov reviewed Mar 9, 2021

View reviewed changes

gensim/models/nmf.py Outdated Show resolved Hide resolved

mpenkov reviewed Mar 9, 2021

View reviewed changes

gensim/test/test_nmf.py Show resolved Hide resolved

fix NMF log formatting

54ceed4

piskvorky removed the stale Waiting for author to complete contribution, no recent effort label Mar 9, 2021

address review comment

315f65a

piskvorky merged commit 700d6b1 into piskvorky:develop Mar 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] NMF notebook and logging fixups #2481

[MRG] NMF notebook and logging fixups #2481

anotherbugmaster commented May 7, 2019

piskvorky commented Jun 3, 2019 •

edited

Loading

piskvorky commented Jun 12, 2019 •

edited

Loading

mpenkov commented Jun 21, 2019

anotherbugmaster commented Jun 21, 2019

mpenkov commented Jun 21, 2019

anotherbugmaster commented Jun 21, 2019

mpenkov commented Jun 23, 2019

anotherbugmaster commented Jun 24, 2019 •

edited

Loading

mpenkov commented Jun 25, 2019

piskvorky commented Aug 21, 2019 •

edited

Loading

mpenkov commented Aug 27, 2019

Maocx commented Nov 4, 2019 •

edited

Loading

piskvorky commented Nov 4, 2019 •

edited

Loading

mpenkov commented Nov 5, 2019

piskvorky commented Nov 5, 2019 •

edited

Loading

anotherbugmaster commented Nov 5, 2019

anotherbugmaster commented Nov 10, 2019

Maocx commented Nov 22, 2019 •

edited

Loading

anotherbugmaster commented Nov 22, 2019

Maocx commented Nov 22, 2019

anotherbugmaster commented Nov 22, 2019 •

edited

Loading

mpenkov commented Jun 10, 2020

anotherbugmaster commented Jun 11, 2020

piskvorky commented Jun 11, 2020 •

edited

Loading

anotherbugmaster commented Jun 11, 2020

[MRG] NMF notebook and logging fixups #2481

[MRG] NMF notebook and logging fixups #2481

Conversation

anotherbugmaster commented May 7, 2019

piskvorky commented Jun 3, 2019 • edited Loading

piskvorky commented Jun 12, 2019 • edited Loading

mpenkov commented Jun 21, 2019

anotherbugmaster commented Jun 21, 2019

mpenkov commented Jun 21, 2019

anotherbugmaster commented Jun 21, 2019

mpenkov commented Jun 23, 2019

anotherbugmaster commented Jun 24, 2019 • edited Loading

mpenkov commented Jun 25, 2019

piskvorky commented Aug 21, 2019 • edited Loading

mpenkov commented Aug 27, 2019

Maocx commented Nov 4, 2019 • edited Loading

piskvorky commented Nov 4, 2019 • edited Loading

mpenkov commented Nov 5, 2019

piskvorky commented Nov 5, 2019 • edited Loading

anotherbugmaster commented Nov 5, 2019

anotherbugmaster commented Nov 10, 2019

Maocx commented Nov 22, 2019 • edited Loading

anotherbugmaster commented Nov 22, 2019

Maocx commented Nov 22, 2019

anotherbugmaster commented Nov 22, 2019 • edited Loading

mpenkov commented Jun 10, 2020

anotherbugmaster commented Jun 11, 2020

piskvorky commented Jun 11, 2020 • edited Loading

anotherbugmaster commented Jun 11, 2020

piskvorky commented Jun 3, 2019 •

edited

Loading

piskvorky commented Jun 12, 2019 •

edited

Loading

anotherbugmaster commented Jun 24, 2019 •

edited

Loading

piskvorky commented Aug 21, 2019 •

edited

Loading

Maocx commented Nov 4, 2019 •

edited

Loading

piskvorky commented Nov 4, 2019 •

edited

Loading

piskvorky commented Nov 5, 2019 •

edited

Loading

Maocx commented Nov 22, 2019 •

edited

Loading

anotherbugmaster commented Nov 22, 2019 •

edited

Loading

piskvorky commented Jun 11, 2020 •

edited

Loading