-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate difference between new and 0.1.1 SAR versions when using movielens 10M #465
Comments
Results of new and old SAR with ML100k
The new SAR in staging:
Results of new and old SAR with ML1M
The new SAR in staging:
Results of new and old SAR with ML20M
The new SAR in staging:
|
just confirming that we see the same results despite changes to the notebook. |
line profiling the two version of the fit() method reveals a difference in the dot() calculation
barring the possibility that the actually matrices are not equivalent across versions (which seems unlikely as the 100k and 1m datasets produce the same results I expect the issue is due to something numpy is doing under the hood. see the answer here: https://stackoverflow.com/questions/19839539/how-to-get-faster-code-than-numpy-dot-for-matrix-multiplication it's possible that something with the datatypes changed which is preventing numpy from calling the BLAS library, though I wouldn't expect there to be significant differences in the final matrix? maybe it's only realized when the small differences are summed up across a large number of items. next step is just to confirm the user_affinity and item_similarity matrices are identical across versions and check the datatypes and numpy flags, then maybe try gemm directly. |
The stackoverflow entry may be not very relevant because the matrices involved are sparse. Actually the types of the matrices changed between master and staging, and also the jaccard etc. functions that produce |
Right, @anargyri as you mentioned the item_similarity object type changed from a numpy matrix to scipy sparse csc matrix, apparently the using the dot function on that is much slower than a dense matrix. But the good news is it's quite easy to convert it back to dense before passing it into the dot function. I tested that and it seems to resolve the speed issue. I am testing now to see how that change impacts the results. |
Also the efficiency of |
Here are the results from speed / eval standpoint. In the 'fix' version I just change the final calculation in fit() to be: You can see the 'fix' version is still slightly slower than 0.1.1 (@10m), and the performance is degraded, but I think this is just due to using single vs double floating point precision.
|
Thoughts on this? I can certainly update the line of code and 'correct' the test metrics. But just want to make sure others are ok with that approach |
this is great, I think the issue with the speed is solved by the dense multiplication. However, there is still the issue with the performance. It is super odd that 100k and 1M provide the exact same results and 10M (and I guess 20M) provide a different one. In SAR there should not be difference in the results across runs, since the algo is deterministic. @gramhagen can you check this?
Another question, the results you are showing in the table are float32 or float64? |
Nice plots! Good catch about the time efficiency. Since |
Results above are from 0.1.1 using float64 while staging uses float32 |
stalled a bit on this because after staring at the code for so long I was compelled to clean it up, anyway the issue is resolved and from what I can tell derived from numpy doing something funky when applying the dot product across a csr and csc matrix. The changes to jaccard and lift slowed down the computation a bit, so I reverted them and silenced the divide by zero warnings (the result is still correct regardless), there's a few more changes, but the metrics are back on track as well as the speed so I think we're in good shape, I'll push a PR shortly |
hey @gramhagen is this issue solved on the PR you are working on? |
Closing this now that #485 is merged |
What is affected by this bug?
In the SAR quickstart notebook, when using movielens 10M, there is a significat difference in the metrics and in the compute time
For SAR in release 0.1.1, the metrics are:
The new SAR in staging:
In which platform does it happen?
Steps for both versions:
The text was updated successfully, but these errors were encountered: