Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate difference between new and 0.1.1 SAR versions when using movielens 10M #465

Closed
1 of 2 tasks
miguelgfierro opened this issue Jan 30, 2019 · 15 comments
Closed
1 of 2 tasks
Assignees
Labels
algorithm bug Something isn't working notebook Notebook related issues

Comments

@miguelgfierro
Copy link
Collaborator

miguelgfierro commented Jan 30, 2019

What is affected by this bug?

In the SAR quickstart notebook, when using movielens 10M, there is a significat difference in the metrics and in the compute time

For SAR in release 0.1.1, the metrics are:

Took 109.8672547340393 seconds for training.
Took 9.996008157730103 seconds for prediction.
Model:	sar_ref
Top K:	10
MAP:	0.101402
NDCG:	0.321073
Precision@K:	0.275766
Recall@K:	0.156483

The new SAR in staging:

Took 563.519758939743 seconds for training.
Took 8.40031886100769 seconds for prediction.
Model:	sar_ref
Top K:	10
MAP:	0.066775
NDCG:	0.249102
Precision@K:	0.216268
Recall@K:	0.106777

In which platform does it happen?

  • Azure Data Science Virtual Machine.

Steps for both versions:

  • Check the results with 100k, 1M and 20M
  • Check that the unit tests are passing
@miguelgfierro miguelgfierro added bug Something isn't working notebook Notebook related issues algorithm test labels Jan 30, 2019
@miguelgfierro miguelgfierro self-assigned this Jan 30, 2019
@miguelgfierro
Copy link
Collaborator Author

miguelgfierro commented Jan 30, 2019

Results of new and old SAR with ML100k
For SAR in release 0.1.1, the metrics are:

Took 0.5094079971313477 seconds for training.
Took 0.0852363109588623 seconds for prediction.
Model:	sar_ref
Top K:	10
MAP:	0.105815
NDCG:	0.373197
Precision@K:	0.326617
Recall@K:	0.175957

The new SAR in staging:

Took 0.5978834629058838 seconds for training.
Took 0.03621530532836914 seconds for prediction.
Model:	sar_ref
Top K:	10
MAP:	0.105815
NDCG:	0.373197
Precision@K:	0.326617
Recall@K:	0.175957

Results of new and old SAR with ML1M
For SAR in release 0.1.1, the metrics are:

Took 9.09568476676941 seconds for training.
Took 0.3207972049713135 seconds for prediction.
Model:	sar_ref
Top K:	10
MAP:	0.064013
NDCG:	0.308012
Precision@K:	0.277215
Recall@K:	0.109292

The new SAR in staging:

Took 5.787659168243408 seconds for training.
Took 0.4074389934539795 seconds for prediction.
Model:	sar_ref
Top K:	10
MAP:	0.064013
NDCG:	0.308012
Precision@K:	0.277215
Recall@K:	0.109292

Results of new and old SAR with ML20M
For SAR in release 0.1.1, the metrics are:


Took 538.0573198795319 seconds for training.
Took 40.26268434524536 seconds for prediction.
Model:	sar_ref
Top K:	10
MAP:	0.085287
NDCG:	0.287087
Precision@K:	0.247186
Recall@K:	0.134884

The new SAR in staging:

Took 2996.4829161167145 seconds for training.
Took 30.46952724456787 seconds for prediction.
Model:	sar_ref
Top K:	10
MAP:	0.021469
NDCG:	0.118241
Precision@K:	0.105806
Recall@K:	0.036814

@miguelgfierro miguelgfierro mentioned this issue Jan 30, 2019
3 tasks
@gramhagen
Copy link
Collaborator

just confirming that we see the same results despite changes to the notebook.
I took a python export of the notebook in staging and modified it to run through all the data set sizes then ran it on current staging as well as release 0.1.1 and the results match what you get. So it's definitely a change in the code itself. I'll look a bit closer on what has changed in the sar model.

@gramhagen gramhagen reopened this Jan 31, 2019
@gramhagen
Copy link
Collaborator

line profiling the two version of the fit() method reveals a difference in the dot() calculation

Version Line Time (us) Time (%) Line Contents
Staging 403 582523501.0 92.0 self.scores = self.user_affinity.dot(self.item_similarity)
Master 380 69671170.0 56.7 self.scores = self.user_affinity.dot(self.item_similarity)

barring the possibility that the actually matrices are not equivalent across versions (which seems unlikely as the 100k and 1m datasets produce the same results I expect the issue is due to something numpy is doing under the hood.

see the answer here: https://stackoverflow.com/questions/19839539/how-to-get-faster-code-than-numpy-dot-for-matrix-multiplication

it's possible that something with the datatypes changed which is preventing numpy from calling the BLAS library, though I wouldn't expect there to be significant differences in the final matrix? maybe it's only realized when the small differences are summed up across a large number of items.

next step is just to confirm the user_affinity and item_similarity matrices are identical across versions and check the datatypes and numpy flags, then maybe try gemm directly.

@anargyri
Copy link
Collaborator

The stackoverflow entry may be not very relevant because the matrices involved are sparse. Actually the types of the matrices changed between master and staging, and also the jaccard etc. functions that produce item_similarity. Also I changed the data types to single precision float (I think the code by Max resulted in integers and double precision floats in some places).
So, the picture is a bit complicated. What you say about comparing the matrices and their product makes sense.

@gramhagen
Copy link
Collaborator

Right, @anargyri as you mentioned the item_similarity object type changed from a numpy matrix to scipy sparse csc matrix, apparently the using the dot function on that is much slower than a dense matrix.

But the good news is it's quite easy to convert it back to dense before passing it into the dot function. I tested that and it seems to resolve the speed issue. I am testing now to see how that change impacts the results.

@anargyri
Copy link
Collaborator

Also the efficiency of dot() depends on the types of sparse matrices. I haven't found a recommendation in scipy about this, so I chose user_affinity to be csr and item_similarity to be csc.
It is probably important that item_similarity is not really sparse (maybe ~50% or so), so maybe there are no advantages in sparse multiplication (on that line; the products before that should benefit from sparse matrix types).
Does the dense matrix come at a cost of more memory?

@gramhagen
Copy link
Collaborator

it doesn't seem like we're saving much memory by using the sparse matrix, here's plots of memory usage over time (for the fit method on 10m dataset).

Current Staging
master_mprof

Current Master
staging_mprof

So we get ~ 15% reduction in memory.

@gramhagen
Copy link
Collaborator

gramhagen commented Jan 31, 2019

Here are the results from speed / eval standpoint. In the 'fix' version I just change the final calculation in fit() to be: self.scores = self.user_affinity.dot(self.item_similarity.todense())

You can see the 'fix' version is still slightly slower than 0.1.1 (@10m), and the performance is degraded, but I think this is just due to using single vs double floating point precision.

Version dataset fit (s) map ndcg precision@10 recall@10
SAR 0.1.1 ml 100k 0.5262 0.105815 0.37319 0.326617 0.175957
SAR Staging ml 100k 0.6034 0.105815 0.373197 0.326617 0.175957
SAR Staging Fix ml 100k 0.4761 0.105815 0.373197 0.326617 0.175957
SAR 0.1.1 ml 1m 6.2893 0.064013 0.308012 0.277215 0.109292
SAR Staging ml 1m 9.9825 0.064013 0.308012 0.277215 0.109292
SAR Staging Fix ml 1m 5.3546 0.064013 0.308012 0.277215 0.109292
SAR 0.1.1 ml 10m 132.1281 0.101402 0.321073 0.275766 0.156483
SAR Staging ml 10m 619.7484183 0.066775 0.249102 0.216268 0.106777
SAR Staging Fix ml 10m 212.5266 0.066775 0.249102 0.216268 0.106777

@gramhagen
Copy link
Collaborator

Thoughts on this? I can certainly update the line of code and 'correct' the test metrics. But just want to make sure others are ok with that approach

@miguelgfierro
Copy link
Collaborator Author

this is great, I think the issue with the speed is solved by the dense multiplication. However, there is still the issue with the performance. It is super odd that 100k and 1M provide the exact same results and 10M (and I guess 20M) provide a different one. In SAR there should not be difference in the results across runs, since the algo is deterministic.

@gramhagen can you check this?

the performance is degraded, but I think this is just due to using single vs double floating point precision.

Another question, the results you are showing in the table are float32 or float64?

@anargyri
Copy link
Collaborator

anargyri commented Feb 1, 2019

Nice plots! Good catch about the time efficiency. Since item_similarity is not really sparse, it makes sense; let's keep the change you made.
Regarding the metrics, it is unlikely that the discrepancies are due to this line of code. It would also be strange were it due to precision: when the data set increases by a factor of 10 (maybe it is more than 10 if you consider the size of the users-item matrix, but still less than 100) we get from 6 decimal digits match to 1 decimal.
I suspect the discrepancy is due to something else. One thing that might help is to locate the first commit that modified this function in which the discrepancy appears.

@gramhagen
Copy link
Collaborator

Results above are from 0.1.1 using float64 while staging uses float32
I agree it's quite strange to see such a dramatic change just from switching the precision only on larger datasets, I'll do some more digging to get to the bottom

@gramhagen
Copy link
Collaborator

stalled a bit on this because after staring at the code for so long I was compelled to clean it up, anyway the issue is resolved and from what I can tell derived from numpy doing something funky when applying the dot product across a csr and csc matrix. The changes to jaccard and lift slowed down the computation a bit, so I reverted them and silenced the divide by zero warnings (the result is still correct regardless), there's a few more changes, but the metrics are back on track as well as the speed so I think we're in good shape, I'll push a PR shortly

@miguelgfierro
Copy link
Collaborator Author

hey @gramhagen is this issue solved on the PR you are working on?

@gramhagen
Copy link
Collaborator

Closing this now that #485 is merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
algorithm bug Something isn't working notebook Notebook related issues
Projects
None yet
Development

No branches or pull requests

3 participants