Investigate difference between new and 0.1.1 SAR versions when using movielens 10M #465

miguelgfierro · 2019-01-30T10:54:08Z

What is affected by this bug?

In the SAR quickstart notebook, when using movielens 10M, there is a significat difference in the metrics and in the compute time

For SAR in release 0.1.1, the metrics are:

Took 109.8672547340393 seconds for training.
Took 9.996008157730103 seconds for prediction.
Model:	sar_ref
Top K:	10
MAP:	0.101402
NDCG:	0.321073
Precision@K:	0.275766
Recall@K:	0.156483

The new SAR in staging:

Took 563.519758939743 seconds for training.
Took 8.40031886100769 seconds for prediction.
Model:	sar_ref
Top K:	10
MAP:	0.066775
NDCG:	0.249102
Precision@K:	0.216268
Recall@K:	0.106777

In which platform does it happen?

Azure Data Science Virtual Machine.

Steps for both versions:

Check the results with 100k, 1M and 20M
Check that the unit tests are passing

The text was updated successfully, but these errors were encountered:

miguelgfierro · 2019-01-30T11:02:35Z

Results of new and old SAR with ML100k
For SAR in release 0.1.1, the metrics are:

Took 0.5094079971313477 seconds for training.
Took 0.0852363109588623 seconds for prediction.
Model:	sar_ref
Top K:	10
MAP:	0.105815
NDCG:	0.373197
Precision@K:	0.326617
Recall@K:	0.175957

The new SAR in staging:

Took 0.5978834629058838 seconds for training.
Took 0.03621530532836914 seconds for prediction.
Model:	sar_ref
Top K:	10
MAP:	0.105815
NDCG:	0.373197
Precision@K:	0.326617
Recall@K:	0.175957

Results of new and old SAR with ML1M
For SAR in release 0.1.1, the metrics are:

Took 9.09568476676941 seconds for training.
Took 0.3207972049713135 seconds for prediction.
Model:	sar_ref
Top K:	10
MAP:	0.064013
NDCG:	0.308012
Precision@K:	0.277215
Recall@K:	0.109292

The new SAR in staging:

Took 5.787659168243408 seconds for training.
Took 0.4074389934539795 seconds for prediction.
Model:	sar_ref
Top K:	10
MAP:	0.064013
NDCG:	0.308012
Precision@K:	0.277215
Recall@K:	0.109292

Results of new and old SAR with ML20M
For SAR in release 0.1.1, the metrics are:


Took 538.0573198795319 seconds for training.
Took 40.26268434524536 seconds for prediction.
Model:	sar_ref
Top K:	10
MAP:	0.085287
NDCG:	0.287087
Precision@K:	0.247186
Recall@K:	0.134884

The new SAR in staging:

Took 2996.4829161167145 seconds for training.
Took 30.46952724456787 seconds for prediction.
Model:	sar_ref
Top K:	10
MAP:	0.021469
NDCG:	0.118241
Precision@K:	0.105806
Recall@K:	0.036814

gramhagen · 2019-01-31T00:07:47Z

just confirming that we see the same results despite changes to the notebook.
I took a python export of the notebook in staging and modified it to run through all the data set sizes then ran it on current staging as well as release 0.1.1 and the results match what you get. So it's definitely a change in the code itself. I'll look a bit closer on what has changed in the sar model.

gramhagen · 2019-01-31T03:09:52Z

line profiling the two version of the fit() method reveals a difference in the dot() calculation

Version	Line	Time (us)	Time (%)	Line Contents
Staging	403	582523501.0	92.0	self.scores = self.user_affinity.dot(self.item_similarity)
Master	380	69671170.0	56.7	self.scores = self.user_affinity.dot(self.item_similarity)

barring the possibility that the actually matrices are not equivalent across versions (which seems unlikely as the 100k and 1m datasets produce the same results I expect the issue is due to something numpy is doing under the hood.

see the answer here: https://stackoverflow.com/questions/19839539/how-to-get-faster-code-than-numpy-dot-for-matrix-multiplication

it's possible that something with the datatypes changed which is preventing numpy from calling the BLAS library, though I wouldn't expect there to be significant differences in the final matrix? maybe it's only realized when the small differences are summed up across a large number of items.

next step is just to confirm the user_affinity and item_similarity matrices are identical across versions and check the datatypes and numpy flags, then maybe try gemm directly.

anargyri · 2019-01-31T11:58:47Z

The stackoverflow entry may be not very relevant because the matrices involved are sparse. Actually the types of the matrices changed between master and staging, and also the jaccard etc. functions that produce item_similarity. Also I changed the data types to single precision float (I think the code by Max resulted in integers and double precision floats in some places).
So, the picture is a bit complicated. What you say about comparing the matrices and their product makes sense.

gramhagen · 2019-01-31T18:22:57Z

Right, @anargyri as you mentioned the item_similarity object type changed from a numpy matrix to scipy sparse csc matrix, apparently the using the dot function on that is much slower than a dense matrix.

But the good news is it's quite easy to convert it back to dense before passing it into the dot function. I tested that and it seems to resolve the speed issue. I am testing now to see how that change impacts the results.

anargyri · 2019-01-31T19:08:15Z

Also the efficiency of dot() depends on the types of sparse matrices. I haven't found a recommendation in scipy about this, so I chose user_affinity to be csr and item_similarity to be csc.
It is probably important that item_similarity is not really sparse (maybe ~50% or so), so maybe there are no advantages in sparse multiplication (on that line; the products before that should benefit from sparse matrix types).
Does the dense matrix come at a cost of more memory?

gramhagen · 2019-01-31T19:46:24Z

it doesn't seem like we're saving much memory by using the sparse matrix, here's plots of memory usage over time (for the fit method on 10m dataset).

Current Staging

Current Master

So we get ~ 15% reduction in memory.

gramhagen · 2019-01-31T19:49:33Z

Here are the results from speed / eval standpoint. In the 'fix' version I just change the final calculation in fit() to be: self.scores = self.user_affinity.dot(self.item_similarity.todense())

You can see the 'fix' version is still slightly slower than 0.1.1 (@10m), and the performance is degraded, but I think this is just due to using single vs double floating point precision.

Version	dataset	fit (s)	map	ndcg	precision@10	recall@10
SAR 0.1.1	ml 100k	0.5262	0.105815	0.37319	0.326617	0.175957
SAR Staging	ml 100k	0.6034	0.105815	0.373197	0.326617	0.175957
SAR Staging Fix	ml 100k	0.4761	0.105815	0.373197	0.326617	0.175957
SAR 0.1.1	ml 1m	6.2893	0.064013	0.308012	0.277215	0.109292
SAR Staging	ml 1m	9.9825	0.064013	0.308012	0.277215	0.109292
SAR Staging Fix	ml 1m	5.3546	0.064013	0.308012	0.277215	0.109292
SAR 0.1.1	ml 10m	132.1281	0.101402	0.321073	0.275766	0.156483
SAR Staging	ml 10m	619.7484183	0.066775	0.249102	0.216268	0.106777
SAR Staging Fix	ml 10m	212.5266	0.066775	0.249102	0.216268	0.106777

gramhagen · 2019-01-31T21:18:44Z

Thoughts on this? I can certainly update the line of code and 'correct' the test metrics. But just want to make sure others are ok with that approach

miguelgfierro · 2019-02-01T09:14:42Z

this is great, I think the issue with the speed is solved by the dense multiplication. However, there is still the issue with the performance. It is super odd that 100k and 1M provide the exact same results and 10M (and I guess 20M) provide a different one. In SAR there should not be difference in the results across runs, since the algo is deterministic.

@gramhagen can you check this?

the performance is degraded, but I think this is just due to using single vs double floating point precision.

Another question, the results you are showing in the table are float32 or float64?

anargyri · 2019-02-01T11:11:44Z

Nice plots! Good catch about the time efficiency. Since item_similarity is not really sparse, it makes sense; let's keep the change you made.
Regarding the metrics, it is unlikely that the discrepancies are due to this line of code. It would also be strange were it due to precision: when the data set increases by a factor of 10 (maybe it is more than 10 if you consider the size of the users-item matrix, but still less than 100) we get from 6 decimal digits match to 1 decimal.
I suspect the discrepancy is due to something else. One thing that might help is to locate the first commit that modified this function in which the discrepancy appears.

gramhagen · 2019-02-01T14:19:10Z

Results above are from 0.1.1 using float64 while staging uses float32
I agree it's quite strange to see such a dramatic change just from switching the precision only on larger datasets, I'll do some more digging to get to the bottom

gramhagen · 2019-02-05T01:45:31Z

stalled a bit on this because after staring at the code for so long I was compelled to clean it up, anyway the issue is resolved and from what I can tell derived from numpy doing something funky when applying the dot product across a csr and csc matrix. The changes to jaccard and lift slowed down the computation a bit, so I reverted them and silenced the divide by zero warnings (the result is still correct regardless), there's a few more changes, but the metrics are back on track as well as the speed so I think we're in good shape, I'll push a PR shortly

miguelgfierro · 2019-02-07T11:59:37Z

hey @gramhagen is this issue solved on the PR you are working on?

gramhagen · 2019-02-08T14:16:53Z

Closing this now that #485 is merged

miguelgfierro added bug Something isn't working notebook Notebook related issues algorithm test labels Jan 30, 2019

miguelgfierro self-assigned this Jan 30, 2019

miguelgfierro mentioned this issue Jan 30, 2019

Bug/test smoke #463

Merged

3 tasks

gramhagen closed this as completed Jan 31, 2019

gramhagen reopened this Jan 31, 2019

gramhagen mentioned this issue Feb 5, 2019

Gramhagen/sar single node cleanup #485

Merged

3 tasks

gramhagen closed this as completed Feb 8, 2019

miguelgfierro mentioned this issue Feb 14, 2019

Bug/integration tolerance #522

Merged

3 tasks

anargyri mentioned this issue Jul 4, 2023

[BUG] SAR needs to be modified due to a breaking change in scipy #1954

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate difference between new and 0.1.1 SAR versions when using movielens 10M #465

Investigate difference between new and 0.1.1 SAR versions when using movielens 10M #465

miguelgfierro commented Jan 30, 2019 •

edited

Loading

miguelgfierro commented Jan 30, 2019 •

edited

Loading

gramhagen commented Jan 31, 2019

gramhagen commented Jan 31, 2019

anargyri commented Jan 31, 2019

gramhagen commented Jan 31, 2019

anargyri commented Jan 31, 2019

gramhagen commented Jan 31, 2019

gramhagen commented Jan 31, 2019 •

edited

Loading

gramhagen commented Jan 31, 2019

miguelgfierro commented Feb 1, 2019

anargyri commented Feb 1, 2019 •

edited

Loading

gramhagen commented Feb 1, 2019

gramhagen commented Feb 5, 2019

miguelgfierro commented Feb 7, 2019

gramhagen commented Feb 8, 2019

Investigate difference between new and 0.1.1 SAR versions when using movielens 10M #465

Investigate difference between new and 0.1.1 SAR versions when using movielens 10M #465

Comments

miguelgfierro commented Jan 30, 2019 • edited Loading

What is affected by this bug?

In which platform does it happen?

miguelgfierro commented Jan 30, 2019 • edited Loading

gramhagen commented Jan 31, 2019

gramhagen commented Jan 31, 2019

anargyri commented Jan 31, 2019

gramhagen commented Jan 31, 2019

anargyri commented Jan 31, 2019

gramhagen commented Jan 31, 2019

gramhagen commented Jan 31, 2019 • edited Loading

gramhagen commented Jan 31, 2019

miguelgfierro commented Feb 1, 2019

anargyri commented Feb 1, 2019 • edited Loading

gramhagen commented Feb 1, 2019

gramhagen commented Feb 5, 2019

miguelgfierro commented Feb 7, 2019

gramhagen commented Feb 8, 2019

miguelgfierro commented Jan 30, 2019 •

edited

Loading

miguelgfierro commented Jan 30, 2019 •

edited

Loading

gramhagen commented Jan 31, 2019 •

edited

Loading

anargyri commented Feb 1, 2019 •

edited

Loading