-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How avoid NaN and Infinity values in the distance profile? #17
Comments
The source of the problem seems to be movStd in Mass.java. If I make the epsilon 1e-10 or 1e-40, the computed result is now
for distances, and
for indices. |
Though the above suggestion is an improvement, it does not fix all the cases of NaN and Infinity.
For this the profile is:
The problem in this case seems to be at the end of the mass method, where we do the following
The values in res are all very close to the value of m, but not quite, so when r1 is computed the values are very near 0, and even less than 0 in some cases. The sqrt makes those values NaN. If I change add(m) to add(m + 0.01), then the matrix profile is this
This seems better, but I don't know if its correct. It also effects all other test cases slightly, and there are still other cases where NaN and Infinity appear. |
I just read the slides about MASS in https://www.cs.ucr.edu/~eamonn/Matrix_Profile_Tutorial_Part2.pdf again. There is a slide on the 3 sources of numerical error that can arise. I think the current java code does not have safeguards against all these forms of errors. I will try to add some. |
I was able to address the second 2 sources of numerical instability by adding |
Done by preventing the stdv (standard deviation) from going all the way to 0.
Address 2 sources of numerical instability by adding BooleanIndexing.replaceWhere(res, EPS, Conditions.lessThanOrEqual(EPS)); Right before the 2 places where sqrt is taken in the java mass method. This prevents the stdDev from becoming 0, and prevents a negative distance from becoming NaN in the final result. All results for distances and indices in simple test cases work much better now. There are no more NaN's or Infinities in the results. Refer to https://www.cs.ucr.edu/~eamonn/Matrix_Profile_Tutorial_Part2.pdf to see the slide on 3 possible soources of numerical instability. Added some additional tests that had lots of NaNa and Infinities before this change, but now behave much better.
Bump mineset version to 0.0.6-mineset.
Fixed ,Merged. |
While adding more tests, I noticed that there are a number of cases where the distance part of the matrix profile can have Infinity and NaN values. One simple example (but not the only one) is a straight line.
If I run matrix profile on this straight line series of 10 y values
with a window of 5, then I get MP distance values of
And MP index values of
There are 6 values in the profile (as expected). The length should be 10 – windowSize + 1 = 6.
I don’t understand yet why there are NaN values. I was expecting all 0’s for the distances. Maybe it has to do with z-normalization.
I will look at the code more closely and try to make a proposal and perhaps PR to avoid these values.
The text was updated successfully, but these errors were encountered: