-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[MXNET-564] Fix the Flaky test on arange #11377
Conversation
Source to the changes on the number:
Guys, do you think this problem is more like a JVM and C difference question? |
84627a4
to
e2f0330
Compare
Please also note that test_arange is failing in Python: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/1053/pipeline/ Could indicate a problem with the underlying implementation. |
Hi @marcoabreu I think this issue is more related to CUDA, please see the message shown below:
|
Oops, my bad! Thanks for pointing it out. Then it's already tracked at #11395. |
@@ -30,4 +30,13 @@ object CheckUtils { | |||
val norm: Float = a.reduce(Math.abs(_) + Math.abs(_)) | |||
diff / norm | |||
} | |||
|
|||
def almost_equal(a: Array[Float], b: Array[Float], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this used anywhere, can we bring it when necessary
@@ -233,8 +233,13 @@ class OperatorSuite extends FunSuite with BeforeAndAfterAll | |||
val start = scala.util.Random.nextFloat() * 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good find @lanking520 @andrewfayres, the precision is lost on the decimal part when it overflows with float. Declaring start, stop, step as doubles would just solve the problem. I would use just a double here instead of bigdecimal since its slow and unnecessary and this code pattern would soon start trickling to other tests. See the example that Andrew showed me with floats that doesn't work and max is very very close to upper bound with doubles.
scala> ( 3.969419d until 96.70541d by .00005197525d).flatMap(x=>Array.fill[Double](1)(x)).last
res3: Double = 96.70537523623696
e2f0330
to
bf4ef50
Compare
@nswamy Thanks Naveen for your feedback, I have made the changes accordingly. Tested on local with 100k runs. |
* Fix Flaky Test - Change float to double to increase the precision of numbers generated in scala
Description
@nswamy @yzhliu @andrewfayres @anirudh2290
Please see the fix for the issue: #10387
it may help this as well: #8383
After discussion with @andrewfayres , we found the problems was in the Scala section,
the last value of the arange is larger than the stop valuemultiple numbers in the Array are different. We think this should be a bug of Scala as the function is depreciated in the log. As @reminisce recommended, we can increase the minimum value of the step since there is nobody use step in a E-05 level. In summary, there are several ways we can solve this problem:Change the precision level of arange into E-03 and remove the last value of the array.Doesn't help, problem persistChange the way that Scala works into While loops and set precision level to E-06Doesn't help, problem still there...Change the measurement fromDoesn't help, problem still therereid_diff
intoalmost_equal
which mostly used in Python withrtol = true
and precision level E-04.Use BigDecimal and while loop to test, Passed!
Run multiple times with a fixed number in the start, stop and step since the target in here is to check arange is doing well.
We cannot reproduce the same issue with python after 1M tests done by @haojin2 . Python use numpy which takes
Float
asDouble
in their calculation with high accuracy. This PR represent as the ticked solution shown above. Currently, I keep 100000 runs in the code to make sure CI will pass as well. Will consider remove that when we merge into master.Generally speaking, this issue is really rare to come out. I tried 100k and the issue is 70% reproducible. With the amended solution, I ran 10 times 100k and all of them passed. Since we already have a concrete Python test for this operator, consider remove this in the nearer future.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.