[RUNTIME] Add min_repeat_ms to time_evaluator #2200

merrymercy · 2018-11-30T03:05:09Z

min_repeat_ms sets the minimum duration of a measurement and has been used in autotvm for measurement.
As it is a useful feature to make measurement accurate and smart, we'd better move it to general API time_evaluator and encourage people to use it.

cc @eqy @tqchen @sgrechanik-h

sgrechanik-h

And also, is it possible to add some tests checking that this automatically adjusting time measurement algorithm works as expected?

sgrechanik-h · 2018-11-30T09:25:46Z

python/tvm/module.py

@@ -139,26 +139,38 @@ def time_evaluator(self, func_name, ctx, number, repeat=1):
            The context we should run this function on.

        number: int
-            The number of steps used in measuring each time interval
+            The number of times to run this function for taking average.
+            We call this as one `repeat` of measurement.


It's not very clear from the description what we call a repeat of measurement. (The description for min_repeat_ms makes everything clearer though)

sgrechanik-h · 2018-11-30T09:46:28Z

src/runtime/rpc/rpc_session.cc

+                             int number,
+                             int repeat,
+                             int min_repeat_ms) {
+  auto ftimer = [pf, ctx, &number, repeat, min_repeat_ms](TVMArgs args, TVMRetValue *rv) {


Why is the local variable number captured by reference here? It will escape the local scope, might be a bug.

It turns out tvm packed function does not support capturing reference. I updated to capture by value.

sgrechanik-h · 2018-11-30T09:48:20Z

src/runtime/rpc/rpc_session.cc

+
+        if (duration_ms < min_repeat_ms) {
+          number = static_cast<int>(std::max((min_repeat_ms / (duration_ms / number) + 1),
+                                              number * 1.618));


What is 1.618?
Also, using ceil here might be better than adding 1.

btw, do we need this branch here if the loop will exit if the condition is met?

What is 1.618?
Also, using ceil here might be better than adding 1.

https://en.wikipedia.org/wiki/Golden_ratio

Precision is not very important here as I want to encourage it to set a higher number.

sgrechanik-h · 2018-11-30T09:58:07Z

src/runtime/rpc/rpc_session.cc

+
+        duration_ms = std::chrono::duration_cast<std::chrono::duration<double> >
+            (tend - tbegin).count() * 100;
+


If I understand correctly, here we rerun the whole process until we find the right number of iterations. An alternative would be to rerun only the number of iterations equal to the difference between the necessary number of iterations and the number of iterations already run. And then add its duration to the total duration. This approach may have a slightly different behavior, it may be a bit faster, but a bit less precise, I'm not sure, so I would like to see some more comments in the code describing the algorithm, and why this particular algorithm was chosen.

@sgrechanik-h We cannot use the accumulation mode due to the reason explained by eqy

eqy · 2018-11-30T03:07:34Z

python/tvm/autotvm/measure/measure_methods.py

-        are not precise enough to capture short-running tasks. This parameter is
-        also critical when devices need a certain minimum running time to "warm
-        up," such as GPUs that need time to reach a performance power state.
+        where the first one is warm up and will be discarded.


maybe change this to "plus an additional warm up run that will be discarded." It currently sounds like it means (number - 1) x repeat

eqy · 2018-11-30T18:57:12Z

src/runtime/rpc/rpc_session.cc

+                             int number,
+                             int repeat,
+                             int min_repeat_ms) {
+  auto ftimer = [pf, ctx, &number, repeat, min_repeat_ms](TVMArgs args, TVMRetValue *rv) {
    TVMRetValue temp;
    std::ostringstream os;
    // skip first time call, to activate lazy compilation components.
    pf.CallPacked(args, &temp);


I wonder if this definition (1 + number * repeat) is the correct formulation after we have introduced min_repeat_ms. The goal is to start measurement in the correct power state, which we will likely do if we bump up number over and over again for the same time_evaluator call. However, let's say that number is now sufficient and we get to a fresh time_evaluator call. In this case I am not sure 1+ will be enough to get the hardware into the right state if necessary. Should we consider number*(1+repeat)?

Yes, the definition is not correct. I will add a note to the doc string of min_repeat_ms but keep this definition here.

eqy · 2018-11-30T18:59:26Z

src/runtime/rpc/rpc_session.cc

+
+        if (duration_ms < min_repeat_ms) {
+          number = static_cast<int>(std::max((min_repeat_ms / (duration_ms / number) + 1),
+                                              number * 1.618));


btw, do we need this branch here if the loop will exit if the condition is met?

eqy · 2018-12-10T23:15:01Z

src/runtime/rpc/rpc_module.cc

@@ -124,7 +124,8 @@ class RPCModuleNode final : public ModuleNode {
  PackedFunc GetTimeEvaluator(const std::string& name,
                              TVMContext ctx,
                              int number,
-                              int repeat) {
+                              int repeat,
+                              int min_repeat_ms) {


Does this break some current tests if we do not give a default value for min_repeat_ms?

I added a default argument to python side

tqchen · 2018-12-22T17:59:53Z

@merrymercy what is the status of this PR?

sgrechanik-h · 2018-12-26T16:11:46Z

src/runtime/rpc/rpc_session.cc

    TVMRetValue temp;
    std::ostringstream os;
    // skip first time call, to activate lazy compilation components.
    pf.CallPacked(args, &temp);
    DeviceAPI::Get(ctx)->StreamSync(ctx, nullptr);
+    int dynamic_number = number;


You used to modify number directly which had a nice property of remembering the suitable value of number between runs. I think you can still achieve this effect by declaring the lambda as mutable (won't be thread-safe though, so I'm not sure).

sgrechanik-h · 2018-12-26T16:12:49Z

src/runtime/rpc/rpc_session.cc

+
+        dynamic_number = static_cast<int>(
+            std::max((min_repeat_ms / (duration_ms / dynamic_number) + 1),
+                     dynamic_number * 1.618));


The choice of the constant needs an explanation inside the code.

Halide uses 2 but I think there is no "correct" number, so it is a random number.

merrymercy · 2018-12-26T16:40:38Z

@eqy please review again

tqchen · 2018-12-31T18:54:48Z

ping @eqy @sgrechanik-h please take another look, if there is no further comments in 24 hours, we can go ahead and merge this PR in

tqchen · 2019-01-01T19:00:12Z

Thanks, @merrymercy @eqy @sgrechanik-h , this is merged

merrymercy changed the title ~~Add min_repeat_ms time_evaluator~~ [RUNTIME] Add min_repeat_ms to time_evaluator Nov 30, 2018

merrymercy force-pushed the enhance_time_evaluator branch 2 times, most recently from 5f99d62 to f588537 Compare November 30, 2018 04:23

sgrechanik-h suggested changes Nov 30, 2018

View reviewed changes

eqy reviewed Nov 30, 2018

View reviewed changes

icemelon added the status: need update need update based on feedbacks label Dec 7, 2018

eqy reviewed Dec 10, 2018

View reviewed changes

apache deleted a comment from eqy Dec 26, 2018

merrymercy force-pushed the enhance_time_evaluator branch 3 times, most recently from 684edd4 to e58d6c3 Compare December 26, 2018 15:51

[RUNTIME] Move min_repeat_ms to c++ runtime

3587c6b

merrymercy force-pushed the enhance_time_evaluator branch from e58d6c3 to 3587c6b Compare December 26, 2018 15:52

sgrechanik-h reviewed Dec 26, 2018

View reviewed changes

fix test

27c1477

use mutable

6d4768b

merrymercy added status: need review and removed status: need update need update based on feedbacks labels Dec 28, 2018

tqchen approved these changes Dec 31, 2018

View reviewed changes

tqchen merged commit b118848 into apache:master Jan 1, 2019

tqchen added status: accepted and removed status: need review labels Jan 1, 2019

merrymercy deleted the enhance_time_evaluator branch January 3, 2019 03:19

FrozenGene pushed a commit to FrozenGene/tvm that referenced this pull request Jan 10, 2019

[RUNTIME] Add min_repeat_ms to time_evaluator (apache#2200)

6f70b82

ZihengJiang mentioned this pull request Feb 1, 2019

TVM 0.5 Release Note #2448

Closed

wweic pushed a commit to neo-ai/tvm that referenced this pull request Feb 20, 2019

[RUNTIME] Add min_repeat_ms to time_evaluator (apache#2200)

4181600

wweic pushed a commit to neo-ai/tvm that referenced this pull request Feb 20, 2019

[RUNTIME] Add min_repeat_ms to time_evaluator (apache#2200)

85fd26c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RUNTIME] Add min_repeat_ms to time_evaluator #2200

[RUNTIME] Add min_repeat_ms to time_evaluator #2200

merrymercy commented Nov 30, 2018

sgrechanik-h left a comment

sgrechanik-h Nov 30, 2018

sgrechanik-h Nov 30, 2018

merrymercy Dec 26, 2018

sgrechanik-h Nov 30, 2018

eqy Nov 30, 2018 •

edited

Loading

eqy Nov 30, 2018

merrymercy Dec 26, 2018

sgrechanik-h Nov 30, 2018

merrymercy Dec 26, 2018

eqy Nov 30, 2018

eqy Nov 30, 2018

merrymercy Dec 26, 2018

eqy Nov 30, 2018 •

edited

Loading

eqy Dec 10, 2018

merrymercy Dec 26, 2018

tqchen commented Dec 22, 2018

sgrechanik-h Dec 26, 2018

sgrechanik-h Dec 26, 2018

merrymercy Dec 26, 2018

merrymercy commented Dec 26, 2018

tqchen commented Dec 31, 2018

tqchen commented Jan 1, 2019


		duration_ms = std::chrono::duration_cast<std::chrono::duration<double> >
		(tend - tbegin).count() * 100;

[RUNTIME] Add min_repeat_ms to time_evaluator #2200

[RUNTIME] Add min_repeat_ms to time_evaluator #2200

Conversation

merrymercy commented Nov 30, 2018

sgrechanik-h left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eqy Nov 30, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eqy Nov 30, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Dec 22, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

merrymercy commented Dec 26, 2018

tqchen commented Dec 31, 2018

tqchen commented Jan 1, 2019

eqy Nov 30, 2018 •

edited

Loading

eqy Nov 30, 2018 •

edited

Loading