More robust timing tests #1136

gilles-peskine-arm · 2017-10-10T18:42:18Z

The primary objective of this PR is to fix the issue we have with test_suite_timing and the TIMING selftest on the CI, which sometimes fail due to heavy load. I also fixed some minor bugs in the timing module.

This PR introduces new timing tests that test the functions in the timing module mostly independently. The old self-test is still present in the code but will not be executed by the test suite, like other self-tests that we judge to be too time-consuming. CI still runs it once through programs/test/selftest.

Compared with the self-tests:

The new tests are more modular and print some detailed information when they fail.
I don't test mbedtls_set_alarm with delays of more than 1 second, which makes the test run for about 2s instead of 8s.
The timing self-test is more tolerant and shorter.

There are some test ideas I discarded in https://github.com/gilles-peskine-arm/mbedtls/tree/IOTSSL-1798-development-1 . The present PR is https://github.com/gilles-peskine-arm/mbedtls/tree/IOTSSL-1798-development-2 with a few minor fixes and a cleaned up history.

Incidental changes in this PR:

Fix read from an uninitialized integer object in mbedtls_timing_get_timer on Windows (internal ref: IOTSSL-1822)
Improved documentation of the timing API.
Fix mbedtls_set_alarm(0) on Unix/POSIX.
In the selftest program, support running a subset of the tests with command line arguments.

I've tested several iterations of this PR on my machine and on CI. The CI failures haven't completely gone so far, but they're markedly less frequent.

Partial backport for 2.1: #1223 (bug fixes, self test robustness, selftest command line parsing)
Partial backport for 1.3: #1224 (bug fixes, self test robustness)

Internal ref: IOTSSL-1798

gilles-peskine-arm · 2017-10-11T16:20:13Z

CI has now passed twice (I rebased a couple of times).

mazimkhan

Overall looks good. Just few suggestions on robustness of tests given they would run in CI environment.

mazimkhan · 2017-10-13T08:44:33Z

tests/suites/test_suite_timing.function

+   first and run other test functions only if that timing_timer_simple
+   succeeded. */
+
+#include <limits.h>


Is it available on Windows and target platforms? I know it is part of standard library. But seems like it comes as part of linux-headers on Linux.

limits.h is usually shipped with the compiler rather than with libc, so it's available everywhere (the C standard term is that it must be present on freestanding implementations). On Linux it comes with gcc.

I addition to what Gilles said, we also use this header in the library (including in check_config.h) so if their was an issue with it, we would know about it.

mazimkhan · 2017-10-16T10:04:22Z

tests/suites/test_suite_timing.function

 {
-    TEST_ASSERT( mbedtls_timing_self_test( 1 ) == 0 );


Does this test depend on mbedtls_timing_self_test as pre-requisite? If not wouldn't it be better to keep it as seperate distinct test in .data file?

These unit tests are completely independent from mbedtls_timing_self_test. We don't run all the self-tests in the unit tests, some of the more time-consuming ones are only run via programs/test/selftest. The timing test is the most time-consuming by far on a PC-class processor (it takes about 8s to run regardless of the platform — a fast PC can zip through all the other unit tests in less than that).

mazimkhan · 2017-10-16T10:06:44Z

tests/suites/test_suite_timing.function

+static int expected_delay_status( uint32_t int_ms, uint32_t fin_ms,
+                                  unsigned long actual_ms )
+{
+    return( fin_ms == 0 ? -1 :


Is int_ms == 0 as valid condition for the test.

Yes, int_ms is valid. We didn't document the API very well, but int_ms=0 is a timer that expires immediately, whereas fin_ms=0 is a diabled timer.

mazimkhan · 2017-10-16T10:20:19Z

tests/suites/test_suite_timing.function

+    unsigned long millis = 0;
+    unsigned long iterations = 0;
+    /* Start the timer. */
+    (void) mbedtls_timing_get_timer( &timer, 1 );


Although it is existing code. But I don't understand why there can't be a separate function to init/reset the timing structure. Would be been very readable.

Indeed I wouldn't have called a function to set a timer get_timer, but I'm only testing the API, not changing it.

mazimkhan · 2017-10-16T10:27:13Z

tests/suites/test_suite_timing.function

+        ++iterations;
+        millis = mbedtls_timing_get_timer( &timer, 0 );
+    }
+    while( millis < TIMING_SHORT_TEST_MS );


Is the iterations <= TIMING_SHORT_TEST_ITERATIONS_MAX condition missing here? For a bug in mbedtls_timing_get_timer that fails to increment millis the test can be stuck for ever.

I tried to make a compromise between detecting infinite loops, and allowing very fast platforms where it takes a lot of iterations to reach the timeout. If mbedtls_timing_get_timer fails to increment millis, this should have been detected in the previous test case which calls timing_timer_simple.

I'm open to being convinced to do that in every test, but every such check is fragile (I need to hard-code a magic value for the iteration cap) and makes the test code more complex. So I prefer to rely on a higher-level timeout. I wasn't even sure about using a max iteration count in timing_timer_simple.

The check could be in addition to the time check hence not taking long if time is incremented correctly.
Although I understand the difficulty of getting a correct max count that can work with range of targets slow to fast. It would have been good if there was target specific multiplier to find no. of cycles. Even then it is not very portable due to varying instructions. Hence I will leave it up to you.

Would it be possible to ensure that if timing_timer_simple fails, then all following tests fail before even running? I feel this would be a nice way to complete Gille's strategy of checking for infinite loops in one place only.

Execution of the tests is part of the test framework and it has look after execution requirements for all test suites. In this case it makes sense to abort the suite. But we may not want that in other suites. However, we can add a TEST_SUIT_ASSERT that aborts the whole suite.

mazimkhan · 2017-10-16T11:14:18Z

tests/suites/test_suite_timing.function

+           a busy environment such as an overloaded continuous integration
+           system, so we don't test it it. */
+    }
+    while( millis2 < TIMING_SHORT_TEST_MS );


Can we have a mechanism to break out of the loop in case of a bug. See previous comments.

mazimkhan · 2017-10-16T11:19:41Z

tests/suites/test_suite_timing.function

+
+    /* Set an alarm and count how long it takes with a timer. */
+    (void) mbedtls_timing_get_timer( &timer, 1 );
+    mbedtls_set_alarm( seconds );


Should we not validate that seconds > 0?

mazimkhan · 2017-10-16T11:22:51Z

tests/suites/test_suite_timing.function

+        ++iterations;
+        millis = mbedtls_timing_get_timer( &timer, 0 );
+    }
+    while( !mbedtls_timing_alarmed && millis <= millis_max );


Here we are relying on mbedtls_timing_get_timer to work correctly. Watchdog can be based on iterations or in addition to the timer.

mazimkhan · 2017-10-16T11:24:15Z

tests/suites/test_suite_timing.function

+    TEST_ASSERT( millis <= millis_max );
+
+    mbedtls_timing_alarmed = 0;
+    return;


Why is there a need for return?

Because I don't want to print diagnosis information if the test succeeded.

mazimkhan · 2017-10-16T11:24:39Z

tests/suites/test_suite_timing.function

+    int saw_status_1 = 0;
+    int warn_inconclusive = 0;
+
+    assert( int_ms >= 0 );


Why no TEST_ASSERT?

The test function expects int_ms and fin_ms to be unsigned long, but this type is not supported by the test framework, so I use int as the static type and put an additional dynamic type check to verify that the values are nonnegative. I used TEST_ASSERT for test failures, and assert for bad test input (like when there is an error parsing the .data file). Is this not correct?

From execution perspective TEST_ASSERT only aborts a test. assert will abort the whole suite as a suite built as one executable. Do you want to do that if one test has invalid data? Also this may not work as a wider strategy because data can be wrong mistakenly and a user may not put such checks.

I don't have a strong opinion here, since either way tests with bad input will fail reliably and a few seconds of investigation will reveal that the problem is with the input data, and this should be a rare occurrence anyway.

I have however a slight preference for TEST_ASSERT() as it is just as clear (it's fairly obvious to readers that such a statement at the very top of the function is about input validity) and doesn't introduce anything new. But I can totally live with assert() too.

mazimkhan · 2017-10-16T13:42:21Z

tests/suites/test_suite_timing.function

+        TEST_ASSERT( saw_status_1 );
+    }
+
+    return;


Do we really need return here? Also it will make assignment warn_inconclusive = 1 useless.

If the test succeeds, I don't want to print the diagnosis information. The value of warn_inconclusive is used when printing the diagnosis information if the test fails (TEST_ASSERT branches to the exit label if the assertion fails).

mazimkhan · 2017-10-16T14:40:24Z

tests/suites/test_suite_timing.function

+       allows 250ms. */
+    delta1 = hardclock1 - hardclock0;
+    TEST_ASSERT( delta1 > 0 );
+    TEST_ASSERT( delta1 < 1000000000 );


Can we document why we have a big number 1000000000 although the delay was only 2 ms.

The previous comment is supposed to document this (“(…) If the hardclock counter runs at 4GHz, then one billion (which is 1/4 of the counter wraparound on a 32-bit machine) allows 250ms.”). Can you suggest how to clarify it?

mazimkhan · 2017-10-17T09:51:00Z

This PR is critical as CI often fails in timing tests.

mazimkhan · 2017-10-17T09:55:43Z

Running it in Oulu CI as that is most time skewed.

mazimkhan · 2017-10-17T09:58:07Z

Timing selfTest test fails on Oulu CI https://jenkins-internal.mbed.com/job/mbedtls-pr/314/execution/node/206/log/

mpg

Thanks for submitting this PR! Overall the new tests look very good.

Summary of feedback:

suggestion: immediately fail subsequent test if first one failed (feel free to argue against it)
question that might be outside the scope of this PR: are the new diagnostic messages really visible?
then minor style/comments nitpicks.

mpg · 2017-10-18T07:53:24Z

library/timing.c

+        if( verbose != 0 )                                              \
+        {                                                               \
+            mbedtls_printf( "failed at line %d\n", __LINE__ );          \
+            mbedtls_printf( " cycles=%lu ratio=%lu millisecs=%lu secs=%lu hardfail=%d a=%lu b=%lu\n", cycles, ratio, millisecs, secs, hardfail, (unsigned long) a, (unsigned long) b ); \


Style: can we break that long line after the first coma? (Same for the line below.)

mpg · 2017-10-18T07:57:01Z

library/timing.c

@@ -394,17 +402,16 @@ static void busy_msleep( unsigned long msec )
 */
 int mbedtls_timing_self_test( int verbose )
 {
-    unsigned long cycles, ratio;
+    unsigned long cycles = 0, ratio = 0;
    unsigned long millisecs, secs;


While I agree that millisecs and secs are initialised before the first occurrence of FAIL, this might change if we ever re-order tests or add a new one at the beginning, so I'd be more comfortable initializing them too, unless you have a reason not to.

mpg · 2017-10-18T08:09:47Z

tests/suites/test_suite_timing.function

+exit:
+    /* No cleanup needed, but show some diagnostic iterations, because timing
+       problems can be hard to reproduce. */
+    mbedtls_fprintf( stdout, "  Finished with millis=%lu get(timer)<=%lu iterations=%lu\n",


Thanks for including those diagnostic messages! However, I seem to remember that stdout from tests is filtered out by the framework unless --verbose is used (a decision I could never understand by the way). Is that still the case? If so, we should ensure that --verbose is always used when running the tests on the CI.

mpg · 2017-10-18T08:18:00Z

tests/suites/test_suite_timing.function

+    /* No cleanup needed, but show some diagnostic iterations, because timing
+       problems can be hard to reproduce. */
+    mbedtls_fprintf( stdout, "  Finished with millis=%lu get(timer)<=%lu iterations=%lu\n",
+                     millis, mbedtls_timing_get_timer( &timer, 0 ),


I'm wondering if we should include new_milis too, in case the failing assertion is new_millis >= millis, and the return value of get_timer() happens to change just between the TEST_ASSERT and the printf().

mpg · 2017-10-18T08:25:24Z

tests/suites/test_suite_timing.function

+    unsigned long millis = 0;
+    unsigned long iterations = 0;
+
+    /* Start the timer. After setting it for the first time, the return value


Should it be when setting it for the first time, rather that after?

mpg · 2017-10-18T08:30:58Z

tests/suites/test_suite_timing.function

+        ++iterations;
+        millis = mbedtls_timing_get_timer( &timer, 0 );
+    }
+    while( millis < TIMING_SHORT_TEST_MS );


Would it be possible to ensure that if timing_timer_simple fails, then all following tests fail before even running? I feel this would be a nice way to complete Gille's strategy of checking for infinite loops in one place only.

mpg · 2017-10-18T08:35:14Z

tests/suites/test_suite_timing.function

+        /* The timers shouldn't drift apart, i.e. millis2-millis1 should stay
+           roughly constant, but this is hard to test reliably, especially in
+           a busy environment such as an overloaded continuous integration
+           system, so we don't test it it. */


Typo: it it.

mpg · 2017-10-18T08:37:00Z

tests/suites/test_suite_timing.function

+       the alarm (with second resolution) and the timer (with millisecond
+       resolution) are based on different clocks. Be very liberal with the
+       upper bound, because the platform might be busy. */
+     unsigned long millis_min = ( seconds > 0 ?


Style: indented with 5 spaces.

mpg

Thanks for doing the suggested changes. I'm happy with the resulting state of the PR.

(I don't have a strong enough opinion on the assert thing to delay my approval over it, so I'll let you resolve that with Azim, who might care more than me since he's maintaining the CI.)

gilles-peskine-arm · 2017-12-14T15:27:09Z

The CI has run several times, but due to a Jenkins issue it picked old revisions of this PR. Manual run on the latest version (ca5f6e2): https://jenkins-internal.mbed.com/job/mbedtls-commit-tests/47/

gilles-peskine-arm · 2017-12-14T17:15:24Z

I uploaded a fix for a mistake that Clang pointed out. New build: https://jenkins-internal.mbed.com/job/mbedtls-commit-tests/48/

gilles-peskine-arm · 2017-12-20T18:30:46Z

Rebased with a cleaner history. The old history is at https://github.com/gilles-peskine-arm/mbedtls/tree/IOTSSL-1798-development-2

No behavior change.

If given command line arguments, interpret them as test names and only run those tests.

E.g. "selftest -x timing" runs all the self-tests except timing.

New set of unit tests for the timing module, instead of just running the selftest function. The selftest function sometimes fails on a heavily loaded machine (such as a typical continuous integration system). Because of the all-in-one nature of the test and because the exact load pattern can be hard to reproduce, it is difficult to diagnose failures of CI runs with selftest. The new tests are more separated and I strove to point out potential failure modes in comments. * mbedtls_timing_hardclock: not tested. This function gives so few guarantees that there isn't much to test, and it is hard to test reliably because clock cycles don't easily relate to time in any remotely portable way. This function isn't used in the library anyway, it's only there for benchmark programs. * mbedtls_timing_get_timer: tested by setting a timer and verifying that it reaches its target, and by verifying that a timer started later than another always has a smaller elapsed time. * mbedtls_set_alarm: tested by setting an alarm, busy-waiting for it and measuring the elapsed time with a timer. * mbedtls_timing_set_delay, mbedtls_timing_get_delay: tested by setting a delay object and watching it go through its two delay values, using a timer to check that the delays are passed at the expected time. The tests pass under light to moderate load, but some of them can be defeated with sufficiently heavy load. This is unavoidable since the test process to be effectively suspended for any length of time, making us think that a timer has gone on for too long.

Do test mbedtls_timing_hardclock. We can't reliably test much about it, but at least test that it doesn't crash, isn't constant, and doesn't look completely random.

If timing_timer_simple fails because it detects that timers are likely to never expire (e.g. going backward or not incrementing), skip all tests that rely on timers.

mbedtls_timing_self_test fails annoyingly often when running on a busy machine such as can be expected of a continous integration system. Increase the tolerances in the delay test, to reduce the chance of failures that are only due to missing a deadline on a busy machine.

Increase the duration of the self test, otherwise it tends to fail on a busy machine even with the recently upped tolerance. But run the loop only once, it's enough for a simple smoke test.

We don't need to test multiple delays in a self-test. Save 5s of busy-wait.

mpg · 2017-12-21T09:00:35Z

I've reviewed the diff between the previous version of this branch (in my local copy) and the rebased version, and I'm satisfied nothing nasty happened during the rebase.

gilles-peskine-arm · 2017-12-21T15:06:48Z

I triggered a daytime run of the CI job that tends to fail: https://jenkins-internal.mbed.com/job/mbedtls-commit-tests/56/ . All the timing tests passed, there were failures only in compat.sh which is a known issue (also due to CI slowness) that wasn't fixed when this PR was forked.

mazimkhan · 2017-12-22T18:15:08Z

retest

mazimkhan · 2017-12-22T18:20:49Z

retest

Fix for the Marvin attack - BACKPORT

gilles-peskine-arm added bug mbed TLS team labels Oct 10, 2017

gilles-peskine-arm force-pushed the IOTSSL-1798-development branch 3 times, most recently from f2666e8 to 80a4bcc Compare October 11, 2017 14:13

gilles-peskine-arm requested review from mpg and mazimkhan October 11, 2017 16:21

gilles-peskine-arm added the needs-review Every commit must be reviewed by at least two team members, label Oct 11, 2017

mazimkhan suggested changes Oct 16, 2017

View reviewed changes

mazimkhan approved these changes Oct 17, 2017

View reviewed changes

mpg reviewed Oct 18, 2017

View reviewed changes

mpg approved these changes Oct 19, 2017

View reviewed changes

mazimkhan added runCI and removed runCI labels Oct 19, 2017

gilles-peskine-arm added the runCI label Oct 27, 2017

gilles-peskine-arm removed the runCI label Dec 14, 2017

gilles-peskine-arm added runCI needs-design-approval needs-review Every commit must be reviewed by at least two team members, and removed needs-review Every commit must be reviewed by at least two team members, runCI needs-design-approval labels Dec 14, 2017

gilles-peskine-arm added the runCI label Dec 14, 2017

gilles-peskine-arm force-pushed the IOTSSL-1798-development branch 2 times, most recently from 39d53aa to ce6d7a6 Compare December 20, 2017 19:18

gilles-peskine-arm added 2 commits December 20, 2017 20:19

selftest: refactor to separate the list of tests from the logic

319ac80

No behavior change.

selftest: allow running a subset of the tests

c82fbb4

If given command line arguments, interpret them as test names and only run those tests.

gilles-peskine-arm force-pushed the IOTSSL-1798-development branch from ce6d7a6 to a87a655 Compare December 20, 2017 19:24

gilles-peskine-arm added 6 commits December 20, 2017 21:57

selftest: allow excluding a subset of the tests

ff79d27

E.g. "selftest -x timing" runs all the self-tests except timing.

Unit test for mbedtls_timing_hardclock

078f1a1

Do test mbedtls_timing_hardclock. We can't reliably test much about it, but at least test that it doesn't crash, isn't constant, and doesn't look completely random.

Timing unit tests: more protection against infinite loops

2a26d62

If timing_timer_simple fails because it detects that timers are likely to never expire (e.g. going backward or not incrementing), skip all tests that rely on timers.

Timing self test: increased duration

8873bcc

Increase the duration of the self test, otherwise it tends to fail on a busy machine even with the recently upped tolerance. But run the loop only once, it's enough for a simple smoke test.

gilles-peskine-arm force-pushed the IOTSSL-1798-development branch from a87a655 to 8873bcc Compare December 20, 2017 20:58

Timing self test: shorten redundant tests

ada3ee8

We don't need to test multiple delays in a self-test. Save 5s of busy-wait.

gilles-peskine-arm changed the title ~~New timing unit tests~~ More robust timing tests Dec 20, 2017

gilles-peskine-arm removed the runCI label Dec 20, 2017

This was referenced Dec 20, 2017

Backport 2.1: More robust timing tests #1223

Merged

Backport 1.3: More robust timing tests #1224

Merged

gilles-peskine-arm added the runCI label Dec 20, 2017

mpg added needs-design-approval and removed needs-review Every commit must be reviewed by at least two team members, labels Dec 21, 2017

mpg merged commit ada3ee8 into Mbed-TLS:development Dec 26, 2017

mpg mentioned this pull request Jan 29, 2018

Fix alarm(0) failure on mingw32 #1328

Merged

Patater mentioned this pull request Mar 27, 2018

Timing tests are flaky and this is very bad #1517

Closed

gilles-peskine-arm mentioned this pull request Jan 25, 2023

[2.28] Enable the timing.c selftest with MBEDTLS_TIMING_ALT #6931

Merged

2 tasks

minosgalanakis pushed a commit that referenced this pull request Mar 28, 2024

Merge pull request #1136 from yanesca/fix-marvin-attack-backport

8e72c8f

Fix for the Marvin attack - BACKPORT

More robust timing tests #1136

More robust timing tests #1136

Conversation

gilles-peskine-arm commented Oct 10, 2017 • edited Loading

gilles-peskine-arm commented Oct 11, 2017

mazimkhan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mazimkhan Oct 16, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mazimkhan commented Oct 17, 2017

mazimkhan commented Oct 17, 2017

mazimkhan commented Oct 17, 2017 • edited Loading

mpg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mpg left a comment

Choose a reason for hiding this comment

gilles-peskine-arm commented Dec 14, 2017

gilles-peskine-arm commented Dec 14, 2017 • edited Loading

gilles-peskine-arm commented Dec 20, 2017

mpg commented Dec 21, 2017

gilles-peskine-arm commented Dec 21, 2017

mazimkhan commented Dec 22, 2017

mazimkhan commented Dec 22, 2017

gilles-peskine-arm commented Oct 10, 2017 •

edited

Loading

mazimkhan Oct 16, 2017 •

edited

Loading

mazimkhan commented Oct 17, 2017 •

edited

Loading

gilles-peskine-arm commented Dec 14, 2017 •

edited

Loading