-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More robust timing tests #1136
More robust timing tests #1136
Conversation
f2666e8
to
80a4bcc
Compare
CI has now passed twice (I rebased a couple of times). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. Just few suggestions on robustness of tests given they would run in CI environment.
first and run other test functions only if that timing_timer_simple | ||
succeeded. */ | ||
|
||
#include <limits.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it available on Windows and target platforms? I know it is part of standard library. But seems like it comes as part of linux-headers on Linux.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
limits.h
is usually shipped with the compiler rather than with libc, so it's available everywhere (the C standard term is that it must be present on freestanding implementations). On Linux it comes with gcc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I addition to what Gilles said, we also use this header in the library (including in check_config.h
) so if their was an issue with it, we would know about it.
{ | ||
TEST_ASSERT( mbedtls_timing_self_test( 1 ) == 0 ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this test depend on mbedtls_timing_self_test
as pre-requisite? If not wouldn't it be better to keep it as seperate distinct test in .data file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These unit tests are completely independent from mbedtls_timing_self_test
. We don't run all the self-tests in the unit tests, some of the more time-consuming ones are only run via programs/test/selftest
. The timing test is the most time-consuming by far on a PC-class processor (it takes about 8s to run regardless of the platform — a fast PC can zip through all the other unit tests in less than that).
static int expected_delay_status( uint32_t int_ms, uint32_t fin_ms, | ||
unsigned long actual_ms ) | ||
{ | ||
return( fin_ms == 0 ? -1 : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is int_ms == 0
as valid condition for the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, int_ms
is valid. We didn't document the API very well, but int_ms=0
is a timer that expires immediately, whereas fin_ms=0
is a diabled timer.
unsigned long millis = 0; | ||
unsigned long iterations = 0; | ||
/* Start the timer. */ | ||
(void) mbedtls_timing_get_timer( &timer, 1 ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although it is existing code. But I don't understand why there can't be a separate function to init/reset the timing structure. Would be been very readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed I wouldn't have called a function to set a timer get_timer
, but I'm only testing the API, not changing it.
++iterations; | ||
millis = mbedtls_timing_get_timer( &timer, 0 ); | ||
} | ||
while( millis < TIMING_SHORT_TEST_MS ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the iterations <= TIMING_SHORT_TEST_ITERATIONS_MAX
condition missing here? For a bug in mbedtls_timing_get_timer
that fails to increment millis
the test can be stuck for ever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to make a compromise between detecting infinite loops, and allowing very fast platforms where it takes a lot of iterations to reach the timeout. If mbedtls_timing_get_timer
fails to increment millis
, this should have been detected in the previous test case which calls timing_timer_simple
.
I'm open to being convinced to do that in every test, but every such check is fragile (I need to hard-code a magic value for the iteration cap) and makes the test code more complex. So I prefer to rely on a higher-level timeout. I wasn't even sure about using a max iteration count in timing_timer_simple
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check could be in addition to the time check hence not taking long if time is incremented correctly.
Although I understand the difficulty of getting a correct max count that can work with range of targets slow to fast. It would have been good if there was target specific multiplier to find no. of cycles. Even then it is not very portable due to varying instructions. Hence I will leave it up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to ensure that if timing_timer_simple
fails, then all following tests fail before even running? I feel this would be a nice way to complete Gille's strategy of checking for infinite loops in one place only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Execution of the tests is part of the test framework and it has look after execution requirements for all test suites. In this case it makes sense to abort the suite. But we may not want that in other suites. However, we can add a TEST_SUIT_ASSERT
that aborts the whole suite.
a busy environment such as an overloaded continuous integration | ||
system, so we don't test it it. */ | ||
} | ||
while( millis2 < TIMING_SHORT_TEST_MS ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a mechanism to break out of the loop in case of a bug. See previous comments.
|
||
/* Set an alarm and count how long it takes with a timer. */ | ||
(void) mbedtls_timing_get_timer( &timer, 1 ); | ||
mbedtls_set_alarm( seconds ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not validate that seconds > 0
?
++iterations; | ||
millis = mbedtls_timing_get_timer( &timer, 0 ); | ||
} | ||
while( !mbedtls_timing_alarmed && millis <= millis_max ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are relying on mbedtls_timing_get_timer
to work correctly. Watchdog can be based on iterations
or in addition to the timer.
TEST_ASSERT( millis <= millis_max ); | ||
|
||
mbedtls_timing_alarmed = 0; | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there a need for return
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I don't want to print diagnosis information if the test succeeded.
int saw_status_1 = 0; | ||
int warn_inconclusive = 0; | ||
|
||
assert( int_ms >= 0 ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why no TEST_ASSERT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test function expects int_ms
and fin_ms
to be unsigned long
, but this type is not supported by the test framework, so I use int
as the static type and put an additional dynamic type check to verify that the values are nonnegative. I used TEST_ASSERT
for test failures, and assert
for bad test input (like when there is an error parsing the .data
file). Is this not correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From execution perspective TEST_ASSERT
only aborts a test. assert
will abort the whole suite as a suite built as one executable. Do you want to do that if one test has invalid data? Also this may not work as a wider strategy because data can be wrong mistakenly and a user may not put such checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion here, since either way tests with bad input will fail reliably and a few seconds of investigation will reveal that the problem is with the input data, and this should be a rare occurrence anyway.
I have however a slight preference for TEST_ASSERT()
as it is just as clear (it's fairly obvious to readers that such a statement at the very top of the function is about input validity) and doesn't introduce anything new. But I can totally live with assert()
too.
TEST_ASSERT( saw_status_1 ); | ||
} | ||
|
||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need return here? Also it will make assignment warn_inconclusive = 1
useless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the test succeeds, I don't want to print the diagnosis information. The value of warn_inconclusive
is used when printing the diagnosis information if the test fails (TEST_ASSERT
branches to the exit
label if the assertion fails).
allows 250ms. */ | ||
delta1 = hardclock1 - hardclock0; | ||
TEST_ASSERT( delta1 > 0 ); | ||
TEST_ASSERT( delta1 < 1000000000 ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we document why we have a big number 1000000000
although the delay was only 2 ms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous comment is supposed to document this (“(…) If the hardclock counter runs at 4GHz, then one billion (which is 1/4 of the counter wraparound on a 32-bit machine) allows 250ms.”). Can you suggest how to clarify it?
This PR is critical as CI often fails in timing tests. |
Running it in Oulu CI as that is most time skewed. |
Timing selfTest test fails on Oulu CI https://jenkins-internal.mbed.com/job/mbedtls-pr/314/execution/node/206/log/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting this PR! Overall the new tests look very good.
Summary of feedback:
- suggestion: immediately fail subsequent test if first one failed (feel free to argue against it)
- question that might be outside the scope of this PR: are the new diagnostic messages really visible?
- then minor style/comments nitpicks.
library/timing.c
Outdated
if( verbose != 0 ) \ | ||
{ \ | ||
mbedtls_printf( "failed at line %d\n", __LINE__ ); \ | ||
mbedtls_printf( " cycles=%lu ratio=%lu millisecs=%lu secs=%lu hardfail=%d a=%lu b=%lu\n", cycles, ratio, millisecs, secs, hardfail, (unsigned long) a, (unsigned long) b ); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: can we break that long line after the first coma? (Same for the line below.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
library/timing.c
Outdated
@@ -394,17 +402,16 @@ static void busy_msleep( unsigned long msec ) | |||
*/ | |||
int mbedtls_timing_self_test( int verbose ) | |||
{ | |||
unsigned long cycles, ratio; | |||
unsigned long cycles = 0, ratio = 0; | |||
unsigned long millisecs, secs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree that millisecs
and secs
are initialised before the first occurrence of FAIL
, this might change if we ever re-order tests or add a new one at the beginning, so I'd be more comfortable initializing them too, unless you have a reason not to.
exit: | ||
/* No cleanup needed, but show some diagnostic iterations, because timing | ||
problems can be hard to reproduce. */ | ||
mbedtls_fprintf( stdout, " Finished with millis=%lu get(timer)<=%lu iterations=%lu\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for including those diagnostic messages! However, I seem to remember that stdout from tests is filtered out by the framework unless --verbose
is used (a decision I could never understand by the way). Is that still the case? If so, we should ensure that --verbose
is always used when running the tests on the CI.
/* No cleanup needed, but show some diagnostic iterations, because timing | ||
problems can be hard to reproduce. */ | ||
mbedtls_fprintf( stdout, " Finished with millis=%lu get(timer)<=%lu iterations=%lu\n", | ||
millis, mbedtls_timing_get_timer( &timer, 0 ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should include new_milis
too, in case the failing assertion is new_millis >= millis
, and the return value of get_timer()
happens to change just between the TEST_ASSERT
and the printf()
.
unsigned long millis = 0; | ||
unsigned long iterations = 0; | ||
|
||
/* Start the timer. After setting it for the first time, the return value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be when setting it for the first time, rather that after?
++iterations; | ||
millis = mbedtls_timing_get_timer( &timer, 0 ); | ||
} | ||
while( millis < TIMING_SHORT_TEST_MS ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to ensure that if timing_timer_simple
fails, then all following tests fail before even running? I feel this would be a nice way to complete Gille's strategy of checking for infinite loops in one place only.
/* The timers shouldn't drift apart, i.e. millis2-millis1 should stay | ||
roughly constant, but this is hard to test reliably, especially in | ||
a busy environment such as an overloaded continuous integration | ||
system, so we don't test it it. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: it it.
the alarm (with second resolution) and the timer (with millisecond | ||
resolution) are based on different clocks. Be very liberal with the | ||
upper bound, because the platform might be busy. */ | ||
unsigned long millis_min = ( seconds > 0 ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: indented with 5 spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing the suggested changes. I'm happy with the resulting state of the PR.
(I don't have a strong enough opinion on the assert
thing to delay my approval over it, so I'll let you resolve that with Azim, who might care more than me since he's maintaining the CI.)
The CI has run several times, but due to a Jenkins issue it picked old revisions of this PR. Manual run on the latest version (ca5f6e2): https://jenkins-internal.mbed.com/job/mbedtls-commit-tests/47/ |
I uploaded a fix for a mistake that Clang pointed out. New build: https://jenkins-internal.mbed.com/job/mbedtls-commit-tests/48/ |
Rebased with a cleaner history. The old history is at https://github.com/gilles-peskine-arm/mbedtls/tree/IOTSSL-1798-development-2 |
39d53aa
to
ce6d7a6
Compare
No behavior change.
If given command line arguments, interpret them as test names and only run those tests.
ce6d7a6
to
a87a655
Compare
E.g. "selftest -x timing" runs all the self-tests except timing.
New set of unit tests for the timing module, instead of just running the selftest function. The selftest function sometimes fails on a heavily loaded machine (such as a typical continuous integration system). Because of the all-in-one nature of the test and because the exact load pattern can be hard to reproduce, it is difficult to diagnose failures of CI runs with selftest. The new tests are more separated and I strove to point out potential failure modes in comments. * mbedtls_timing_hardclock: not tested. This function gives so few guarantees that there isn't much to test, and it is hard to test reliably because clock cycles don't easily relate to time in any remotely portable way. This function isn't used in the library anyway, it's only there for benchmark programs. * mbedtls_timing_get_timer: tested by setting a timer and verifying that it reaches its target, and by verifying that a timer started later than another always has a smaller elapsed time. * mbedtls_set_alarm: tested by setting an alarm, busy-waiting for it and measuring the elapsed time with a timer. * mbedtls_timing_set_delay, mbedtls_timing_get_delay: tested by setting a delay object and watching it go through its two delay values, using a timer to check that the delays are passed at the expected time. The tests pass under light to moderate load, but some of them can be defeated with sufficiently heavy load. This is unavoidable since the test process to be effectively suspended for any length of time, making us think that a timer has gone on for too long.
Do test mbedtls_timing_hardclock. We can't reliably test much about it, but at least test that it doesn't crash, isn't constant, and doesn't look completely random.
If timing_timer_simple fails because it detects that timers are likely to never expire (e.g. going backward or not incrementing), skip all tests that rely on timers.
mbedtls_timing_self_test fails annoyingly often when running on a busy machine such as can be expected of a continous integration system. Increase the tolerances in the delay test, to reduce the chance of failures that are only due to missing a deadline on a busy machine.
Increase the duration of the self test, otherwise it tends to fail on a busy machine even with the recently upped tolerance. But run the loop only once, it's enough for a simple smoke test.
a87a655
to
8873bcc
Compare
We don't need to test multiple delays in a self-test. Save 5s of busy-wait.
I've reviewed the diff between the previous version of this branch (in my local copy) and the rebased version, and I'm satisfied nothing nasty happened during the rebase. |
I triggered a daytime run of the CI job that tends to fail: https://jenkins-internal.mbed.com/job/mbedtls-commit-tests/56/ . All the timing tests passed, there were failures only in compat.sh which is a known issue (also due to CI slowness) that wasn't fixed when this PR was forked. |
retest |
1 similar comment
retest |
Fix for the Marvin attack - BACKPORT
The primary objective of this PR is to fix the issue we have with
test_suite_timing
and the TIMING selftest on the CI, which sometimes fail due to heavy load. I also fixed some minor bugs in the timing module.This PR introduces new timing tests that test the functions in the timing module mostly independently. The old self-test is still present in the code but will not be executed by the test suite, like other self-tests that we judge to be too time-consuming. CI still runs it once through
programs/test/selftest
.Compared with the self-tests:
mbedtls_set_alarm
with delays of more than 1 second, which makes the test run for about 2s instead of 8s.There are some test ideas I discarded in https://github.com/gilles-peskine-arm/mbedtls/tree/IOTSSL-1798-development-1 . The present PR is https://github.com/gilles-peskine-arm/mbedtls/tree/IOTSSL-1798-development-2 with a few minor fixes and a cleaned up history.
Incidental changes in this PR:
mbedtls_timing_get_timer
on Windows (internal ref: IOTSSL-1822)mbedtls_set_alarm(0)
on Unix/POSIX.selftest
program, support running a subset of the tests with command line arguments.I've tested several iterations of this PR on my machine and on CI. The CI failures haven't completely gone so far, but they're markedly less frequent.
Partial backport for 2.1: #1223 (bug fixes, self test robustness, selftest command line parsing)
Partial backport for 1.3: #1224 (bug fixes, self test robustness)
Internal ref: IOTSSL-1798