Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More robust timing tests #1136

Merged
merged 13 commits into from
Dec 26, 2017

Conversation

gilles-peskine-arm
Copy link
Contributor

@gilles-peskine-arm gilles-peskine-arm commented Oct 10, 2017

The primary objective of this PR is to fix the issue we have with test_suite_timing and the TIMING selftest on the CI, which sometimes fail due to heavy load. I also fixed some minor bugs in the timing module.

This PR introduces new timing tests that test the functions in the timing module mostly independently. The old self-test is still present in the code but will not be executed by the test suite, like other self-tests that we judge to be too time-consuming. CI still runs it once through programs/test/selftest.

Compared with the self-tests:

  • The new tests are more modular and print some detailed information when they fail.
  • I don't test mbedtls_set_alarm with delays of more than 1 second, which makes the test run for about 2s instead of 8s.
  • The timing self-test is more tolerant and shorter.

There are some test ideas I discarded in https://github.com/gilles-peskine-arm/mbedtls/tree/IOTSSL-1798-development-1 . The present PR is https://github.com/gilles-peskine-arm/mbedtls/tree/IOTSSL-1798-development-2 with a few minor fixes and a cleaned up history.

Incidental changes in this PR:

  • Fix read from an uninitialized integer object in mbedtls_timing_get_timer on Windows (internal ref: IOTSSL-1822)
  • Improved documentation of the timing API.
  • Fix mbedtls_set_alarm(0) on Unix/POSIX.
  • In the selftest program, support running a subset of the tests with command line arguments.

I've tested several iterations of this PR on my machine and on CI. The CI failures haven't completely gone so far, but they're markedly less frequent.

Partial backport for 2.1: #1223 (bug fixes, self test robustness, selftest command line parsing)
Partial backport for 1.3: #1224 (bug fixes, self test robustness)

Internal ref: IOTSSL-1798

@gilles-peskine-arm gilles-peskine-arm force-pushed the IOTSSL-1798-development branch 3 times, most recently from f2666e8 to 80a4bcc Compare October 11, 2017 14:13
@gilles-peskine-arm
Copy link
Contributor Author

CI has now passed twice (I rebased a couple of times).

@gilles-peskine-arm gilles-peskine-arm added the needs-review Every commit must be reviewed by at least two team members, label Oct 11, 2017
Copy link

@mazimkhan mazimkhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Just few suggestions on robustness of tests given they would run in CI environment.

first and run other test functions only if that timing_timer_simple
succeeded. */

#include <limits.h>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it available on Windows and target platforms? I know it is part of standard library. But seems like it comes as part of linux-headers on Linux.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

limits.h is usually shipped with the compiler rather than with libc, so it's available everywhere (the C standard term is that it must be present on freestanding implementations). On Linux it comes with gcc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addition to what Gilles said, we also use this header in the library (including in check_config.h) so if their was an issue with it, we would know about it.

{
TEST_ASSERT( mbedtls_timing_self_test( 1 ) == 0 );

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this test depend on mbedtls_timing_self_test as pre-requisite? If not wouldn't it be better to keep it as seperate distinct test in .data file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These unit tests are completely independent from mbedtls_timing_self_test. We don't run all the self-tests in the unit tests, some of the more time-consuming ones are only run via programs/test/selftest. The timing test is the most time-consuming by far on a PC-class processor (it takes about 8s to run regardless of the platform — a fast PC can zip through all the other unit tests in less than that).

static int expected_delay_status( uint32_t int_ms, uint32_t fin_ms,
unsigned long actual_ms )
{
return( fin_ms == 0 ? -1 :

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is int_ms == 0 as valid condition for the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, int_ms is valid. We didn't document the API very well, but int_ms=0 is a timer that expires immediately, whereas fin_ms=0 is a diabled timer.

unsigned long millis = 0;
unsigned long iterations = 0;
/* Start the timer. */
(void) mbedtls_timing_get_timer( &timer, 1 );

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it is existing code. But I don't understand why there can't be a separate function to init/reset the timing structure. Would be been very readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed I wouldn't have called a function to set a timer get_timer, but I'm only testing the API, not changing it.

++iterations;
millis = mbedtls_timing_get_timer( &timer, 0 );
}
while( millis < TIMING_SHORT_TEST_MS );

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the iterations <= TIMING_SHORT_TEST_ITERATIONS_MAX condition missing here? For a bug in mbedtls_timing_get_timer that fails to increment millis the test can be stuck for ever.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to make a compromise between detecting infinite loops, and allowing very fast platforms where it takes a lot of iterations to reach the timeout. If mbedtls_timing_get_timer fails to increment millis, this should have been detected in the previous test case which calls timing_timer_simple.

I'm open to being convinced to do that in every test, but every such check is fragile (I need to hard-code a magic value for the iteration cap) and makes the test code more complex. So I prefer to rely on a higher-level timeout. I wasn't even sure about using a max iteration count in timing_timer_simple.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check could be in addition to the time check hence not taking long if time is incremented correctly.
Although I understand the difficulty of getting a correct max count that can work with range of targets slow to fast. It would have been good if there was target specific multiplier to find no. of cycles. Even then it is not very portable due to varying instructions. Hence I will leave it up to you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to ensure that if timing_timer_simple fails, then all following tests fail before even running? I feel this would be a nice way to complete Gille's strategy of checking for infinite loops in one place only.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Execution of the tests is part of the test framework and it has look after execution requirements for all test suites. In this case it makes sense to abort the suite. But we may not want that in other suites. However, we can add a TEST_SUIT_ASSERT that aborts the whole suite.

a busy environment such as an overloaded continuous integration
system, so we don't test it it. */
}
while( millis2 < TIMING_SHORT_TEST_MS );

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a mechanism to break out of the loop in case of a bug. See previous comments.


/* Set an alarm and count how long it takes with a timer. */
(void) mbedtls_timing_get_timer( &timer, 1 );
mbedtls_set_alarm( seconds );

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not validate that seconds > 0?

++iterations;
millis = mbedtls_timing_get_timer( &timer, 0 );
}
while( !mbedtls_timing_alarmed && millis <= millis_max );

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are relying on mbedtls_timing_get_timer to work correctly. Watchdog can be based on iterations or in addition to the timer.

TEST_ASSERT( millis <= millis_max );

mbedtls_timing_alarmed = 0;
return;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a need for return?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I don't want to print diagnosis information if the test succeeded.

int saw_status_1 = 0;
int warn_inconclusive = 0;

assert( int_ms >= 0 );
Copy link

@mazimkhan mazimkhan Oct 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why no TEST_ASSERT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test function expects int_ms and fin_ms to be unsigned long, but this type is not supported by the test framework, so I use int as the static type and put an additional dynamic type check to verify that the values are nonnegative. I used TEST_ASSERT for test failures, and assert for bad test input (like when there is an error parsing the .data file). Is this not correct?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From execution perspective TEST_ASSERT only aborts a test. assert will abort the whole suite as a suite built as one executable. Do you want to do that if one test has invalid data? Also this may not work as a wider strategy because data can be wrong mistakenly and a user may not put such checks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion here, since either way tests with bad input will fail reliably and a few seconds of investigation will reveal that the problem is with the input data, and this should be a rare occurrence anyway.

I have however a slight preference for TEST_ASSERT() as it is just as clear (it's fairly obvious to readers that such a statement at the very top of the function is about input validity) and doesn't introduce anything new. But I can totally live with assert() too.

TEST_ASSERT( saw_status_1 );
}

return;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need return here? Also it will make assignment warn_inconclusive = 1 useless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the test succeeds, I don't want to print the diagnosis information. The value of warn_inconclusive is used when printing the diagnosis information if the test fails (TEST_ASSERT branches to the exit label if the assertion fails).

allows 250ms. */
delta1 = hardclock1 - hardclock0;
TEST_ASSERT( delta1 > 0 );
TEST_ASSERT( delta1 < 1000000000 );

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we document why we have a big number 1000000000 although the delay was only 2 ms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous comment is supposed to document this (“(…) If the hardclock counter runs at 4GHz, then one billion (which is 1/4 of the counter wraparound on a 32-bit machine) allows 250ms.”). Can you suggest how to clarify it?

@mazimkhan
Copy link

This PR is critical as CI often fails in timing tests.

@mazimkhan
Copy link

Running it in Oulu CI as that is most time skewed.

@mazimkhan
Copy link

mazimkhan commented Oct 17, 2017

Timing selfTest test fails on Oulu CI https://jenkins-internal.mbed.com/job/mbedtls-pr/314/execution/node/206/log/

Copy link
Contributor

@mpg mpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting this PR! Overall the new tests look very good.

Summary of feedback:

  • suggestion: immediately fail subsequent test if first one failed (feel free to argue against it)
  • question that might be outside the scope of this PR: are the new diagnostic messages really visible?
  • then minor style/comments nitpicks.

library/timing.c Outdated
if( verbose != 0 ) \
{ \
mbedtls_printf( "failed at line %d\n", __LINE__ ); \
mbedtls_printf( " cycles=%lu ratio=%lu millisecs=%lu secs=%lu hardfail=%d a=%lu b=%lu\n", cycles, ratio, millisecs, secs, hardfail, (unsigned long) a, (unsigned long) b ); \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: can we break that long line after the first coma? (Same for the line below.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

library/timing.c Outdated
@@ -394,17 +402,16 @@ static void busy_msleep( unsigned long msec )
*/
int mbedtls_timing_self_test( int verbose )
{
unsigned long cycles, ratio;
unsigned long cycles = 0, ratio = 0;
unsigned long millisecs, secs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree that millisecs and secs are initialised before the first occurrence of FAIL, this might change if we ever re-order tests or add a new one at the beginning, so I'd be more comfortable initializing them too, unless you have a reason not to.

exit:
/* No cleanup needed, but show some diagnostic iterations, because timing
problems can be hard to reproduce. */
mbedtls_fprintf( stdout, " Finished with millis=%lu get(timer)<=%lu iterations=%lu\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for including those diagnostic messages! However, I seem to remember that stdout from tests is filtered out by the framework unless --verbose is used (a decision I could never understand by the way). Is that still the case? If so, we should ensure that --verbose is always used when running the tests on the CI.

/* No cleanup needed, but show some diagnostic iterations, because timing
problems can be hard to reproduce. */
mbedtls_fprintf( stdout, " Finished with millis=%lu get(timer)<=%lu iterations=%lu\n",
millis, mbedtls_timing_get_timer( &timer, 0 ),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should include new_milis too, in case the failing assertion is new_millis >= millis, and the return value of get_timer() happens to change just between the TEST_ASSERT and the printf().

unsigned long millis = 0;
unsigned long iterations = 0;

/* Start the timer. After setting it for the first time, the return value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be when setting it for the first time, rather that after?

++iterations;
millis = mbedtls_timing_get_timer( &timer, 0 );
}
while( millis < TIMING_SHORT_TEST_MS );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to ensure that if timing_timer_simple fails, then all following tests fail before even running? I feel this would be a nice way to complete Gille's strategy of checking for infinite loops in one place only.

/* The timers shouldn't drift apart, i.e. millis2-millis1 should stay
roughly constant, but this is hard to test reliably, especially in
a busy environment such as an overloaded continuous integration
system, so we don't test it it. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: it it.

the alarm (with second resolution) and the timer (with millisecond
resolution) are based on different clocks. Be very liberal with the
upper bound, because the platform might be busy. */
unsigned long millis_min = ( seconds > 0 ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: indented with 5 spaces.

Copy link
Contributor

@mpg mpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing the suggested changes. I'm happy with the resulting state of the PR.

(I don't have a strong enough opinion on the assert thing to delay my approval over it, so I'll let you resolve that with Azim, who might care more than me since he's maintaining the CI.)

@gilles-peskine-arm
Copy link
Contributor Author

The CI has run several times, but due to a Jenkins issue it picked old revisions of this PR. Manual run on the latest version (ca5f6e2): https://jenkins-internal.mbed.com/job/mbedtls-commit-tests/47/

@gilles-peskine-arm gilles-peskine-arm added runCI needs-design-approval needs-review Every commit must be reviewed by at least two team members, and removed needs-review Every commit must be reviewed by at least two team members, runCI needs-design-approval labels Dec 14, 2017
@gilles-peskine-arm
Copy link
Contributor Author

gilles-peskine-arm commented Dec 14, 2017

I uploaded a fix for a mistake that Clang pointed out. New build: https://jenkins-internal.mbed.com/job/mbedtls-commit-tests/48/

@gilles-peskine-arm
Copy link
Contributor Author

Rebased with a cleaner history. The old history is at https://github.com/gilles-peskine-arm/mbedtls/tree/IOTSSL-1798-development-2

@gilles-peskine-arm gilles-peskine-arm force-pushed the IOTSSL-1798-development branch 2 times, most recently from 39d53aa to ce6d7a6 Compare December 20, 2017 19:18
If given command line arguments, interpret them as test names and only
run those tests.
E.g. "selftest -x timing" runs all the self-tests except timing.
New set of unit tests for the timing module, instead of just running
the selftest function.

The selftest function sometimes fails on a heavily loaded
machine (such as a typical continuous integration system). Because of
the all-in-one nature of the test and because the exact load pattern
can be hard to reproduce, it is difficult to diagnose failures of CI
runs with selftest. The new tests are more separated and I strove to
point out potential failure modes in comments.

* mbedtls_timing_hardclock: not tested. This function gives so few
  guarantees that there isn't much to test, and it is hard to test
  reliably because clock cycles don't easily relate to time in any
  remotely portable way. This function isn't used in the library
  anyway, it's only there for benchmark programs.
* mbedtls_timing_get_timer: tested by setting a timer and verifying
  that it reaches its target, and by verifying that a timer started
  later than another always has a smaller elapsed time.
* mbedtls_set_alarm: tested by setting an alarm, busy-waiting for it
  and measuring the elapsed time with a timer.
* mbedtls_timing_set_delay, mbedtls_timing_get_delay: tested by
  setting a delay object and watching it go through its two delay
  values, using a timer to check that the delays are passed at the
  expected time.

The tests pass under light to moderate load, but some of them can be
defeated with sufficiently heavy load. This is unavoidable since the
test process to be effectively suspended for any length of time,
making us think that a timer has gone on for too long.
Do test mbedtls_timing_hardclock. We can't reliably test much about
it, but at least test that it doesn't crash, isn't constant, and
doesn't look completely random.
If timing_timer_simple fails because it detects that timers are likely
to never expire (e.g. going backward or not incrementing), skip all
tests that rely on timers.
mbedtls_timing_self_test fails annoyingly often when running on a busy
machine such as can be expected of a continous integration system.
Increase the tolerances in the delay test, to reduce the chance of
failures that are only due to missing a deadline on a busy machine.
Increase the duration of the self test, otherwise it tends to fail on
a busy machine even with the recently upped tolerance. But run the
loop only once, it's enough for a simple smoke test.
We don't need to test multiple delays in a self-test.
Save 5s of busy-wait.
@gilles-peskine-arm gilles-peskine-arm changed the title New timing unit tests More robust timing tests Dec 20, 2017
@mpg
Copy link
Contributor

mpg commented Dec 21, 2017

I've reviewed the diff between the previous version of this branch (in my local copy) and the rebased version, and I'm satisfied nothing nasty happened during the rebase.

@mpg mpg added needs-design-approval and removed needs-review Every commit must be reviewed by at least two team members, labels Dec 21, 2017
@gilles-peskine-arm
Copy link
Contributor Author

I triggered a daytime run of the CI job that tends to fail: https://jenkins-internal.mbed.com/job/mbedtls-commit-tests/56/ . All the timing tests passed, there were failures only in compat.sh which is a known issue (also due to CI slowness) that wasn't fixed when this PR was forked.

@mazimkhan
Copy link

retest

1 similar comment
@mazimkhan
Copy link

retest

@mpg mpg merged commit ada3ee8 into Mbed-TLS:development Dec 26, 2017
minosgalanakis pushed a commit that referenced this pull request Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants