Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fault mitigation #5646

Merged
merged 10 commits into from
Nov 25, 2022
Merged

Fault mitigation #5646

merged 10 commits into from
Nov 25, 2022

Conversation

jenswi-linaro
Copy link
Contributor

This is #5247 rebased and updated.

The use case is to make buf_ta_open() more resilient to fault injection attacks on the hardware, more specifically glitching attacks. The basic assumption is that glitch can cause one or more instructions in sequence to act as nops when executed. Anything can happen of course but I'm using this approximation to have something more concrete to work with.

The fault mitigations are supposed to add zero overhead unless enabled and if enabled have a modest overhead.

In this PR I've made fault mitigations enabled by default with CFG_CORE_FAULT_MITIGATION?=y in mk/config.mk. Should it be disabled by default instead?

@ldts
Copy link
Contributor

ldts commented Nov 18, 2022

@jenswi-linaro are there any performance tests you plan to run? I have several boards (xilinx, nxp, stm32...) that I could use if you need help. Not that performance is more critical than security in this case so may be irrelevant.

* This is implicit since we're normally trying to protect things post
* boot and booting takes quite some time.
*
* [1] https://www.riscure.com/uploads/2020/05/Riscure_Whitepaper_Fault_Mitigation_Patterns_final.pdf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jenswi-linaro the link is not visible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll update.

@jenswi-linaro
Copy link
Contributor Author

@jenswi-linaro are there any performance tests you plan to run? I have several boards (xilinx, nxp, stm32...) that I could use if you need help. Not that performance is more critical than security in this case so may be irrelevant.

@ldts, thanks for the offer. I haven't planned any performance testing because I believe the added overhead will not be noticeable combined with signature verification. That said, if you notice anything odd please let me know.

@jenswi-linaro
Copy link
Contributor Author

Rebased to resolve a merge conflict.

Copy link
Contributor

@etienne-lms etienne-lms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor things in commit "Basic fault mitigation routines"

* While a function is executed it can update its state as a way of keeping
* track of important passages inside the function. Before the function
* returns with for instance ftmn_return_res() to check that the
* accumulated state matches the expected state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setence sounds strange.

| (...). BeforeWhen the function
| * returns with for instance ftmn_return_ res() to it is checked that the
| * accumulated state matches the expected state.

?

* FTMN_PANIC() - FTMN specific panic function
*
* This function is called whenever the FTMN function detects an
* inconsistency. An inconsistency is able to occur if the system is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 space chars

*
* This function is called whenever the FTMN function detects an
* inconsistency. An inconsistency is able to occur if the system is
* subject to an fault injection attack, in this case doing a panic() isn't
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/an/a/

#ifdef __KERNEL__
#define FTMN_PANIC() panic();
#else
#define FTMN_PANIC() TEE_Panic(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A specific ID could be helpful when printed in generic core debug trace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I never look at the TEE_Panic ID. Do you have any particular ID in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TEE_ERROR_SECURITY or TEE_ERROR_BAD_STATE could be good candidates.
But i admit it will not be specifically obvious it specifically from the fault injection countermeasure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only occasion when I have found specific values to be helpful in TEE_Panic(), is when running negative tests in CI, when the xtest output is visible but not the secure world log. Not very applicable to this case I suppose. But FWIW either ways are OK with me (0 or something else).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care which we take either. @etienne-lms which do you prefer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not willing to track glitch attacks on CI runs? :)
Ok, 0 is fine.

static inline void __ftmn_callee_done(struct ftmn_func_arg *arg,
unsigned long my_hash, unsigned long res)
{
if (IS_ENABLED(CFG_CORE_FAULT_MITIGATION))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reson not testing arg here, compared to other neighbour functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well spotted, I'll fix.

*
* The passed result will be stored in the struct ftmn_func_arg struct
* supplied by the caller. This function can be called any number of times
* by the callee, provided that one of the FTMN_CALLEE_DONE_XXX() function has
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/function/functions/

* __ftmn_get_tsd_func_arg().
*
* The FTMN_CALLE_* functions only work with the real function name so the
* old hash must be removed and replaces with the new for the calling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/replaces/replaced/

(my_old_hash), FTMN_FUNC_HASH(__func__))

/*
* FTMN_SET_CHECK_RES() - records a result in local checked state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d/record/Record/

ditto for few macros below.

#include <fault_mitigation.h>
#include <initcall.h>
#include <kernel/thread.h>
#include <types_ext.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

order

@@ -54,6 +54,7 @@
#include <tee/tee_ta_enc_manager.h>
#include <tee/uuid.h>
#include <utee_defines.h>
#include <fault_mitigation.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

order

@jenswi-linaro
Copy link
Contributor Author

Addressed all comments except the one concerning TEE_Panic(0).

Copy link
Contributor

@jforissier jforissier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I find the modified code a bit difficult to read unfortunately, but I suppose there is not much that can be done. The impact could have been much worse ;)

A couple of comments below, then for the whole:

Acked-by: Jerome Forissier <jerome.forissier@linaro.org>


typedef int (*ftmn_memcmp_t)(const void *p1, const void *p2, size_t nb);

/* The default hash used when xoring the result in struct ftmn_check */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are these values chosen? More or less randomly I suppose? Any particular property they'd better have?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, just random.

{
#if defined(CFG_CORE_FAULT_MITIGATION) && defined(__KERNEL__)
return &thread_get_tsd()->ftmn_arg;
#elif defined(CFG_CORE_FAULT_MITIGATION)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CFG_CORE should not be used for user-space features IMO. How about CFG_FAULT_MITIGATION?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

Adds basic fault mitigation routines designed to help protecting from
fault injection attacks on the hardware. This is by no means bullet
proof, but it should at least improve the situation.

These routines focus on verifying that a function has been called and
that the returned value matches the result from the function. This is
done by having a handshake between the caller and the callee where also
the return value is transmitted in a separate channel.

Acked-by: Jerome Forissier <jerome.forissier@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Adds some simple test for the fault mitigation routines.

Acked-by: Jerome Forissier <jerome.forissier@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Adds fault mitigation in mbedtls_rsa_rsassa_pss_verify_ext() by using
the macro FTMN_CALLEE_DONE_MEMCMP() instead of memcmp() when checking
that the hash in the RSA signature is matching the expected value.

FTMN_CALLEE_DONE_MEMCMP() saves on success the result in a thread local
storage if fault mitigations was enabled when the function was called.

Acked-by: Jerome Forissier <jerome.forissier@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
…fy()

Adds fault mitigation in mbedtls_rsa_rsassa_pkcs1_v15_verify() by using
the macro FTMN_CALLEE_DONE_MEMCMP() instead of just
mbedtls_safer_memcmp() when checking that the hash in the RSA signature
is matching the expected value.

FTMN_CALLEE_DONE_MEMCMP() saves on success the result in a thread local
storage if fault mitigations was enabled when the function was called.

Acked-by: Jerome Forissier <jerome.forissier@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Adds fault mitigations in crypto_acipher_rsassa_verify() by checking
that the internal call to memcmp() when verifying the hash in the RSA
signature was called and was successful.

The internal call to memcmp() records the result of the comparison if
successful. This is double checked against the normal return value from
the called pk_info->verify_func().

If the normal return value is OK then the recorded return value must
match or we're likely subject to a fault injection attack and we're
triggering a panic.

If the normal return value isn't OK we don't care about the recorded
value, it's overridden by a new error code. In this case we don't know
if we're subject to a fault injection attack or not, the important thing
to make sure that the calling function doesn't miss the error.

This fault mitigation is only enabled with the calling function enabled
fault mitigations and CFG_CORE_FAULT_MITIGATION is 'y'.

Acked-by: Jerome Forissier <jerome.forissier@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Adds fault mitigations in crypto_acipher_rsassa_verify() and dependent
functions in libTomCrypt in order to include the critical final
memcompare.

This fault mitigation is only enabled with the calling function enabled
fault mitigations and CFG_CORE_FAULT_MITIGATION is 'y'.

Acked-by: Jerome Forissier <jerome.forissier@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
…a_verify()

Adds a stubbed fault mitigation for the drivers version of
crypto_acipher_rsassa_verify). End the function with FTMN_CALLEE_DONE()
to record that the function was indeed called and a redundant copy of
the return value.

Acked-by: Jerome Forissier <jerome.forissier@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Adds fault mitigations to shdr_verify_signature() and
shdr_verify_signature2(). shdr_verify_signature() and
shdr_verify_signature2() are called using the wrapper FTMN_CALL_FUNC()
which verifies that the correct function was called and that the return
value hasn't been tampered with.

Acked-by: Jerome Forissier <jerome.forissier@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Adds and enables fault mitigation in buf_ta_open() to check both the
signature of the TA and then also the hash of the TA before returning
success.

Acked-by: Jerome Forissier <jerome.forissier@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Adds and enables fault mitigation in ree_fs_ta_open() to check the
signature of the TA before returning success.

Acked-by: Jerome Forissier <jerome.forissier@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
@jenswi-linaro
Copy link
Contributor Author

Squashed and tags applied.

@etienne-lms
Copy link
Contributor

LGTM.
Acked-by: Etienne Carriere <etienne.carriere@linaro.org>

@jenswi-linaro
Copy link
Contributor Author

Tag applied.

@jforissier jforissier merged commit 2d7720f into OP-TEE:master Nov 25, 2022
@jenswi-linaro jenswi-linaro deleted the fault_mitigation branch November 25, 2022 11:14
size_t hash_size = 0;
size_t hash_algo = 0;

if (shdr->magic != SHDR_MAGIC)
return TEE_ERROR_SECURITY;
goto err;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to 'goto err' on OP-TEE 3.14 and ran 'xtest -t regression 1008' and encountered an 'unhandled pageable abort' error. After analysis, I think it is because crypto_acipher_alloc_rsa_public_key is not executed but crypto_acipher_free_rsa_public_key is executed. Not sure if the latest version is OK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the (latest) implementation, I think it currently works fine as key is 0 initialized and all public key free possible callback handles (ltc, mbedtls, versal, caam and se050 do behave nicely in such context. Yet, I may have missed something and all in one, it looks a bit fragile and I think you're right, shdr_verify_signature() exit error sequence should not always call crypto_acipher_free_rsa_public_key().

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error is really because the key was not initialized to 0 on the old version. After initialization, all xtests passed. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For info: looking into struct rsa_keypair management, few other places assume it is safe to call crypto_acipher_free_rsa_public_key() on a key ref that was zero initialized and not allocated or partially allocated, for example sw_crypto_acipher_alloc_rsa_keypair() in lib/libtomcrypt/rsa.c and lib/libmbedtls/core/rsa.c. So contrary to above my comment, current latest implementation is fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants