Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce scripts automating wrapping structs members with MBEDTLS_PRIVATE macro #4511

Merged

Conversation

mstarzyk-mobica
Copy link
Contributor

python scripts relies on doxygen to generate xml files.
shell script prepares configuration, enables xml generation in doxyfile and runs the python script

@mstarzyk-mobica mstarzyk-mobica force-pushed the mbedtls_private_with_python branch from 4c196b9 to d338698 Compare May 14, 2021 09:23
@mstarzyk-mobica mstarzyk-mobica self-assigned this May 14, 2021
@mstarzyk-mobica mstarzyk-mobica added mbedtls-3 needs-design-approval needs-review Every commit must be reviewed by at least two team members, needs-reviewer This PR needs someone to pick it up for review labels May 14, 2021
@mstarzyk-mobica
Copy link
Contributor Author

Python script is checking the structmbedtls_*.xml files. I assumed that these contain all the data we need. However now I see that not all structs have their xml models, e.g. mbedtls_aes_context from aes.h. I am not yet sure why this happens.
I searched through generated xml files and mbedtls_aes_context is mentioned only inside functions or in the program listing, so there is no xml model for that particular struct. Any hints why?

@mstarzyk-mobica mstarzyk-mobica force-pushed the mbedtls_private_with_python branch 3 times, most recently from e408700 to e93e28c Compare May 14, 2021 20:28
@@ -0,0 +1,8 @@
make clean
sed -i 's/GENERATE_XML = NO/GENERATE_XML = YES/g' doxygen/mbedtls.doxyfile
scripts/config.py realfull
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why mbedtls_aes_context is missing.

For the purpose of finding private fields, config.py full will get us most of the way. It will miss a handful of cases such as whichever ECDH context is not used in this configuration. I think full is close enough and we can deal with the few remaining cases manually.

@mstarzyk-mobica mstarzyk-mobica force-pushed the mbedtls_private_with_python branch 2 times, most recently from 1eb199c to 30b20a8 Compare May 17, 2021 20:06
@ronald-cron-arm ronald-cron-arm linked an issue May 18, 2021 that may be closed by this pull request
Copy link
Contributor

@ronald-cron-arm ronald-cron-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've discussed it a bit with Manuel and Gilles and we think that the approach of leveraging the doxygen data to mark the structure fields as private is good. That way you are able to do the job with a relatively short and simple python script. Things to do now:

  1. In the *.sh move to the full config not the realfull one as Gilles mentioned
  2. Probably better to put the scripts in the /scripts directory even if they are transient.
  3. After applying your script we need a specific commit that fix the coding style as we end up with a lot of lines that are too long (more than 80 characters). Adopting the structure:
/*!< Doxygen comment about field foo */
    int foo;

rather than

int foo; /*!< Doxygen comment about field foo */ 

would probably ease the restructuring by limiting the number of line return needed.

@mpg @gilles-peskine-arm what to think about that ?

  1. Regarding the programs, as discussed with Gilles offline, define MBEDTLS_ALLOW_PRIVATE_ACCESS in test programs and use MBEDTLS_PRIVATE in sample programs. Not always easy to know if a program is a test or a sample program though. Definitely test: programs/test/*, programs/ssl/ssl_client2, programs/ssl_server2, programs/fuzz/*
    Definitely sample: programs/aes/*, programs/hash/*, programs/pkey/*, programs/random/*, programs/x509/*, programs/psa/key_ladder_demo, programs/psa/crypto_examples. If you have to change a file not in this list just ask we will try to figure out.

Otherwise to check for completeness, we plan to scan the header files resulting from your script with some sed/awk/grep/... magic. The following awk magic prints all the occurrences of set of 3 lines following a "{" line where none of the lines contain MBEDTLS_PRIVATE:

awk '/^{/{{line_number=FNR} for(i=1; i<=3; i++) {getline; lines[i]=$0} for(i=1; i<=3; i++){if(lines[i] ~ /MBEDTLS_PRIVATE/){next}} {printf "--- %s line %d\n", FILENAME, line_number}; for(i=1; i<=3; i++){ print lines[i]} }' *.h

With this I can see that we are probably missing something at line 319 of ecp.h, mbedtls_ecp_restart_ctx structure (locally I've run your script with the full configuration, not realfull).

@gilles-peskine-arm
Copy link
Contributor

After applying your script we need a specific commit that fix the coding style as we end up with a lot of lines that are too long (more than 80 characters).

Can be left over for after 3.0.

If you reorganize Doxygen comments, note that they have to start with a different prefix depending on whether they are before or after the field. Our convention is that Doxygen comments go before unless they're on the same line (or at least start on the same line), and being inconsistent about this would be very confusing to source code readers, so please make sure that you don't break this convention.

Regarding mbedtls_ecp_restart_ctx: that's one of the features that remain disabled in full because it's incompatible with some other features. I expect that there'll be a small number of such cases which we can handle manually. Thanks for providing a script.

@mstarzyk-mobica
Copy link
Contributor Author

2\. Probably better to put the scripts in the /scripts directory even if they are transient.

I thought that this script was supposed to be used once, is there any benefit of keeping it in the repo? If we want to keep it,
perhaps it should be refined to provide some --help and allow for specifing which files should be targeted.

After applying your script we need a specific commit that fix the coding style as we end up with a lot of lines that are too long (more than 80 characters).

I was hoping that maybe we have a clang-tidy or something similar that would do the job. If not, then perhaps it would best to alter the script so that it takes the long comment lines into account and restructures them if needed.

If you reorganize Doxygen comments, note that they have to start with a different prefix depending on whether they are before or after the field. Our convention is that Doxygen comments go before unless they're on the same line (or at least start on the same line), and being inconsistent about this would be very confusing to source code readers, so please make sure that you don't break this convention.

I am a bit confused by this. The doxygen comments are fine both above and in the same line, as long as comment prefixes are correct?

Regarding the programs, as discussed with Gilles offline, define MBEDTLS_ALLOW_PRIVATE_ACCESS in test programs and use MBEDTLS_PRIVATE in sample programs. Not always easy to know if a program is a test or a sample program though. Definitely test: programs/test/, programs/ssl/ssl_client2, programs/ssl_server2, programs/fuzz/
Definitely sample: programs/aes/, programs/hash/, programs/pkey/, programs/random/, programs/x509/*, programs/psa/key_ladder_demo, programs/psa/crypto_examples. If you have to change a file not in this list just ask we will try to figure out.

Alright. We don't have a doxyfile for those, do we? I am not sure how much effort is need to manually apply MBEDTLS_PRIVATE to sample programs, but if it turns out be tedious work, perhaps we could generate doxygen xml for the programs and just reuse (with slight adjustment) the python script used for the library.

@gilles-peskine-arm
Copy link
Contributor

I was hoping that maybe we have a clang-tidy or something similar that would do the job.

No. Clang-tidy doesn't understand our peculiar whitespace style. One Day™ we'll change to a more standard style.

Please do not reformat lines in the same commit as other code changes. git diff -w ignores horizontal whitespace but there isn't a similar way to ignore places where a newline was added or removed so reformatting lines in the same commit as doing other things results in a diff that's hard to read.

As far as I'm concerned, if a few lines become too long, it doesn't matter. We'll fix it later (preferably after we change the coding style, which will alter line lengths anyway).

I am a bit confused by this. The doxygen comments are fine both above and in the same line, as long as comment prefixes are correct?

The following are fine (/*!< instead of /**< is also acceptable):

declaration of foo; //!< Documentation of foo
declaration of bar; /**< Declaration of bar */
declaration of baz; /**< Declaration of baz.
                     * It continues on the next line. */

The following is also fine, but shouldn't be mixed with the other style in the same structure to avoid confusing the reader.

/** Documentation of corge */
declaration of corge;

The following is almost never used in Mbed TLS, even though it is accepted by Doxygen, and please do not introduce any new occurrences to avoid confusion:

declaration of corge;
/**< Documentation of corge */

Alright. We don't have a doxyfile for those, do we? I am not sure how much effort is need to manually apply MBEDTLS_PRIVATE to sample programs, but if it turns out be tedious work, perhaps we could generate doxygen xml for the programs and just reuse (with slight adjustment) the python script used for the library.

Doxygen only parses headers. You can find the places that need MBEDTLS_PRIVATE simply by compiling the programs and adding the macro call where the compiler complains.

@ronald-cron-arm
Copy link
Contributor

ronald-cron-arm commented May 19, 2021

As Gilles suggested, let's forget about the reformatting of lines that are too long.

I thought that this script was supposed to be used once, is there any benefit of keeping it in the repo? If we want to keep it,
perhaps it should be refined to provide some --help and allow for specifing which files should be targeted.

Yes only used once, thus ok to keep it in the root directory. Please add a commit that remove them though.

Otherwise in my previous comment I forgot to mention the most important regarding the output of the awk script. The script does not output a lot of items thus it can be processed manually. Most items are false positive (inline function code, enumeration values) meaning that your script does by far most of the job. Please run the awk script, find the revealed blind spots (mbedtls_ecp_restart_ctx and a few others I think) of your script and do a commit to fix them.

@mstarzyk-mobica
Copy link
Contributor Author

I need your clarification on following files:
programs/ssl/dtls_client.c
programs/ssl/dtls_server.c
programs/ssl/mini_client.c
programs/ssl/ssl_client1.c
programs/ssl/ssl_context_info.c
programs/ssl/ssl_fork_server.c
programs/ssl/ssl_mail_client.c
programs/ssl/ssl_server.c
programs/ssl/ssl_test_lib.c

Which are tests and which are programs?

@gilles-peskine-arm
Copy link
Contributor

gilles-peskine-arm commented May 19, 2021

ssl_test_lib.c, ssl_client2.c and ssl_server2.c are test programs. I think ssl_context_info.c is only a test program. The others in ssl are demonstration programs.

In principle, this should be clear from https://github.com/ARMmbed/mbedtls/blob/development/programs/README.md but ssl_context_info is missing.

@mstarzyk-mobica mstarzyk-mobica force-pushed the mbedtls_private_with_python branch from 30b20a8 to fdabeb9 Compare May 20, 2021 08:45
@mstarzyk-mobica
Copy link
Contributor Author

I used awk script to find remaining structs members.
Test programs use MBEDTLS_ALLOW_PRIVATE_ACCESS
Sample programs use MBEDTLS_PRIVATE wrappers to access structs members.
Lines exceeding 80 chars not fixed.

@mstarzyk-mobica
Copy link
Contributor Author

Still getting few error during build.
check_doxygen_warnings: Check: doxygen warnings (builds the documentation): tests/scripts/doxygen.sh -> 1
not sure what to do about this one

@mstarzyk-mobica mstarzyk-mobica force-pushed the mbedtls_private_with_python branch 3 times, most recently from 7c228ba to 4ccbddb Compare May 20, 2021 10:03
@@ -88,41 +89,41 @@ extern "C" {
*/
typedef struct mbedtls_rsa_context
{
int ver; /*!< Reserved for internal purposes.
int MBEDTLS_PRIVATE(ver); /*!< Reserved for internal purposes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: now that each field is explicitly marked as private, I think we should remove the \note above - direct manipulation is now more than deprecated, it's forbidden.

Conflicts:
         include/mbedtls/ecp.h

Conflict resolved by using the code from development branch
and manually applying the MBEDTLS_PRIVATE wrapping.
Signed-off-by: Mateusz Starzyk <mateusz.starzyk@mobica.com>
Signed-off-by: Mateusz Starzyk <mateusz.starzyk@mobica.com>
Signed-off-by: Mateusz Starzyk <mateusz.starzyk@mobica.com>
@@ -52,7 +52,7 @@ struct mbedtls_timing_hr_time
*/
typedef struct mbedtls_timing_delay_context
{
struct mbedtls_timing_hr_time timer;
struct mbedtls_timing_hr_time MBEDTLS_PRIVATE(timer);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was missed by python script.
I checked the doxygen xml files: Doxygen claims this variable is defined at line 47. I don't know why this happens. However, since all other 'missed' variables were covered by adjusting library configuration, I don't see any value in looking into this any further.

Copy link
Contributor

@gilles-peskine-arm gilles-peskine-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script I posted in my previous review) now only finds the _pms_xxx fields in union mbedtls_ssl_premaster_secret in ssl.h, and some false positives. Since this union is described as “Dummy type used only for its size”, it's clear that it's not a useful part of the public API, so let's not spend more time on it.

We should now remove the scripts from the root directory. I'm ok with doing that in a follow-up.

Copy link
Contributor

@davidhorstmann-arm davidhorstmann-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one comment re the status of a previous comment but LGTM otherwise!

@@ -47,11 +48,11 @@
typedef struct
{
/** The HMAC algorithm in use */
psa_algorithm_t alg;
psa_algorithm_t MBEDTLS_PRIVATE(alg);
/** The hash context. */
struct psa_hash_operation_s hash_ctx;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, does this need changing, or is it okay how it is? I'll approve anyway.

Signed-off-by: Ronald Cron <ronald.cron@arm.com>
…te_with_python

Conflicts:
    include/mbedtls/ssl.h
    include/psa/crypto_struct.h

Conflicts fixed by using the code from development branch
and manually re-applying the MBEDTLS_PRIVATE wrapping.
@ronald-cron-arm
Copy link
Contributor

The previous head of the PR was the merge of a74295f and development(364380e70c7bf682bf7d8ecdb86719ce7ff5b5d0). To avoid a new merge commit, I've restarted from a74295f then added the commit a90e090 then merged 'development(21f8464)`.

@ronald-cron-arm ronald-cron-arm dismissed their stale review June 14, 2021 14:24

outdated change request

Copy link
Contributor

@gilles-peskine-arm gilles-peskine-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a merge locally starting from my previous approval of 0b6c4f8 and removed the temporary scripts, and I ended up with the same result.

Copy link
Contributor

@davidhorstmann-arm davidhorstmann-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks!

@ronald-cron-arm ronald-cron-arm merged commit 823f594 into Mbed-TLS:development Jun 14, 2021
gilles-peskine-arm added a commit to gilles-peskine-arm/mbedtls that referenced this pull request Jun 29, 2021
We forgot those in Mbed-TLS#4511.

Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
daverodgman pushed a commit to daverodgman/mbedtls that referenced this pull request Jun 30, 2021
We forgot those in Mbed-TLS#4511.

Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-review Every commit must be reviewed by at least two team members,
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document structure fields as private
5 participants