-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Identify improvements in baremetal branch that should be upstreamed #3535
Comments
I used this to list all PRs that have been merged in the baremetal branch:
The output consists of 151 PRs. I did a first pass over that to annotate them with:
I'm attaching the resulting prs-baremetal.txt file for reference, but I don't suggest anyone reviews it, as this would be tedious and provide low value. I'll report the results by category in follow-up comments, making use of the following script:
|
Rejected PRs There are 67 PRs in the baremetal branch I'm suggesting to reject as out-of-scope or inapplicable, which fall in the following categories:
|
PRs that were already upstreamed I've identified 28 PRs that don't need upstreaming, either because it was already done, or because the PR is actually a side-port from something that was first done upstream. For information, they fall in the following categories:
|
PRs that we may or may not want to upstreamThere are 20 PRs here, that I'll group in 3 series. 1. Measurements: There are 16 PRs that relate directly or indirectly to measuring the size of Mbed TLS in the configuration of interest to the baremetal branch.
Reason(s) against upstreaming this: the config was designed with a specific use case in mind. Reasons for upsteaming this include:
Foreseeable difficulties in upstreaming (or indeed not upstreaming) this: it's probably going to be hard to upsteam all PRs in this series at once, as later PRs in this series will depend on other code size PRs (introducing new config.h) options. Conversely, not upstreaming this series is likely to create issue when upstreaming some code size PRs that update Recommendation: I'm quite in favour of upstreaming this, but I'd like to know what others think. List of PRs in this series: 2. Code size: reduce cost of the MD and PK layers when a single alg is available
Overview: make the MD layer zero-cost (in terms of code-size and runtime indirection) when a Reasons for upstreaming: who doesn't want to save hundreds of bytes? (probably more than 1K if we complete the work in the PK layer (see PR description) and do something similar in the Cipher layer as well) Reasons against upstreaming:
Recommendation: I'm inclined not to upstream this as-is, but perhaps study it to see if some of the ideas could be applied in the implementation of the PSA Crypto API without compromising other goals. (Note: the PK PR is probably easier to review, but the MD PR goes further, as explained in the PK PR's description.) 3. Code size: Make all of X.509 as single compilation unit
Overview: turn all of the X509 library into a single translation unit, by Reasons for upstreaming: who doesn't want to save hundreds or bytes, especially by just letting the compiler to more work for us? Reasons against upstreaming: the way it's done is not very clean and make cause issues for people importing Mbed TLS in another project (as demonstrated by the Mbed OS issue). Recommendation: I don't think we should upstream as-is, but we should definitely looks for ways to achieve the same result in a cleaner way. The savings are quite significant (and making more functions static reduce our ABI surface as a bonus). |
PRs suitable for upstreaming - outliersMost remaining PRs that are suitable for upstreaming are in the "code size" category (some of them also save a bit of RAM), except two that I'll mention now to get them out of the way. I'll complete my review of code size PRs on Monday. RAM: X.509 parsing on-demand: This was started by Hanno as a PR in the development branch, then side-ported to baremetal where it was finalized and merged, and the original development PR was never merged. It now needs rebasing for conflict resolution (which Hanno is prepared to do) and review (which needs resourcing as the PR is large en complex, but Manuel already reviewed it on the baremetal side, so could act as a first reviewer).
Misc: remove an unnecessary cast in X.509 |
For categories 2 and 3 I feel the same way as you do: I don't think it is worth upstreaming them, but we can learn from them, when we are implementing similar improvements. For category 1 PRs I was against upstreaming at first, because the sheer number of PRs was intimidating. Taking a closer look showed that most of them are really tiny or consist mainly test data (certificates). There are maybe 2 or 3 medium sized PRs that would take longer to review. I very much would like to see some code size measurements in our tests: from time to time we do code size optimisations, but without measuring and tracking it all that effort just falls in the category of premature optimisation. I have some concern about taking that somewhat arbitrary configuration as the standard for our measurements, but as you say it is fairly reasonable configuration and I think it is a good starting point. In summary, I am for upstreaming the category 1 PRs as well. |
PRs suitable for upstreaming: code sizeThis is the bulk of upstreamable PRs. The numbers at the end are the impact on code size with ARM-GCC (as measured with Note: I've tried to group PRs than are simple fixes or follow-up of another PR with the original. I've also tried to document dependencies but I might have missed a few. Same for possible API/ABI break (I might also have flagged some that are actually not problematic, so the flag just means "needs attention/discussion"). New config.h options to make more code optional We already have ways to configure many things out of the build, but it turned out that we could use more of those.
Hardcoding of SSL options The second PR in this series introduces a new concept, used by all subsequent PRs: whenever we have something in SSL that can be configured at run-time (eg: list of allowed algorithms, use of extensions, protocol versions, etc.), if we know at compile-time what that option will be, we can say so in Note: all PRs in this series depend the first two: more precisely, they all depend on r592 which depends on r590.
Misc (SSL)
Misc (X.509) Note: the two PRs in this series both depend on r584, which is X.509 on-demand parsing (part of RAM optimisation work) (upstream: #2478) - mentioned in the previous comment Note: since those two PRs are about removing unused fields of technically-public structures in some configurations, API/ABI compatibility questions need to be checked before upstreaming, or we might choose to include this in 3.0 work.
Misc (other) |
Code-size saving PRs sorted by bytes saved The table below lists PRs that have code-size savings mentioned in their description, sorted from larger to smallest savings. For each change, only the "base" PR is listed (not follow-ups for bugfixes and tests, nor preparatory work). Some PRs making multiple changes, the total code size savings of all changes in that PR is used. Note: excluded from this table is the change with the largest impact (use TinyCrypt to replace our ECC implementation, including bignum), because I don't think it's suitable for upstreaming. Its exact impact is harder to measure (a large part of bignum would be removed by the linker anyway) but was likely of the order of 4 to 8 KiB.
Total savings from items in the "YES" category: 11502 bytes. |
I learned about the impact of structure field ordering on code size from the work on the baremetal branch., yet I couldn't find something corresponding to #4747 in this list. Did I miss it? |
@gilles-peskine-arm IIRC it was r649. The logic applied there, however, was not to move commonly used fields to the top, but rather sort them in size from smallest to biggest. This helps because the maximum immediate offset is 127 * the size of the access, so 32-bit accesses have a large max. offset than byte accesses. E.g. if you have a struct with 128 byte fields, 64 halfword fields and 32 word fields, you can access all of them from the base of the structure through an immediate offset if and only if you order them in precisely this way. It might be worth revisiting whether the logic you apply might be more suitable in places. |
Note: since this study was done, one of the changes listed above has been made independently in development:
This was done in #6784 as part of cleaning up before driver-only hashes work. (I didn't check if other changes have happened, I just noticed this one.) |
The
baremetal
branch of Mbed TLS includes changes, features and options that allow to reduce the size of the code and/or the RAM usage of the library. Most of those would be of general interest to people who use Mbed TLS in highly constrained environments and would be suitable for upstreaming; while some of them are either too specific, or would conflict with or be made redundant by future development (such as the move the PSA Crypto).This task is to:
This task does not include doing any of the actual side-porting/upstreaming - this work will need to be planned later from the above epics.
The text was updated successfully, but these errors were encountered: