-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mpi_mul_hlp code size is huge #1717
Comments
I created a pull request showing what I am using locally: |
The CLA signing process, what would have been needed in order to integrate my patch, got stuck somewhere in the company I work for, so my patch probably won't get integrated. As you have done the similar thing already before seeing my patch, please do make a pull request, I guess this would benefit many people with memory-constrained systems. |
#5706 has changed the structure of bignum multiplication to have less manual unrolling. For @hanno-arm Is there more to gain there? Or can we consider this issue to be resolved? |
See also #5360 (comment) for some data. I'm not super inclined to add yet another compile time option, so I'd like to find a compromise that's acceptable for most platforms, and I think 8-1 is a good compromise. |
The code in this area has changed quite a bit, so #1718 was closed, and other changes in #5373 also closed, until eventually the changes in #5706 were merged. With that, the best change that could be made would be
This would save 236 bytes when compiled for the TF-M configuration with ArmClang 6.19 and |
From a quick run of
i.e., the most common value is 4 - so, if this is representative of real applications, unrolling X8 doesn't benefit the majority of cases and we would probably see better perf as well as smaller code size by reducing the unroll amount to 4 (or maybe 2 or 3). |
The distribution of |
But
|
There is clearly some worthwhile size optimisation that can be done if we only have 256-bit curves and/or 384-bit curves (picking "most common" and "NIST-recommended" sizes there) |
This really needs some performance tests before we can move forward with it |
Description
mbed TLS build:
Version: 2.10.0 , git commit c041435
Enhancement\Feature Request
Justification - why does the library need this feature?
We are building mbedtls for signature checking for an embedded device (arm cortex core), and found out that the current code in library/bignum.c: mpi_mul_hlp() unrolls a loop big time like this:
This generates quite big code, no matter of the compiler optimizations. We'd like to have an option to define out all but the last loop, which would then leave the unrolling to the compiler. We can save around 4kB of .text with this fix, which is > 25% of all the code that we need for signature checking.
Suggested enhancement
Something like this:
The text was updated successfully, but these errors were encountered: