Compile time SHA256 override? #702

gmaxwell · 2019-12-18T12:22:08Z

It's a bit silly that the library packs in its own sha256 even when the user has their own and on small devices it's a costly waste of space. In places like Bitcoin Core the environment has a fast SIMD or SHA-NI sha256, ... it actually makes a measurable difference in signing time to use SHA-NI.

It would be nice if there were some -Dlibsecp2561k_sha256init=x -Dlibsecp2561k_sha256update=y -Dlibsecp2561k_sha256final=z settings that could be used to compile time substitute in another library and leave the internal one out.

elichai · 2019-12-18T12:25:59Z

Last time we discussed this on IRC some people said that providing 3 functions might be a bit much and having a single sha256 function instead is better.
I think that a single one might be a bit awkward because it requires allocating a big buffer on the stack that can contain everything (and then more things you need to clean).

What are your thoughts on this?

real-or-random · 2019-12-18T16:48:30Z

I think you'll want this feature only if you seriously care about performance. And in this case, you probably have the more complex API with 3 functions anyway.

apoelstra · 2019-12-18T17:51:15Z

Also, in practice a single function is often harder to use - if you want to hash a series of things it's nice to just throw them all into a hash engine ... if you need to create a buffer first then (depending on your language) you may need to think about buffer allocation and indexing.

Another alternate might be to just conditionally compile out the sha256 functions, and require the user provide replacements with exactly the correct names in order for linking to succeed. This is what we do for the default error callbacks for example.

gmaxwell · 2019-12-18T20:08:36Z

only if you seriously care about performance.

I think the vast majority of users would be ones that don't want to have an extra 19kilobytes in their binary.

Another alternate might be to just conditionally compile out the sha256 functions, and require the user provide replacements with exactly the correct names in order for linking to succeed.

That would be ducky too.

elichai · 2019-12-20T09:13:23Z

Unless anyone already works on this I'd like to give it a try :)

real-or-random · 2020-04-24T17:12:19Z

Unless anyone already works on this I'd like to give it a try :)

Could you look into this?

elichai · 2020-04-24T17:46:12Z

Could you look into this?

IIRC the problem I encountered was the SHA2 context.
I remember a few solutions but none of them was great:

Decide that the context needs to be size X, provide an opaque struct with unsigned array size X, and the implementer can memcpy this into his own Context back and forth (or type pun it? need to look into what the standard says on that).
The implementer also needs to provide a header with the name sha256.h that defines the struct+functions using the same names/symbols we use.
Provide a c_void pointer to the context in the function's API. (it then requires a reset function to reset the context, and it limits us because we can't hash two things "in parallel")

Any feedback on which is best or maybe a different solution is appreciated.

sipa · 2020-04-24T17:58:39Z

One possibility is keeping the SHA256 padding/chopping into blocks on the secp256k1 side, and only require external implementations to provide a SHA256 transformation function.

The API would look like:

/** Update the state (array of 8 uint32_t) pointed to by state with 64*blocks bytes of input pointed to by data. */
void sha256_transform(uint32_t *state, size_t blocks, unsigned char *data);

This means the external implementation may need to copy data from/to "array of 8 uint32_t" to their own internal representation - but I suspect almost any C-like implementation already uses that representation anyway.

elichai · 2020-04-26T07:23:08Z

That's an interesting idea.
I do think it will probably require the user to slightly modify his implementation to match this (I've looked at a few sha2 implementations I've found on google, and they all have their own quirks when it comes to the "transform" function)

And is there any real advantage with passing a bunch of blocks at the same time? won't it just call the transform function in a loop? or is there some SIMD/SHA-NI magic this enables?

elichai · 2020-04-26T15:05:26Z

I played with this, it works pretty nice, but then I realized that one of the reasons were binary size, so I measured using counting asm lines in godbolt(I'm not sure if it's a good way to compare),
and exposing RFC6979+HMAC+SHA2+SHA2_transform.
with -Os (for minimal size):
As-is: 3,523 lines of asm.
Omitting the transform function: 429 lines.
Omitting all of SHA2: 242 lines.

with -O2:
As-is: 3,844 lines.
Omitting transform: 511 lines.
Omitting all of SHA2: 236 lines.

with -O3:
As-is: 6,852 lines.
Omitting transform: 3,470 lines.
Omitting all of SHA2: 565 lines.
(-O3 is crazy lol)

Anyone who wants to play with it: https://godbolt.org/z/MPF3w9
add -DUSE_EXTERNAL_SHA2_TRANSFORM to omit the transform function.
add -DUSE_EXTERNAL_SHA2 to omit all of sha2 functions and leave only HMAC+RFC6979

real-or-random · 2020-04-26T16:20:09Z

For binary size you can really just look at the file size of the binary, see #700 (comment).

add -DUSE_EXTERNAL_SHA2 to omit all of sha2 functions and leave only HMAC+RFC6979

If at all, I think we would want the opposite. We'll need SHA256 for Schnorr sigs and ECDH, but HMAC/RFC6979 is only used for deriving nonces, and you can use SHA256, too.

elichai · 2020-04-27T08:45:25Z

If at all, I think we would want the opposite. We'll need SHA256 for Schnorr sigs and ECDH, but HMAC/RFC6979 is only used for deriving nonces, and you can use SHA256, too.

I just had to expose some end result through godbolt to get asm, I used RFC6979 because it doesn't have any complex logic and it uses SHA256 in a complex way that prevent inlining everything and optimizing write together with finalize and initialize (so it's an example to show the asm code of SHA2 not of RFC6979)

real-or-random · 2020-04-28T09:39:21Z

This really depends on whether we have a compile-time override or a runtime override but we should think about a self-test in the spirit of https://github.com/bitcoin/bitcoin/blob/99813a9745fe10a58bedd7a4cb721faf14f907a4/src/crypto/sha256.cpp#L465 .

elichai · 2020-04-28T21:32:17Z

I think the main advantage is actually code size, because my tests don't show big advantages for optimized sha2 implementations, although I haven't tested with SHA-NI but I don't think it's going to be a huge improvement in terms of performance, as SHA2 is really fast compared to anything EC related

switck · 2021-02-01T14:21:55Z

Im making embedded library containing libsecp256k1, see https://github.com/switck/libngu

thoughts:

already have a few sha256's functions in my codebase, do not want again in my binary (especially a vanilla version like this one)
would like to use the hmac-sha256 being carried in your library, but it's declared static
definitely want your deterministic nonce, since that impacts private key security
this is a code-size problem, not a performance concern

RE: sharing SHA256 code

a single-shot function is best if you plan to share code.
all s/w implementations use init/update/finalize but the context is never predictable
hardware SHA256 has context that isn't just a memory pointer+length (ie. chip resources)
you are hashing short messages, not megabytes, so stop/restart not needed
allocating space on the stack is instant; copying/appending messages onto stack is quick
a single-shot function can optimize for specific message lengths, since complete message size is known at start.

real-or-random · 2021-12-23T15:23:41Z

@elichai Any update here? Maybe, if you currently don't have a lot of time, it would still be good to share your WIP branch, so someone else could adopt the PR.

elichai · 2021-12-23T18:12:19Z

@elichai Any update here? Maybe, if you currently don't have a lot of time, it would still be good to share your WIP branch, so someone else could adopt the PR.

Sadly I can't find it :(

One possibility is keeping the SHA256 padding/chopping into blocks on the secp256k1 side, and only require external implementations to provide a SHA256 transformation function.

The API would look like:
/** Update the state (array of 8 uint32_t) pointed to by state with 64*blocks bytes of input pointed to by data. */
void sha256_transform(uint32_t *state, size_t blocks, unsigned char *data);
This means the external implementation may need to copy data from/to "array of 8 uint32_t" to their own internal representation - but I suspect almost any C-like implementation already uses that representation anyway.

Do you know if implementations would want to keep the state as a __m128i instead of uint32_t? (I see that bitcoin's implementations load it to vector registers each time, but wondering if that's optimal)

real-or-random · 2022-01-02T13:34:48Z

It's a bit silly that the library packs in its own sha256 even when the user has their own and on small devices it's a costly waste of space.

In places like Bitcoin Core the environment has a fast SIMD or SHA-NI sha256, ... it actually makes a measurable difference in signing time to use SHA-NI.

Going a step back, I believe that these are separate concerns. As per the discussion here, an override helps mostly to save space.

If we want to profit from hardware optimizations, this is somewhat orthogonal. An override would help here but only if the caller controls the compilation. This is true for Bitcoin Core but in general it's rather an exception. Also, if you look at Core's SHA256 implementation (https://github.com/bitcoin/bitcoin/blob/master/src/crypto/sha256.cpp), it's not clear how the override would look like. Depending on the available hardware, the essential function is either Transform, Transform_2way, Transform_4way, Transform_8way.

This is an argument in favor of optimizing our SHA256 implementation (independent of the possibility to override), and I tend to believe that this is desirable. We didn't want to do this in the past because SHA256 was an implementation detail. But as it's an integral part of Schnorr verification and signing now, things have changed. It's just somewhat messy: We'd need to duplicate code from Core (and worse, change it to C). Or move SHA256 (and maybe other stuff such as ChaCha20 for #760) to a separate library that's linked to secp256k1 and Core. Or expose the SHA256 here and make Core use them. None of this sounds great.

Thoughts?

benma · 2024-07-14T20:44:14Z

I'd very much like being able to plug in a different sha256 implementation to save binary space.

real-or-random mentioned this issue Apr 26, 2020

Add schnorrsig module which implements BIP-340 compliant signatures #558

Merged

real-or-random mentioned this issue Jul 22, 2020

Get rid of autoconf check for endianness #770

Closed

This was referenced Aug 8, 2020

WIP: "safegcd" field and scalar inversion #767

Closed

Document / assert non-portable assumptions #792

Open

apoelstra mentioned this issue Feb 20, 2022

Limit SharedSecret to 32 byte buffer rust-bitcoin/rust-secp256k1#402

Merged

andrewkozlik mentioned this issue Aug 12, 2022

Investigate how to save flash space trezor/trezor-firmware#1947

Closed

jonasnick mentioned this issue Oct 11, 2023

Silent Payments (BIP352) module -- discussion about scope and interface #1427

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compile time SHA256 override? #702

Compile time SHA256 override? #702

gmaxwell commented Dec 18, 2019 •

edited

Loading

elichai commented Dec 18, 2019

real-or-random commented Dec 18, 2019

apoelstra commented Dec 18, 2019

gmaxwell commented Dec 18, 2019

elichai commented Dec 20, 2019

real-or-random commented Apr 24, 2020

elichai commented Apr 24, 2020 •

edited

Loading

sipa commented Apr 24, 2020 •

edited

Loading

elichai commented Apr 26, 2020

elichai commented Apr 26, 2020 •

edited

Loading

real-or-random commented Apr 26, 2020

elichai commented Apr 27, 2020 •

edited

Loading

real-or-random commented Apr 28, 2020

elichai commented Apr 28, 2020

switck commented Feb 1, 2021

real-or-random commented Dec 23, 2021

elichai commented Dec 23, 2021

real-or-random commented Jan 2, 2022

benma commented Jul 14, 2024

Compile time SHA256 override? #702

Compile time SHA256 override? #702

Comments

gmaxwell commented Dec 18, 2019 • edited Loading

elichai commented Dec 18, 2019

real-or-random commented Dec 18, 2019

apoelstra commented Dec 18, 2019

gmaxwell commented Dec 18, 2019

elichai commented Dec 20, 2019

real-or-random commented Apr 24, 2020

elichai commented Apr 24, 2020 • edited Loading

sipa commented Apr 24, 2020 • edited Loading

elichai commented Apr 26, 2020

elichai commented Apr 26, 2020 • edited Loading

real-or-random commented Apr 26, 2020

elichai commented Apr 27, 2020 • edited Loading

real-or-random commented Apr 28, 2020

elichai commented Apr 28, 2020

switck commented Feb 1, 2021

real-or-random commented Dec 23, 2021

elichai commented Dec 23, 2021

real-or-random commented Jan 2, 2022

benma commented Jul 14, 2024

gmaxwell commented Dec 18, 2019 •

edited

Loading

elichai commented Apr 24, 2020 •

edited

Loading

sipa commented Apr 24, 2020 •

edited

Loading

elichai commented Apr 26, 2020 •

edited

Loading

elichai commented Apr 27, 2020 •

edited

Loading