-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming Interface for authenticated encryption? #218
Comments
Okay, I found why people worry about commitment in the context of encrypted messages. Following this paper about partitioning oracles, Age decided to mitigate the attack by limiting the size of its chunks. Partitioning oracles are about exploiting the fact that some AEAD constructions, including ChaPoly, allow attackers to craft ciphertext that successfully decrypt under more than one key. Instinctively, you would think that a a ciphertext can only decrypt under the key that generated it. Not quite:
The partitioning oracle attack works thus: first, the attacker crafts a ciphertext that decrypts under half of all possible keys, such that the attacker knows that half. Then they send that message to the victim, that candidly leaks whether decryption was successful or not (through an error message or timing leak). Now the attacker has gained one bit of information about the key. Repeat the operation 255 more times, and voilà, the attacker has the whole key, and the victim is pwned. While practical attacks aren't that powerful (against AES-GCM, the paper talks about covering up to 2^16 keys in a single message, and only up to 10 in the case of ChaPoly), they can still be quite scary in two scenarios:
Age mitigate those attacks by limiting the size of its chunks: with this limitation, it's only possible to cover two keys per query instead of a gazillion. Now as much as we could discourage scenarios under which there is a decryption oracle, we're gonna have to assume users may fall into this trap. So as a crypto library, it would be best to only provide committing AEAD. Unfortunately, (i) Monocypher already provides a non-committing ChaPoly construction, and (ii) polynomial hashes are fast. So there's seem to be a tension between usability and performance. That said, there is a way to defeat partitioning Oracles: commit the key before actually using it:
So, if we manage to commit the key before starting the secure session or decrypting the actual message, we should be good. No need to limit the size of chunks like Age does. And since this naturally happens in a number of situation, a non-committing AEAD could be viewed as legitimate. Still, we must recognise that partitioning oracle attacks are a foot gun, and we must consider carefully whether that should stop us. In any case, we probably want to add a security considerations section about them in the current manual. |
Encrypted streams would be very good to have. Sodium's crypto_stream isn't really a stream, however, it's a sequence of messages, and the encrypting and decrypting parties have to agree on what the sizes of the messages are, because the decryption API requires the caller to pass in the entire encrypted message. This is a pain in the butt for use in network connections, because
|
Hmm, I'm not sure we can meaningfully hide message sizes from traffic analysis in an interactive protocol, so I'm not sure Scuttlebutt's workaround really works: when you're sending a message and there's no other message right behind it, you have to send your first message anyway, and encrypting its length won't do you much good if the eavesdropper can just notice a pause in the data stream. If you really want to hide message sizes & boundaries, I have two strategies in mind:
How effective those strategies are really is application dependent. And more to the point, I don't think I can come up with a nice, simple, low-overhead streaming scheme that supports any of the above out of the box. Then there are the use cases where you do know chunk sizes in advance: file encryption and streaming lots of data. In both cases, chunk length can be a hard coded convention, written at the beginning of the file, or agreed upon during the handshake. For these use cases, libsodium's approach minimises overhead. Personally, I don't think I can find a lightweight approach that satisfies your use case out of the box. I'm afraid users will have to endure butt pain, at least until I get around to write an actual network library. |
Seems like I finally have a design worth considering (latest version). Here's the API: typedef struct {
uint32_t counter; // MSB of the ChaCha20 counter
uint8_t key[32];
uint8_t nonce[8];
} crypto_stream_ctx;
void crypto_stream_init(crypto_stream_ctx *ctx,
const uint8_t key[32], const uint8_t nonce[24]);
void crypto_stream_write(crypto_stream_ctx *ctx,
uint8_t mac[16],
uint8_t *cipher_text,
const uint8_t *plain_text, size_t text_size);
int crypto_stream_read(crypto_stream_ctx *ctx,
uint8_t *plain_text,
const uint8_t mac[16],
const uint8_t *cipher_text, size_t text_size);
void crypto_stream_rekey(crypto_stream_ctx *ctx);
void crypto_stream_init_raw(crypto_stream_ctx *ctx,
const uint8_t key[32],
const uint8_t nonce[8],
uint32_t counter);
// with additional data
void crypto_stream_write_aead(crypto_stream_ctx *ctx,
uint8_t mac[16],
uint8_t *cipher_text,
const uint8_t *ad , size_t ad_size,
const uint8_t *plain_text, size_t text_size);
int crypto_stream_read_aead(crypto_stream_ctx *ctx,
uint8_t *plain_text,
const uint8_t mac[16],
const uint8_t *ad , size_t ad_size,
const uint8_t *cipher_text, size_t text_size); A few things to consider:
void crypto_stream_rfc8439_init(crypto_stream_ctx *ctx,
const uint8_t key[32], const uint8_t nonce[12])
{
crypto_stream_init_raw(ctx, key, nonce + 4, load32_le(nonce));
}
void crypto_stream_djb_init(crypto_stream_ctx *ctx,
const uint8_t key[32], const uint8_t nonce[8])
{
crypto_stream_init_raw(ctx, key, nonce, 0);
}
|
I haven't actually used the API yet, so take the following with a grain of salt, but I'd like to know these ahead of time going in:
|
This looks reasonable to me (and similar to the libSodium API that I recently built a byte-oriented stream around.) I assume the behavior of |
Yes, both parties need to call |
Added some tests, and refined the design. I have managed to lift the size limits of chunks, at the cost of an increased rekey overhead (the automatic rekey occurs every 256 chunks instead of every 4 billions). The API maintains an internal context with the following information:
There are 3 ways to initialise this context that makes some sense:
(I have realised that the 12 byte nonce cannot be used for the incremental interface because of catastrophic nonce reuse problems with chunks beyond the first one, as well as rekeying. Because of this, the current design only provides the short and extended nonce versions, which do not have this problem. I removed the Nonce sizes have their usual pros & cons: short nonces have no overhead, but they cannot be random. Long nonces can be random, but they incur an additional HChaCha20 call. Once the context is initialised, we can start sending message chunks. Each chunk is built thus:
The AEAD works the same as RFC 8439, except with a 64-bit counter and a 64-bit nonce, and the start value of the counter is not always zero. The counter is then incremented, and if it cycles back to zero we rekey. With the chosen increment, we end up rekeying every 256 messages. This means one one additional HChaCha20 call per 256 messages. In the absolute worst case (empty messages), the overhead is less than 0.4%. In a reasonable worst case (short messages), the overheads halves down to 0.2%. In a typical streaming scenario with 32KiB chunks, even if we assume perfect vector instructions parallelism we get a 0.006% overhead, which I deem negligible. The reason for this big increment is because it ensures message chunk lengths are unlimited (the theoretical limit is 2^62 bytes, which is close enough to infinity). Finally, the rekey operation derives the next key from the current one:
With this design, I believe we have a versatile, limitless, low-overhead streaming API, that as a bonus even allows RFC 8439 compliance if we can tolerate a tiny little hack (I currently use that hack in the tests to verify RFC compliance). Though I'll still seek peer review, I'm fairly confident the current design is sound. The only thing left to do now is discuss tradeoffs. |
One small change to the rekeying algorithm: we should reset the counter to zero. This changes nothing for automatic rekeying, because the counter is already zero. However, manual rekeying can happen at any time, and when it happens, we can safely delay automatic rekeying to reduce overhead. |
I had an epiphany. 😎 Rekeying can be achieved by copying 32 previously unused bytes. See, RFC 8439 requires one ChaCha20 call to generate the authentication key. Strangely enough, it does not use HChacha20, but regular Chacha20. This incurs a small overhead, but Monocypher conforms to it because it provides cheap compatibility with Libsodium. Thing is, a ChaCha20 block is 64 bytes, and the authentication key is only uses the first 32. The last 32 can be used however we want. This is ideal for rekeying. And since the overhead is so low (just copy 32 bytes), we can afford to do it for every single message, which by the way makes the Even better, since we change the key on a systematic basis, we no longer need to mess with the counter. As a consequence, we now can use 96-bit nonces with multi-chunk messages without fearing any collision, so I can justify core support for those (and along them RFC 8439 compatibility). Icing on the cake, users now have a (somewhat low-performance) fast key erasure user-space CSPRNG at their disposal. When they need new random bytes, they can just write a new message from a stream context, and the current seed will automatically be erased. With this poor man's RNG, the temptation to add a dedicated one plummets. This design is also very close to full blown key commitment support: for single chunk messages, we can use the key as an additional tag instead of a key for the next chunk, and have the recipient compare that tag along with the authentication tag. I'm not sure this is worth the trouble however, because of the increased message size overhead. And on top of it the whole thing is simpler. I saved 8 lines of code. The API didn't change much: I've removed the now redundant void crypto_stream_init(crypto_stream_ctx *ctx,
const uint8_t key[32], const uint8_t nonce[24]);
void crypto_stream_init_djb(crypto_stream_ctx *ctx,
const uint8_t key[32], const uint8_t nonce[8]);
void crypto_stream_init_ietf(crypto_stream_ctx *ctx,
const uint8_t key[32], const uint8_t nonce[12]);
void crypto_stream_write(crypto_stream_ctx *ctx,
uint8_t mac[16],
uint8_t *cipher_text,
const uint8_t *plain_text, size_t text_size);
int crypto_stream_read(crypto_stream_ctx *ctx,
uint8_t *plain_text,
const uint8_t mac[16],
const uint8_t *cipher_text, size_t text_size);
void crypto_stream_write_aead(crypto_stream_ctx *ctx,
uint8_t mac[16],
uint8_t *cipher_text,
const uint8_t *ad , size_t ad_size,
const uint8_t *plain_text, size_t text_size);
int crypto_stream_read_aead(crypto_stream_ctx *ctx,
uint8_t *plain_text,
const uint8_t mac[16],
const uint8_t *ad , size_t ad_size,
const uint8_t *cipher_text, size_t text_size); Internally, initialisation is unchanged, but the way chunks are generated is simplified:
That simple. If I zoom in a little and detail the AEAD we get this:
The |
So essentially, you're doing away with the rekey method because every stream chunk is an immediate re-key anyway at the cost of an extra 32-byte copy for each message. Do you have your RPi on hand? I wonder how bad the extra copy expenses end up being on a smaller platform. Going back to try and rewrite fk to use the new API, I notice that I'm missing libsodium's Was this part of your threat model? |
Yup.
I don't have an R-Pi, but we do have the compiler explorer. Here's what the extra
In both cases we're copying word by word, which should be quite a bit faster than going byte by byte. Overall, I don't expect the slowdown will be noticeable, except perhaps for empty messages (and even then we still process one Chacha20 block and one Poly1305 block for the authentication tag).
Oops, it was not to be honest. This tag looked like an unnecessary complication, so I just scraped it without much thought. That said, I think we have an ace up our sleeve: the additional data. We just need to add something to it at the last block. If the streams cuts of too soon (or too late!), the recipient will have the wrong additional data for its last block, and decryption will automatically fail. #include <monocypher.h>
#define BLOCK_SIZE (32 << 10)
uint8_t END_TAG[4] = "LAST";
void encrypt(uint8_t *out, size_t *out_size,
const uint8_t key[32], const uint8_t nonce[24],
const uint8_t *in, size_t in_size)
{
*out_size = 0;
crypto_stream_ctx ctx;
crypto_stream_init(&ctx, key, nonce);
uint64_t nb_blocks = (in_size + BLOCK_SIZE -1) / BLOCK_SIZE;
for (uint64_t i = 0; i < nb_blocks - 1; i++) {
crypto_stream_write(&ctx, out, out+16, in, BLOCK_SIZE);
in += BLOCK_SIZE;
in_size -= BLOCK_SIZE;
*out_size += BLOCK_SIZE + 16;
}
crypto_stream_write_aead(&ctx, out, out+16, END_TAG, 4, in, in_size);
}
int decrypt(uint8_t *out, size_t *out_size,
const uint8_t key[32], const uint8_t nonce[24],
const uint8_t *in, size_t in_size)
{
*out_size = 0;
crypto_stream_ctx ctx;
crypto_stream_init(&ctx, key, nonce);
uint64_t nb_blocks = (in_size + BLOCK_SIZE + 15) / (BLOCK_SIZE + 16);
for (uint64_t i = 0; i < nb_blocks - 1; i++) {
if (crypto_stream_read(&ctx, out, in, in+16, BLOCK_SIZE)) {
return -1;
}
in += BLOCK_SIZE + 16;
in_size -= BLOCK_SIZE + 16;
*out_size += BLOCK_SIZE;
}
return crypto_stream_read_aead(&ctx, out, in, END_TAG, 4, in+16,BLOCK_SIZE);
} There, no additional branch, and I believe very little additional code overall. How would that translate to your file encryption utility? |
Doesn't this handling of the end tag imply that you know and can transmit the size of the entire stream ahead of time so that you could just authenticate the entire message length as AD in the first block and just error out then? I'm more concerned about scenarios where the full size of the streamed data isn't known ahead of time, such as when JSON or encoded ASN.1 output is streamed out on the fly. Re fk in particular: I wanted to avoid having to determine the file size ahead of time and just stream the chunks in since |
OK, in this case there is no escaping the overhead associated with transmitting that tag across the wire. Libsodium by the way does it quite inefficiently, reserving an entire additional ChaCha20 block for the tag, then authenticating all 64 bytes (16 would have sufficed, the only reason this is not too slow is vector instructions). I'm not sure there's an ideal solution to be honest. Right now I see three approaches:
With the interface I'm proposing, I believe all three approaches are possible. They're not trivial, but if we chose & provided one out of the box we'd paint ourselves into a corner just like Libsodium, which have etched their overhead & extra complexity to the wire format itself. I'd like to avoid that. Some use cases will likely benefit from one approach more than another, and others won't even need the overhead. I want to allow them all if possible. That being said, I'll need to write actual code to get a better feel for this API. One core design goal after all is for it to be suitable for file formats & network protocols, so this has to be tested. And though I'm currently fairly confident about the performance, simplicity, and versatility of my design, further tests may yet send me back to the drawing board. |
Yet another approach just occurred to me:
|
First of all, thank you for your work on Monocypher. I like how clean it is. I've read with interest this thread and wanted to chime in to tell you that I came across Monocrypt, a WIP by @skeeto, that uses Monocypher to encrypt/decrypt files using the high level This tool doesn't need to know à priori the file's size. So I thought that it may give you some inspiration for this streaming interface. Again, thank you for your work, and sorry if my comment is out of topic. |
Thanks for the heads up @ewd340, I’ll take a look. |
I finally got some time tp play with the streaming interface (from the stream branch), and wrote a very basic utility (heavily inspired by Monocrypt). Now, I have to admit, I'm not a cryptographer, but this exercise was fun. #define CHUNKLEN (128 << 10) - 16
#define VERSION 0
uint8_t END_TAG[4] = "LAST";
enum error {ERR_OK, ERR_KEYFILE, ERR_READ, ERR_WRITE, ERR_TRUNC, ERR_INVALID}; I have defined a small header, that could have been only a nonce actually: struct aead_header {
uint8_t nonce[24];
uint8_t version;
}; The encryption function: static enum error fencrypt(FILE *in, FILE *out, const uint8_t key[32]){
int eof;
int len;
uint8_t *ad;
size_t adlen;
enum error err = ERR_OK;
struct aead_header header;
uint8_t buf_in[CHUNKLEN];
uint8_t buf_out[CHUNKLEN+16];
getrandom(header.nonce, 24, GRND_NONBLOCK);
header.version = VERSION;
crypto_stream_ctx ctx;
crypto_stream_init(&ctx, key, header.nonce);
if (!fwrite(&header, sizeof(header), 1, out)) {
return ERR_WRITE;
}
do{
len = fread(buf_in, 1, CHUNKLEN, in);
if (!len && ferror(in)) {
err = ERR_READ;
break;
}
eof = feof(in);
ad = (eof)?END_TAG:0;
adlen = (eof)?4:0;
crypto_stream_write_aead(&ctx, buf_out, buf_out+16, ad, adlen,
buf_in, len);
if (!fwrite(buf_out, 1, (size_t) len+16, out)) {
err = ERR_WRITE;
break;
}
} while(!eof);
return err;
} And the decryption one: static enum error fdecrypt(FILE *in, FILE *out, const uint8_t key[32])
{
int eof;
int len;
uint8_t *ad;
size_t adlen;
enum error err = ERR_OK;
struct aead_header header;
uint8_t buf_out[CHUNKLEN];
uint8_t buf_in[CHUNKLEN+16];
crypto_stream_ctx ctx;
/* Read the header to get some info about the encryption: nonce, ver... */
if (!fread(&header, 1, sizeof(header), in)) {
return ferror(in) ? ERR_READ : ERR_TRUNC;
}
crypto_stream_init(&ctx, key, header.nonce);
do {
len = fread(buf_in, 1, CHUNKLEN+16, in);
if (!len && ferror(in)) {
err = ERR_READ;
break;
}
eof = feof(in);
ad = (eof)?END_TAG:0;
adlen = (eof)?4:0;
if(crypto_stream_read_aead(&ctx, buf_out, buf_in, ad, adlen,
buf_in+16, len-16)) {
err = ERR_INVALID;
break;
}
if(!fwrite(buf_out, (size_t) len-16, 1, out)) {
err = ERR_WRITE;
break;
}
} while(!eof);
return err;
} Again, this was just a small POC, please let me know if I'm missing something. |
For what it's worth, my network-stream code is in my secret-handshake-cpp repo. There are two layers: CryptoBox works with chunks of encrypted data, reading/writing an entire chunk; and CryptoStream is a byte-oriented stream interface that reads and writes CryptoBoxes. This has not been heavily used yet, but it works well enough to support the CapnProto RPC protocol on a TCP socket. |
@ewd340, though it may have been intentional for readability, you may have missed one thing: The way you use additional data to signal the end of the stream and avoid truncation attacks looks correct. It's also simple and cheap, I like it. |
@snej, thanks for the reference, I'll take a look, see if what I'm proposing here could have been used to simplify your code. |
Thank you so much for the feedback, @LoupVaillant!
Would something like this do the trick? static enum error random_buf(uint8_t *buf, size_t buflen)
{
enum error err = ERR_OK;
if (buflen > 255) {
err = ERR_NO_RANDOM;
} else {
ssize_t len = getrandom(buf, buflen, 0);
err = (len == (ssize_t)buflen)?ERR_OK:ERR_NO_RANDOM;
}
return err;
} |
Speaking of this interface, I got confused a little bit by the parameters order of the functions: void crypto_stream_write_aead(crypto_stream_ctx *ctx,
uint8_t mac[16],
uint8_t *cipher_text,
const uint8_t *ad , size_t ad_size,
const uint8_t *plain_text, size_t text_size);
int crypto_stream_read_aead(crypto_stream_ctx *ctx,
uint8_t *plain_text,
const uint8_t mac[16],
const uint8_t *ad , size_t ad_size,
const uint8_t *cipher_text, size_t text_size); Not a big deal, it is easy to read indeed, but I somewhat expected to have the same order (especially for the |
@ewd340 If I may... I have some input on getrandom and its relatives. First of all: The cutoff is at 256 bytes, not 255 bytes. For everything else, read on. getrandom(2) started as a Linux-specific syscall, but there are vaguely compatible interfaces in other systems, namely at least FreeBSD, Solaris and illumos (but not OpenBSD or macOS, more on those later). There is no standard to adhere to. The 256-byte limitation is arbitrary (though likely there to avoid spending too much time in the kernel per syscall) and you may get a larger buffer filled but only partially. Since there is no real distinction between a syscall and a “normal” C standard library function, an operating system is free to implement a semantically identical userspace getrandom(3) instead, too. I therefore don't think you should guard the size limit pre-emptively. Just let the call fail and go from there. I would therefore instead write approximately this (snippet hereby placed under the CC0-1.0):
(Since there is no standard, any implementation may also fail when less bytes are requested, but since most platforms will add it for compatibility with Linux, there is a natural incentive not to lower the maximum number of bytes returned.) On a similar note: getrandom(2) is somewhat new, introduced in Linux 3.17 (2014); perhaps some embedded platform's vendor kernel doesn't have it. glibc added support in 2.25 (2017). If you have a new glibc but an old kernel, glibc may return Miscellaneous portability notes born from frustration: OpenBSD or macOS have not implemented getrandom(2). They use getentropy(2). The interface is intended to seed (cryptographically secure) RNGs in userspace with kernel randomness; e.g. OpenBSD arc4random(3) seeds itself with it. Applications are expected to directly call arc4random(3) instead. While almost all of the platforms mentioned thus far implement arc4random(3), there is one major platform that does not: Linux/glibc. POSIX will, in a future issue, standardize getentropy() only.. There is no help coming by ways of the standards committees of the world. A portable solution would, therefore, have to check in this order and at least partially at build time:
About Windows: Windows has a crypto API and it expects you to use it; getting random bytes involves setting up a cryptography context and then calling a function called BCryptGenRandom that uses it to generate random bytes. (The function also has "BCrypt" in its name, yet has nothing to do with the bcrypt hash algorithm.) This isn't strictly necessary, however. Some people work around this limitation by calling into the deprecated RtlGenRandom() function, which doesn't require keeping state. This leads hacks such as the one found in libsodium. |
Thanks a lot @fscoto for your very instructive input, I liked every bit of it! Thanks also for the better Re: portable solutions, I appreciate the details you provided, I wish every OS would just provide |
Current time is way too predictable. Not enough entropy to be extracted from it. Even if you get down to the millisecond, an attacker that can guess when you performed the seed down to the second will get only 10 bits of entropy to search through. And I’m being reasonable, the best attacks may be even better. If you have an OS that can give you randomness, you should use it. If the system API is too horrible, you may initialise a user space RNG when the program starts up, and after that no more errors. User space RNGs are error prone though, multithreading and all that, so be careful. And sometimes, you don’t even get randomness. Perhaps not even jitter entropy if you’re using a single process in-order embedded CPU. In that case, you need to come up with 32 random bytes from another source (like your desktop), record that seed in your chip, and seed your RNG with that. And of course, make sure you renew the seed every time you reboot, so you don’t end up reusing the same "random" stream over and over. |
Thanks again @LoupVaillant and sorry if I'm diverting the discussion from its main topic with my randomness issues. I enjoy the very informative input I'm getting from you, guys. The" Goodness, no!" made me chuckle... I saw it coming :) Still, I thought that, since the nonce need not be unpredictable (only unique per key, if I have understood correctly), I can get away with something basic. Anyways... even for a nonce, it won't hurt to have good entropy... so my laziness is not justified, I admit. Back to the interface, again not a big deal, but I humbly think that the void crypto_stream_write_aead(crypto_stream_ctx *ctx,
uint8_t mac[16],
uint8_t *cipher_text,
const uint8_t *ad , size_t ad_size,
const uint8_t *plain_text, size_t text_size);
int crypto_stream_read_aead(crypto_stream_ctx *ctx,
const uint8_t mac[16],
uint8_t *plain_text,
const uint8_t *ad , size_t ad_size,
const uint8_t *cipher_text, size_t text_size); Or is there a reasoning behind the proposed order that I don't get? |
Oh, sorry, I misunderstood. I shouldn't haven answered when I was this tired. Yes, a nonce doesn't have to be unpredictable, only unique. If you have a reliable clock (that never display the same hour), you can use it. There's one big caveat though: make sure you don't run two encryptions at the same time. Alternatively, you can maintain a counter, but the same reuse misuse that applies with userspace RNGs will apply to that counter.
There is: in all of Monocypher, the arguments go in the following order:
And yes, sometimes this particular consistency sacrifices other kinds of consistency. Your proposal makes sense, it's just not the logic I chose to follow. Also, note the order of arguments for the current AEAD API: void crypto_lock_aead(uint8_t mac[16],
uint8_t *cipher_text,
const uint8_t key[32],
const uint8_t nonce[24],
const uint8_t *ad , size_t ad_size,
const uint8_t *plain_text, size_t text_size);
int crypto_unlock_aead(uint8_t *plain_text,
const uint8_t key[32],
const uint8_t nonce[24],
const uint8_t mac[16],
const uint8_t *ad , size_t ad_size,
const uint8_t *cipher_text, size_t text_size); Same logic, similar order. (We have a key and nonce instead of the context, so they don't come first, but the rest of the arguments are in the same order.) (Edit: Alternatively, I could swap |
This is it! I kind of felt it, but failed to see it. Thanks again. |
Tests not complete, no documentation yet. I'm currently rethinking the AEAD API as a whole, and to be honest I'm so happy with this streaming API that I believe it could replace the regular API entirely. One problem with the AEAD API is the sheer number of arguments. `crypto_lock_aead()` and `crypto_unlock_aead()` currently have 8 arguments, comprising 6 pointers (all of the same type) and 2 sizes. There are way too many opportunities to swap arguments and break stuff. The streaming API however is divided into an init phase, which has only 3 arguments, and a read/write phase, which has 7, but "only" 4 pointers to byte buffers. Which I don't think we can improve much. We could try and use a second struct similar to what we do with Argon2, but with only 7 arguments (compared to Argon2's 15) I don't think we would gain that much readability. As for how to use the streaming API for single shot uses, that's obvious enough: - Declare the context and call Init. - Call read/write. - Wipe the context. One may argue that everything else (Poly1305, Blake2b, SHA-512, and HMAC) provide a one-shot API, and we should do so here as well. There's just one problem: we don't have one init function, we have _three_. If we provide a one-shot API, orthogonality would need all 3 variants. That's 6 functions total (3 locks, 3 unlocks), which is a bit much, especially since at least one of them is only provided for compatibility with a standard I don't entirely agree with. We could of course only provide only the single one-shot API (like we do right now), but that leaves such an obvious hole in the API. Stopping at just the 5 functions we need for everything (streaming, one-shot, all 3 variants) is very tempting. See #218
The streaming interface has been added, and the designed gelled to what I believe to be a satisfactory design. There is one thing I must explain that's not in the message commit: the choice to not imitate libsodium. There are two reasons:
Oh, and about this key commitment thing an partition oracle attacks: that should be addressed by committing the key at the beginning of the stream, typically by outputting a hash derived from the same seed as the encryption key. With this, I believe we are done. |
Now that Monocypher is just under 1900 lines of code, we have room for more functionality. One that is very tempting to add is streaming AEAD, similar to Libsodium's.
One reason for the temptation is the ability to share code with
crypto_lock()
andcrypto_unlock()
. If the streaming API is provided separately, then a bunch of code, most notably authentication, must be duplicated. Inside Monocypher however, we can define the current AEAD interface in terms of the streaming API. I have written a prototype that currently costs 49 lines of code (and 6 new functions, and one additional struct).On the other hand, we don't want to include something of limited value. We may want other functionality in the future, and it would be a pity if we had to break the 2KLoC psychological barrier to get it. To be included, this streaming interface should satisfy the following criteria:
If possible, it should also satisfy those bonus criteria:
crypto_lock()
(mostly for the sake of elegance).One major question remains about the "no major footguns" criterion: key commitment. As they stand, polynomial hash based ciphers are vulnerable to attacks where the same message could be successfully decrypted under several possible keys. And most importantly, an attacker could find those keys and exploit that for nefarious purposes. There's also a related question about message commitment, where it's possible to find several messages that produce the same authentication tag (that requires knowledge of the key).
We could solve those problems by using a random key robust AEAD, such as Chacha20/Blake2b, or by adding a bunch of encrypted zeroes to the message (well, to each chunk, actually). This is either slower or unwieldy, not to mention the increased size overhead.
I believe there is another way: make sure the protocol leaves no room for tricking the recipient into using the wrong decryption key. Commit the key once, then use ordinary AEAD. Also, we need to determine how much of an issue the lack of message commitment may be.
If this works, then a ChaPoly based streaming interface is probably worth adding. If it does not work, then even
crypto_lock()
was probably not worth adding, and we'd better not compound the problem with yet another broken construction.Food for thought, to be considered when we can make the time.
The text was updated successfully, but these errors were encountered: