-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
There should be a standard way to recusively hash structs #29
Comments
I'd consider this unwise based on previous discussions. Right now, Conversely, all these cryptographic hash functions would destroy the performance of That said, one could provide a version using roughly the endian-hasher crate, unless its horribly broken.
As previously, we'd keep There is still an issue here in that we trust I could hack up a pull request for this if @newpavlov thinks it wise. I think the first question is : Should the |
I'm tired right now, but I think argument against my alternative goes roughly : You should document your protocols and file formats better by using a more explicit serialization. That could just be a warning somewhere. |
Sorry to be this blunt, but this is a terrible idea.
What? The trait definition stops this, because types. Nor is there any standard notion of how to map a cryptographic hash function's digest onto a 64-bit integer. But beyond that, cryptographic hash functions utilized by The fundamental tradeoff is:
To me, if your goal is #2, anything slower than SipHash is unacceptable, because you still want a Do you disagree? Can you explain the use case where you want something that's simultaneously slow and insecure? |
@burdges the endianess issues are indeed a problem. If there's no way to fix those it's unfortunate. I'm not sure if I understand why this happens. Are primitive types hashing by feeding the hasher one u8 at a time? Because the I'll have a look if your suggestion works for my use case.
Nothing stops me from doing a You're assuming I want to use the normal Hasher API with SHA256 to use in a Hashmap or something like that. That would be a bad idea indeed, SipHash is strictly better in that application. What I need is a longer hash that I can use to uniquely identify a certain I generate hashes that represent the state of the image pipeline up to that point as a function of it's settings. I'm currently using MetroHash but I want at least 128bit and a crypto hash for this. I'm only using the |
It worse so long as people use You can always locally employ a version of In fact, I doubt I think those are the arguments against |
@burdges It seems the standard implementation does just that: So it seems |
Yikes, I completely missed that! I'm tempted to raise that as a libs issue. In any case, I suppose serde should be considered the right way to do this for now. |
@burdges That's very interesting how would you use serde for this? All my structs already implement |
@pedrocr perhaps something like this is what you want: https://github.com/cryptosphere/objecthash-rs Note that I never wrote a procedural macro for it. I also intend on doing something slightly different and a bit more fleshed out soon. |
@tarcieri yes, this is precisely my use case, thanks. Wouldn't it be easier to just write a serde |
@pedrocr it may be possible to finagle it through serde visitors. I would advise against using it for now though. I'll have a better library out soon. |
@tarcieri it seems objecthash solves a bigger problem of having redactable structs. For me that's not an issue so maybe just writing a |
Yes, even using the As an aside, I'm pretty happy doing all my cryptographic hashing manually so far because frequently if I need a cryptographic hashes then it actually needs data from multiple sources or maybe should not cover all the data present in the |
@pedrocr as is, the Rust implementation doesn't support redaction, but yes, redaction/Merkle inclusion proofs are one of the nice features you get out of a hashing scheme like that. |
@burdges cool, I'll have a look at this then. Would a PR with something like an adapter between As for doing it manually I was hoping to do it automatically as these structs are specifically designed to be all the settings for the image operation and nothing more. We'll see :) |
I found an easy way : You It'll look vaguely like what I wrote above, but using Also, your original question could now be : Why not implement |
@burdges sounds good, avoids having to write a full |
@burdges also is bincode guaranteed to always send the same bytes? Or can it do things like reorder fields between versions? It may be safer to just implement the |
@pedrocr you seem to be running afoul of the canonicalization problem. One of the nice things about objecthash-like schemes is you don't need to canonicalize prior to hashing. Structurally similar objects will always have the same hash. |
@tarcieri see this for my conclusions on the topic: https://internals.rust-lang.org/t/f32-f64-should-implement-hash/5436/33 My worry in this case is only that the same exact struct will hash differently because bincode changes. Making sure that reordering fields doesn't change the hash for example isn't as important. Since my structs already implement Having a crate that does derive and allows me to use any hash would be nice though so please do ping me once you have something to test. The other features can't hurt either. |
Yes, serde explicitly supports non-self-describing formats, naming bincode as the example. There are various instructions for Also bincode never emits any thing about the fields besides lengths and contents. I'd imagine bincode would use a feature or a major version update if they broke binary comparability, but who knows. |
@burdges so hashing bincode does seem fairly reasonable. Will have a look at that then. |
@pedrocr |
There is a lingering question as to whether the |
@burdges I think the current |
It would also be nice if a |
Should this issue be closed? I think it's semantically incorrect to implement |
@tarcieri the core of the issue is still unresolved. I've changed the title to reflect that. Right now the best way to use crypto hashes on structs is |
@pedrocr fn digest_serialized<S: Serialize, F: Serializer>(s: S, f: F) -> Result<Output<Self::OutputSize>, F::Error>; UPD: edited code a bit, removed bincode method |
Then, by definition, you need an algorithm that knows how to compute cryptographic hashes of structured data. Better make sure it isn't vulnerable to second-preimage attacks. |
@tarcieri as far as I can tell that only applies if you are hashing individual structs and then hashing the concatenated hashes. I don't want that at all. I just want to initialize the hash, pass in all the values recursively, and finalize the hash. This has the same properties as serializing and then hashing without the performance penalty. |
You need to domain separate the values somehow, either by length or by structure, and in either case you'll want to encode the types. The former pretty much involves a bincode-like scheme, and the latter an objecthash one. Otherwise, you'll have ambiguities where different structures compute the same hash. |
None of these are an issue for me.
That's how |
Ok, so it sounds like you want to abandon security... in which case why are you using a cryptographic hash? |
@tarcieri none of these abandon security but even if they did I'm using hashes for caching. All I need is a statistical guarantee of non-collision absent an attacker. |
If you're using hashes like this in any sort of security context and have ambiguities in a structured hashing format, an attacker can exploit them to trick you into accepting documents containing e.g. nested content and have it verify as if it was the same as a non-nested message. Second preimage attacks on naive Merkle Trees are an example of this, and it's similar in concept to other attacks regarding confusion of message structure such as packet injection attacks. If you want to build a naive hashing scheme using cryptographically secure hashes for your own purposes, that's fine, but I would be strongly opposed to the inclusion of naive schemes in cryptography libraries, because people generally expect cryptography libraries to be built for security purposes. |
@tarcieri I suggest you bring this up with the rust lib developers then. All I want is to get a hash calculated the exact same way |
You're talking about creating a
These same attacks can happen without hashes-of-hashes. As I pointed out the general attack is one of failure to disambiguate message structures. Packet injection is the same basic idea, without any hashing involved whatsoever. That's a parsing ambiguity, but the same concept applies: the security system is confused about the message structure, allowing an attacker to trick it into doing something it shouldn't. In my opinion if any messages with different structures can realistically produce the same content hash, the scheme is broken from a security perspective. |
From what I've seen the current implementation doesn't actually randomize the key. But even if it did that's not relevant if it's trivial to generate a collision by just manipulating the structure.
I'd agree I just haven't found a case where that can happen with a setup like that of |
According to the documentation, the implementation used by https://doc.rust-lang.org/std/collections/struct.HashMap.html
|
At least on my machine the hash is not seeded. This always returns the same value: https://gist.github.com/pedrocr/8d0283bed3f56e5cb6fb9fe0a785f947 But that's not relevant to the point as I specifically mentioned. If you can generate structs that have the same hashing because you manipulate the structure it doesn't matter that your hash function is randomly keyed. Within the same execution of the program those structs will colide, just with different values. And that's what would be the bug in |
Use a And to reiterate yet again, in the context where You're colluding threat models here: In the one I would find acceptable for I'm not sure this is continuing to be a productive conversation: I would like the threat of ambiguous messages to be solved strategically for anything named I would find a hand-rolled naive and ambiguous scheme unacceptable. You haven't made a concrete proposal, but you're downplaying everything you need to do to solve these problems from a security perspective, and that's why I'm worried. |
And that's the attack that if it works with
My proposal is very simple. Do exactly what |
Incorrect. Again, hashDoS is an algorithmic attack that relies on the attacker being able to find large numbers of collisions. Finding a single collision is not catastrophic, because it doesn't help the attacker that much. Furthermore, the interesting cases for hashDoS are With a These are totally different threat models you are trying to collude.
The onus is on you to show why your scheme is secure. Otherwise you're shifting the burden of proof. Your (ill-defined) scheme is not secure simply because I haven't found an attack. You need to be able to answer questions like: how is the scheme domain separated everywhere type-by-type? How does it distinguish a A secure scheme will be demonstrably unambiguous and free of collisions for arbitrarily structured messages. Anything less is insecure. This is a much higher bar than |
If you care about collisions across types then yes, it's easy to generate collisions that are benign in |
After looking at
This makes it easier to have values for which |
I would argue, if it's a bug, cause the byte representation of the data is the same. Hash functions work on bytes, not data structures. |
@Trojan295 I'm not sure if you're arguing that it's a bug or not. |
Are you trying to use those structured hashing functions only for internal Rust purposes like Let's take this Generally, I don't think such feature is required in case cryptographic hashes, as in case of crypto you mostly operate on numbers/bytes and not custom data structures. The way of hashing needs to be well known and unified. |
@Trojan295 short of But as I've previously stated, and if it's the thing I have a bug up my butt about, it's conflating security domains and concerns, and that's really what I want to avoid. I am a huge fan of something like a |
I will close this issue and will create a separate issue in the |
I think a utility crate could provide a wrapper type |
@burdges |
std::hash::Hasher
can be derived for structs and is a standard hashing interface in rust. The standard interface only allows 64bit outputs but there's nothing stopping extra outpust tailored to specific hashes. So for ergonomic purposes wouldn't it make sense to have an adapter to allow using theHasher
API?The text was updated successfully, but these errors were encountered: