Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There should be a standard way to recusively hash structs #29

Closed
pedrocr opened this issue Jun 29, 2017 · 60 comments
Closed

There should be a standard way to recusively hash structs #29

pedrocr opened this issue Jun 29, 2017 · 60 comments

Comments

@pedrocr
Copy link

pedrocr commented Jun 29, 2017

std::hash::Hasher can be derived for structs and is a standard hashing interface in rust. The standard interface only allows 64bit outputs but there's nothing stopping extra outpust tailored to specific hashes. So for ergonomic purposes wouldn't it make sense to have an adapter to allow using the Hasher API?

@burdges
Copy link

burdges commented Jun 29, 2017

I'd consider this unwise based on previous discussions.

Right now, std::hash::Hasher does not address endianness, making it non-portable. It's okay for HashMap because you can only serialize HashMap as a series of tuples, not as the raw table. It's already not okay for more lossy probabilistic data structures, like a Cuckoo table or Bloom filter, that you likely serialize as the table itself. It's flat wrong for cryptographic hash functions that one uses almost exclusively in protocols or files.

Conversely, all these cryptographic hash functions would destroy the performance of HashMap if used there, so avoid doing that too. SipHasher is already designed to be a sweat spot between being fast and providing the sort of cryptographic assurances HashMap needs when under DoS attacks.

That said, one could provide a version using roughly the endian-hasher crate, unless its horribly broken.

use std::hash::{Hash, Hasher};
use endian_hasher::*;

struct DigestHasher<'a, T: 'a + ?Sized>(&'a mut T);

impl<'a, T: ?Sized + Input> Hasher for DigestHasher<'a, T> {
    fn finish(&self) -> u64 {  panic!();  }
    fn write(&mut self, bytes: &[u8]) {  self.0.process(bytes);  }
}

pub trait Input {
    /// Digest input data. This method can be called repeatedly
    /// for use with streaming messages.
    fn process(&mut self, input: &[u8]);

    /// Provide data from anything that implements `Hash`, attempting to converting embedded primitive numeric types to little endian.
    fn input_le<H: Hash>(&mut self, hashable: &H) {
        let mut s = HasherToLE(DigestHasher(self));
        hashable.hash(&mut s);
    }

    /// Provide data from anything that implements `Hash`, attempting to convert embedded primitive numeric types to big endian.
    fn input_be<H: Hash>(&mut self, hashable: &H) {
        let mut s = HasherToBE(DigestHasher(self));
        hashable.hash(&mut s);
    }

    /// Provide data from anything that implements `Hash`, leaving primitive numeric types in native endianness.
    ///
    /// Warning: Non-portable, do not use without first correcting the endianness of the input.
    fn input_native<H: Hash>(&mut self, hashable: &H) {
        let mut s = DigestHasher(self);
        hashable.hash(&mut s);
    }
}

As previously, we'd keep DigestHasher private to the digest crate so that it cannot be used with HashMap. And we'd make the finish method panic for good measure.

There is still an issue here in that we trust H: Hash to handle primitive numeric types by calling Hash instances recursively, which it might fail to do. We could lay the blame for such portability bugs with whoever wrote H: Hash perhaps, but they sound tricky to find.

I could hack up a pull request for this if @newpavlov thinks it wise. I think the first question is : Should the endian-hasher crate live or die?

@burdges
Copy link

burdges commented Jun 29, 2017

I'm tired right now, but I think argument against my alternative goes roughly : You should document your protocols and file formats better by using a more explicit serialization. That could just be a warning somewhere.

@tarcieri
Copy link
Member

tarcieri commented Jun 29, 2017

Sorry to be this blunt, but this is a terrible idea.

The standard interface only allows 64bit outputs but there's nothing stopping extra outpust tailored to specific hashes

What? The trait definition stops this, because types. Nor is there any standard notion of how to map a cryptographic hash function's digest onto a 64-bit integer.

But beyond that, cryptographic hash functions utilized by std::hash::Hasher should be PRFs (i.e. keyed by a secret unguessable by an attacker), because hashDoS. There is no point in using a hash function which is both slow (because crypto) and not a PRF (because security).

The fundamental tradeoff is:

  1. A hash function that is fast as possible but doesn't hash attacker-controlled data. This omits every cryptographic hash function, because they are slow because they act as random oracles.
  2. A cryptographic PRF keyed by unguessable data to prevent hashDoS.

To me, if your goal is #2, anything slower than SipHash is unacceptable, because you still want a std::hash::Hasher to be as fast as possible given other constraints.

Do you disagree? Can you explain the use case where you want something that's simultaneously slow and insecure?

@pedrocr
Copy link
Author

pedrocr commented Jun 29, 2017

@burdges the endianess issues are indeed a problem. If there's no way to fix those it's unfortunate. I'm not sure if I understand why this happens. Are primitive types hashing by feeding the hasher one u8 at a time? Because the Hasher interface includes everything from u8 to u128 so it would seem that as long as Hash implementations use the full type it should be fine. Why doesn't that work?

I'll have a look if your suggestion works for my use case.

@tarcieri What? The trait definition stops this, because types. Nor is there any standard notion of how to map a cryptographic hash function's digest onto a 64-bit integer.

Nothing stops me from doing a .finish_256bit() and have that give me SHA256 of my struct. It's not part of the trait but I don't care I just want to be able to use the Hash API as the plumbing to get all the values to the Hasher.

You're assuming I want to use the normal Hasher API with SHA256 to use in a Hashmap or something like that. That would be a bad idea indeed, SipHash is strictly better in that application. What I need is a longer hash that I can use to uniquely identify a certain struct that I can use over time to identify that struct for caching and serialization. Think of how git uses hashes, content addresses, not how hash tables use hashes. Here's my current code:

https://github.com/pedrocr/rawloader/blob/c0a017590cc94d21416cbe4f4fe1fce84f4daaf6/src/imageops/mod.rs#L206-L225

I generate hashes that represent the state of the image pipeline up to that point as a function of it's settings. I'm currently using MetroHash but I want at least 128bit and a crypto hash for this. I'm only using the Hash trait for convenience of #derive(Hash) and friends. But maybe I just need a #derive(CryptoHash).

@burdges
Copy link

burdges commented Jun 29, 2017

It worse so long as people use Hasher recursively, but if any impl of Hash casts a &[u64] to a &[u8] then it'll break. And someone might wind up doing this for performance if they mistakenly examine performance in unoptimized builds.

You can always locally employ a version of DigestHasher wrapper that makes a cryptographic hash function look like a Hasher, either with or without my endianness fix. I think the question is if my input_le and input_be functions make sense.

In fact, I doubt #[derive(Hash)] has stabilized field order, which provides a much better argument against adding my input_le and input_be functions to the Input trait. You could warn people that they should implement Hash themselves, not derive it.

I think those are the arguments against input_le and input_be : Bad impls for Hash break them. #[derive(Hash)] might not have stabilized field order. And you can document your protocol better by using a more explicit serialization. Iffy but not necessarily fatal. I donno.

@pedrocr
Copy link
Author

pedrocr commented Jun 29, 2017

@burdges It seems the standard implementation does just that:

https://github.com/rust-lang/rust/blob/c6afde6c46d167c9c75389b887f1d2498aeef3e4/src/libcore/hash/mod.rs#L506-L510

So it seems Hash is just too fundamentally broken for the serialization use case. Is there any derivable trait somewhere that can help with that? Something that just recursively hashes the underlying bytes of all types which a fixed endianness would be enough.

@burdges
Copy link

burdges commented Jun 29, 2017

Yikes, I completely missed that! I'm tempted to raise that as a libs issue. In any case, I suppose serde should be considered the right way to do this for now.

@pedrocr
Copy link
Author

pedrocr commented Jun 29, 2017

@burdges That's very interesting how would you use serde for this? All my structs already implement Serialize and Deserialize. There's a standard way to just feed that to SHA256 without first going to YAML or something like that?

@tarcieri
Copy link
Member

tarcieri commented Jun 29, 2017

@pedrocr perhaps something like this is what you want:

https://github.com/cryptosphere/objecthash-rs

Note that I never wrote a procedural macro for it. I also intend on doing something slightly different and a bit more fleshed out soon.

@pedrocr
Copy link
Author

pedrocr commented Jun 29, 2017

@tarcieri yes, this is precisely my use case, thanks. Wouldn't it be easier to just write a serde Serializer that feeds a Digest? That way #derive(Serialize) is all that's needed and any crypto hash can be used?

@tarcieri
Copy link
Member

@pedrocr it may be possible to finagle it through serde visitors.

I would advise against using it for now though. I'll have a better library out soon.

@pedrocr
Copy link
Author

pedrocr commented Jun 29, 2017

@tarcieri it seems objecthash solves a bigger problem of having redactable structs. For me that's not an issue so maybe just writing a Serializer is simpler.

@burdges
Copy link

burdges commented Jun 29, 2017

Yes, even using the bicode crate still involves an allocation, but one could write a Serializer for digests to avoid allocation, maybe even one that agreed with the results of bincode.

As an aside, I'm pretty happy doing all my cryptographic hashing manually so far because frequently if I need a cryptographic hashes then it actually needs data from multiple sources or maybe should not cover all the data present in the struct provided. YMMV

@tarcieri
Copy link
Member

@pedrocr as is, the Rust implementation doesn't support redaction, but yes, redaction/Merkle inclusion proofs are one of the nice features you get out of a hashing scheme like that.

@pedrocr
Copy link
Author

pedrocr commented Jun 29, 2017

@burdges cool, I'll have a look at this then. Would a PR with something like an adapter between Serializer and Digest make sense?

As for doing it manually I was hoping to do it automatically as these structs are specifically designed to be all the settings for the image operation and nothing more. We'll see :)

@burdges
Copy link

burdges commented Jun 29, 2017

I found an easy way : You impl io::Write for a local wrapper struct Inputer<'a, I: 'a + ?Sized>(&'a mut I) where I: Input; and provide a method with a Serialize argument that calls bincode::serialize_into the Inputer wrapper.

It'll look vaguely like what I wrote above, but using io::Write instead of Hasher and serde::Serialize instead of Hash. Now serialize_into should avoid the Vec allocation in bincode::serialize.

Also, your original question could now be : Why not implement io::Write? Or could Input be replaced with io::Write even?

@pedrocr
Copy link
Author

pedrocr commented Jun 29, 2017

@burdges sounds good, avoids having to write a full Serializer. Does do more indirection though.

@pedrocr
Copy link
Author

pedrocr commented Jun 29, 2017

@burdges also is bincode guaranteed to always send the same bytes? Or can it do things like reorder fields between versions? It may be safer to just implement the Serializer to guarantee no funny business happens that invalidates the hash.

@tarcieri
Copy link
Member

@pedrocr you seem to be running afoul of the canonicalization problem. One of the nice things about objecthash-like schemes is you don't need to canonicalize prior to hashing. Structurally similar objects will always have the same hash.

@pedrocr
Copy link
Author

pedrocr commented Jun 29, 2017

@tarcieri see this for my conclusions on the topic:

https://internals.rust-lang.org/t/f32-f64-should-implement-hash/5436/33

My worry in this case is only that the same exact struct will hash differently because bincode changes. Making sure that reordering fields doesn't change the hash for example isn't as important. Since my structs already implement Serialize using that would mean one less derive to worry about.

Having a crate that does derive and allows me to use any hash would be nice though so please do ping me once you have something to test. The other features can't hurt either.

@burdges
Copy link

burdges commented Jun 29, 2017

Yes, serde explicitly supports non-self-describing formats, naming bincode as the example.

There are various instructions for impls of Serialize and Deserialize to support non-self-describing formats, so those can break bincode of course.

Also bincode never emits any thing about the fields besides lengths and contents. I'd imagine bincode would use a feature or a major version update if they broke binary comparability, but who knows.

@pedrocr
Copy link
Author

pedrocr commented Jun 29, 2017

@burdges so hashing bincode does seem fairly reasonable. Will have a look at that then.

@newpavlov
Copy link
Member

newpavlov commented Jul 3, 2017

@pedrocr
Sorry I couldn't join discussion earlier. As I understand, your problem can be solved by a separate crate (bincode + some glue code or using future tarcieri's crate) and you do not currently require any changes in the digest? In that case I think this issue can be closed.

@burdges
Copy link

burdges commented Jul 3, 2017

There is a lingering question as to whether the Input trait should be replaced by std::io::Write or something like that. I haven't thought about it really myself.

@newpavlov
Copy link
Member

newpavlov commented Jul 3, 2017

@burdges
There is two problems with using std::io::Write. First: there is no core::io, so it conflicts with the goal to support no_std whenever possible. Second: its methods return Result while hashes can process any data without an error, so using Write will result in a lot of useless unwraps.

I think the current digest_reader is more than enough for convenient interfacing with the Read implementors.

@pedrocr
Copy link
Author

pedrocr commented Jul 3, 2017

It would also be nice if a Serializer was written so that one could hash straight from serde without using bincode. But that would just be a convenience that should probably be behind a feature flag.

@tarcieri
Copy link
Member

tarcieri commented Jul 4, 2017

Should this issue be closed? I think it's semantically incorrect to implement Hasher for this purpose, and there are plenty of other APIs more appropriate for hashing structured data in a cryptographically meaningful way.

@pedrocr pedrocr changed the title Why not implement Hasher? There should be a standard way to recusively hash structs Jul 4, 2017
@pedrocr
Copy link
Author

pedrocr commented Jul 4, 2017

@tarcieri the core of the issue is still unresolved. I've changed the title to reflect that. Right now the best way to use crypto hashes on structs is serde->bincode->Digest which is far from ideal.

@newpavlov
Copy link
Member

newpavlov commented Jul 4, 2017

@pedrocr
Am I correct that you want to add the following method to Digest trait or something similar?

fn digest_serialized<S: Serialize, F: Serializer>(s: S, f: F) -> Result<Output<Self::OutputSize>, F::Error>;

UPD: edited code a bit, removed bincode method

@tarcieri
Copy link
Member

tarcieri commented Jul 4, 2017

all I want is a recursive crypto hash of arbitrary rust structs

I don't want to serialize and then hash

Then, by definition, you need an algorithm that knows how to compute cryptographic hashes of structured data.

Better make sure it isn't vulnerable to second-preimage attacks.

@pedrocr
Copy link
Author

pedrocr commented Jul 4, 2017

@tarcieri as far as I can tell that only applies if you are hashing individual structs and then hashing the concatenated hashes. I don't want that at all. I just want to initialize the hash, pass in all the values recursively, and finalize the hash. This has the same properties as serializing and then hashing without the performance penalty.

@tarcieri
Copy link
Member

tarcieri commented Jul 4, 2017

You need to domain separate the values somehow, either by length or by structure, and in either case you'll want to encode the types. The former pretty much involves a bincode-like scheme, and the latter an objecthash one.

Otherwise, you'll have ambiguities where different structures compute the same hash.

@pedrocr
Copy link
Author

pedrocr commented Jul 4, 2017

You need to domain separate the values somehow, either by length or by structure, and in either case you'll want to encode the types.

None of these are an issue for me.

Otherwise, you'll have ambiguities where different structures compute the same hash.

That's how Hash already works and is fine for me.

@tarcieri
Copy link
Member

tarcieri commented Jul 4, 2017

Ok, so it sounds like you want to abandon security... in which case why are you using a cryptographic hash?

@pedrocr
Copy link
Author

pedrocr commented Jul 4, 2017

@tarcieri none of these abandon security but even if they did I'm using hashes for caching. All I need is a statistical guarantee of non-collision absent an attacker.

@tarcieri
Copy link
Member

tarcieri commented Jul 4, 2017

If you're using hashes like this in any sort of security context and have ambiguities in a structured hashing format, an attacker can exploit them to trick you into accepting documents containing e.g. nested content and have it verify as if it was the same as a non-nested message. Second preimage attacks on naive Merkle Trees are an example of this, and it's similar in concept to other attacks regarding confusion of message structure such as packet injection attacks.

If you want to build a naive hashing scheme using cryptographically secure hashes for your own purposes, that's fine, but I would be strongly opposed to the inclusion of naive schemes in cryptography libraries, because people generally expect cryptography libraries to be built for security purposes.

@pedrocr
Copy link
Author

pedrocr commented Jul 4, 2017

@tarcieri I suggest you bring this up with the rust lib developers then. All I want is to get a hash calculated the exact same way Hash already works. I only want a crypto hash because 64bits are not enough collision protection for the application. If there's a way to create collisions easily with that kind of interface rust in general is vulnerable to them and that can be exploited to DoS systems at the very least. You haven't given an example of how that would happen though. All the cases you've pointed out are hashes of hashes situations.

@tarcieri
Copy link
Member

tarcieri commented Jul 4, 2017

I suggest you bring this up with the rust lib developers then.

Hash isn't intended to be a cryptographic hash for identifying content. It's for use in hash-like data structures, where collisions are easily tolerated by the underlying algorithms. The main threat to those is hashDoS, where an attacker can deliberately exploit repeated collisions to the same value. The Rust core developers addressed those by using SipHash, a PRF with a random key.

You're talking about creating a CryptoHash trait, which to me would ostensibly be for the purposes of secure content authentication. If you don't want it to be about security, I would suggest taking "crypto" out of the name.

You haven't given an example of how that would happen though.

These same attacks can happen without hashes-of-hashes. As I pointed out the general attack is one of failure to disambiguate message structures. Packet injection is the same basic idea, without any hashing involved whatsoever. That's a parsing ambiguity, but the same concept applies: the security system is confused about the message structure, allowing an attacker to trick it into doing something it shouldn't.

In my opinion if any messages with different structures can realistically produce the same content hash, the scheme is broken from a security perspective.

@pedrocr
Copy link
Author

pedrocr commented Jul 5, 2017

The Rust core developers addressed those by using SipHash, a PRF with a random key.

From what I've seen the current implementation doesn't actually randomize the key. But even if it did that's not relevant if it's trivial to generate a collision by just manipulating the structure.

These same attacks can happen without hashes-of-hashes. (...) In my opinion if any messages with different structures can realistically produce the same content hash, the scheme is broken from a security perspective.

I'd agree I just haven't found a case where that can happen with a setup like that of Hash. I'm curious to see one as I believe it would be a bug in Hash as well (or in some implementations of the trait).

@tarcieri
Copy link
Member

tarcieri commented Jul 5, 2017

From what I've seen the current implementation doesn't actually randomize the key.

Hash is just a trait. Some implementations defend against hashDoS and some don't.

According to the documentation, the implementation used by HashMap specifically implements the randomly-keyed cryptographic hash strategy (and uses std::hash::SipHasher I believe):

https://doc.rust-lang.org/std/collections/struct.HashMap.html

By default, HashMap uses a hashing algorithm selected to provide resistance against HashDoS attacks. The algorithm is randomly seeded, and a reasonable best-effort is made to generate this seed from a high quality, secure source of randomness provided by the host without blocking the program

@pedrocr
Copy link
Author

pedrocr commented Jul 5, 2017

At least on my machine the hash is not seeded. This always returns the same value:

https://gist.github.com/pedrocr/8d0283bed3f56e5cb6fb9fe0a785f947

But that's not relevant to the point as I specifically mentioned. If you can generate structs that have the same hashing because you manipulate the structure it doesn't matter that your hash function is randomly keyed. Within the same execution of the program those structs will colide, just with different values. And that's what would be the bug in Hash.

@tarcieri
Copy link
Member

tarcieri commented Jul 5, 2017

At least on my machine the hash is not seeded.

Use a HashMap and I'll expect you'll see different results. Hash itself is NOT designed to be a security primitive.

And to reiterate yet again, in the context where Hash security matters (i.e. HashMap and SipHasher), we're operating under a very different threat model where collisions are tolerated. Hash-like data structures naturally accommodate collisions, but are susceptible to an algorithmic attack if they rebalance to the same value repeatedly.

You're colluding threat models here: Hash and CryptoHash operate under different threat models.

In the one I would find acceptable for CryptoHash, collisions for attacker-controlled documents are not acceptable.

I'm not sure this is continuing to be a productive conversation: I would like the threat of ambiguous messages to be solved strategically for anything named CryptoHash. This is a well-studied problem in cryptography with a limited number of solutions.

I would find a hand-rolled naive and ambiguous scheme unacceptable. You haven't made a concrete proposal, but you're downplaying everything you need to do to solve these problems from a security perspective, and that's why I'm worried.

@pedrocr
Copy link
Author

pedrocr commented Jul 5, 2017

Hash-like data structures naturally accommodate collisions, but are susceptible to an algorithmic attack if they rebalance to the same value repeatedly.

And that's the attack that if it works with CryptoHash will also break Hash. HashMap tolerates collisions is they are fully random. After all it only has 64 bits of key. But if it's possible to generate a collision by just manipulating structure Hash and CryptoHash are equally broken.

I would find a hand-rolled naive and ambiguous scheme unacceptable. You haven't made a concrete proposal, but you're downplaying everything you need to do to solve these problems from a security perspective, and that's why I'm worried.

My proposal is very simple. Do exactly what Hash does but feed it to a crypto hash. I've yet to see an attack that breaks that but it may exist. If it does Hash should probably be fixed as well. So far the only case I've seen that's strange is that hashing a zero sized struct is a no-op. Not even that is exploitable in rust as far as I can tell. Other than that things like ("foo","bar") and ("foobar","") hash to different values already. Do you have an example where this breaks down?

@tarcieri
Copy link
Member

tarcieri commented Jul 5, 2017

But if it's possible to generate a collision by just manipulating structure Hash and CryptoHash are equally broken.

Incorrect. Again, hashDoS is an algorithmic attack that relies on the attacker being able to find large numbers of collisions. Finding a single collision is not catastrophic, because it doesn't help the attacker that much.

Furthermore, the interesting cases for hashDoS are HashMap<T, _> for a single type T. For typical Hash cases we don't need to worry about collisions across types.

With a CryptoHash, which calculates a content hash, any single collision is catastrophic, and we aren't constrained by T and have to worry about collisions across types.

These are totally different threat models you are trying to collude. Hash is not designed to solve the problem you're proposing.

My proposal is very simple. Do exactly what Hash does but feed it to a crypto hash. I've yet to see an attack that breaks that but it may exist.

The onus is on you to show why your scheme is secure. Otherwise you're shifting the burden of proof. Your (ill-defined) scheme is not secure simply because I haven't found an attack. You need to be able to answer questions like: how is the scheme domain separated everywhere type-by-type? How does it distinguish a Vec<u8> containing the data that would be hashed for some struct from the struct itself?

A secure scheme will be demonstrably unambiguous and free of collisions for arbitrarily structured messages. Anything less is insecure.

This is a much higher bar than Hash.

@pedrocr
Copy link
Author

pedrocr commented Jul 5, 2017

With a CryptoHash, which calculates a content hash, any single collision is catastrophic, and we aren't constrained by T and have to worry about collisions across types.

If you care about collisions across types then yes, it's easy to generate collisions that are benign in Hash but break CryptoHash. Again, not my use case, but I can see how you'd want that in general and if there's a simple way to do that then great. objecthash doesn't currently do that completely though as it hashes the same structure of fields to the same value independently of type. So objecthash(Person{id:0}) == objecthash(Dog{id:0}). Generating extra colisions with Hash isn't trivial actually but can be done with things like hash((0u16,0u16)) == hash((0u8,0u8,0u8,0u8)). That's just a trivial bug (i.e., since Hash doesn't care about different types it doesn't do the same as with Vec and also hash the length). But if there's a standard way to fix that all the better. I've had a second look at objecthash and opened a bug report on it.

@pedrocr
Copy link
Author

pedrocr commented Jul 5, 2017

After looking at objecthash what it does is create a hash that's interoperable between languages and thus has some situations that generate collisions on purpose:

  • First{id: 0} and Second{id: 0} will hash to the same value even if one uses u32 and the other u64.
  • Different unicode strings that normalize to the same value have the same hash
  • Structs that have the same fields but in different order will hash to the same value

This makes it easier to have values for which objecthash(a) == objecthash(b) and yet myfunc(a) != myfunc(b). For some applications this is not ideal. Given this I'd say it would make sense to have a simpler scheme that doesn't have the issues of Hash (e.g., tuples hash the length) but just does the hashing of all fields in order with all the contents.

@Trojan295
Copy link

Trojan295 commented Jul 5, 2017

hash((0u16,0u16)) == hash((0u8,0u8,0u8,0u8))

I would argue, if it's a bug, cause the byte representation of the data is the same. Hash functions work on bytes, not data structures.
Let's say you have a vector in Rust and in Java, which are holding exactly the same data. A hash of those should be the same in both languages, even that the underlying implementation of vectors could be different.
If we couple a hash function with the implementation of the data structure in Rust, we could make, that this hash would be usable only in Rust. What's even worse, changes to the compiler could affect the resulting hashes.

@pedrocr
Copy link
Author

pedrocr commented Jul 5, 2017

@Trojan295 I'm not sure if you're arguing that it's a bug or not. (0u16, 0u16) and (0u8, 0u8, 0u8, 0u8) are not holding the same data, one has 2 zeroes the other 4. The fact that end generating 4 zero bytes in memory in both cases is an implementation detail. And rust already does hash([0u16, 0u16]) != hash([0u8, 0u8, 0u8, 0u8]) because the Vec length is already hashes. Only the tuple length isn't.

@Trojan295
Copy link

Are you trying to use those structured hashing functions only for internal Rust purposes like HashMap or do you want to make them usable in cryptographic manner? For such internal use that ok, but hashes used in cryptography are mostly send to other entities. Now if you would define a custom data structure in Rust and apply some hash on it, then the other entity would need to know how did you calculate the hash of this structure.

Let's take this Vec. Don't know how it's done in Rust, but let's assume that the length is appended to the data and hashed. Then the guy on the other side of the wire needs to know, how you build the byte stream, that was hashed (so that you appended the length, and not for ex. prepended). It's not simple to unify this across multiple parties.

Generally, I don't think such feature is required in case cryptographic hashes, as in case of crypto you mostly operate on numbers/bytes and not custom data structures. The way of hashing needs to be well known and unified.

@tarcieri
Copy link
Member

tarcieri commented Jul 5, 2017

@Trojan295 short of Hash changing to a const-dependent pi-typed interface, this wouldn't work for things like HashMap, which needs a different type signature.

But as I've previously stated, and if it's the thing I have a bug up my butt about, it's conflating security domains and concerns, and that's really what I want to avoid.

I am a huge fan of something like a CryptoHash scheme and have already implemented one in Rust and plan on designing and implementing another. I just want to make sure it covers all of the concerns I have. Those concerns are orthogonal to what Hash presently provides.

@newpavlov
Copy link
Member

I will close this issue and will create a separate issue in the RustCrypto/utils repository. Crate for cryptographically hashing structs would be a great addition to the project, but as I stated earlier it's better to do as a separate crate. I think concerns raised by @tarcieri are good ones and should be addressed in this crate.

@burdges
Copy link

burdges commented Jul 13, 2017

I think a utility crate could provide a wrapper type struct HashWriter<'a, I: 'a + ?Sized + Input>(&'a mut I) that impls io:Write along with input_le, input_be, etc. We cannot make it totally generic over serde serializers anyways because most lack bincode::serialize_into, but folks can work around that on a case by case basis.

@newpavlov
Copy link
Member

@burdges
Could you please specify for that use cases io::Write impl is needed and current digest_reader is not enough?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants