Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hasher2Rng #627

Closed
burdges opened this issue Oct 7, 2018 · 7 comments
Closed

Hasher2Rng #627

burdges opened this issue Oct 7, 2018 · 7 comments

Comments

@burdges
Copy link
Contributor

burdges commented Oct 7, 2018

We need rejection sampling in probabilistic data structures whose size is not a power of two, so maybe just using rand for the sampling side makes sense.

We could do this in a fully cryptographic setting by using any cryptographic hash function like Blake2b that outputs a [u8; 32] seed for ChaChaRng, but I'm curious if anyone has a nicer scheme?

Actually, we only need SipHasher like protection for the key, not full collision resistance, so I'm unsure about optimal primitives choices here. Thoughts?

It's seemingly fine to go from SipHasher128 to ChaChaRng by padding with either zeros or another hash key, this sounds messy.

I'm also curious if there is a cleaner form for the cryptographic setting: We could impling RngCore for some SHAKE variant. I think many Keccek variants are faster than SHA3 but I'd still kinda expect this to be slow. Is anyone aware of a nice XOF like transformation from Blake2b to ChaCha? It's likely nicer to anyone re-implementing this in another language to just seed with a [u8; 32], probably just as fast, but I'm curious if anyone tweaked this further.

@burdges
Copy link
Contributor Author

burdges commented Oct 8, 2018

After slightly more thought, there is seemingly nothing rand should do to simplify this.

At least for probabilistic data structures that do not store any key, then our lowest level interface need only map from a borrowed key to an Rng. We give users maximum flexibility if they provide this "hasher" map as a closure, so we might have:

pub struct CuckooTable<K, F, R, V, BS>
where K: ?Sized, F: Fn(&K) -> R, R: Rng, BS: BackingSotrageTrait
{ .. }

impl<K, F, R, V, BS> CuckooTable<K, F, R, V, BS>
where K: ?Sized, F: Fn(&K) -> R, R: Rng, BS: BackingSotrageTrait 
{
    pub fn with_hasher(f: F) -> Self { .. }
    ...

Above this, you want some user friendly approach like RandomState, which might resemble:

pub struct RandomState(u128);

impl Default for RandomState {
    fn default() -> RandomState {
        thread_local!(static KEYS: Cell<[u128]> = {
            Cell::new(thread_rng().gen())
        });
        KEYS.with(|keys| {
            let rs = KEYS.get();
            KEYS.put(rs.wrapping_add(1));
            RandomState(rs)
        });
    }
}

impl<K> Fn<(&K), ChaChaRng> for RandomState 
where K: ?Sized+Borrow<[u8]>
{
    extern "rust-call" fn call(self, arg: (&K)) -> T {
        use blake2::{Blake2b, Digest};
        let mut hasher = Blake2b::new_keyed(self.0.to_le().to_bytes(), 32);
        hasher.input(arg.0.borrow());
        ChaChaRng::from_seed(array_ref!(hasher.result(),0,32));
    }
}

impl<K, V, BS> CuckooTable<K, RandomState, ChaChaRng, V, BS>
where K: ?Sized+Borrow<[u8]>, BS: BackingSotrageTrait
{
    pub fn new() -> Self {
        Self::with_hasher(RandomState)
    }
}

We've slightly more boilerplate from Fn anytime we create this hasher closure manually to make it polymorphic or give it a nameable type, but users can specify arbitrary one off hasher closures super easily.

@burdges
Copy link
Contributor Author

burdges commented Oct 8, 2018

I'll close this but feel free to express opinions on my questions from my opening comment.

@burdges burdges closed this as completed Oct 8, 2018
@dhardy
Copy link
Member

dhardy commented Oct 8, 2018

We need rejection sampling in probabilistic data structures whose size is not a power of two, so maybe just using rand for the sampling side makes sense.

You want an RNG...

We could do this in a fully cryptographic setting by using any cryptographic hash function like Blake2b that outputs a [u8; 32] seed for ChaChaRng, but I'm curious if anyone has a nicer scheme?

... you have some unmentioned random state you wish to convert to a ChaChaRng seed.

Actually, we only need SipHasher like protection for the key, not full collision resistance, so I'm unsure about optimal primitives choices here. Thoughts?

You state that you don't need collision resistance, so I presume an attacker cannot control the random state used? In this case do you even need a cryptographic hash function? I have no idea what you are trying to achieve. Where does your random state come from?

It's seemingly fine to go from SipHasher128 to ChaChaRng by padding with either zeros or another hash key, this sounds messy.

I believe padding with zeros is fine so long as the 128-bit output of SipHasher128 is sufficient. Alternatively some construction like in #554 theoretically allows usage of the full internal state, but this is a novel variation without any real cryptanalysis.

I'm also curious if there is a cleaner form for the cryptographic setting: We could impling RngCore for some SHAKE variant. I think many Keccek variants are faster than SHA3 but I'd still kinda expect this to be slow. Is anyone aware of a nice XOF like transformation from Blake2b to ChaCha? It's likely nicer to anyone re-implementing this in another language to just seed with a [u8; 32], probably just as fast, but I'm curious if anyone tweaked this further.

I think this is vaguely what I tried to do in #554, but using SipHash without too much thought for cryptographic strength (I believe it should be as strong as the original with 1-2 u64 outputs; with four u64 outputs the output and state sizes are equal hence essentially there is a (probably) bijective output function, and the strength of the mixing functions is more important to prevent reversals).

Keccak is based on a sponge construction allowing variable length output and so should be straightforward to implement RngCore for, and is designed to be secure for arbitrary length outputs. The sha3 lib already implements most of this, e.g. Shake128 impls ExtendableOutput, though it doesn't implement RngCore. So probably you can simply do:

let mut hash = Shake128::default();
hash.process(input_data);
let mut seed = <ChaChaRng as SeedableRng>::Seed::default();
hash.xof_result().read(&mut seed);
let mut rng = ChaChaRng::from_seed(seed);

@burdges
Copy link
Contributor Author

burdges commented Oct 8, 2018

Yes, there would be random state for DoS protection, like in the pseudocode I wrote. Also my Fn(&K) -> impl Rng interface does not enforce that state's existence well, although it's likely less bad than std::hash::BuildHasherDefault.

I think #554 sounds optimal, but it'd need cryptoanalysis. Thanks!

@dhardy
Copy link
Member

dhardy commented Oct 8, 2018

So you have random state, and wish to use it to seed a CSPRNG used for DoS protection. Using a hash function with variable length input and output is a convenient way to convert the input state to the required seed type — but, cryptographically, is it any stronger than converting the state via transmutation with truncation or zero-extension as required?

If the same random state is used for multiple seeds, then some mechanism needs to be used to ensure each one is unique (continuing to read from the same parent RNG / extensible-output hash, or perhaps overwriting the state with hash function output). But otherwise I'm not sure that using a hash function to scramble the state actually helps.

Maybe your RandomState should just be a pre-seeded ChaChaRng?

@burdges
Copy link
Contributor Author

burdges commented Oct 9, 2018

Yes DoS protection. And the F: Fn(&K) -> impl Rng is the keyed hash function.

Yes if K is shorter than 32 bytes then a perfectly reasonable F is the closure in

let ct_key : [u8; 32] = thread_rng().gen();
let mut ct = CuckooTable::with_hasher(|k| k.iter().zip(ct_key).map(BitXor::bitxor).collect::<ArrayVec<[u8; 32]>>().into_inner())

@dhardy
Copy link
Member

dhardy commented Oct 9, 2018

Yes, perhaps.

BTW rng.gen() is still horribly inefficient on u8 arrays until #269 gets solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants