Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SeedableRng::Seed and from_hashable #62

Closed
wants to merge 10,000 commits into from
Closed

SeedableRng::Seed and from_hashable #62

wants to merge 10,000 commits into from

Conversation

dhardy
Copy link
Owner

@dhardy dhardy commented Nov 24, 2017

Edit: PR no longer shows the code thanks to our rebasing of Rand history, but the commit is still available here.

Implement seeding via hash function, as discussed in #18.

This is a very rough implementation.

To discuss, regarding the hash function:

  • SeaHash is exposed for general usage, simply because it can be (if we're going to do this, we must stabilise it, so there's no worry that it will be replaced). This is a little odd, but seems sensible on the whole.
  • We could hide SeaHash and from_hashable behind a feature gate (hash or seahash) until we're more comfortable with the idea — that allows stabilising rand-core without this feature, in theory. But hopefully we don't need to do this.
  • More AsBytesFixed impls are needed, e.g. for u64
  • I'm unsure that AsBytesFixed is the best way of going about this. I'm considering moving the hashing into the trait implementation with trait Hash<H> { fn hash(state: &mut H, val: Self); } or similar. This allows specialised implementations of SeaHash.
  • How general do we make our SeaHash impl? I mean, do we allow pushing multiple items to it? This is functionality we don't need but requires very little extra code and possibly a small performance hit, but it does make the SeaHash impl we expose significantly more useful to other users. (We could also consider whether from_hashable should be more general, e.g. consume a SeaHash struct.)

Regarding SeedableRng:

  • Making Seed an associated type is a compromise; SeedSize associated constant doesn't appear to work; we do however restrict Seed to types supported by Finalize
  • Making Seed a u64 array instead of u8 array is certainly more convenient for us; any reason we shouldn't do this?
  • The new SeedableRng::from_seed impls for Isaac aren't 100% backwards compatible (old version allowed much larger seeds).

Lots to discuss, but I think this is moving forward. (@ticki if you get the time, I'd love your thoughts.)

dhardy and others added 30 commits September 13, 2017 09:15
Also removed SeedableRng impl for IsaacWordRng: it's not reproducible across platforms
These changes make it possible to sample from closed ranges, not only from open.

Included is a small optimisation for the modulus operator, and an optimisation
for the types i8/u8 and i16/u16.
The license declaration in the README is non-specific. I think this is a hold-over from extraction from the Rust repo. The Rust repo has a file that details the other licenses involved. I scanned through this code and most of it has a rust standard mit/apache header. Some files have no header, and could be under BSD, but if that's the case, that specific license text needs to be added somewhere to this repo.
Update README.md license section
fuchsia: magenta was renamed zircon
Allow sampling from a closed integer range
This makes documentation work correctly with the new pulldown-cmark
Markdown parser (rust-lang/rust#44229).
Fix formatting warnings with commonmark enabled
I have implemented it as a function instead of a trait, as that makes it easy to
add it to every RNG ont at a time.

I split the `init` function in two instead of the current version that uses a
bool to select between two paths. This makes it more clear how the seed is used.
The current `mix` macro has to be defined in the function, and would have to be
duplicated. Therefore I converted it to a seperate function.

I precalculated the values a...h, but am not sure this is a good idea. It makes
the resulting code smaller, and gives a small performance win. Because it are
'magic' values anyway, I thought why not?
Also moved the `impl_uint_from_fill` macro from `os.rs` to `randcore`. I had to
modify its error handling anyway, and it is shared with `OsRng`.
@dhardy
Copy link
Owner Author

dhardy commented Dec 20, 2017

I found a way to move from_hashable out of rand-core, which I think is preferable for most people. It does unfortunately mean users have to use rand::FromHashable; now, but I guess we can live with that.

@dhardy
Copy link
Owner Author

dhardy commented Dec 31, 2017

Updated: rebased and squashed. I still have the old history locally, but don't see much value.

@pitdicker
Copy link

High time to get this merged 😄. Moving from_hashable out of rand-core seems like a good idea!

How confident are you in the custom finalizers? And is there anything specific I should look at?

I would like it if the RNG tests would only depend on things in rand-core. Especially if they are to be split out into another crate. Does it make sense to add a test_stdng_construction test in rand instead?

@dhardy
Copy link
Owner Author

dhardy commented Dec 31, 2017

Actually, I think I will replace SeaHash with MetroHash; it's better known and reviewed and has similar performance, as well as native 128-bit output and 256-bits of state (IIRC), and a permutation function which should make it relatively easy to output any amount of state. But I didn't get that done yet.

@pitdicker
Copy link

If we go with 128-bit hashes, there is a lot more choice. MetroHash, CityHash, FarmHash, MurmurHash, SpookyHash, maybe more.
I think that it is good to look at the performance, but that simplicity is more important. Maybe I am wrong, but it seems to me MetroHash, CityHash and FarmHash (and maybe xxHash) are part of a series of hashes that come and go. It would be nice if we could pick something that does not seem badly outdated over ~5 years. I have a little preference for Murmurhash3 there.

This repro contains a list of hashes with benchmarks: https://github.com/rurban/smhasher

128-bit MetroHash and MurmurHash3 seem about evenly matched for small inputs. I wonder what their performance on x86 is.

@dhardy
Copy link
Owner Author

dhardy commented Jan 1, 2018

That repo benchmarks hash functions by throughput on reasonably large data sizes. We don't care much about that. It also doesn't say much about security (other than some stuff about hash tables which I don't even think is correct).

I agree that simplicity is fairly important; it seems most hash functions innards aren't that complex, but the input/output functions can get complex to handle multiple input sizes, optimise special cases, and some other functionality.

Hmm, this conversation is split between two threads?

@pitdicker
Copy link

That repo benchmarks hash functions by throughput on reasonably large data sizes.

If you click on the name of the hash, there are also the results for hashing 1, 2, 3, 4, etc. bytes. And the last column is useful for us, because that shows if it is statistically ok. We don't really have security considerations here, right?

Hmm, this conversation is split between two threads?

Sorry 😄 Seemed to fit there. Link for others that may read this: #18 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.