Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SmallRng algorithm #603

Closed
dhardy opened this issue Sep 7, 2018 · 19 comments
Closed

SmallRng algorithm #603

dhardy opened this issue Sep 7, 2018 · 19 comments
Labels
E-question Participation: opinions wanted

Comments

@dhardy
Copy link
Member

dhardy commented Sep 7, 2018

This type is introduced as follows:

An RNG recommended when small state, cheap initialization and good
performance are required. The PRNG algorithm in SmallRng is chosen to be
efficient on the current platform, without consideration for cryptography
or security
. The size of its state is much smaller than for StdRng.

There is general consensus that the current algorithm, Xorshift, is not an ideal choice due to quality (poor results in PractRand and BigCrush) and trivial predictability (to the point that seeding one instance from another effectively creates a clone). Ideally we should replace this for the 0.6 release.

Existing work:


I'm creating this new issue for visibility (most of the previous discussions happened on a fork) and to refresh this issue.

(First note: "reproducible" is used here to mean deterministic, portable, and a commitment not to tweak the algorithm in future updates. All reproducible PRNGs should eventually be migrated out of Rand proper to sub-libs, as discussed in dhardy#58 and #431.)

Question 1: is the current API with two non-reproducible PRNGs, StdRng and SmallRng, the right one? As I understand the current choice is reasonably well accepted, although there have been suggestions to add more categories of PRNG.

Question 2: which algorithm should we use for SmallRng? (The choice may vary by platform.)

@dhardy dhardy added E-question Participation: opinions wanted X-discussion labels Sep 7, 2018
@dhardy
Copy link
Member Author

dhardy commented Sep 7, 2018

I should ping @pitdicker @vks @huonw @Lokathor @Ichoran who discussed this in the past and @huonw.

dhardy#60 starts with a list of potential requirements for a general-purpose PRNG:

  • minimum cycle length and/or number of streams
  • minimum state and key size
  • quality: passing PractRand and BigCrush (though some have suggested that certain failures should be allowed)
  • high speed
  • low memory usage
  • not trivial to predict
  • some theoretical basis
  • applicable licence

There is some question of the relative importance of each requirement (small PRNGs are all about trading off quality for speed and low memory usage), but the above is a good starting point.

@dhardy
Copy link
Member Author

dhardy commented Sep 7, 2018

We can, if need be, select a different algorithm for Intel x86, x86_64, ARMv7, ARMv8 etc.

It is not so easy to select a different algorithm for u32 generation vs u64 or fill_bytes performance; in theory we could provide e.g. SmallRng32 and SmallRng64, but most likely a single simple choice is better. Users can if necessary benchmark their particular problem on their platform and choose an appropriate algorithm themselves.

@vks
Copy link
Collaborator

vks commented Sep 14, 2018

For now, I would just replace xorshift with PCG or xoshiro. In the long term, we might want to use a SIMD/AES-NI based RNG on platforms where that is possible.

@Lokathor
Copy link
Contributor

You can do a pcg32 as a single 64-bit number. Going smaller than that makes your quality drop off a cliff though.

@dhardy dhardy mentioned this issue Sep 14, 2018
28 tasks
@dhardy
Copy link
Member Author

dhardy commented Sep 22, 2018

As @vks says, lets just select something reasonable for now. From this summary, lets use pcg_xsl_128_mcg for x86_64, and pcg_xsh_64_lcg for x86. These aren't the fastest but never show terrible performance. Note that of the other options here, Xorshift is only listed for performance comparison, SFC is interesting but could do with further analysis, and xoroshiro_mt_64of128 is put together by @pitdicker and deserves further analysis before use. Xoshiro was not analysed here but should be, along with plain 128-bit MCGs and LCGs, but that is a job for another day.

So, a job list, open to all takers:

  • create a rand_pcg crate with the above two algorithms, probably based on this code with cross-checking against @imneme's code
  • make SmallRng a wrapper around the above algorithms, depending on platform
  • (optional) create a rand_sfc crate

Please make these crates sub-directories of the small-rngs repo.

@Lokathor
Copy link
Contributor

Consider that you have 32-bit machines in some places still, so you should probably pick a smaller gen on those machines, the 128 bit based generator is just too much overkill.

@dhardy
Copy link
Member Author

dhardy commented Sep 22, 2018

Yes, the 128-bit generator appears to have terrible performance on x86, that's why I suggested we only use it on x86_64 (or 64-bit in general).

@Lokathor
Copy link
Contributor

Ah, right. Forgot the naming for a moment. The only 32-bit machine I've used in ages is rpi boards and similar, which is all 32 bit ARM.

@vks
Copy link
Collaborator

vks commented Sep 22, 2018

Consider that you have 32-bit machines in some places still, so you should probably pick a smaller gen on those machines, the 128 bit based generator is just too much overkill.

There are decent generators with 128 bits of state that use 32 bit arithmetic (i.e. xoshiro). 64 bits of state is not enough for massively parallel applications.

@Lokathor
Copy link
Contributor

I'm fully familiar. PCG32 is in fact one such thing.

@dhardy
Copy link
Member Author

dhardy commented Sep 23, 2018

There are decent generators with 128 bits of state that use 32 bit arithmetic (i.e. xoshiro). 64 bits of state is not enough for massively parallel applications.

The one suggested has 127 bits. It relies on PCG streams to achieve this so not ideal, but I think this is acceptable for now.

@Lokathor
Copy link
Contributor

You can also have a PCG32 variant that automatically jumps from stream to stream at some pre-determined point (such as seed = 0), effectively dumping all 127 bits of state into a single stream.

Then again, if you have an important and massively parallel operation I don't think that you'll be using the least capable RNG that still functions that your platform offers.

@dhardy
Copy link
Member Author

dhardy commented Sep 24, 2018

We don't need 127 bits of state in a single cycle. But for parallel usage, it's somewhat important that there is a low probability of two randomly-seeded instances are not close to one another, and a 64-bit seed is on the small side here.

@Lokathor
Copy link
Contributor

Lokathor commented Sep 24, 2018

So, here's your options for Pcg32:

  • 64 bits of state and a baked in stream selection. This keeps the generator extremely small but obviously there's concerns to be had in some situations.
  • 64 bits of state and 63 bits of stream selection (EDIT: you can even use a NonZero type here to let the compiler use part of that final bit for niche optimization). This is plenty flexible if you want to fork a generator into another thread.
  • The exact same generator as above but every time the seed lands on 0 you also jump the stream in some cyclic way, thus effectively making it 127 bits of state.

If none of those are what you want for SmallRNG, then you just don't want PCG32 as your SmallRNG. I'm sure that someone might pick to have as small a state as possible without regard to massively parallel operations. It might be best to worry less about picking particular RNGs for particular specially named RNG types, and just provide all the RNGs you can muster (that are reasonably effective) and then list some pros on each one and then have the rand::rngs have a little table that says "if you want the smallest one we suggest X, if you want to do Y then we suggest Z` and so on. Which is deeply subjective and all that, but it's a lot easier than fretting too much over what's best for every possible use case ever.

@dhardy
Copy link
Member Author

dhardy commented Sep 25, 2018

@Lokathor this has all been discussed in various places. #431 has some plans for additional generators and we have a new repo to house these: small-rngs. @pitdicker evaluated the period and chance of overlap last November (see also requirements at the beginning of this issue: ~2^50 words cycle length, ~128 bits state size).

Yes, we want the middle option you suggested.

@dhardy
Copy link
Member Author

dhardy commented Sep 27, 2018

FYI: I've started on the rand_pcg crate

@coltfred
Copy link
Contributor

coltfred commented Sep 27, 2018

@dhardy I didn't see your post here until now. I've also pushed up a rand_pcg crate which just contains the code @pitdicker wrote, while adjusting it to only depend on RngCore. https://github.com/coltfred/small-rngs/tree/pcg_algos

@dhardy
Copy link
Member Author

dhardy commented Sep 27, 2018

Sorry, I actually started a couple of days ago but it took longer than I expected. Perhaps you could take a look at my PR? rust-random/small-rngs#4

@dhardy
Copy link
Member Author

dhardy commented Jan 28, 2019

This has now been implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E-question Participation: opinions wanted
Projects
None yet
Development

No branches or pull requests

4 participants