-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Requirements for a small fast RNG #60
Comments
Evaluation of Bob Jenkins small fast RNG.This falls in the category of chaotic RNGs. It is reported to pass PractRand and BigCrush. There exist three variants: a 32-bit version, a 64-bit version, and 32-bit version with three rotations. License is public domain. Speed and memory:
x86:
The performance of jsf is not great. Note that I did not optimise it heavily. But the baseline Xorshift128/32 is not all that fast already, and jsf has in many situations only ~50% the performance of that. The state size is 128 bits for the 32-bit variant, and 256 bits for the 64-bit variant. Initialization and cycles. PractRand includes a variant that can take 128 bits as seed. This opens up the implementation to short cycles, of which only 6 are known until know. But only cycles of length 0 are explored, there could be many more. So adjusting the initialization routine to take a larger seed does not seem wise. There is some probabilistic guessing around 32-bit seeds and how long it takes for one seed to run into another seed, but it does not sound very convincing to me. The shortest cycle is expected to be 2^94, but this is unproven. Conclusion |
Evaluation of a small RNG by Geronimo JonesI am not sure what the basis of this RNG is. It is reported to pass PractRand and BigCrush. The RNG is mainly designed for 64-bit architectures. License is GPL v2 or v3. Speed and memory:
x86:
Performance is pretty bad, especially on 32-bit architectures. I am not sure why, because it does only a few operations, that should even be somewhat parallelizable. The assembly looks plausible to me, but LLVM tries to fit 4 u64 into three registers. Initialization and period. Conclusion |
Evaluation of a Small Fast Counting RNGThe sfc RNG is designed by Chris Doty-Humphrey, the author of PractRand. It is reported to pass PractRand and BigCrush. There are variants of the RNG for 16-, 32- and 64-bit architectures. License is public domain. It is described as: "It's basically a good small chaotic RNG driven by a bad smaller linear RNG. The combination gives it the strengths of each — good chaotic behavior, but enough structure to avoid short cycles." Speed and memory:
x86:
Sfc has extremely good performance on 64-bit architectures. Of the RNG's I have collected, the 64-bit version is actually the fastest RNG to generate u64's, and the 32-bit version the fastest for u32. Only on 32-bit architectures there are Xorshift variants that can outperform it, but only one that also does not fail PractRand and BigCrush. Also an advantage over some other PRNGs is that it does not do multiplication, which can be slow on some architectures. The state includes 3 u32's (u64 for the 64-bit variant) and a counter. Initialization and period. The 32-bit variant has 2^128 states, the cycle length should on average be 2^127. But only a minimum of 2^32 is guaranteed, so consuming more than 2^32 results is maybe not advisable. The 64-bit variant has 2^256 states, with a cycle length of on average 2^255. A minimum cycle length of 2^64 is guaranteed, thanks to the counter. That makes it good enough for any use. Conclusion The minimum guaranteed cycle length of the 32-bit version is to short, but I wonder if that guarantee is not overly conservative. On 32-bit architectures there may be faster RNGs. |
Evaluation of XorshiftThe Xorshift RNG is designed by George Marsaglia. It performs spectacularly bad on PractRand and BigCrush, and should really be used with an output function. There are variants for 32- and 64-bit architectures, but based on the design principles more are possible. License is public domain. Speed:
x86:
Performance is extremely good, and can even be improved a little by not doing the three xorshifts in order. They can be parallelised by doing the first xorshift of the next round while doing the second two xorshifts of the current round. Memory and period Conclusion |
Evaluation of Xorshift128+ and Xoroshiro128+Xorshift+ is an improvement over Xorshift and XSadd by Sebastiano Vigna. Xoroshiro+ is again an improvement over Xorshift+ by David Blackman and Sebastiano Vigna. Although there are claims they pass BigCrush, it actually fails that and PractRand. The author of PCG visualizes Xoroshiro, and it seems they are not such a big improvement over Xorshift as they are sold. The RNG's are optimised for 64-bit architectures. License is public domain. Speed:
x86:
Performance is very good on x86_64. On x86 it is ok, but native 32-bit RNGs can be faster. Memory and period Conclusion |
Evaluation of PCG Permuted Congruential GeneratorsPCG is a big improvement over the LCG class of RNGs by Melissa O'Neill. There are a large number of variants, because different base generators and different permutation functions can be combined. PCG is designed to be statistically very good, and passes PractRand and BigCrush. Variants suitable for 32 and 64-bit architectures are available. License is public domain. Probably the most interesting combinations are:
PCG also has the samewhat rare feature of easy seeking in the output stream. Speed:
x86:
Performance is pretty good on x86_64. It is best to choose between a 64/32 and 128/64-variant depending on whether you need to generate u32's or u64's. On x86 it is ok-ish, but there are not all that many RNG's of good quality that are faster. PCG-XSL-RR 128/64 (MCG) performs exceptionally bad on x86, because it needs to do 128-bit multiplication. Memory and period The 128/64 variant uses 128 bits of state. In my opinion, support for streams adds little to this variant, and performance can be improved by more than 30% by removing it. That means using a multiplicative congruential generator as basis. That brings the state to 128 bits, and the period to 2^126. For initialization 2^48 RNGs can be created before the chance of duplication gets high. Conclusion |
Evaluation of Velox 3bA small generator by Elias Yarrkov. It is designed for 32-bit architectures. License is public domain. Speed and memory:
x86:
I don't really understand how this RNG works. It uses two arrays of 4 u32's internally, so 256 bits of state. It does not generate one value at a time, but in blocks of 4. This is uncommon for small RNGs, and more like a cryptographic RNG. Initialization and period The minimum period for 128-bit blocks is guaranteed to be 2^128, with an expected average period of 2^255. Conclusion |
Looks to me like V3b has the simplest state-advance possible (increment by 1), and a complex permutation function, hence the period is exactly 2^128 steps (each with 4 outputs, so 2^130). Edit: no, that's not it at all. The 128-bit state |
I had trouble understanding that line, but that is exactly it 👍 |
Evaluation of XSM Xor-Shift and MultiplyXSM is designed by Chris Doty-Humphrey (the author of PractRand) with the goal to support random access aka seeking. It passes PractRand and BigCrush. Variants suitable for 32 and 64-bit architectures are available. License is public domain. XSM is revised several times already, and another revision is coming up for PractRand 0.94. The idea behind XSM is that it uses an LCG as base generator, and combines that with a relatively complex non-linear output function. I am not sure whether the LCG part is still true, it seems the base generator is more a combination of "creative" additions. I am not sure there is enough structure to it to be able to talk about periods. The Xor-Shift part is also gone after some revisions. Speed:
x86:
Conclusion |
Evaluation of Middle Square Weyl Sequence RNGThe Middle Square Weyl Sequence RNG is designed by Bernard Widynski. It reportedly passes BigCrush, with get suspected in PractRand after 4gb, and fails after 8gb. It operates on 64-bit words. License is GPL for the software package, but I am not sure that applies to the paper. It combines the middle square method from John von Neumann with a Weyl sequence. Speed and memory:
x86:
MSWC has extremely good performance on both 32- and 64-bit architectures, outperforming al others tested so far (Also Xorshift and Xoroshiro). The state includes 64 bits for the Middle Square state, and 64 bits for the Weyl sequence. It also needs to store one u64 for the current stream. Initialization and period. Conclusion |
SFC cycle lengthAccording to the paper Chaotic Random Number Generators with Random Cycle Lengths for chaotic RNGs it is a good idea to talk about the chance to get a short cycle, because guarantees about minimal cycle lengths can often not be given. The 32 bit version of the Small Fast Counting RNG (SFC) provides a minimum cycle guarantee of 2^32, thanks to the combination with a 32-bit counter. Every cycle length of the chaotic part is multiplied by this number. To get a minimum worry-free amount of 2^50 results from the RNG, we need to know the chance to get a cycle from the chaotic part of 2^50 / 2^32 = 2^18. SFC has the good property that every state change is invertible. This implies two different states can never have the same next state. Once the state becomes the initial state, the cycle is closed. For an invertible chaotic RNG the chance to get a cycle of 2^18 with a state of 96 bits is simply: 2^18 / 2^96 = 1 in 10^23. So while we can't give an absolute guarantee Sfc32 provides a minimum cycle length of 2^50, the chance to get a smaller cycle is unbelievably small. |
You're assuming exactly one cycle of length 2^18 exists. Why? Without knowing more about the distribution of cycle lengths you can't say anything. There could be many fixed points (cycles of length 2^32) or it could be that most points are part of cycles with length less than 2^18, given only the information above. |
Oops I means to say a cycle less than 2^18, and should have added much more disclaimer text 😄 . The linked paper says we can assume the length of the cycles to follow a logarithmic distribution (formula 14). We can assume there are about ln(2^96)=67 cycles (formula 12, 13). If there is a cycle of say 2^95, there is a 50% chance for a random initialization to end up in that one. The chance to end up in a cycle of length 1 is only 1 in 2^96. I was making a big assumption that this RNG is well designed, and the cycles roughly follow the same distribution. Thanks to the logarithmic distribution the chance to end up in a short cycle if very small.
|
Sounds like SFC32 is "okay" then. Great as a member of a supporting library, not sure about making it default. |
I have seen some comments saying that chaotic RNGs are to "random" and theoretically unpredictable to work with. And others saying that is just what you would want from an RNG 😄. Periodic RNGs like LCG and Xorshift are theoretically simple. Combining them with an output function, like PCG and Xorshift* do, makes them statistically strong. You could argue both have a place in the world. For a default something periodic seems like a more conservative choice. I am just collecting and trying to be objective for now, and have a few more to go... |
Evaluation of Xorshift / Xoroshiro with multiply and truncationThis are a variants on Xorshift* noted by Melissa O'Neil, but they may be more common. I put some effort into playing with them. A requirement to pass statistical tests is that the multiplier must be 64 bit, which has the consequence that the word size should also be 64 bit. With this they pass PractRand and BigCrush. License is public domain. Instead of Xorshift as a base generator, Xoroshiro could be a better choice as is does one instruction less. For 32-bit results the Xorshift/Xoroshiro result is multiplied by a constant, and truncated to the 32 most significant bits. For 64-bit results it is possible to use a widening multiply, and truncate to the middle 64 bits. On x86_64 the widening multiply is faster, on x86 combining two 32-bit results is faster. Speed:
x86:
Xoroshiro with multiply and truncation is always faster than Xorshift with those output functions, so it makes sense to go with Xoroshiro as a base generator. The only difference in these tests between On x86_64 the performance is equal to Xoroshiro+, or even 12% better for 32-bit results. Memory and period Conclusion |
What is the advantage of using xoroshiro* over xoroshiro+? The widening
multiply to get rid of the linear dependencies instead of truncating?
I still don't think it's worth it to get rid of the linear dependencies.
…On Wed, Nov 29, 2017, 23:50 Paul Dicker ***@***.***> wrote:
Evaluation of Xorshift / Xoroshiro with multiply and truncation
This are a variants on Xorshift* noted by Melissa O'Neil
<http://www.pcg-random.org/posts/other-good-options.html>, but they may
be more common. I put some effort into playing with them. A requirement to
pass statistical tests is that the multiplier must be 64 bit, which has the
consequence that the word size should also be 64 bit. With this they pass
PractRand and BigCrush. License is public domain.
Instead of Xorshift as a base generator, Xoroshiro could be a better
choice as is does one instruction less.
For 32-bit results the Xorshift/Xoroshiro result is multiplied by a
constant, and truncated to the 32 most significant bits. For 64-bit results
it is possible to use a widening multiply, and truncate to the middle 64
bits. On x86_64 the widening multiply is faster, on x86 combining two
32-bit results is faster.
*Speed*:
Benchmarks from my pc, with Xorshift128/32 (current RNG in rand) as a
baseline.
x86_64:
test gen_u32_xoroshiro_mt_32of128... bench: 1,060 ns/iter (+/- 9) = 3773 MB/s
test gen_u32_xoroshiro_mt_64of128... bench: 1,082 ns/iter (+/- 109) = 3696 MB/s
test gen_u32_xorshift_mt_64 ... bench: 1,280 ns/iter (+/- 4) = 3125 MB/s
test gen_u32_xorshift_128_32 ... bench: 1,082 ns/iter (+/- 2) = 3696 MB/s
test gen_u64_xoroshiro_mt_32of128... bench: 3,565 ns/iter (+/- 16) = 2244 MB/s
test gen_u64_xoroshiro_mt_64of128... bench: 1,094 ns/iter (+/- 12) = 7312 MB/s
test gen_u64_xorshift_mt_64 ... bench: 1,271 ns/iter (+/- 30) = 6294 MB/s
test gen_u64_xorshift_128_32 ... bench: 2,638 ns/iter (+/- 23) = 3032 MB/s
x86:
test gen_u32_xoroshiro_mt_32of128... bench: 3,658 ns/iter (+/- 52) = 1093 MB/s
test gen_u32_xoroshiro_mt_64of128... bench: 3,661 ns/iter (+/- 44) = 1092 MB/s
test gen_u32_xorshift_mt_64 ... bench: 3,482 ns/iter (+/- 13) = 1148 MB/s
test gen_u32_xorshift_128_32 ... bench: 1,475 ns/iter (+/- 60) = 2711 MB/s
test gen_u64_xoroshiro_mt_32of128... bench: 6,174 ns/iter (+/- 16) = 1295 MB/s
test gen_u64_xoroshiro_mt_64of128... bench: 11,884 ns/iter (+/- 83) = 673 MB/s
test gen_u64_xorshift_mt_64 ... bench: 11,624 ns/iter (+/- 25) = 688 MB/s
test gen_u64_xorshift_128_32 ... bench: 4,591 ns/iter (+/- 59) = 1742 MB/s
Xoroshiro with multiply and truncation is always faster than Xorshift with
those output functions, so it makes sense to go with Xoroshiro as a base
generator. The only difference in these tests between xoroshiro_mt_32of128
and xoroshiro_mt_64of128 is using a widening multiply on x86_64.
On x86_64 the performance is equal to Xoroshiro+, or even 12% better for
32-bit results.
On x86 it performs between 33% and 53% worse than Xoroshiro+.
*Memory and period*
As state size I choose 128 bits, two u64 words. The period is the same as
the Xorshift and Xoroshiro base generators: 2^128 - 1.
*Conclusion*
To my knowledge these are the fastest periodic RNGs that pass PractRand
and BigCrush on x86_64. PCG could be a better choice on x86.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#60 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AACCtGcLgRr9sjLyrLVc7PI59tzr8F23ks5s7X1SgaJpZM4Qpoy4>
.
|
Evaluation of KISS generatorsThe KISS generators are a family of RNGs by George Marsaglia where he combines several small RNGs to create one statistically good RNG. The 32 bit variant was originally designed in 1993, and it has been improved in 1999. It combines two Multiply-With-Carry generators, a LCG and an Xorshift RNG. In 2009 he designed a similar 64-bit variant. They are reported to pass BigCrush, and I believe they pass PractRand also. License is public domain. Speed:
x86:
Unsurprisingly combining 3 RNGs into one is does not create something very fast, although it performs better than I expected. LLVM is not very good at reordering steps from the 3 independent RNG's to take advantage of instruction level parallelism, I expect performance can be improved by maybe 10% yet. Memory, period and seed The 64-bit variant combines three generators, and requires 256 bits of state. The MWC generator has a period of 2^121 + 2^63 - 1. The Xorshift component has a period of 2^64 - 1, and the LCG of 2^64. This gives a combined period of about 2^247. The 32-bit variant takes 128 bits as seed, the 64-bit variant 250 (6 bits of the MWC carry-value are ignored). Conclusion |
Edit: Removed this comment because it was completely wrong (If it sounds to good to be true, it probably is 😄) |
Evaluation of Choatic Iterations PRNGTwo ways to improve the statistical quality of simple RNG's is to add an output function, or to combine different types of RNG's together. A relatively new alternative is chaotic iterations. This technique uses 3 bits of one RNG to pick 1 to 4 iterations of another RNG. The authors published many papers about this technique, and it is is now at version 3. In theory this technique can be used with any kind of small RNG. One of the papers on version 2 evaluated the results of using different base generators. While their statistical properties improved, it takes a large range of iterations to even pass DieHARD. The variant in the paper describing version 3 uses Xorshift as base generator, and Blum Blum Shub to pick the chaotic iterations. This seems to pass PractRand (so far), and the authors even claim some cryptographic security. That may be a bit optimistic though. Speed and memory:
x86:
To improve performance I have replaced the three branches with bit masking tricks (improves it 3x). The performance on x86 is the same as on x86_64, which I have not seen before. The memory used for this version if 196 bits. Period and initialization The seed to initialize both Xorshifts in this variant and the Blum Blum Shub generator may not be 0. A seed of 192 bits is enough for worry-free initialization. Conclusion |
I am also curious why xoroshiro* is used. Melissa O'Neil noted that merely taking |
It is just a variant I explored. Good link though, thank you. In another blog post she commented on the structure that becomes visible in Xoroshiro+. The author of PractRand also commented on the linearity of Xoroshiro+ output. So while it passes PractRand and TestU01, that is only because the state space it too large for the test to see the pattern. With a multiply and truncation just about all structure is gone. |
It is not clear how the pattern in low dimensions generalizes to higher
dimensions. I don't buy this "state space" argument.
…On Fri, Dec 8, 2017, 01:49 Paul Dicker ***@***.***> wrote:
It is just a variant I explored. Good link though, thank you. In another
blog post she commented on the structure that becomes visible in
Xoroshiro+
<http://www.pcg-random.org/posts/visualizing-the-heart-of-some-prngs.html#xoroshiro-with-additive-output-function>.
The author of PractRand also commented on the linearity of Xoroshiro+
output.
So while it passes PractRand and TestU01, that is only because the state
space it to large for the test to see the pattern. With a multiply and
truncation just about all structure is gone.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#60 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AACCtF8nq2D3v1giN_NmODRO1gENvNllks5s-Eg_gaJpZM4Qpoy4>
.
|
@vks I like it for it's speed, and Xoroshiro (without the +) can serve as a good and somewhat novel base algorithm. The PCG paper expressed nicely that a lot of this stuff is just engineering trade-offs between quality and performance. The last couple of decennia there has been a huge push for better quality, and Xoroshiro+ goes a bit against the flow here and aims for performance above quality. Note Vigna has published a paper about the much better Xorshift* first. Why would we not choose something different if the performance is somewhat comparable, and the quality is better? And if quality is not an issue, where do you draw the line, why not use an even faster RNG? |
I agree with @pitdicker, we should pick a fast generator, but one with very good statistical quality. However, we can also make other generators like Xoroshiro+ available in a companion crate. I also agree with @vks that the PCG author is making a jump claiming that the cut-down versions of generators are representative of larger versions, but lacking better analysis tools seems worth some attention. Actually, it should be possible to do something similar with 32-bit generators. Most have too long periods to fully explore, but if took 2^32 samples (feasible based on the performance numbers), choose some fixed 16-bit number and disregarded all samples whose high 16 bits does not match this fixed number, we could perform similar analysis with the low 16-bits of the remaining numbers. (We could also do the same for the high 16 bits.) Okay, that's quite a bit of work and I'm not convinced we need to do it, especially since all we are trying to do here is choose a default fast generator which can be changed any time (and may well not be the same on all architectures either). |
Where to go from here?Definitely not all RNG's ever thought up are tested here. There are just to many. I think we have a good overview of the landscape now though, and don't expect to find something much better lurking somewhere. Yet, including an LCG with a modulus that is not a power of two (which can still be efficient) and RANROT (although it is not considered relevant any more by its author) would be interesting. We ended up with several families of RNG's that have excellent quality, and that are based on different principles:
The performance of the last two are much worse than the others, so we can cross them off. I have not included other choatic RNG's in this list because SFC is much faster and offers better quarantees. Small Fast Counting RNGOn x86_64 this is the fastest good quality RNG. There are two variants, where the 32-bit version is the better choice for everything except generating SFC has two negative points:
This means SFC does not at the moment pass the 'Reasonably good theoretical basis' requirement. I think it is a good idea to contact the author and see if he want to provide some background. The other negative point, that it is a choatic RNG, can be both an advantage and a disadvantage. I can see situations where people would not be confident in it, and would prefer a periodic RNG. So SFC is great as an alternative RNG, but I would choose a conservative periodic RNG as default. PCGThis is probably the RNG most liked by people in the Rust community, it is mentioned often in the discussions around The large number of possible variants and choices can be overwhelming, it is possible to pick a base RNG, permutation function, streams and extension mechanism. But we can narrow it down to two best choices:
The performance of PCG-XSL-RR 128/64 (MCG) on x86 is currently very bad. This has to do with doing 128-bit multiplications on a 32-bit architecture. What makes it worse is that the multiplication function at the moment does not get inlined, approximately halving the performance. Xoroshiro with multiply and truncationThis is the fastest good-quality periodic RNG in this comparison. The big disadvantage is that it is put together by me. It combines well-known techniques: a Xoroshiro base generator by Vigna, multiply as permutation function as noted in "Numerical Recipes" for Xorshift*, truncation as recommended by O'Neil, and widening as a simple improvement step by me. The problem is the choice of multiplier. For that I used a list of multipliers that perform well on spectral tests by L'Ecuyer. To generate 32-bit results the best multiplier is the one for m=2^48 modulus 2^32, because the truncation removes the last 16 bits. Similar for 64-bit results the best multiplier is the one for m=2^96 modulus 2^64. A possible area of improvement could be to search for a good multiplier ourselves. There probably exist multipliers for just the 64 bits we use that are better than the best multiplier for 96 bits. With inlining the widening multiply on x86, and also using widening multiply for Comparative benchmarksBenchmarks from my pc, and again with the current RNG in rand (Xorshift128/32) as a baseline:
x86:
Interesting is that nothing can match the performance of generating 32-bit integers with Xorshift128/32 on x86. I think this is a necessary consequence of the thorough mixing of the state with every round that the RNG's with good statistical quality do. Xorshift128/32 only updates 1/4 of the state every round. Picking a defaultPCG has the big advantage that is of good quality, with good presentation, and liked by the community. A disadvantage is that even with the choice narrowed down to just two variants, there is still a choice to make. Xoroshiro with multiply and truncation outperforms PCG-XSH-RR 64/32 (LCG) by 7~13%, and for PCG also has a feature XoroshiroMT can not have: relatively fast seeking through the period. The choice between XoroshiroMT and PCG is difficult. And I don't want to suggest XoroshiroMT because I had a hand in it. But because their quality is similar, XoroshiroMT is always faster, and for PCG you have to pick between two variants for the best choice, XoroshiroMT seems the best default. If it was up to me I would include all three RNG's discussed here (5 variants) in @dhardy, what do you think? |
@pitdicker I think you've done some relatively thorough and quite careful research here; it would seem good material for a publication (or even two: analysis of existing generators and the design of Xoroshiro MT). Nevertheless PRNGs are a tricky subject and it may be worth getting in contact with others who've been working in the field. On choosing one as a default, in a way it doesn't matter too much since we reserve the right to change it (though should probably require at least bumping the minor version number in this case). So far Xoroshiro MT does sound very good all round; I'm not able to say any more than you are about its quality. If unsure we can start with a conservative choice. It may be however that a companion crate with a selection of PRNGs is available before long anyway, making it easy for users to choose themselves. It would also seem worth assembling a set of generators in an external crate (assuming licences are compatible) along with benchmark code, and then ask people to benchmark on several different platforms. @pitdicker would you be happy to start such a crate? |
|
There is something wrong with benchmarks on the latest Rust nightlies if you want to run the benchmarks. It adds a couple of extra branches for some reason, halving the results. I will have to open a bug report. |
@pitdicker
I'm a bit worried about some claims about xoroshiro (orginating from
O'Neill) that are repeated here. I had a private discussion with Vigna
about them and agree with him that they are questionable. In the interest
of an evenhanded discussion, I tried to represent this point of view.
The only statistical weakness of xoroshiro128+ I consider established are
the linear dependencies in the lowest two bits. Are you referring to
something else when talking about xoroshiro's "lower statistical quality"?
I don't think the linear dependencies are relevant in practice, since the
commonly used RNGs have them in every bit. Your truncation gets rid of them
at the cost of some performance. I think this is fair, if you really want
to get rid of them for some reason. You are choosing xoroshiro* over the
faster xoroshiro+ presumably based on the claims generalizing low to high
dimensions, which I think is questionable.
Personally, I draw the line for statistical quality at "not worse than the
Mersenne Twister", because I'm not aware of real-life usecases where its
quality was not good enough. (In theory, generating binary vectors might be
a problem, but I don't know of any real-world example.) The secondary
criterion for me is performance. This is why I favor xoroshiro128* over the
others.
…On Thu, Dec 7, 2017, 22:06 Paul Dicker ***@***.***> wrote:
@vks <https://github.com/vks>
What is it in Xoroshiro+ that makes you want to defend it so much? There
is a very large body of work around RNG's, and Xoroshiro+ is certainly not
the only one. Have you compared it objectively against others like I am
trying to do here?
I like it for it's speed, and Xoroshiro (without the +) can serve as a
good and somewhat novel base algorithm.
The PCG paper expressed nicely that a lot of this stuff is just
engineering trade-offs between quality and performance. The last couple of
decennia there has been a huge push for better quality, and Xoroshiro+ goes
a bit against the flow here and aims for performance above quality. Note
Vigna has published a paper about the much better Xorshift* first.
Why would we not choose something different if the performance is somewhat
comparable, and the quality is better? And if quality is not an issue,
where do you draw the line, why not use an even faster RNG?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#60 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AACCtOAO3jf9NqFQOPfpiqsu4ttynczvks5s-FNOgaJpZM4Qpoy4>
.
|
The problem is that I agree with the claims 😄. But let's not discuss it again. In #58 we are discussing to make a new I think four categories would be good, with an example in #58 (comment). What I named low(er) quality but very fast or very small should be worded a little better. Basically the idea is the RNG may be excellent for many uses, but we should spell them out together with a little background. Xoroshiro128+ and Mersenne Twister can for example be fine for generating floating point numbers, and I am sure you can name more situations they are excellent for. I just would not recommend it as default, as, well, who knows how the standard PRNG from If I and/or @dhardy extract the PRNG's from |
There has been a new PRNG (xoshiro) by Vigna and Blackman. They suggest xoshiro256** for all purposes. It gets rid of the linear dependencies of xoroshiro and has a period suitable for all parallel applications. For floating point numbers, they suggest xoshiro256+, which is faster but has linear dependencies in the 3 lowest bits, which can be ignored when generating floating-point numbers. The also suggest analogous 32-bit generators. |
We should also consider large MCGs and LCGs, at least on 64-bit platforms: http://www.pcg-random.org/posts/does-it-beat-the-minimal-standard.html |
Wow. That's a long and thorough discussion (just got a reference to this thread from the current interaction on the library). Let me just make some easy points:
|
This are my personal requirements for picking a default small PRNG. It seems like we should have some set of requirements to apply to #52, beyond benchmarks.
@dhardy, @vks: What do you think?
A user should not need to worry about:
This translates to the following requirements, in RNG terms:
Other qualities to look for are:
Speed and memory
This seems like a clear requirement. What differentiates small fast RNGs from cryptographic ones, is that they are very fast and / or use much less memory. Between a dozen and up to twenty instructions seem to be the norm. Also a state of about 16 bytes is usually enough, and using more memory is wasting.
Not consuming to many results
What should be the minimum amount of results that can be consumed?
On today's hardware, a fast RNG can generate about 2^32 results in just 4 seconds. In most cases it is not reasonable to expect a process to do little else than generate random numbers, at least not without also requiring them to be cryptographically secure. With the exception of simulations maybe. 2^47 results take about a day to be generated. It seems anything that can generate a usable range close to 2^50 seems good enough.
Independently initialized RNGs having a chance of producing similar streams
Many RNGs have only one period. When you pick some random initial state, there is a chance to end up in the same range as the results of another generator. Lets take a period of 2^128, and 2^48 as an upper limit for the expected amount of used results. It then takes about 2^32 random initializations to have a chance of 1 in a million to repeat part of an already used stream. See this comment, although some numbers are off by a factor 1000. So a period of 128 bits seems like a minimum requirement.
An alternative is RNGs with a small period and support for multiple streams, like PCG (when using LCG as a base generator) and SplitMix offer. A period of only 2^64 is just about enough to have a chance of 1 in 2000 for one new RNG to end up within the stream of one other RNG. If the RNG has 2^63 possible streams like PCG, 2^27 initializations are possible before the stream has a change of 1 in 1000 of being the same. Conclusion: support for ~2^64 streams is just enough to make a period of ~2^64 good enough for worry-free initialization.
All the above relies on the RNG taking a seed that has as many bits as the log2 of the period, 128. If the RNG can only take less bits, initialization may still be worry-free if the RNG has a proven method to map that seed to a state that can not be in the usable range of another seed. An example could be initializing PCG with a fixed state, and a random u64 for the stream. With just a u64 it is possible to guarantee 2^22 initializations and still having a change of only 1 in a million of ending up in the same stream. I am not aware of any RNG that offers this, besides it being quite pointless.
For small chaotic RNGs it is not really possible to talk about periods. The sizes calculated above for periods should map to state sizes if the RNG is perfect. So the state of chaotic RNGs should be at least 128 bits. I am not yet clear on what guarantees are available for this category of RNGs. Especially how cycles effect the birthday problem when initializing.
Whether statistical biases it contains may or may not matter
The standard way to evaluate the statistical quality of RNGs is through test suites, with PractRand and the BigCrush test from TestU01 being the current state of the art.
Many small chaotic RNGs pass without problems, and are quite fast.
For RNG's with a fixed period the recipe seems to be: take a fast, low quality RNG as a base generator, and apply a permutation function. Only the simple base generators like Xorshift, LCG or a Wehl sequence fail quickly.
Whether a statistical biases of an RNG matter, depends on how the generated values get used. For many uses a little bias of patterns does not matter. On the other hand, when it does matter but someone does not realize so, it is very hard to notice. The values appear random after all.
RNGs that have a clear bill from BigCrush and PractRand can be used without worry. At least for all situations where there is no adversary that tries to predict the RNG, use cryptographic RNGs in such a case.
Not being trivially predictable
On the PCG blog a post about trivial predictability is worth a read. For some algorithms cryptographically secure PRNG are unnecessary heavy, while some difficulty predicting the next results is enough to deter adversaries. One example is randomized quicksort. Not being trivially predictable can be seen as a plus, but not a requirement.
Reproducable results also on architectures with a different endianness
Even more then cryptographic RNGs, simple RNGs are used in many situations that need reproducible results. Then endianness should not be a worry.
Reasonably good theoretical basis
If the RNG has at least some theoretical background, requirements like the period size or cycle length can be proven. I would say we need those guarantees for the RNG do be made a default in Rand. While a published paper is nice, even comments in the source code with math that makes sense seem good enough for me.
License that can be used in Rand.
Our licenses are MIT and Apache 2.0. For RNGs that means they should have compatible licenses, like MIT or public domain. Anything higher like BSD or GPL would need permission from the author.
The text was updated successfully, but these errors were encountered: