Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible alternative numer-theoretic shuffling algorithm #323

Closed
vbuterin opened this issue Dec 14, 2018 · 13 comments
Closed

Possible alternative numer-theoretic shuffling algorithm #323

vbuterin opened this issue Dec 14, 2018 · 13 comments
Labels
general:RFC Request for Comments

Comments

@vbuterin
Copy link
Contributor

vbuterin commented Dec 14, 2018

Motivation

Construct a shuffling algorithm where you can compute the value in the shuffled list at any specific position relatively cheaply without computing all of the other values at the same time. This could be used to replace the current shuffle.

Helpers

def is_prime(x):
    return [i for i in range(2, int(x**0.5)+1) if x%i == 0] == []

Algorithm

def value_at_position(n, value, seed):
    # We do the shuffling mod p, the lowest prime >= n, but if we actually shuffle into
    # the "forbidden" [n...p-1] slice we just reshuffle until we get out of that slice
    p = n 
    while not is_prime(p):
        p += 1 
    # x -> x**power is a permutation mod p
    power = 3 
    while (p-1) % power == 0 or not is_prime(power):
        power += 2 
    for round in range(40):
        a = int.from_bytes(seed[(round % 8)*4: (round % 8)*4 + 4], 'big')
        value = (pow(value, power, p) + a) % p
        while value >= n:
            value = (pow(value, power, p) + a) % p
        # Update the seed if needed
        if round % 8 == 0:
            seed = hash(seed)
    return value

def shuffle(values, seed):
    return [values[value_at_position(len(values), i, seed)] for i in range(len(values))]

Note that the above is a maximally simple definition, not an optimal implementation. An optimal implementation would calculate the prime, the exponent, and the a values once, and pass them as inputs to value_at_position, which would then need to do much less work per value.

The main weaknesses are: (i) the computation of the total shuffle is slower even with optimizations, and (ii) the security argument is more heuristic ("it repeats x -> x**3 + k[i] so it's like MIMC") than the current Fisher-Yates shuffle. However, the strengths may be worth it, especially if we wish to cease storing the shuffling in the state.

@vbuterin vbuterin changed the title Possible alternative shuffling algorithm Possible alternative numer-theoretic shuffling algorithm Dec 14, 2018
@mratsim
Copy link
Contributor

mratsim commented Dec 14, 2018

What are the security properties that are needed?

Quick analysis of the bottlenecks

The main bottleneck in the current Fisher-Yates shuffling is the repeated modulo in the loop as it's an expensive operation. In the new proposed shuffling there are gcd, pow and modulo.

In my opinion the main advantage of Fisher-Yates is the ability to do it in-place, however I suppose to be able to rollback state, most clients will not implement it in-place but in a temporary location instead.

Alternative

In case we need a temporary location, I think reservoir sampling will be superior in terms of performance while preserving equi-probabilities of all sequences.

The algorithm is very simple and doesn't involve expensive operation, from Wikipedia (array indexing starts at one)

ReservoirSample(S[1..n], R[1..k])
  // fill the reservoir array
  for i = 1 to k
      R[i] := S[i]

  // replace elements with gradually decreasing probability
  for i = k+1 to n
    j := random(1, i)   // important: inclusive range
    if j <= k
        R[j] := S[i]

We only need to couple it with a prng. For example XorShift128+ is used by all major browsers (and could be a eWASM primitive):

#include <stdint.h>

/* The state must be seeded so that it is not all zero */
uint64_t s[2];

uint64_t xorshift128plus(void) {
	uint64_t x = s[0];
	uint64_t const y = s[1];
	s[0] = y;
	x ^= x << 23; // a
	s[1] = x ^ y ^ (x >> 17) ^ (y >> 26); // b, c
	return s[1] + y;
}

Or the latest Xoroshiro128+ by the same author, which is significantly faster and with much better statistical quality.

Implementation in Nim (disclaimer this is the default Nim PRNG):

type
  Rand* = object ## State of the random number generator.
                 ## The procs that use the default state
                 ## are **not** thread-safe!
    a0, a1: ui

var state = Rand(  # <---- We can use our own seed instead
  a0: 0x69B4C98CB8530805u64,
  a1: 0xFED1DD3004688D67CAu64)

proc rotl(x, k: ui): ui =
  result = (x shl k) or (x shr (ui(64) - k))

proc next*(r: var Rand): uint64 =
  ## Uses the state to compute a new ``uint64`` random number.
  let s0 = r.a0
  var s1 = r.a1
  result = s0 + s1
  s1 = s1 xor s0
  r.a0 = rotl(s0, 55) xor s1 xor (s1 shl 14) # a, b
  r.a1 = rotl(s1, 36) # c

Advantage

This is much more hardware friendly.

Drawback

Like the current Fisher-Yates, you need to keep a state.

Papers

@mratsim
Copy link
Contributor

mratsim commented Dec 15, 2018

I've been thinking over this and unfortunately reservoir sampling wouldn't work as is.

We need a reservoir that hold the final selection, so it must be the size of the list to shuffle, i.e. we would start with the list and then we remove (~sample) elements randomly from it, so basically it just degrade into random sampling without replacement. However when doing random sampling the data structure used is quite important and a list/vector is not the most efficient one (in my benchmarks a Fenwick tree is very efficient) but if we need to reorganize into a new data structure we might as well do copy+Fisher Yates.

An alternative scheme would be to have a smaller reservoir (say 8 items) and add items ejected from the reservoir to the final list. Unfortunately even if we start from a random point in the original list, the order would be correlated to the original order.

So in short I think reservoir sampling is not suitable.

@vbuterin
Copy link
Contributor Author

The main desired property is being maximally close to a random permutation. Specifically, it should be maximally computationally difficult to find seeds that ensure that in a shuffling of 1...N, any specific subset x1, x2, x3.... x[k] are in a specific range [v ..... v+k].

The reason why the number-theoretic shuffle above is interesting is that you don't need to compute the whole output to compute a small portion of the output. This makes it much easier to use in light clients, and to verify blocks generally.

@vbuterin
Copy link
Contributor Author

I just optimized the code a bit; it can shuffle 100000 values in one second (vs 1 million in one second for the Fisher-Yates shuffle). That said, the hash complexity is smaller, so I suspect in an optimized C implementation the number-theoretic shuffle will not be much smaller than out current Fisher-Yates shuffle, as in python the hashes are optimized but the rest of the code is not, leading to hashing costs being understated.

@hwwhww hwwhww added the general:RFC Request for Comments label Dec 15, 2018
@vbuterin
Copy link
Contributor Author

vbuterin commented Dec 18, 2018

Here is a hash-based alternative. Here's the core:

def numhash(x, i, seed, modulus):
  assert 0 <= i < 4
  return (int.from_bytes(hash(x.to_bytes(32, 'big') + seed), 'big') // modulus**i) % modulus

def feistel(x, modulus, seed):
   assert is_perfect_square(modulus) and modulus < 2**50
   h = int(modulus ** 0.5)
   L, R = x//h, x%h
   for i in range(4):
    new_R = (L + numhash(R, i, seed, h)) % h
    L = R
    R = new_R
   return L * h + R

Now for the actual function, let modulus be the smallest perfect square greater than or equal to n. To permute some value, set value = feistel(value, modulus, seed) and if needed repeat until the result is less than n, just as above. This takes 4 hashes to compute for a single number, but to permute an entire list you only need ~sqrt(n) hashes.

@vbuterin
Copy link
Contributor Author

vbuterin commented Dec 18, 2018

I implemented both options and the status quo here: https://github.com/ethereum/research/tree/master/shuffling

Here's the timing test output, testing both computing a full shuffle of 100000 and computing just a specific sub-committee of 500 out of the 100000:

Testing prime shuffle
[40388, 24854, 44555, 69180, 37292, 27818, 85124, 51675, 75163, 16592]
[40388, 24854, 44555, 69180, 37292, 27818, 85124, 51675, 75163, 16592]
Total runtime:  1.0141525268554688
Runtime to compute committee:  0.022953271865844727

Testing feistel shuffle
[82855, 3100, 89704, 87662, 7830, 16014, 57626, 95313, 53632, 97853]
[82855, 3100, 89704, 87662, 7830, 16014, 57626, 95313, 53632, 97853]
Total runtime:  0.10488557815551758
Runtime to compute committee:  0.00408935546875

Testing Fisher-Yates shuffle (status quo)
[93044, 39644, 88989, 78137, 17662, 17187, 41433, 85069, 64061, 6647]
Total runtime:  0.08116364479064941

@mratsim
Copy link
Contributor

mratsim commented Dec 19, 2018

Feistel shuffling is impressively fast.

I've transformed the current state-of-the-art sampling technique from natural language processing into a shuffling algorithm in ethereum/research#98.

Originally it was for weighted sampling hence why it is rearranging input to efficiently check cumulative probability distributions.

There is an initialisation cost but further sampling are 6.5x cheaper.

Testing prime shuffle
[40388, 24854, 44555, 69180, 37292, 27818, 85124, 51675, 75163, 16592]
[40388, 24854, 44555, 69180, 37292, 27818, 85124, 51675, 75163, 16592]
Total runtime:  1.651440143585205
Runtime to compute committee:  0.02932882308959961


Testing feistel shuffle
[82855, 3100, 89704, 87662, 7830, 16014, 57626, 95313, 53632, 97853]
[82855, 3100, 89704, 87662, 7830, 16014, 57626, 95313, 53632, 97853]
Total runtime:  0.15117597579956055
Runtime to compute committee:  0.005218982696533203


Testing Fisher-Yates shuffle
[93044, 39644, 88989, 78137, 17662, 17187, 41433, 85069, 64061, 6647]
Total runtime:  0.1256880760192871


Testing F+tree sampling
[93044, 39644, 88989, 78137, 17662, 17187, 41433, 85069, 64061, 6647]
[50405, 51879, 54080, 30514, 76290, 60097, 83697, 19454, 68194, 85273]
[75628, 50172, 45023, 21349, 80036, 81698, 2829, 994, 88511, 20356]
Total runtime:  1.5932817459106445
Runtime to compute first committee:  0.053829193115234375
Runtime to compute next committee:  0.008832931518554688

@JustinDrake
Copy link
Contributor

Benedikt Bunz points to this paper as a potential candidate.

@drozdziak1
Copy link
Contributor

Is it certain that the present shuffling approach is going to be ditched?

@JustinDrake
Copy link
Contributor

@drozdziak1 I'd say 80% :) Light-client friendly shuffles seem like a big win, and they are possible. It's now an engineering/cryptography question to find a really efficient one, maybe even a SNARK/STARK-friendly one.

@esaulpaugh
Copy link

esaulpaugh commented Jan 31, 2019

It sounds like a CSPRNG is a requirement. Anything less just seems sketchy when deliberate attacks are expected. As they say, attacks only get better

@th4s
Copy link

th4s commented Feb 1, 2019

@vbuterin have you run some random number tests (e.g. dieharder) against the primenumber implementation or is this algorithm some well-known CSPRNG?

@djrtwo
Copy link
Contributor

djrtwo commented Feb 12, 2019

closed by adding "swap or not" #563

@djrtwo djrtwo closed this as completed Feb 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
general:RFC Request for Comments
Projects
None yet
Development

No branches or pull requests

8 participants