Truncate hash output rather than `modulo p` #142

HarryR · 2019-07-26T18:40:36Z

As per: https://crypto.stackexchange.com/questions/21006/security-concern-about-reducing-hash-value-using-modulo-operation/21010#21010

I know @sanchopansa asked this question privately, and I remember responding at the time that I was unable to find bias in the outputs with a range of small primes (where a brute-force approach was possible)

Also referencing: Loopring/protocol3-circuits#7

The goal is to Identify where hash output is computed modulo N rather than truncated to a number of bits where it fits safely within the field. I would have thought that for modulo reduction vs truncation, if the source number is at 2 bits larger than the result, then you would get even uniformly random distribution, rather than the

The following reference is useful: https://ericlippert.com/2013/12/16/how-much-bias-is-introduced-by-the-remainder-technique/ - However, that is with a 2^n extension field rather than a prime field.

The libff::Fp_model::random_element() function sets the highest bits to zero until it's below the modulus, which is equivalent to masking the least significant FieldT::capacity() bits.

Locations:

I can't find any other locations where something is interpreted as mod p rather than truncating to floor(log2(p)) bits.

There is one location where the output of a hash is truncated:

https://github.com/HarryR/ethsnarks/blob/master/contracts/EdDSA.sol#L14

The text was updated successfully, but these errors were encountered:

HarryR · 2019-07-27T22:32:18Z

Have done some research into this.

With 2*ceil(log2(p)) versus n*... where n>2 you get diminishing returns on how uniform the distribution is. Say if the prime is b bits, and you obtain 2*b random bits then reduce modulo p, then the distribution is roughly equivalent to 10*b or 100*b random bits.

If we look at this as a pidgeon-hole-principal problem, with p**3 random samples.

The ideal mean and median will be p**2, the ideal population standard deviation will be p, and the ideal population variance will be p**2.

when truncating the number of bits to be k=floor(log2(p)), the population standard deviation and variance are close to ideal, but the mean and median are larger, meaning a smaller range of values are filled, such that p/mean = k.

Thus there are two patterns which are safe:

Truncating random input to k=floor(log2(p)) bits, where you get k bits of space.
Using k=2*ceil(log2(p)) bits, where the result is k % p

And using one, or two extra bits seems to introduce large biases.

This re-enforces the notion that simply taking ceil(log2(p)), even with a few extra bits, is generally not secure when reduced modulo p to derive a uniformly random distribution, and as such either of the two techniques can be used for better quality randomness.

I am unsure of the real-world implications of other approaches. So this change should definitely be adopted.

example code used to derive stats about probabilities;

from math import log2, ceil, floor
from collections import defaultdict
from hashlib import sha256
from os import urandom
from statistics import mean, median, pstdev, pvariance, stdev, variance


with open('primes1.txt') as handle:
	for lineno, line in enumerate(handle):
		for p in map(int, filter(lambda x: x not in ['', "\n", "\m"], line.split(' '))):
			if p == 2:
				continue
			b32_occurs = defaultdict(int)
			sm_occurs = defaultdict(int)
			mm_occurs = defaultdict(int)
			mm1_occurs = defaultdict(int)
			mm2_occurs = defaultdict(int)

			maxbits = ceil(log2(p))
			safebits = floor(log2(p))
			max_mask = (1<<(maxbits)) - 1
			max1_mask = (1<<(maxbits+1)) - 1
			max2_mask = (1<<(maxbits*2)) - 1
			safe_mask = (1<<(safebits)) - 1
			print(p, log2(p), ceil(log2(p)), bin(max_mask), safe_mask)
			hasher = sha256(urandom(32))
			p_cubed = p**3
			p_squared = p **2
			for _ in range(p_cubed):
				rand_sample_bytes = hasher.digest()
				rand_sample = int.from_bytes(rand_sample_bytes, 'little')
				safe_rand = rand_sample & safe_mask
				max_rand = rand_sample & max_mask
				max1_rand = rand_sample & max1_mask
				max2_rand = rand_sample & max2_mask

				b32_occurs[ rand_sample % p ] += 1
				sm_occurs[ safe_rand % p ] += 1
				mm_occurs[ max_rand % p ] += 1
				mm1_occurs[ max1_rand % p ] += 1
				mm2_occurs[ max2_rand % p ] += 1

				hasher.update(rand_sample_bytes)
			for n, v in [('b32', b32_occurs), ('sm', sm_occurs), ('mm', mm_occurs), ('mm1', mm1_occurs), ('mm2', mm2_occurs)]:
				vals = v.values()
				print(n, round(mean(vals) / p_squared, 3), round(median(vals) / p_squared, 3), round(p / pstdev(vals),3), round(pvariance(vals) / p_squared,3)) # sorted(v.items()), 
			print()
		if lineno > 10:
			break

HarryR · 2019-07-27T23:19:05Z

The approach used by libff is to clear all bits higher than the MSB of modulus, then reject samples until it's within the field. This has a potentially unlimited bound, but practically there are few iterations - although on average it may consume more entropy than then 2*log2(p) bits used with approach 2)

HarryR added help wanted Extra attention is needed question Further information is requested labels Jul 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Truncate hash output rather than `modulo p` #142

Truncate hash output rather than `modulo p` #142

HarryR commented Jul 26, 2019 •

edited

Loading

HarryR commented Jul 27, 2019 •

edited

Loading

HarryR commented Jul 27, 2019 •

edited

Loading

Truncate hash output rather than modulo p #142

Truncate hash output rather than modulo p #142

Comments

HarryR commented Jul 26, 2019 • edited Loading

HarryR commented Jul 27, 2019 • edited Loading

HarryR commented Jul 27, 2019 • edited Loading

Truncate hash output rather than `modulo p` #142

Truncate hash output rather than `modulo p` #142

HarryR commented Jul 26, 2019 •

edited

Loading

HarryR commented Jul 27, 2019 •

edited

Loading

HarryR commented Jul 27, 2019 •

edited

Loading