Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cryptographically secure random number generator #352

Merged
merged 10 commits into from
Oct 25, 2013

Conversation

ilya-stromberg
Copy link
Contributor

Supports Windows and Posix

Tested on Windows and Linux, but should work also on OS X and FreeBSD.

{
private import std.stdio;

//ссылка на файловый поток
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really a good idea to keep comments in russian in public project ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, forgot to delete this.

@s-ludwig
Copy link
Member

A few observations:

  • All modules must have at least the standard module doc comment containing license/copyright information and the brief description of the module (see other modules)
  • The comments used on each type/method are currently normal // style comments. These should be /** */ style ddoc comments instead, so that they will appear in the API documentation (see also the style guide section about doc comments)
  • Shouldn't /dev/random be the default instead of /dev/urandom, since this is explicitly meant for cryptographically secure applications?
  • This probably only matters for WinRT based use currently, but instead of using pragma(lib), "advapi32" should be added to the two "libs-windows" fields in package.json
  • Functionally, would it make any difference if front returned a single ubyte instead of an array? The advantage would be that no dynamic memory allocations would be necessary when requesting random numbers (apart from possibly a single buffer that is allocated in the constructor). This can be important in high-performance applications.
  • There are a few stylistic issues (WRT to the style guide), but I can fix those myself later on

Sorry for the nitpicking, but going towards a stable version, the quality bar needs to be set very high.

@ilya-stromberg
Copy link
Contributor Author

Shouldn't /dev/random be the default instead of /dev/urandom, since this is explicitly meant for cryptographically secure applications?

It runs too slow and blocks application. Just try to run unit tests a few times, I am sure, you'll see a 30-60 sec pause after 3-5 run. It isn't acceptable for responsive application. Use /dev/random only for long-term secure-critical purposes like SSL/SSH keys, random passwords or salts for ciphers.

Functionally, would it make any difference if front returned a single ubyte instead of an array? The advantage would be that no dynamic memory allocations would be necessary when requesting random numbers (apart from possibly a single buffer that is allocated in the constructor). This can be important in high-performance applications.

I allocate memory only once at the constructor. It lets to full the all buffer via only one system call. Yes, it's possible to return ubyte instead of an array, but it will require a lot of system calls, and context switching is expensive. So, any ideas?

Sorry for the nitpicking, but going towards a stable version, the quality bar needs to be set very high.

It's OK. I'll fix the code style.

@s-ludwig
Copy link
Member

It runs too slow and blocks application. Just try to run unit tests a few times, I am sure, you'll see a 30-60 sec pause after 3-5 run. It isn't acceptable for responsive application. Use /dev/random only for long-term secure-critical purposes like SSL/SSH keys, random passwords or salts for ciphers.

I see, so it seems like /dev/urandom is usually the same as /dev/random, just without enforcing minimum entropy requirements for the entropy pool, right? Then it probably indeed makes sense to use it by default. However, this brings up another issue, probably only important for /dev/random: Since the file access may block, it needs to be implemented in terms of a vibe.core.file.File instead of a std.stdio.File so that the event loop is not blocked.

I allocate memory only once at the constructor. It lets to full the all buffer via only one system call. Yes, it's possible to return ubyte instead of an array, but it will require a lot of system calls, and context switching is expensive. So, any ideas?

Sorry, I've overlooked that. The problem here is that it imposes the risk of accidentally using the wrong random value when someone stores the slice returned by front, so I'd rather avoid that. What about doing basically the same as now, but returning the reused buffer byte-by-byte instead and refilling the buffer once it is empty?

@ilya-stromberg
Copy link
Contributor Author

I see, so it seems like /dev/urandom is usually the same as /dev/random, just without enforcing minimum entropy requirements for the entropy pool, right?

Not exactly. As I know, /dev/random is hardware-based random number generator with high entropy level and minimum entropy requirements. Since in computer not so much random events it works slow.
/dev/urandom use some algorithm (like AES cipher with salt) to generate random numbers. It's secure because it's almost impossible to invert the algorithm. So, it's similar to the CryptGenRandom for Windows.

However, this brings up another issue, probably only important for /dev/random: Since the file access may block, it needs to be implemented in terms of a vibe.core.file.File instead of a std.stdio.File so that the event loop is not blocked.

I think we have no problems with /dev/urandom. Time to generate 100_000_000 bytes (100 bytes in the buffer, 1_000_000 iterations) is 6.041 sec (~60 nanoseconds per byte). I think it's fast enough, and non-blocking API can only delay generation.
About /dev/random: yes, we should use non-blocking API, but I didn't use vibe.core.file.File yet. Can we delay implementation of /dev/random API since we will not use it for vibe.http.session? I can remove it from code.

The problem here is that it imposes the risk of accidentally using the wrong random value when someone stores the slice returned by front, so I'd rather avoid that.

I document it:

//Return current buffer with random number
//Current buffer will be overwritten by the next call of popFront()
//For long-term storage copy array using .dup or.idup methods

Is it clear?

What about doing basically the same as now, but returning the reused buffer byte-by-byte instead and refilling the buffer once it is empty?

It's possible, but will work slower. The basic idea is: create new SystemRand with buffer size that you need and than use it. So, front returns reference to the buffer to avoid unnecessary copy. If you really need a long-term copy, do it yourself. Look at the HashRand - it use SystemRand buffer to calculate hash and store result in new buffer, so we transfer a lot of data without loops and avoid unnecessary copy.
Also, additional benchmark: time to generate 1000_000_000 bytes (1000 bytes in the buffer, 1_000_000 iterations) is 58.586 sec (~58.6 nanoseconds per byte). It's faster than previous one, so the main idea is correct: we should use buffers to avoid unnecessary loops.
Do you really want to use ubyte instead of an array?

@s-ludwig
Copy link
Member

The Wikipedia article about /dev/(u)random is a bit fuzzy, but it seemed to imply that both use the same pool of hardware random noise to seed their PRNG, which the difference that /dev/random will block until the noise pool fulfills certain entropy requirements.

Removing /dev/random for now would work, too, then.

It's possible, but will work slower.

In your benchmark, which compiler flags did you use and did you compare it against the ubyte version? I would guess that due to inlining it will be pretty efficient, especially since at some point the application has to copy the buffer anyway. For sure the impact will be an order of magnitude smaller than the RNG itself takes to compute the numbers. If that holds, I don't think it's a valid reason for having the interface adjusted in a certain way. The default should always be API safety, especially in a security critical place. If it is really needed, we could add an additional peek() function that returns a view to the current buffer contents.

@ilya-stromberg
Copy link
Contributor Author

Yes, /dev/random and /dev/urandom use the same hardware random noise. Also, /dev/urandom use PRNG to generate random numbers faster and hardware random noise to seed and re-seed the PRNG, but /dev/random use only hardware random noise.

In your benchmark, which compiler flags did you use and did you compare it against the ubyte version?

I did not compare it, just to buffers with different size. I use DMD without any flags (sorry, forgot about it), so optimized program should be faster.

The default should always be API safety, especially in a security critical place.

OK, convinced. Idea: we can add void fillBuffer(ubyte[] buffer) function. User should provide reference to the buffer, and we just fill it via simple copy. It looks like solves all our problems:

  • user can allocate buffer only once
  • user can allocate buffer via GC or via custom allocator
  • user can store data as long as they need
  • user can use same buffer many times
  • user can receive as much data as he need
  • user can use the same SystemRand for different data size
  • user avoid unnecessary loop to receive more than 1 ubyte

So, we should just remove empty, popFront() and front() functions and provide only fillBuffer function. Any ideas?

@ilya-stromberg
Copy link
Contributor Author

@s-ludwig: Do you agree to have void fillBuffer(ubyte[] buffer) function as public API?

@s-ludwig
Copy link
Member

Let's keep both possibilities. Having an input range interface can come in very handy and won't have a big performance overhead (at least in release mode) and the fillBuffer API can be used when it's more convenient/speed is an issue. I'd go for read as the name instead, though, to make it consistent with other places (e.g. InputStream).

@ilya-stromberg
Copy link
Contributor Author

I think it will be better to have byChunk(size) and byChunk(ubyte[] buffer) functions that returns input range interface. I lets to use different buffer size for same SystemRand.
Also, it was some problems with current input range API for Windows. When I use

foreach(rand; take(systemRand, iterationCount))
{
   //...
}

the function CryptGenRandom don't generate random numbers. It's secure because throws exception, but I really don't know why. All current unit tests works. Probably, it happens because take function copies systemRand variable. So, I'll try to implement byChunk functions and look at the Windows problem again.

@s-ludwig
Copy link
Member

The structs should either implement this(this) or have a @disable this(this); to handle the situation when an instance is copied. But possibly it's better to make them into final classes instead and let them derive from a RandomNumberGenerator interface, so that run-time polymorphism is possible, too.

But I don't get why you are against an InputRange!ubyte interface, though. Of course you can also make an optimized byChunk, but why not simply have a canonical base interface that is as flexible as possible, especially without knowing if there is any considerable performance impact? It would fit nicely with all range and algorithm functions, be safe and flexible, and can still be optimized if necessary. Or was that your idea and you just referred to the read/fillBuffer function?

@ilya-stromberg
Copy link
Contributor Author

OK, I'll see what can we do.

@ilya-stromberg
Copy link
Contributor Author

Yes, input range API works normally with final class for Windows.

I don't get why you are against an InputRange!ubyte interface, though.

It's possible, I agree. But in that case we can have security issue. We should store front value until popFront() call. But user can call popFront() too late or don't call it at all. In that case somebody else receive the same ubyte value. We can return different ubyte value after every front call, but it breaks current InputRange behavior. As you said, the default should always be API safety, especially in a security critical place. So, possible solutions are: do not implement InputRange!ubyte at all or implement byUbyte function that returns InputRange!ubyte interface. Ideas?

@mihails-strasuns
Copy link
Contributor

We should store front value until popFront() call. But user can call popFront() too late or don't call it at all. In that case somebody else receive the same ubyte value.

That does not make sense. Programmer is always supposed to use front and popFront in pair, anything else is crucial design error. It is not like anyone is usually calling those methods directly.

@ilya-stromberg
Copy link
Contributor Author

Programmer is always supposed to use front and popFront in pair, anything else is crucial design error. It is not like anyone is usually calling those methods directly.

Yes, I know. But it's still possible.
@s-ludwig, what do you think?

@mihails-strasuns
Copy link
Contributor

But it's still possible.

As well as dereferencing null.

@ilya-stromberg
Copy link
Contributor Author

I don't get why you are against an InputRange!ubyte interface, though.

Additional question: how will be use InputRange!ubyte?
We need a very long random number, much longer than ubyte. For example, RSA Laboratory crack DES cipher (56 bits key size) via brute force attack less than 3 days in 1998 year! That's because today we use only Triple DES (112 or 168 bits). AES cipher has 128, 192 or 256 bits key size. It's definitely more than ulong. It means that you need an array to store your random data. So, why will you use InputRange!ubyte if you have fillBuffer(ubyte[] buffer) function?

@ilya-stromberg
Copy link
Contributor Author

As well as dereferencing null.

But you'll see the error message.

@s-ludwig
Copy link
Member

Additional question: how will be use InputRange!ubyte?

You could use ubyte[32] mykey; rnd.take(32).copy(mykey[]); or auto mykey = rnd.take(32).array. But I would welcome fillBuffer, or ubyte[32] mykey; rnd.read(mykey); very much as an alternative interface.

Regarding the front/popFront issue, I'd say that this may be a flaw in the range design in general (one that is maybe impossible to avoid given the trade-offs involved). But since you usually operate on such a range using high level primitives, as @Dicebot already mentioned, it shouldn't be much of an issue in practice (but I agree that it is an issue in general). On the other hand I think the danger to accidentally store a returned buffer is very real (and even likely, see File.byLine issues on the D forums) and that is what I wanted to avoid.

Something like byUbyte would work of course, but it's also really just cosmetics and doesn't do anything to solve the actual issue. So the only choices are to either have a range interface along with the chance for misuse, or to not have it at all. But taking into account how ranges flourish in the D world, I would still go for the former.

@ilya-stromberg
Copy link
Contributor Author

Something like byUbyte would work of course, but it's also really just cosmetics and doesn't do anything to solve the actual issue.

Not exactly. byUbyte will return new input range that will store own front value. So, it will safer than one global input range because you can have a lot of absolutely different byUbyte ranges.

@s-ludwig
Copy link
Member

The same can be achieved by using separate RNG instances. It's just offset by one level.

@ilya-stromberg
Copy link
Contributor Author

Additional notice:
To avoid too many system calls we should use internal buffer for InputRange!ubyte. But in that case bad guy can use possible bugs in our program to access the internal buffer and compromise our random numbers. It's impossible with fillBuffer(ubyte[] buffer) if it doesn't use internal buffers because bad guy can't access to the OS kernel.
Do you agree with possible risk?

@mihails-strasuns
Copy link
Contributor

If someone has found an exploit to access internal program state remotely, we have much more serious issue than some compromised sessions.

It really seems like you are over-thinking this. Those who care about the security are perfectly aware of all risks and pitfalls. Those who don't won't be using this RNG directly at all and should never even know about its existence. Using power tools blindly and getting your leg shut off is expected behaviour.

@s-ludwig
Copy link
Member

I don't know, if someone has compromised the application in that way, it will most likely be possible to start much worse attacks than reading the cached random bytes. And for critical things I'd always use something like the hash based RNG with a high factor and the buffer should always be much smaller in comparison. But yes, in theory this cache imposes an additional risk.

But the most important step is to get away from the current implementation, which is using a buffer AND returns a view to it.

@ilya-stromberg
Copy link
Contributor Author

But the most important step is to get away from the current implementation, which is using a buffer AND returns a view to it.

Don't worry, path is ready.
I implemented fillBuffer(ubyte[] buffer) function, it supports both heap and stack arrays (see unit tests).

@s-ludwig, your example will be look like this:

ubyte[32] mykey;
rnd.fillBuffer(mykey);

Please tell me the function name that you want to have in API: read, InputStream, fillBuffer or something else?

@ilya-stromberg
Copy link
Contributor Author

If someone can found an exploit to access internal program state remotely, we have much more serious issue than some compromised sessions.

Probably. But if bug not a critical, bad guy can't run any code, but theoretically can do something else. I just put attention to this.

@ilya-stromberg
Copy link
Contributor Author

And for critical things I'd always use something like the hash based RNG with a high factor and the buffer should always be much smaller in comparison.

It doesn't give additional security. If I know init state and algorithm I can calculate result myself. So, hash based RNG secured only if nobody else knows init state.

@s-ludwig
Copy link
Member

Please tell me the function name that you want to have in API: read, InputStream, fillBuffer or something else?

I'd choose read for consistency reasons.

Thinking about InputStream, maybe we can sidestep the input range issue by staying with only read for now (deriving from InputStream) and at some point provide the range interface using a generic InputStream to input range adapter.

And for critical things I'd always use something like the hash based RNG with a high factor and the buffer should always be much smaller in comparison.
It doesn't give additional security. If I know init state and algorithm I can calculate result myself. So, hash based RNG secured only if nobody else knows init state.

But how would you know the init state? You just know a small part of the buffer used to calculate a single hash value + for all I've read about /dev/urandom it is not a simple PRNG with a fixed init state. If you could simply calculate the next random numbers by knowing a few, you could also calculate the next session ID after receiving one.

@ilya-stromberg
Copy link
Contributor Author

I'd choose read for consistency reasons.

Thinking about InputStream, maybe we can sidestep the input range issue by staying with only read for now (deriving from InputStream) and at some point provide the range interface using a generic InputStream to input range adapter.

OK, in that case I just rename method.

@mihails-strasuns
Copy link
Contributor

Actually, speaking about ranges, it may be the case where using OutputRange API makes much more sense.

@ilya-stromberg
Copy link
Contributor Author

Actually, speaking about ranges, it may be the case where using OutputRange API makes much more sense.

Do you have any usage example?

@ilya-stromberg
Copy link
Contributor Author

Method was renamed.

Please tell me if everything OK with API. In that case I'll fix the code style issues.

@mihails-strasuns
Copy link
Contributor

Do you have any usage example?

It is a matter of simply turning fillBuffer(ubyte[]) into fill(R)(R range, size_t count) if (isOutputRange!(R, ubyte))

(possibly long count = 1 by default if one wants to allow empty()-delimited output)

@s-ludwig
Copy link
Member

@ilya-stromberg: Looks good so far. In anticipation of a common base class/interface for multiple RNGs, I would maybe replace the "Rand" suffix with "RNG" or the full "RandomNumberGenerator" (but this is borderline in length).

BTW you have removed the hash based RNG. Didn't that work out with the new interface? Anyway, it's not important for the session stuff, so it can be added when needed.

@Dicebot: However, that is much less useful than an input range in terms of processing the data stream and would again require a temporary buffer.

@ilya-stromberg
Copy link
Contributor Author

Looks good so far. In anticipation of a common base class/interface for multiple RNGs, I would maybe replace the "Rand" suffix with "RNG" or the full "RandomNumberGenerator" (but this is borderline in length).

OK, I'll add "RNG" suffix.

BTW you have removed the hash based RNG. Didn't that work out with the new interface? Anyway, it's not important for the session stuff, so it can be added when needed.

I just didn't have time to port it to the new API. Also, I don't want remodel it few times. So, I'll try to implement it when we finish API design.

@ilya-stromberg
Copy link
Contributor Author

It is a matter of simply turning fillBuffer(ubyte[]) into fill(R)(R range, size_t count) if (isOutputRange!(R, ubyte))

Thank you for idea, I didn't think about it.
But yes, Phobos don't have enough support for Output Range.

@ilya-stromberg
Copy link
Contributor Author

The suffix was replaced. Do you have any additional wishes?

@s-ludwig
Copy link
Member

No, the API so far should be fine. Thank you for making the adjustments.

@ilya-stromberg
Copy link
Contributor Author

BTW you have removed the hash based RNG.

Added hash-based RNG.
@s-ludwig, does code and API fine?

@ilya-stromberg
Copy link
Contributor Author

@s-ludwig, @Dicebot: Is it OK?

@s-ludwig
Copy link
Member

AFAICS mainly just the DOC comments and the pragma is missing.

@ilya-stromberg
Copy link
Contributor Author

OK, will do.
Do you have any ideas for MixerRNG name? Is it clear?

@s-ludwig
Copy link
Member

Maybe HashMixerRNG to make it slightly more explicit, but it generally seems alright.

@ilya-stromberg
Copy link
Contributor Author

Thanks for idea.

@ilya-stromberg
Copy link
Contributor Author

Fix DDoc comments and pragma.

BTW, what's wrong with pragma and WinRT?

@s-ludwig
Copy link
Member

Thanks, looks good to merge. The problem with the pragma is that linking against "advapi" is not allowed for WinRT applications. So it would actually be necessary to write proper replacement code for that, but since the WinRT back end is still in its infancy, that can be safely postponed.

s-ludwig added a commit that referenced this pull request Oct 25, 2013
Add cryptographically secure random number generator
@s-ludwig s-ludwig merged commit 4a1c884 into vibe-d:master Oct 25, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants