Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

random_device: rdrand failed - Crashes host application #3151

Closed
benbucksch opened this issue May 10, 2021 · 13 comments
Closed

random_device: rdrand failed - Crashes host application #3151

benbucksch opened this issue May 10, 2021 · 13 comments

Comments

@benbucksch
Copy link

benbucksch commented May 10, 2021

When using an application that uses libsass, the application crashes (segfaults) with random_device: rdrand failed.

Reproduction

  1. Start an application that uses libsass, e.g. simply run hugo.

Actual results

terminate called after throwing an instance of 'std::runtime_error'
  what():  random_device: rdrand failed
Cancelled (Segfault)

Expected result

Hugo works

Version

This happens on the latest version of libsass1 in Ubuntu, currently libsass version 3.6.3. However, the problem also exists on current libsass master, because the relevant code has not changed.

Cause

  1. Some AMD CPUs seems to return a non-random number, but still claim success. See e.g. reports on Twitter.
  2. std:random_device throws an exception.
  3. libsass is unable to cope, throws the exception up into the caller.
  4. The calling application cannot possibly handle this error and fails.

The underlying root cause is that libsass is using cryptographically secure random numbers. Why? I don't see why CSS would need that. I would think that pseudo-random is sufficient.

Fix

Simply use pseudo-random numbers, from rand(), instead of

std::random_device rd;
return rd();

I've tested that, and that fixes the bug. PR will follow.

@mgreter
Copy link
Contributor

mgreter commented May 10, 2021

If I remember correctly LibSass only gets a cryptographically secure random number to seed the pseudo random number generator (PRNG) and from than on only uses PRNG. This is IMO best practice and I don't see LibSass doing anything not allowed by the C++11 APIs. So to me this clearly indicates some problem with the underlying library. Are you sure it also errors if you compile LibSass and the implementor on the same Machine? The error in node-sass kinda sounds to me like a linking issue, but that is really just a blatant assumption. Anyway, will try to research a bit on my own to see if we really need to abandon this way of initializing the PRNG. Btw. if you make a patch, please make it opt-in via a compile define or even better catch the error by the C++ call and then have a compile define to either error out or continue without true PRNG initialization.
Also rand function is exposed to sass code, so we don't really know what people do with it ;)

@mgreter
Copy link
Contributor

mgreter commented May 10, 2021

So after a short research I'm pretty sure this is the issue we're talking about (sorry it's in german):
https://www.heise.de/newsticker/meldung/Ryzen-3000-Bekannter-Fehler-bringt-viele-Linux-Distributionen-zu-Fall-4465220.html

It seems systemd fixed this issue via systemd/systemd@1c53d4a. Problem for LibSass is that the call to rdrand is done by C++. So best case is probably still to catch this error if possible and do the same workaround as systemd.

This is not something we should have to do, but well, reality. This is something for AMD, the firmware, the kernel or the C++ libs to fix/work-around. I'll try to find if any of them already include something in this regard in a recent version.

@benbucksch
Copy link
Author

benbucksch commented May 10, 2021

to me this clearly indicates some problem with the underlying library

random_device is throwing a runtime exception. This is correct and normal, if it cannot get random numbers.

libsass doesn't handle the error, and throws it up to the caller, instead of handling the error and doing a fallback. That's the bug. The caller has no way to handle this error, causing the segfault.

Random number sources can fail for any number of reasons, and are notoriously unreliable, so errors needs to be handled in any case, independent from the AMD bug.

I don't see LibSass doing anything not allowed by the C++11 APIs

Sure, nothing forbids that. But that doesn't answer my question:

What does a SASS need cryptographically secure random numbers for? Why are pseudo-random numbers not sufficient?

if you make a patch, please make it opt-in via a compile define or even better catch the error by the C++ call and then have a compile define to either error out or continue without true PRNG initialization.

I have made a patch that is verified to fix the problem, and I took the time to describe the problem and attach the patch, and I'm trying to help you out in fixing this serious issue, but you'll have to take it from here on.

If you insist on the strong seeding, you need a try/catch around random_device in any case. In case of errors from the secure random source, you can fall back to rand(). That would fix this bug and would be a simple and staight-forward change on top of my patch. Feel free to do that.

I hope I have provided the key information of: what is going wrong, why, and a patch that is verified to fix the problem.

@benbucksch
Copy link
Author

benbucksch commented May 10, 2021

FWIW, this is a AMD Ryzen 5 5600X, stepping 0. And everything else on my system seems to function normally, as far as I can see. Only hugo fails, in libsass.

I think the issue here is very simple: You don't need strong cryptographically secure random numbers, so you can just use rand(), and that's it.

Alternatively, you need to handle the fact that random number sources can fail, for any number of reasons, and have a fallback.

@mgreter
Copy link
Contributor

mgreter commented May 10, 2021

Are you running your latest BIOS for your MB? It seems that's how AMD is intending to fix the root cause. Also std::random_device isn't really cryptographically secure, it is only more likely to fetch the random number from a dedicated hardware device. IMHO it's the correct thing to do from a code perspective, although we certainly need to handle the error case. Seems I've missed that it actually can throw (examples seem to always ignore it).

They way you patched it you can also just return a static number instead of calling rand, since we never seeded srand, it will always generate the same number (first call after RNG, seeded with 1). This would mean that sass code calling random will always get the same number sequence, which is certainly not what I would expect. So I'd rather would use time or pid to seed the PRNG instead of rand(). I will comment on your PR when I come up with a solution, shouldn't be to difficult.

@mgreter
Copy link
Contributor

mgreter commented May 10, 2021

I'd have a few more questions here @benbucksch, hope you're willing to help me out here.

  • Can you reproduce the issue easily and reliably (would love to test two or three things)
  • Are you on windows or on linux (my open points are mainly in regard to windows)

If you are on windows, I would love to know how the mingw implementation would work on other compilers (I think we added this case explicitly to avoid an issue with older MinGW). You could simply replace #ifdef __MINGW32__ with #ifdef _WIN32 in fn_numbers.cpp to enable that implementation. I wonder if this one works or also exposes some weird behavior.

mgreter added a commit to mgreter/libsass that referenced this issue May 10, 2021
@benbucksch
Copy link
Author

benbucksch commented May 10, 2021

Is this randomness needed to generate CSS class names and similar code?

So I'd rather would use time or pid to seed the PRNG instead of rand()

Unixtime in ms makes sense as seed, yes, depending on what you use it for.

Can you reproduce the issue easily and reliably

Yes. It's 100% reproducible for me.

(would love to test two or three things)

The problem is that I am extremely limited in time. This issue has already cost me far more time than I could afford, even until now. So, unfortunately, I'm not able to test patches.

Good news is that the problem is clearly identified, and the symptions recorded here, and the solution does not depend on me or my machine anymore. All you need to know is that random_device().rd() throws a runtime_exception, so you need to handle that. This should be done in any case, even if the AMD bug is fixed, so at this point, it's simply a question of handling the exception, i.e. a classic coding problem.

Are you on windows or on linux (my open points are mainly in regard to windows)

I'm on Ubuntu 20.04 LTS (see initial description).

I appreciate that you understood that I'm very limited in terms of time.

I hope you can find a good solution for this that you're happy with!

@mgreter
Copy link
Contributor

mgreter commented May 10, 2021

Sure, I can understand, just means I have no clue what would happen in your case if you'd compile with MinGW ¯\(ツ)

@benbucksch
Copy link
Author

I have no clue what would happen in your case if you'd compile with MinGW

Ah, I see. I see the #ifdef __MINGW32 using Win32 APIs and wincrypt.h. Yeah, I'm definitely not set up to compile on Windows. Good question, though.

Thanks a lot for having taken this seriously, having improved my patch and merged it! Appreciated.

Enjoy your other work on libsass! :-)

@benbucksch
Copy link
Author

benbucksch commented May 10, 2021

All I can do is look up the MS docs for CryptGenRandom, which say: "If the function fails, the return value is zero", which suggests the fix would be very similar to what you do on Unix.

@mgreter
Copy link
Contributor

mgreter commented May 10, 2021

I've already refactors the code for upcoming new major release in that regard. The "mingw" part will no longer be used by default on mingw and needs to be enabled explicitly if users know they have a (now very) old mingw version with that issue. So it's ok for me ATM.

Btw. can you confirm that my patch solves your problem, just to be sure ;)

@benbucksch
Copy link
Author

benbucksch commented May 10, 2021

I've applied your PR #3153 that was merged. Yes, I can confirm that it fixes the bug for me.

# apt remove libsass
# apt install hugo
$ hugo
terminate called after throwing an instance of 'std::runtime_error'
  what():  random_device: rdrand failed
Abgebrochen (Speicherabzug geschrieben)
# dpkg -i libsass1_3.6.3-1ubuntu1_amd64.deb (built with your patch from #3153 )
$ hugo
works

Thank you!

@mgreter
Copy link
Contributor

mgreter commented May 10, 2021

Danke fürs testen!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants