Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rabbit Produces null Keystream When inString == outString #1231

Closed
Demonslay335 opened this issue Sep 25, 2023 · 9 comments
Closed

Rabbit Produces null Keystream When inString == outString #1231

Demonslay335 opened this issue Sep 25, 2023 · 9 comments

Comments

@Demonslay335
Copy link

Demonslay335 commented Sep 25, 2023

When upgrading from CryptoPP 8.6.0 to 8.8.0 (master branch via vcpkg), we started having our own unit tests fail with code using the CryptoPP::RabbitWithIV algorithm. Upon inspection, it appears when inString == outString (which is perfectly legal according to ProcessData), Rabbit is returning a buffer with all 0x00 bytes, instead of the encrypted/decrypted buffer.

When I dug into the method, it looks like RabbitWithIVPolicy::OperateKeystream ends up just XOR'ing the generated keystream with itself...

Here is a minimal example reproducing the issue with the Rabbit test vectors, Test 4. This code passes on CryptoPP 8.6.0, and throws an assertion on CryptoPP 8.8.0. This also applies to CryptoPP::Rabbit. Compiled using Visual Studio (x64) on Windows 10.

#include <cryptopp/rabbit.h>

int main()
{
	auto key = std::vector<uint8_t>(CryptoPP::RabbitWithIV::KEYLENGTH, 0x00);
	auto iv = std::vector<uint8_t>(CryptoPP::RabbitWithIV::IV_LENGTH, 0x00);
	auto buffer = std::vector<uint8_t>(0x20, 0x00);
	auto expected = std::vector<uint8_t>{ 0xED, 0xB7, 0x05, 0x67, 0x37, 0x5D, 0xCD, 0x7C, 0xD8, 0x95, 0x54, 0xF8, 0x5E, 0x27, 0xA7, 0xC6, 0x8D, 0x4A, 0xDC, 0x70, 0x32, 0x29, 0x8F, 0x7B, 0xD4, 0xEF, 0xF5, 0x04, 0xAC, 0xA6, 0x29, 0x5F };

	CryptoPP::RabbitWithIV::Encryption enc;
	enc.SetKeyWithIV(key.data(), key.size(), iv.data(), iv.size());

	enc.ProcessData(buffer.data(), buffer.data(), buffer.size());

	CRYPTOPP_ASSERT(memcmp(buffer.data(), expected.data(), expected.size()) == 0);

	return 0;
}

Just to note, we use a lot of CryptoPP algorithms in our codebase, and no other algorithms had this issue so far.

Seems this may be related to #1010, perhaps a missed algorithm?

@noloader
Copy link
Collaborator

Ugh, sorry to hear that. GH #1010 gave us a lot of problems. I hope they are not surfacing again.

I can't duplicate on my test machines.

$ ./cryptest.exe tv rabbit
Using seed: 1695672009      

Testing SymmetricCipher algorithm Rabbit.
....................................................................................................................................................................................................................................................................
Testing SymmetricCipher algorithm RabbitWithIV.
....................................................................................................................................................................................................................................................................
Tests complete. Total tests = 520. Failed tests = 0.

Can you give the attached patch a try and report back. rabbit.diff.zip.

$ cat rabbit.diff
diff --git a/rabbit.cpp b/rabbit.cpp
index 4f655111..7bbdb1de 100644
--- a/rabbit.cpp
+++ b/rabbit.cpp
@@ -152,6 +152,10 @@ void RabbitPolicy::OperateKeystream(KeystreamOperation operation, byte *output,
        //  adding the input buffer and keystream.
        if ((operation & EnumToInt(INPUT_NULL)) != EnumToInt(INPUT_NULL))
                xorbuf(output, input, GetBytesPerIteration() * iterationCount);
+
+       // https://github.com/weidai11/cryptopp/issues/1231
+       volatile byte* unused = const_cast<volatile byte *>(output);
+       CRYPTOPP_UNUSED(unused);
 }
 
 void RabbitWithIVPolicy::CipherSetKey(const NameValuePairs &params, const byte *userKey, size_t keylen)
@@ -254,6 +258,10 @@ void RabbitWithIVPolicy::OperateKeystream(KeystreamOperation operation, byte *ou
        //  adding the input buffer and keystream.
        if ((operation & EnumToInt(INPUT_NULL)) != EnumToInt(INPUT_NULL))
                xorbuf(output, input, GetBytesPerIteration() * iterationCount);
+
+       // https://github.com/weidai11/cryptopp/issues/1231
+       volatile byte* unused = const_cast<volatile byte *>(output);
+       CRYPTOPP_UNUSED(unused);
 }
 
 NAMESPACE_END

@Demonslay335
Copy link
Author

Demonslay335 commented Sep 25, 2023

Does the test suite do input == output for the buffers though? From looking at the code, I believe it is using separate buffers thru the sinks.

I've tried the patch with no change. OperateKeystream is still passed output == input, causing the xorbuf to cancel itself out.

With a little more testing, I confirmed if AdditiveCipherTemplate<S>::ProcessData is forced to use WriteKeystream instead of OperateKeystream on the policy (e.g. length < bytesPerIteration), then the keystream is correct.

I feel like it may have to do with the way Rabbit's keystream function here is implemented, it looks different from how ChaCha and other stream ciphers are implemented (from my quick looking). Inside the loop, the keystream is directly written to output, which also of course overwrites the input in this case. I'm trying to read further into why this isn't a problem in another stream cipher.

@Demonslay335
Copy link
Author

I've just reproduced this same issue with the HC128 and HC256 algorithms. They have OperateKeystream structured the same as Rabbit, so the same thing happens. A quick search for the code block doesn't show any other algorithms so far doing it that way.

I think this may be related more to the changes made in GH #1088, and these algorithms weren't adapted to account for the pointers being the same when being consumed. In 8.6.0, it looks like if they were the same, a temporary string was created to be passed instead, then copied back to. These stream ciphers are directly writing the keystream to the output pointer, treating it is more like a temporary buffer, instead of saving the keystream + input.

I notice other stream ciphers like ChaCha and Sosemanuk (that work in this case) use the CRYPTOPP_KEYSTREAM_OUTPUT_WORD and CRYPTOPP_KEYSTREAM_OUTPUT_SWITCH macros, which could be related to a fix.

@noloader
Copy link
Collaborator

Hi @Demonslay335,

Does the test suite do input == output for the buffers though?

Probably not. That will be the first thing I fix.

I feel like it may have to do with the way Rabbit's keystream function here is implemented, it looks different from how ChaCha and other stream ciphers are implemented

Yeah, I believe you are correct.

I implemented Rabbit and HC-{128|256} classes. I don't recall why I did not use the same pattern as say, Salsa and ChaCha.

notice other stream ciphers like ChaCha and Sosemanuk (that work in this case) use the CRYPTOPP_KEYSTREAM_OUTPUT_WORD and CRYPTOPP_KEYSTREAM_OUTPUT_SWITCH macros, which could be related to a fix.

Yeah, something needs fixing. I'll dig into it when I have some time.

I am terribly sorry about this.

noloader added a commit that referenced this issue Sep 26, 2023
@noloader
Copy link
Collaborator

@Demonslay335, @fwosar,

I provided the first check-in at 560d48f96861. It adds the self tests to exercise in-place encryption and decryption. Though the self tests use ProcessString(inoutString, length), under the hood ProcessData(inString, outString, length) is called.

And a lot more than Rabbit and HC-{128|256} broke. Ugh...

For CTR mode, I think the test code is crafted wrong. For CTR mode, I believe we need to call ProcessLastBlock(), and that is not happening. So that explains CTR mode. But the other breaks look valid (to me).

@noloader
Copy link
Collaborator

noloader commented Sep 28, 2023

@Demonslay335, @fwosar,

I believe Rabbit is fixed with b157b4d30116. The fix was to cutover to CRYPTOPP_KEYSTREAM_OUTPUT_SWITCH, like in Salsa and ChaCha.

I do not know why that fixes the issue. The code is equivalent. In fact, the old code was simpler than the code with the CRYPTOPP_KEYSTREAM_OUTPUT_SWITCH. I expect the new code using CRYPTOPP_KEYSTREAM_OUTPUT_SWITCH to run a tad bit slower than the old code.

My wild ass guess is, the macros thwart analysis by the compiler, so the code with the CRYPTOPP_KEYSTREAM_OUTPUT_SWITCH remains in tact. I.e., the code is not removed by the compiler. The older, simpler code was easier to analyze, and the compiler removed calls to OperateKeystream due to aliasing violations in ProcessData.

I'll keep chipping away at the other failures.

@noloader
Copy link
Collaborator

noloader commented Sep 28, 2023

@Demonslay335, @fwosar,

I believe HC-{128|256} are fixed with 0bf879883573. The fix was to cutover to CRYPTOPP_KEYSTREAM_OUTPUT_SWITCH, like in Salsa and ChaCha.

@noloader
Copy link
Collaborator

I'm closing this issue since we cleared the problem with Rabbit, RabbitWithIV, HC128 and HC256.

I'm opening another bug for HIGHT/CTR mode.

@Demonslay335
Copy link
Author

Demonslay335 commented Sep 28, 2023

@noloader Thanks for the quick fix.

I believe it actually has nothing to do with compiler in this case (luckily).

I found the main difference is in the CRYPTOPP_KEYSTREAM_OUTPUT_WORD macro, which uses the 5th argument to PutWord (xorBlock), whereas the old stand-alone code did not (defaulting to NULLPTR). So the keystream is properly XOR'd with the input before being written to the output buffer, as we want.

template <class T>
inline void PutWord(bool assumeAligned, ByteOrder order, byte *block, T value, const byte *xorBlock = NULLPTR)
{
	CRYPTOPP_UNUSED(assumeAligned);

	T t1, t2;
	t1 = ConditionalByteReverse(order, value);
	if (xorBlock != NULLPTR) {std::memcpy(&t2, xorBlock, sizeof(T)); t1 ^= t2;} // < -- BINGO
	if (block != NULLPTR) {std::memcpy(block, &t1, sizeof(T));}
}

To verify, adapting that parameter actually fixes the old code.

byte* out = output;
for (unsigned int i = 0; i<iterationCount; ++i, out += 16)
{
	/* Iterate the system */
	m_wcy = NextState(m_wc, m_wx, m_wcy);

	/* Encrypt/decrypt 16 bytes of data */
	PutWord(false, LITTLE_ENDIAN_ORDER, out +  0, m_wx[0] ^ (m_wx[5] >> 16) ^ (m_wx[3] << 16), input + i * sizeof(WordType));
	PutWord(false, LITTLE_ENDIAN_ORDER, out +  4, m_wx[2] ^ (m_wx[7] >> 16) ^ (m_wx[5] << 16), input + i * sizeof(WordType));
	PutWord(false, LITTLE_ENDIAN_ORDER, out +  8, m_wx[4] ^ (m_wx[1] >> 16) ^ (m_wx[7] << 16), input + i * sizeof(WordType));
	PutWord(false, LITTLE_ENDIAN_ORDER, out + 12, m_wx[6] ^ (m_wx[3] >> 16) ^ (m_wx[1] << 16), input + i * sizeof(WordType));
}

Use of the macros is far more consistent with the rest of the library though, and using the switch statement macro properly accounts for all the operation values.

noloader added a commit that referenced this issue Sep 29, 2023
It turns out we went down a rabbit hole when we added the volatile cast gyrations in an attempt to change the compiler behavior. We are seeing the same failures from AES, Rabbit, HIGHT, HC-128 and HC-256 with and without the gyrations.
We were able to work out the problems with Rabbit, HIGHT, HC-128 and HC-256. See GH #1231 and GH #1234.
We are also not able to successfully cut-in Cryptogams AES on ARMv7, so it is now disabled. See GH #1236.
Since the volatile casts were not a solution, we are backing it out along with associated comments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants