Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for various endianness issues #5302

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

Hugobros3
Copy link

@Hugobros3 Hugobros3 commented Jul 1, 2023

The SPIR-V specifications is agnostic on the issue of endianness (at least at the 32-bit word level), and tooling has some supporting code for it. It sadly appears to suffer from quite a bit of rot, this PR tackles the following:

  • spirv-dis fails to disassemble a SPIR-V file in the non-native endianness
    • This is caused by trying to parse a string literal before the words are converted into native order, as assumed by MakeString
    • I changed it so that the whole file is byte-swapped when the parser is initialized, which opens the path for further simplifications
  • source extraction code duplicates the logic added in mhillenbrand@a95b262 and fails to account for mismatched endianness
  • Some amount of dead code resulting from the prior commit

I ran and passed all tests on a real PowerPC G5, which is as fun a way to spend a day as any other. There's probably more I can do, but I'd like this to be reviewed before just so I know I'm on the right track.

Additionally, I'm not sure how relevant big-endian SPIR-V support really is considering the state of the tooling. Rather it would probably be beneficial to have big-endian architectures default to emitting little-endian files, as I doubt there are many, if any implementations that correctly implement endianness agnosticism. I think it needs to at least be an option!

@CLAassistant
Copy link

CLAassistant commented Jul 1, 2023

CLA assistant check
All committers have signed the CLA.

words->insert(words->end(), _.words + _.word_index,
_.words + index_after_operand);
}
words->insert(words->end(), _.native_words.get() + _.word_index,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment will need to change. This is not always translating to native endianness.

@@ -590,7 +590,7 @@ spv_result_t Parser::parseOperand(size_t inst_offset,
case SPV_OPERAND_TYPE_OPTIONAL_LITERAL_STRING: {
const size_t max_words = _.num_words - _.word_index;
std::string string =
spvtools::utils::MakeString(_.words + _.word_index, max_words, false);
spvtools::utils::MakeString(_.native_words.get() + _.word_index, max_words, false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this change is correct. The SPEC says:

A string is interpreted as a nul-terminated stream of characters. All string comparisons are case sensitive. The character set is Unicode in the UTF-8 encoding scheme. The UTF-8 octets (8-bit bytes) are packed four per word, following the little-endian convention (i.e., the first octet is in the lowest-order 8 bits of the word).

How do you know that the native_words are the correct little-endian order?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the spec confusingly says "little-endian" it actually calls for packing bytes into 4-byte words by shifting them, which is what MakeString implements. This means that the spv\0 ([73,70,76,0]) string gets turned into the 0x00767073 word which will end up like 73 70 76 00 on a little-endian machine, but on a big-endian machine it will be 00 76 70 73. Very much not a byte string that can be read directly!

We therefore want all the words to be in the native order because MakeString is actually endinaness-agnostic as it only looks at the 32-bit words, it never tries to cast them to chars and read the result as a string. Only looking at things at the word level means endianness conversion is just pre/post processing to swap the words to match machine order, with zero additional complexity required.

@dneto0
Copy link
Collaborator

dneto0 commented Nov 26, 2024

Additionally, I'm not sure how relevant big-endian SPIR-V support really is considering the state of the tooling. Rather it would probably be beneficial to have big-endian architectures default to emitting little-endian files, as I doubt there are many, if any implementations that correctly implement endianness agnosticism. I think it needs to at least be an option!

I had to go back through to 2015 archives for this one:

  • Vulkan ended up deciding that a driver is meant to consume SPIR-V modules in its host endianness, and this is specified
  • Most tooling will only ever handle one kind of endianness. But widespread generic tools like SPIRV-Tools are reasonably expected to handle either case. SPIRV-Tools is expected to read either endianness, but we have lacked big-endian machines to test it on. (Glslang produces SPIR-V "correctly" for its host endianness, whichever endianness that is.)
  • It was generally understood that folks who hit the big-endian cases would submit patches; and here we are. :-)

@dneto0
Copy link
Collaborator

dneto0 commented Nov 26, 2024

I also dug up the original discussion from 2016 is on the public bugzilla: https://www.khronos.org/members/login/bugzilla-public/show_bug.cgi?id=1474
(the bug database had been down but it's now restored)

@Hugobros3
Copy link
Author

Thanks for digging this up, I found a few dead links last time but wasn't sure where to point to.

Copy link
Collaborator

@dneto0 dneto0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've loaded the prior context (from 2015-2016) and am now starting to look at this.

This patch lacks tests, so it can't be accepted as-is.

@@ -218,6 +218,7 @@ class Parser {
// Is the SPIR-V binary in a different endianness from the host native
// endianness?
bool requires_endian_conversion;
std::unique_ptr<uint32_t[]> native_words;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document what this means.

@@ -262,6 +263,9 @@ spv_result_t Parser::parseModule() {
<< _.words[0] << "'.";
}
_.requires_endian_conversion = !spvIsHostEndian(_.endian);
_.native_words = std::make_unique<uint32_t[]>(_.num_words);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to fix the logic without doubling copying the whole module, for everybody in all situations.

@dneto0
Copy link
Collaborator

dneto0 commented Nov 27, 2024

There are endianness tests, and two of them that deal with strings specifically.

See

TEST_F(BinaryParseTest, InstructionWithStringOperand) {

and

TEST_F(CxxBinaryParseTest, InstructionWithStringOperand) {

I'd start by modifying them to make sure they cover the case that I think needs fixing: On a little endian machine, a big-endian SPIR-V module is not correctly having its strings be parsed correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants