Fixes for various endianness issues #5302

Hugobros3 · 2023-07-01T15:25:55Z

The SPIR-V specifications is agnostic on the issue of endianness (at least at the 32-bit word level), and tooling has some supporting code for it. It sadly appears to suffer from quite a bit of rot, this PR tackles the following:

spirv-dis fails to disassemble a SPIR-V file in the non-native endianness
- This is caused by trying to parse a string literal before the words are converted into native order, as assumed by MakeString
- I changed it so that the whole file is byte-swapped when the parser is initialized, which opens the path for further simplifications
source extraction code duplicates the logic added in mhillenbrand@a95b262 and fails to account for mismatched endianness
Some amount of dead code resulting from the prior commit

I ran and passed all tests on a real PowerPC G5, which is as fun a way to spend a day as any other. There's probably more I can do, but I'd like this to be reviewed before just so I know I'm on the right track.

Additionally, I'm not sure how relevant big-endian SPIR-V support really is considering the state of the tooling. Rather it would probably be beneficial to have big-endian architectures default to emitting little-endian files, as I doubt there are many, if any implementations that correctly implement endianness agnosticism. I think it needs to at least be an option!

CLAassistant · 2023-07-01T15:26:00Z

All committers have signed the CLA.

s-perron · 2024-11-12T16:06:50Z

source/binary.cpp

-      words->insert(words->end(), _.words + _.word_index,
-                    _.words + index_after_operand);
-    }
+      words->insert(words->end(), _.native_words.get() + _.word_index,


The comment will need to change. This is not always translating to native endianness.

s-perron · 2024-11-12T16:11:44Z

source/binary.cpp

@@ -590,7 +590,7 @@ spv_result_t Parser::parseOperand(size_t inst_offset,
    case SPV_OPERAND_TYPE_OPTIONAL_LITERAL_STRING: {
      const size_t max_words = _.num_words - _.word_index;
      std::string string =
-          spvtools::utils::MakeString(_.words + _.word_index, max_words, false);
+          spvtools::utils::MakeString(_.native_words.get() + _.word_index, max_words, false);


I'm not sure this change is correct. The SPEC says:

A string is interpreted as a nul-terminated stream of characters. All string comparisons are case sensitive. The character set is Unicode in the UTF-8 encoding scheme. The UTF-8 octets (8-bit bytes) are packed four per word, following the little-endian convention (i.e., the first octet is in the lowest-order 8 bits of the word).

How do you know that the native_words are the correct little-endian order?

While the spec confusingly says "little-endian" it actually calls for packing bytes into 4-byte words by shifting them, which is what MakeString implements. This means that the spv\0 ([73,70,76,0]) string gets turned into the 0x00767073 word which will end up like 73 70 76 00 on a little-endian machine, but on a big-endian machine it will be 00 76 70 73. Very much not a byte string that can be read directly!

We therefore want all the words to be in the native order because MakeString is actually endinaness-agnostic as it only looks at the 32-bit words, it never tries to cast them to chars and read the result as a string. Only looking at things at the word level means endianness conversion is just pre/post processing to swap the words to match machine order, with zero additional complexity required.

dneto0 · 2024-11-26T22:53:14Z

Additionally, I'm not sure how relevant big-endian SPIR-V support really is considering the state of the tooling. Rather it would probably be beneficial to have big-endian architectures default to emitting little-endian files, as I doubt there are many, if any implementations that correctly implement endianness agnosticism. I think it needs to at least be an option!

I had to go back through to 2015 archives for this one:

Vulkan ended up deciding that a driver is meant to consume SPIR-V modules in its host endianness, and this is specified
Most tooling will only ever handle one kind of endianness. But widespread generic tools like SPIRV-Tools are reasonably expected to handle either case. SPIRV-Tools is expected to read either endianness, but we have lacked big-endian machines to test it on. (Glslang produces SPIR-V "correctly" for its host endianness, whichever endianness that is.)
It was generally understood that folks who hit the big-endian cases would submit patches; and here we are. :-)

dneto0 · 2024-11-26T23:02:00Z

I also dug up the original discussion from 2016 is on the public bugzilla: https://www.khronos.org/members/login/bugzilla-public/show_bug.cgi?id=1474
(the bug database had been down but it's now restored)

Hugobros3 · 2024-11-27T01:24:32Z

Thanks for digging this up, I found a few dead links last time but wasn't sure where to point to.

dneto0

I've loaded the prior context (from 2015-2016) and am now starting to look at this.

This patch lacks tests, so it can't be accepted as-is.

dneto0 · 2024-11-27T15:06:13Z

source/binary.cpp

@@ -218,6 +218,7 @@ class Parser {
    // Is the SPIR-V binary in a different endianness from the host native
    // endianness?
    bool requires_endian_conversion;
+    std::unique_ptr<uint32_t[]> native_words;


Please document what this means.

dneto0 · 2024-11-27T15:07:00Z

source/binary.cpp

@@ -262,6 +263,9 @@ spv_result_t Parser::parseModule() {
                        << _.words[0] << "'.";
  }
  _.requires_endian_conversion = !spvIsHostEndian(_.endian);
+  _.native_words = std::make_unique<uint32_t[]>(_.num_words);


Let's try to fix the logic without doubling copying the whole module, for everybody in all situations.

dneto0 · 2024-11-27T16:02:04Z

There are endianness tests, and two of them that deal with strings specifically.

See

SPIRV-Tools/test/binary_parse_test.cpp

Line 723 in 7c58952

TEST_F(BinaryParseTest, InstructionWithStringOperand) {

and

SPIRV-Tools/test/binary_parse_test.cpp

Line 748 in 7c58952

TEST_F(CxxBinaryParseTest, InstructionWithStringOperand) {

I'd start by modifying them to make sure they cover the case that I think needs fixing: On a little endian machine, a big-endian SPIR-V module is not correctly having its strings be parsed correctly.

Hugobros3 added 6 commits July 1, 2023 11:33

remove dead code dealing with operand endianness

4826b6e

perform endianness conversion eagerly

39a63f8

fix string literals when file has non-native endianness

e600563

don't convert endianness if not required

3ccc744

use native_words as a cache for spvFixWord

74df3cf

simplify and fix mismatched endian string parsing for extract_source.cpp

458eb86

alan-baker requested a review from dneto0 July 4, 2023 13:30

s-perron added the component:as/dis label Jul 7, 2023

s-perron reviewed Nov 12, 2024

View reviewed changes

awilfox mentioned this pull request Nov 22, 2024

String endianness is incorrect: big-endian strings cannot be read on little-endian machines #5595

Open

s-perron added the kokoro:run label Nov 26, 2024

kokoro-team removed the kokoro:run label Nov 26, 2024

dneto0 reviewed Nov 27, 2024

View reviewed changes

dneto0 requested changes Nov 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes for various endianness issues #5302

Fixes for various endianness issues #5302

Hugobros3 commented Jul 1, 2023 •

edited

Loading

CLAassistant commented Jul 1, 2023 •

edited

Loading

s-perron Nov 12, 2024

s-perron Nov 12, 2024

Hugobros3 Nov 14, 2024

dneto0 commented Nov 26, 2024

dneto0 commented Nov 26, 2024

Hugobros3 commented Nov 27, 2024

dneto0 left a comment

dneto0 Nov 27, 2024

dneto0 Nov 27, 2024

dneto0 commented Nov 27, 2024

Fixes for various endianness issues #5302

Are you sure you want to change the base?

Fixes for various endianness issues #5302

Conversation

Hugobros3 commented Jul 1, 2023 • edited Loading

CLAassistant commented Jul 1, 2023 • edited Loading

s-perron Nov 12, 2024

Choose a reason for hiding this comment

s-perron Nov 12, 2024

Choose a reason for hiding this comment

Hugobros3 Nov 14, 2024

Choose a reason for hiding this comment

dneto0 commented Nov 26, 2024

dneto0 commented Nov 26, 2024

Hugobros3 commented Nov 27, 2024

dneto0 left a comment

Choose a reason for hiding this comment

dneto0 Nov 27, 2024

Choose a reason for hiding this comment

dneto0 Nov 27, 2024

Choose a reason for hiding this comment

dneto0 commented Nov 27, 2024

Hugobros3 commented Jul 1, 2023 •

edited

Loading

CLAassistant commented Jul 1, 2023 •

edited

Loading