Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for various endianness issues #5302

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 8 additions & 17 deletions source/binary.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ class Parser {
// Returns the endian-corrected word at the given position.
uint32_t peekAt(size_t index) const {
assert(index < _.num_words);
return spvFixWord(_.words[index], _.endian);
return _.native_words[index];
}

// Data members
Expand Down Expand Up @@ -218,6 +218,7 @@ class Parser {
// Is the SPIR-V binary in a different endianness from the host native
// endianness?
bool requires_endian_conversion;
std::unique_ptr<uint32_t[]> native_words;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document what this means.


// Maps a result ID to its type ID. By convention:
// - a result ID that is a type definition maps to itself.
Expand Down Expand Up @@ -262,6 +263,9 @@ spv_result_t Parser::parseModule() {
<< _.words[0] << "'.";
}
_.requires_endian_conversion = !spvIsHostEndian(_.endian);
_.native_words = std::make_unique<uint32_t[]>(_.num_words);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to fix the logic without doubling copying the whole module, for everybody in all situations.

for (size_t i = 0; i < _.num_words; i++)
_.native_words[i] = _.requires_endian_conversion ? spvFixWord(_.words[i], _.endian) : _.words[i];

// Process the header.
spv_header_t header;
Expand Down Expand Up @@ -440,10 +444,6 @@ spv_result_t Parser::parseOperand(size_t inst_offset,

const uint32_t word = peek();

// Do the words in this operand have to be converted to native endianness?
// True for all but literal strings.
bool convert_operand_endianness = true;

switch (type) {
case SPV_OPERAND_TYPE_TYPE_ID:
if (!word)
Expand Down Expand Up @@ -590,7 +590,7 @@ spv_result_t Parser::parseOperand(size_t inst_offset,
case SPV_OPERAND_TYPE_OPTIONAL_LITERAL_STRING: {
const size_t max_words = _.num_words - _.word_index;
std::string string =
spvtools::utils::MakeString(_.words + _.word_index, max_words, false);
spvtools::utils::MakeString(_.native_words.get() + _.word_index, max_words, false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this change is correct. The SPEC says:

A string is interpreted as a nul-terminated stream of characters. All string comparisons are case sensitive. The character set is Unicode in the UTF-8 encoding scheme. The UTF-8 octets (8-bit bytes) are packed four per word, following the little-endian convention (i.e., the first octet is in the lowest-order 8 bits of the word).

How do you know that the native_words are the correct little-endian order?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the spec confusingly says "little-endian" it actually calls for packing bytes into 4-byte words by shifting them, which is what MakeString implements. This means that the spv\0 ([73,70,76,0]) string gets turned into the 0x00767073 word which will end up like 73 70 76 00 on a little-endian machine, but on a big-endian machine it will be 00 76 70 73. Very much not a byte string that can be read directly!

We therefore want all the words to be in the native order because MakeString is actually endinaness-agnostic as it only looks at the 32-bit words, it never tries to cast them to chars and read the result as a string. Only looking at things at the word level means endianness conversion is just pre/post processing to swap the words to match machine order, with zero additional complexity required.


if (string.length() == max_words * 4)
return exhaustedInputDiagnostic(inst_offset, opcode, type);
Expand Down Expand Up @@ -752,17 +752,8 @@ spv_result_t Parser::parseOperand(size_t inst_offset,

if (_.requires_endian_conversion) {
// Copy instruction words. Translate to native endianness as needed.
if (convert_operand_endianness) {
const spv_endianness_t endianness = _.endian;
std::transform(_.words + _.word_index, _.words + index_after_operand,
std::back_inserter(*words),
[endianness](const uint32_t raw_word) {
return spvFixWord(raw_word, endianness);
});
} else {
words->insert(words->end(), _.words + _.word_index,
_.words + index_after_operand);
}
words->insert(words->end(), _.native_words.get() + _.word_index,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment will need to change. This is not always translating to native endianness.

_.native_words.get() + index_after_operand);
}

// Advance past the operand.
Expand Down
58 changes: 7 additions & 51 deletions tools/objdump/extract_source.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,45 +23,12 @@
#include "spirv-tools/libspirv.hpp"
#include "spirv/unified1/spirv.hpp"
#include "tools/util/cli_consumer.h"
#include "source/binary.h"

namespace {

constexpr auto kDefaultEnvironment = SPV_ENV_UNIVERSAL_1_6;

// Extract a string literal from a given range.
// Copies all the characters from `begin` to the first '\0' it encounters, while
// removing escape patterns.
// Not finding a '\0' before reaching `end` fails the extraction.
//
// Returns `true` if the extraction succeeded.
// `output` value is undefined if false is returned.
spv_result_t ExtractStringLiteral(const spv_position_t& loc, const char* begin,
const char* end, std::string* output) {
size_t sourceLength = std::distance(begin, end);
std::string escapedString;
escapedString.resize(sourceLength);

size_t writeIndex = 0;
size_t readIndex = 0;
for (; readIndex < sourceLength; writeIndex++, readIndex++) {
const char read = begin[readIndex];
if (read == '\0') {
escapedString.resize(writeIndex);
output->append(escapedString);
return SPV_SUCCESS;
}

if (read == '\\') {
++readIndex;
}
escapedString[writeIndex] = begin[readIndex];
}

spvtools::Error(spvtools::utils::CLIMessageConsumer, "", loc,
"Missing NULL terminator for literal string.");
return SPV_ERROR_INVALID_BINARY;
}

spv_result_t extractOpString(const spv_position_t& loc,
const spv_parsed_instruction_t& instruction,
std::string* output) {
Expand All @@ -73,12 +40,8 @@ spv_result_t extractOpString(const spv_position_t& loc,
return SPV_ERROR_INVALID_BINARY;
}

const auto& operand = instruction.operands[1];
const char* stringBegin =
reinterpret_cast<const char*>(instruction.words + operand.offset);
const char* stringEnd = reinterpret_cast<const char*>(
instruction.words + operand.offset + operand.num_words);
return ExtractStringLiteral(loc, stringBegin, stringEnd, output);
*output = spvDecodeLiteralStringOperand(instruction, 1);
return SPV_SUCCESS;
}

spv_result_t extractOpSourceContinued(
Expand All @@ -92,12 +55,8 @@ spv_result_t extractOpSourceContinued(
return SPV_ERROR_INVALID_BINARY;
}

const auto& operand = instruction.operands[0];
const char* stringBegin =
reinterpret_cast<const char*>(instruction.words + operand.offset);
const char* stringEnd = reinterpret_cast<const char*>(
instruction.words + operand.offset + operand.num_words);
return ExtractStringLiteral(loc, stringBegin, stringEnd, output);
*output = *output + spvDecodeLiteralStringOperand(instruction, 0);
return SPV_SUCCESS;
}

spv_result_t extractOpSource(const spv_position_t& loc,
Expand All @@ -123,11 +82,8 @@ spv_result_t extractOpSource(const spv_position_t& loc,
return SPV_SUCCESS;
}

const char* stringBegin =
reinterpret_cast<const char*>(instruction.words + 4);
const char* stringEnd =
reinterpret_cast<const char*>(instruction.words + instruction.num_words);
return ExtractStringLiteral(loc, stringBegin, stringEnd, code);
*code = spvDecodeLiteralStringOperand(instruction, 3);
return SPV_SUCCESS;
}

} // namespace
Expand Down