-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cdac] Implement ISOSDacInterface::GetPEFileName in cDAC #106358
Conversation
Tagging subscribers to this area: @tommcdon |
while (true) | ||
{ | ||
// Read characters until we find the null terminator | ||
char nameChar = _target.Read<char>(addr); | ||
if (nameChar == 0) | ||
break; | ||
|
||
name.Add(nameChar); | ||
addr += sizeof(char); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider doing this slightly different. I would instead determine the entire length, find the null terminator and then reread the entire block of bytes in a single operation. I think it would be slightly easier to debug. I also don't know if endianness is important here. Metadata is always encoded as UTF-16LE, but I'm not sure about non-metadata allocated strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-metadata allocated strings would be big endian on a big endian machine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this implementation is ... fine... but it would be better if we added an api to Target to read null terminated strings of both 16bit and 8 bit char types in target endianness. This is a thing which keeps coming up, and we shouldn't be re-implementing the wheel again and again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-metadata allocated strings would be big endian on a big endian machine
Would they even be utf-16 at all?
(To be clear, I think this is kind of academic until someone sits down and does a coreclr port to POWER64 and actually makes some decision about what an LPCWSTR
even is in that platform version of the CoreCLR PAL. I don't know what a reasonable answer might be. /cc @uweigand @directhex )
Do we want to stash an encoding somewhere in the runtime descriptor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it would be better if we added an api to Target to read null terminated strings of both 16bit and 8 bit char types in target endianness.
It is more complicated than that. We need to know if the target pointer is into metadata, which is always LE. I was going to suggest the same thing, but the metadata angle makes it less obvious the correct answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even with the metadata angle, I think having APIs on Target reading strings in target endianness still makes sense.
We need to know if the target pointer is into metadata, which is always LE.
And then it is up to the caller / consumer of the target pointer to know that it shouldn't be read in target endianness and to read it their own way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having APIs on Target reading strings in target endianness still makes sense.
Sure, I agree with this. My point is, it isn't as simple as "read the string". In fact it can become very complicated if the API doesn't know the encoding. The API needs to know the encoding apriori or else it won't be able to detect null properly. Is it the "high order" byte of a UTF-16 or an actual null in UTF-8? The string reading API needs to know. The endianness for UTF-16 is easier, but still can be confusing to the caller. The API said it read UTF-16, but now the caller must consider the source and determine if it is in machine endian form or metadta endian form. Creating three APIs would be needed.
It would need to be something like:
string ReadUTF8String();
string ReadUTF16TargetEndianness();
string ReadUTF16MetadataEndianness();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @AaronRobinsonMSFT's apis here are good enough for now. As @lambdageek points out, dealing with actual big endian machines is quite unlikely to happen in the near term, and we can get away with having a "reasonable" model now, and tweak it to match reality if/when such a thing happens. I also don't think this refactor to using a Target
level api should happen in this PR, but should be deferred to a follow-up PR which is focused on making the string handling sensible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was expecting multiple APIs - like what @AaronRobinsonMSFT wrote. I put in a TODO for adding/using Target
APIs and will a follow-up change for that.
GetPath
inLoader
contractContributes to #99302