Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix realpath on Windows #13582

Closed
wants to merge 2 commits into from
Closed

Fix realpath on Windows #13582

wants to merge 2 commits into from

Conversation

malmaud
Copy link
Contributor

@malmaud malmaud commented Oct 13, 2015

There was a small inefficiency that would usually cause the call to GetFullPathName to be made twice. See #13542 (diff).

if (p < buflength)
resize!(buf, p+1)
return utf8(UTF16String(buf))
if (p > length(buf))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be p+1 > length(buf), I think. According to the documentation, the return value is:

If the function succeeds, the return value is the length, in TCHARs, of the string copied to lpBuffer, not including the terminating null character.
If the lpBuffer buffer is too small to contain the path, the return value is the size, in TCHARs, of the buffer that is required to hold the path and the terminating null character.

In particular, note the inconsistency regarding whether the return value includes the terminating null codeunit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same correction is needed for longpath, since the docs for GetLongPathName are similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I don't think it's possible for p==length(buf), which would be the only affected case by this change.

@stevengj
Copy link
Member

In fact, I see a lot of utf8(UTF16String(buf)) calls. Probably we should just have a utf8(x::Vector{UInt16}, len=length(x)) to convert from a UInt16 buffer. If the buffer is NUL-terminated, you would call utf8(x, length(x)-1).

@malmaud
Copy link
Contributor Author

malmaud commented Oct 13, 2015

I'm not sure it's a great idea for utf8 to assume that the data is intended to be interpreted as UTF16 just because it's of type Vector{UInt16} - that might muddle the waters of encoding scheme vs backend storage. But certainly all these utf8(UTF16String(.)) constructs are unfortunate.

@stevengj
Copy link
Member

@malmaud, utf8 already assumes a Ptr{UInt8} is UTF-8 encoded, and utf16 assumes a Ptr{UInt16} is UTF-16 encoded. And utf16(a::Vector{UInt16}) already exists and assumes a is UTF-16 data. So, utf8(::Vector{UInt16}) would be consistent. It would also be really weird to pass a Vector{UInt16} to a utf* conversion function and expect it to be treated as any encoding other than UTF-16.

@malmaud
Copy link
Contributor Author

malmaud commented Oct 13, 2015

Ok, I'm convinced. I'll make a PR for that.

On Tue, Oct 13, 2015 at 2:12 PM Steven G. Johnson notifications@github.com
wrote:

@malmaud https://github.com/malmaud, utf8 already assumes a Ptr{UInt8}
is UTF-8 encoded, and utf16 assumes a Ptr{UInt16} is UTF-16 encoded. So,
utf8(::Vector{UInt16}) would be consistent. It would also be really weird
to pass a Vector{UInt16} to a utf* conversion function and expect it to
be treated as any encoding other than UTF-16.


Reply to this email directly or view it on GitHub
#13582 (comment).

@vtjnash
Copy link
Member

vtjnash commented Oct 30, 2015

however, utf32(a::Vector{UInt8}) also already exists and does not mean a is utf8-data, but that it is utf32 data in a byte vector (with unknown endianness): utf32(UInt8['A',0,0,0,'B',0,0,0]) => "AB"

it's best to stick with utf8(UTF16String(data)) because it is then clear that the encoding and transcoding operations are independent.

@ScottPJones
Copy link
Contributor

If you recall, I'd already tried to add more of these conversions, and was made to remove them.
See if Tony will finally let them in.

@tkelman
Copy link
Contributor

tkelman commented Nov 10, 2015

superseded by #12819?

@vtjnash
Copy link
Member

vtjnash commented Nov 10, 2015

yes -- i was thinking of this PR as I made those changes, I just couldn't remember where I had seen this.

@vtjnash vtjnash closed this Nov 10, 2015
@tkelman tkelman deleted the jmm/windows_realpath branch March 22, 2016 12:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants