Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WINDOWS: make env.jl and file.jl unicode-compliant for #4240 #7008

Closed
wants to merge 3 commits into from

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented May 28, 2014

No description provided.

// hidden 0 terminator for all byte arrays
tot++;
if (elsz <= sizeof(void*) && ndims == 1) {
// hidden 0 terminator for all small-element arrays
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JeffBezanson is this acceptable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No; sizeof(void*) is big enough to hold an Int64 or Float64, so this amounts to over-allocating essentially every array. Due to alignment this will actually add 16 bytes, which is a lot of overhead if you have lots of small arrays. The size of our array metadata (64 bytes) is already killer for lots of small objects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please suggest another approach then

Perhaps ccall could push! a hidden zero on every array, and set a flag bit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know; this is a very hard problem. Calling push! internally is a non-starter; it's very expensive, and ccall might not always have access to the array object itself. Maybe the UTF16String constructor can handle it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or adding a 0 if elsz <= 2?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why exactly do we need this?

@vtjnash vtjnash added this to the 0.3 milestone May 28, 2014
@vtjnash
Copy link
Member Author

vtjnash commented May 28, 2014

For strlen

If utf16string is going to handle this in Julia, then perhaps everything should. This really just feels too magical, and therefore problematic, since the user has so many avenues for making strings.

@JeffBezanson
Copy link
Member

We have to think in detail about what cases we can and can't handle. For example, if you allocate data in C and wrap it with pointer_to_array, there is nothing we can do short of copying it. The hack we use now is effective because passing julia-allocated byte arrays to C is extremely common. Without the if elsz==1 hack we would be totally hosed. The existence of ways to work around it does not matter; what matters is the 99% cases.

@stevengj
Copy link
Member

It seems reasonable to me to handle this in UTF16String, since the most important reason for that type is for Windows API calls.

@vtjnash vtjnash closed this Jun 5, 2014
@vtjnash vtjnash deleted the jn/W branch June 5, 2014 03:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
system:windows Affects only Windows unicode Related to unicode characters and encodings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants