-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Cstring/Cwstring types for safe passing of NUL-terminated strings to ccall #10994
Conversation
# C NUL-terminated string pointers; these can be used in ccall | ||
# instead of Ptr{UInt8} and Ptr{Cwchar_t}, respectively, to enforce | ||
# a check for embedded NUL chars in the string (to avoid silent truncation). | ||
immutable Cstring; p::Ptr{UInt8}; end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be Cchar
although I'm not familiar with any systems where Cchar != UInt8
and we're likely to be broken for many reasons on any such system?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, that's the least convincing argument ever, pls ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sizeof(char)
is always one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that was an amusing read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have had some signedness-of-char issues with unicode on XP in the past
ade3444
to
df1aa34
Compare
Along the way, I noticed (and fixed + test) an unrelated bug where |
Yes, it should be Ptr{Cchar}, not Ptr{UInt8}, if you really want to be correct... |
This is great, @stevengj ! |
|
@tkelman, any idea why I would be getting an undefined-var error on Windows for
|
@StevenG There isn't anything in Julia that would look at whether Cchar is signed or unsigned on the platform, when dereferencing a Ptr{Cchar} from C? (I know, my newbieness with Julia is showing! 😀) |
@ScottPJones, we are just talking about the argument type declared in Not that it would hurt to change the declarations to |
if findfirst(s.data, 0) != length(s.data) | ||
throw(ArgumentError("embedded NUL chars are not allowed in C strings")) | ||
end | ||
return Cwstring(unsafe_convert(Ptr{wchar_t}, s)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return Cwstring(unsafe_convert(Ptr{Cwchar_t}, s))
^
7841275
to
71ac26c
Compare
# instead of Ptr{Cchar} and Ptr{Cwchar_t}, respectively, to enforce | ||
# a check for embedded NUL chars in the string (to avoid silent truncation). | ||
immutable Cstring; p::Ptr{Cchar}; end | ||
immutable Cwstring; p::Ptr{Cwchar_t}; end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't consistent with some platform ABIs, unfortunately, and will break ccall on some platforms. it must be declared bitstype WORD_SIZE Cstring
. i realize that creating new bits types is quite opaque currently and should be better documented. one extremely limited example is
Lines 2255 to 2257 in 54d8cc7
bitstype 24 Int24 | |
Int24(x::Int) = Intrinsics.box(Int24,Intrinsics.trunc_int(Int24,Intrinsics.unbox(Int,x))) | |
Int(x::Int24) = Intrinsics.box(Int,Intrinsics.zext_int(Int,Intrinsics.unbox(Int24,x))) |
pointer.jl is a more complex example (https://github.com/JuliaLang/julia/blob/master/base/pointer.jl)
(fwiw, and iirc, the only place it will actually matter (and thus segfault) is if this is used as a return type on x86 linux)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know. Unfortunately, WORD_SIZE
is defined in int.jl
, but I need it in c.jl
which is included before int.jl
in sysimg.jl
. Any suggestions? I guess I can use move the Ptr.size*8
WORD_SIZE
definition to base.jl
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I'm getting ErrorException("invalid number of bits in type Cstring")
... does the 2nd argument to bitstype
have to be a numeric literal?
It won't let me define a bitstype
inside an if WORD_SIZE === 64
statement either; it says type definition not allowed inside a local scope
, which is confusing since normally if
does not start a new scope. (Hmm, later on it seems to work; I'm not sure what I was doing wrong.)
I am getting pretty frustrated with bitstype
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I have it working as a bitstype
if I hard-code the size as 64. I'm trying to figure out how to use WORD_SIZE
, but it is tricky because a lot of things aren't defined yet at this point in the bootstrap process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example,
if Ptr.size === 8
bitstype 64 Cstring
bitstype 64 Cwstring
else
bitstype 32 Cstring
bitstype 32 Cwstring
end
doesn't work for some reason (I get a box: argument is of incorrect size
error later in the build process).
... aha, the problem is that .size
is an Int32
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And
if box(Int, unbox(Int32, Int.size)) === 8
bitstype 64 Cstring
bitstype 64 Cwstring
else
bitstype 32 Cstring
bitstype 32 Cwstring
end
is giving the mysterious type definition not allowed inside a local scope
error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay,
const sizeof_Int = box(UInt32, unbox(Int32, Int.size))
if sizeof_Int === 0x00000008 # need === since == doesn't exist yet
bitstype 64 Cstring
bitstype 64 Cwstring
else
bitstype 32 Cstring
bitstype 32 Cwstring
end
seems to work, though why I need the const
declaration to avoid type definition not allowed inside a local scope
is mysterious to me.
with this commit, i think you can also get rid of the following check in SubString-to-Ptr conversion which is equivalent to this test for an embedded null: Lines 650 to 652 in 54d8cc7
|
@vtjnash, no, that test is not equivalent. The point of that test is that if you have a substring whose end does not coincide with the end of the string, then the substring is not NUL-terminated. This is true regardless of whether there are embedded NULs anywhere. Or maybe your point is that people will stop using |
it's equivalent in the sense that it is a verification of whether the SubString can be used as a c-style null-terminated string. but yes, i don't see any way of making this transition, since it is leaving the existing common case unsafe, while adding an alternative safe method. |
# instead of Ptr{Cchar} and Ptr{Cwchar_t}, respectively, to enforce | ||
# a check for embedded NUL chars in the string (to avoid silent truncation). | ||
const sizeof_Int = box(UInt32, unbox(Int32, Int.size)) | ||
if sizeof_Int === 0x00000008 # need === since == doesn't exist yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow, this is low level bit twiddling. the way boot.jl does this is probably cleaner (if is(Int,Int64) ... end
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duh, yes, that would be cleaner...
We might as well delete the |
874e63c
to
f53d861
Compare
I can reproduce the file test giving |
Thanks @tkelman, what is the file that is in the directory? |
Does this work safely for non-ByteString strings? I.e. suppose we write ccall(:func_that_takes_a_cstring, Cint, (Cstring,), str) but |
yes, in v0.4 |
That's what I thought, just wanted to make sure. In that case I'm cool with this. @JeffBezanson's opinion would be good though. I have a feeling he might have one (he's at LL today, so it may take a bit). |
This isn't merged onto master yet and I want to tag 0.3.8 today, so the sendfile bugfix can probably wait until 0.3.9. |
@tkelman, note that there are two bugfix patches in this PR: one to |
Thanks, noted, but |
Oh, right. |
fwiw, this seems like a good way to help avoid zero-day exploits such as described in http://lucumr.pocoo.org/2010/12/24/common-mistakes-as-web-developer/ |
add Cstring/Cwstring types for safe passing of NUL-terminated strings to ccall
@JeffBezanson said it was okay to merge. |
I'm not sure exactly why, but this seems to cause an MSVC build to freeze while starting the second stage of bootstrap - no |
backported the sendfile bugfix in f5e0074 |
Whatever my bootstrapping problem with an msvc build was seems to have gone away. Now just hitting the segfault from 8b8b261... |
This adds two new types,
Cstring
andCwstring
, which act just likePtr{UInt8}
andPtr{Cwchar_t}
as arguments toccall
, except that they throw an error if the string contains an embedded NUL character. I went through the Julia code and used these types wherever they seemed necessary.This is necessary for safe passing of strings to C routines expecting NUL-terminated strings, to avoid silent truncation. See #10958, #10991.
Note that you can avoid the test, for strings known not to contain NUL, simply by using
Ptr{UInt8}
as before.Note also that I changed several of our own C API routines to accept a
char*
and asize_t
length parameter, rather than assuming NUL termination.Still missing:
Cmd
strings (command-line params)Cstring
to abitstype
cc: @JeffBezanson, @StefanKarpinski, @ScottPJones, @jiahao