-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds support for String(x::Symbol) = string(x) #18152
Conversation
would have been less noise to rebase the existing PR #18120, but this is fine. will have to squash out that unnecessary merge commit |
string or print_to_string ? |
@@ -34,6 +34,7 @@ If you need to subsequently modify `v`, use `String(copy(v))` instead. | |||
""" | |||
String(v::Array{UInt8,1}) | |||
|
|||
String(x::Symbol) = string(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be more efficient to do String(x::Symbol) = unsafe_string(Cstring(x))
, which copies directly from the underlying data. (Seems to be about 3x faster in a quick benchmark.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this convert(::Type{String}, x::Symbol)
for consistency? We already have convert(Symbol, ::String)
. It would be weird if Symbol(str)
and s::Symbol = str
works, and String(sym)
works, but x::String = sym
does not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Want to make a PR for that change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we should merge this first, and make another PR to replace it with convert
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer a single PR. Ideally @dbeach24 would update this PR, but he doesn't seem to be around so maybe we should fork his branch and push a new PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Been rather busy. I'll work on it later this weekend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks @dbeach24.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TotalVerb If speed is a goal here, seems like:
convert(::Type{String}, x::Symbol) = unsafe_wrap(String, Cstring(x))
Will perform even better, since that is a zero-copy conversion. unsafe_string
is unsafe because it reads from an arbitrary pointer, but it still performs a copy. In that sense, I suppose unsafe_wrap
is doubly unsafe, since it also assumes that the underlying memory will not be modified at a later time.
However, since Symbol
is an immutable type, I presume that unsafe_wrap
is okay here. (But wonder is there is any possibility of a memory management issue because of this...?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, Symbol
s are currently never freed, so this is safe. But I don't know if it's a good idea to depend on this behaviour. @stevengj
Various places in the source code that use
|
Darn it, I screwed up my copy-and-paste in the above comment, so there are a number of duplicate lines that were supposed to be other occurrences of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this change. But this should probably be a method of convert
(see comment).
a66b010
to
710561d
Compare
Just updated the PR with a new commit. |
@@ -62,6 +62,7 @@ convert(::Type{Array{UInt8}}, s::AbstractString) = String(s).data | |||
convert(::Type{String}, s::AbstractString) = String(s) | |||
convert(::Type{Vector{Char}}, s::AbstractString) = collect(s) | |||
convert(::Type{Symbol}, s::AbstractString) = Symbol(s) | |||
convert(::Type{String}, s::Symbol) = unsafe_wrap(String, Cstring(s)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very efficient conversion, but I need a ruling on the safety of doing this w.r.t. memory management of symbols.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this mean the .data
field of the result of converting the same symbol to multiple different string objects would all point to the same data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They would point to the same data, but the .data
array would be different. They would be distinct .data
arrays that share the same underlying data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, yes, this would mean that all the strings have data that points to the same memory.
My suggestion is to use unsafe_string
here, not, unsafe_wrap
. I don't think the extra efficiency of not making a copy is worth it for the danger of being able to modify Symbol
contents through the data
field of a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevengj I will follow your advice.
It seems that your view includes two kinds of immutability. e.g. Symbols are IMMUTABLE, while Strings are merely immutable... and we can't go trusting an immutable object to hold the definitive copy of IMMUTABLE data.
I am correctly understanding this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not like someone can't muck around with the Symbol
contents anyway using Cstring
directly, though. Mutating the data
field of a string should be a no-go unless you make the string yourself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we all agree that mutating "someone else's" String is a no-go, then I'd be more inclined to stick with the unsafe_wrap
version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true that you're not supposed to mutate strings, and unsafe_wrap
should be safe for symbol data if people are following the rules, but I just wanted to reduce the temptation here. I'm fine either way, however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's probably not enough of a performance difference to matter. I'll go with the "safer" unsafe version.
julia> symbols = [gensym() for i=1:10^6];
julia> f1(symbols) = for s in symbols unsafe_wrap(String, Cstring(s)) end
julia> f2(symbols) = for s in symbols unsafe_string(Cstring(s)) end
julia> @time f1(symbols)
0.040287 seconds (2.00 M allocations: 91.553 MB, 14.31% gc time)
julia> @time f2(symbols)
0.052429 seconds (2.00 M allocations: 106.812 MB, 15.83% gc time)
This is the underlying support for calling String(::Symbol). Bug fix for issue JuliaLang#16997
710561d
to
431961d
Compare
Restarted travis. Previous error was the broadcast issue. LGTM when CI pass. |
What is
anyone seen that before? |
Oh right, then I guess Kristoffer didn't restart all jobs. |
Looks like I only restarded one of them. Oops. Redid it. |
@KristofferC OKed this and CI passed so I'm going to merge. |
Bug fix for #16997
defines String(x::Symbol) = string(x)
(This one's against master!)