Add Char * String and Char * Char #22532

adamslc · 2017-06-25T16:46:38Z

Solves #22512. I am definitely a Github novice, so let me know if I made a stupid mistake somewhere...

KristofferC · 2017-06-25T16:53:30Z

Perhaps you could also add some tests of the kind 'a' * "b" * 'c' etc?

added extra tests

adamslc · 2017-06-25T17:32:13Z

I've added a few more tests. I tried to squash the extra commit, but I just made a horrible mess. How should I do that?

KristofferC · 2017-06-25T17:33:44Z

It's OK, it can be squashed on merge with the github ui.

pabloferz · 2017-06-25T20:06:44Z

base/strings/basic.jl

@@ -68,6 +68,9 @@ julia> "Hello " * "world"
 ```
 """
 (*)(s1::AbstractString, ss::AbstractString...) = string(s1, ss...)
+(*)(c::Char, s::AbstractString) = string(c, s)
+(*)(s::AbstractString, c::Char) = string(s, c)
+(*)(c1::Char, c2::Char) = string(c1, c2)


We might subsume all these with

(*)(s::AbstractString, ss::Union{Char,AbstractString}...) = string(s, ss...) (*)(c::Char, ss::Union{Char,AbstractString}...) = string(c, ss...)

and take advantage of the string(::Union{Char,String}...) method.

I reduced it further to one method (see next commit)

andyferris · 2017-06-25T23:20:58Z

~~I was wondering about concatenating characters to make this operation more complete, e.g. char * char and things like 'a' * 'b' * "c" vs "a" * 'b' * 'c'.~~

Sorry I misread :)

andyferris · 2017-06-25T23:21:02Z

PS ++ rules 😜

tkelman · 2017-06-26T01:37:10Z

docstring should be updated here - it isn't formatted correctly on master, signatures should be indented, not backtick fenced

tkelman · 2017-06-26T02:01:32Z

base/strings/basic.jl

-```
-*(s::AbstractString, t::AbstractString)
-```
+    *(s::Union{Char, AbstractString}, t::Union{Char, AbstractString})


... on the second input

adamslc · 2017-06-26T16:03:45Z

Does this need anything else?

pabloferz · 2017-06-26T16:12:40Z

There was a timeout on linux 32-bit so I restated the build (gist backed-up here https://gist.github.com/pabloferz/4cb263e0967ba9a5c3256bc53d3619ee)

fredrikekre · 2017-06-26T18:47:35Z

base/strings/basic.jl


-Concatenate strings. The `*` operator is an alias to this function.
+Concatenate strings and characters. The `*` operator is an alias to this function.


"[...] and characters to a [`String`](@ref)." perhaps?

Also, this is the * function, its an alias for itself!?

I would make it like this:

""" *(s::Union{AbstractString, Char}, t::Union{AbstractString, Char}) Concatenate strings and/or characters, producing a [`String`](@ref). This is equivalent to calling the [`string`](@ref) function on the arguments. """

tkelman · 2017-06-30T05:28:55Z

base/strings/basic.jl

+Concatenate strings and/or characters, producing a [`String`](@ref). This is equivalent
+to calling the [`string`](@ref) function on the arguments.
+
+# Examples



delete the blank line

I thought we had been putting lines between the headers and contents in docstring?

Ah, I guess not.

recently moving towards getting rid of them everywhere

K, thanks for the heads up

musm · 2017-06-30T06:29:25Z

base/strings/basic.jl

 ```
 """
-(*)(s1::AbstractString, ss::AbstractString...) = string(s1, ss...)
+(*)(s1::Union{Char, AbstractString}, ss::Union{Char, AbstractString}...) = string(s1, ss...)


Aside from some of the linalg code, typically there isn't a space after the comma in Union. Probably best to keep consistency within this file and alike.

StefanKarpinski · 2017-07-11T15:32:07Z

Unless there are objections, I plan to merge this in 24 hours.

pabloferz · 2017-07-11T15:37:11Z

base/strings/basic.jl

@@ -56,9 +56,12 @@ sizeof(s::AbstractString) = error("type $(typeof(s)) has no canonical binary rep
 eltype(::Type{<:AbstractString}) = Char

 """
-    *(s::Union{Char, AbstractString}, t::Union{Char, AbstractString}...)
+    *(s::Union{AbstractString, Char}, t::Union{AbstractString, Char})


Why where the three dots removed?

ararslan · 2017-07-11T18:17:52Z

Once the ... is added back to the docstring, LGTM.

KristofferC · 2017-07-11T18:39:37Z

Just add it back (with collaborator access to the branch) and merge?

[ci skip]

ararslan · 2017-07-11T18:46:20Z

Good idea, @KristofferC. Done.

ararslan · 2017-07-11T18:47:29Z

Thanks for the contribution, @adamslc! Nice work here.

[ci skip]

stevengj · 2017-07-12T01:14:48Z

Two problems:

Since it defines Char * Char = String, it should also define one(::Type{Char}) = "".
Should probably have a specialized Char^Integer method similar to the String^Integer method.

ararslan · 2017-07-12T01:16:59Z

Regarding one, isn't it typically assumed that one(::Type{T}) has type T? Defining that method for Char would break that, and AFAIK it would be the only exception. (There may be others that I don't know about though.)

stevengj · 2017-07-12T01:20:33Z

@ararslan, no, that is not correct. e.g. one for a dimensionful quantity returns a different type. (If you want the same type, you call oneunit, which isn't defined here.)

One the other hand, it is true that "" isn't really a multiplicative identity for Char, since "" * 'x' == "x", not 'x'. That makes me think we shouldn't define one after all.

stevengj · 2017-07-12T01:22:45Z

We should definitely have a specialized ^, however. The default one isn't type-stable for Char and is grossly inefficient for this type anyway.

musm · 2017-07-12T17:53:10Z

@stevengj

The following specialized version of repeat, which ^ calls, seems to work fine.

function repeat(s::Char, r::Integer)
    r < 0 && throw(ArgumentError("can't repeat a char $r times"))
    out = _string_n(r)
    ccall(:memset, Ptr{Void}, (Ptr{UInt8}, Cint, Csize_t), out, s, r)
    return out
end

There isn't much speed improvement over repeat(s::String,r::Integer)

stevengj · 2017-07-12T18:03:28Z

(That only works for isascii(s). For non-ascii I would just call string(s)^r as a fallback.)

musm · 2017-07-12T18:55:31Z

Calling repeat(string(s), 3) would allocate and makes it about twice as slow.

stevengj · 2017-07-12T19:12:19Z

@musm, I understand that, but since char^integer and string^integer are almost exclusively used for ASCII chars (mainly for repeating spaces), I think it is fine to optimize mainly the ASCII case of char^integer and leave the non-ASCII case to a slower fallback for now.

StefanKarpinski · 2017-07-12T19:39:32Z

To implement an efficient character repeating operator, it's sufficient to figure out what 1-4 byte pattern the character produces in UTF-8 and then copy that as many times as the character needs to be repeated. Not entirely straightforward, but not crazy to implement either.

ararslan · 2017-07-12T19:43:04Z

Is there something relevant already implemented in utf8proc?

stevengj · 2017-07-12T20:11:16Z

Stefan, I know its possible, but I don't think it is worth the trouble

StefanKarpinski · 2017-07-12T20:39:56Z

Sure, can always be done as an optimization in the future some time.

[ci skip]

…iaLang#22766)

musm · 2017-09-20T04:00:17Z

@ScottPJones sent me the following version which does not allocate a while back. I don't think he has had the time to open a PR on his branch so I am posting this here in the hopes that someone opens a PR with the change

function repeat(c::Char, r::Integer)
    r < 0 && throw(ArgumentError("can't repeat a character $r times"))
    r == 0 && return ""
    ch = UInt(c)
    if ch < 0x80
        out = Base._string_n(r)
        ccall(:memset, Ptr{Void}, (Ptr{UInt8}, Cint, Csize_t), out, c, r)
    elseif ch < 0x800
        out = _string_n(2r)
        p16 = reinterpret(Ptr{UInt16}, pointer(out))
        u16 = ((ch >> 0x6) | (ch & 0x3f) << 0x8) % UInt16 | 0x80c0
        @inbounds for i = 1:r
            unsafe_store!(p16, u16, i)
        end
    elseif ch < 0x10000
        (0xd800 ≥ ch ≤ 0xdfff) || throw(ArgumentError("invalid character 0x$(hex(ch))"))
        out = _string_n(3r)
        p = pointer(out)
        b1 = (ch >> 0xc) % UInt8 | 0xe0
        b2 = ((ch >> 0x6) & 0x3f) % UInt8 | 0x80
        b3 = (ch & 0x3f) % UInt8 | 0x80
        @inbounds for i = 1:r
            unsafe_store!(p, b1)
            unsafe_store!(p, b2, 2)
            unsafe_store!(p, b3, 3)
            p += 3
        end
    elseif ch < 0x110000
        out = _string_n(4r)
        p32 = reinterpret(Ptr{UInt32}, pointer(out))
        u32 = ((ch >> 0x12) | ((ch >> 0x4) & 0x03f00) |
            ((ch << 0xa) & 0x3f0000) | ((ch & 0x3f) << 0x18)) % UInt32 | 0x808080f0
        @inbounds for i = 1:r
            unsafe_store!(p32, u32)
            p32 += 4
        end
    else
        throw(ArgumentError("invalid character 0x$(hex(ch))"))
    end
    return out
end

StefanKarpinski · 2017-09-20T05:13:36Z

Thanks, PR: #23787.

added char concat methods

c20972c

kshyatt added the strings "Strings!" label Jun 25, 2017

Luke Adams added 2 commits June 25, 2017 11:22

added char concat methods

c6dec74

added extra tests

added more tests

7ce7e1a

pabloferz reviewed Jun 25, 2017

View reviewed changes

condense methods

7313f5d

update docstring

c377c83

tkelman reviewed Jun 26, 2017

View reviewed changes

add ...

f9dd68c

KristofferC approved these changes Jun 26, 2017

View reviewed changes

fredrikekre reviewed Jun 26, 2017

View reviewed changes

Minor docstring improvement

7d557d2

ararslan approved these changes Jun 30, 2017

View reviewed changes

tkelman reviewed Jun 30, 2017

View reviewed changes

ararslan added 2 commits June 29, 2017 22:34

Remove line

df56a8f

Merge branch 'master' into char_concat

2fbec25

musm reviewed Jun 30, 2017

View reviewed changes

pabloferz reviewed Jul 11, 2017

View reviewed changes

Add back the ...

a1577a1

[ci skip]

ararslan merged commit f98b857 into JuliaLang:master Jul 11, 2017

ararslan added a commit that referenced this pull request Jul 11, 2017

Add a NEWS entry for #22532: Char concatenation using *

8ee74a9

[ci skip]

musm mentioned this pull request Jul 12, 2017

Add specialized repeat function for Char #22785

Closed

ararslan added a commit that referenced this pull request Jul 15, 2017

Add a NEWS entry for #22532: Char concatenation using *

ee21d82

[ci skip]

ararslan added a commit that referenced this pull request Jul 15, 2017

Add a NEWS entry for #22532: Char concatenation using *

26ca36c

[ci skip]

ararslan added a commit that referenced this pull request Jul 15, 2017

Add a NEWS entry for #22532: Char concatenation using * (#22766)

0b2774e

jeffwong pushed a commit to jeffwong/julia that referenced this pull request Jul 24, 2017

Add Char * String and Char * Char (JuliaLang#22532)

8fcb668

jeffwong pushed a commit to jeffwong/julia that referenced this pull request Jul 24, 2017

Add a NEWS entry for JuliaLang#22532: Char concatenation using * (Jul…

7ac2d81

…iaLang#22766)


		Concatenate strings. The `*` operator is an alias to this function.
		Concatenate strings and characters. The `*` operator is an alias to this function.

Add Char * String and Char * Char #22532

Add Char * String and Char * Char #22532

Conversation

adamslc commented Jun 25, 2017

KristofferC commented Jun 25, 2017

adamslc commented Jun 25, 2017

KristofferC commented Jun 25, 2017

pabloferz Jun 25, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andyferris commented Jun 25, 2017 • edited Loading

andyferris commented Jun 25, 2017

tkelman commented Jun 26, 2017

Choose a reason for hiding this comment

adamslc commented Jun 26, 2017

pabloferz commented Jun 26, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StefanKarpinski commented Jul 11, 2017

Choose a reason for hiding this comment

ararslan commented Jul 11, 2017

KristofferC commented Jul 11, 2017

ararslan commented Jul 11, 2017

ararslan commented Jul 11, 2017

stevengj commented Jul 12, 2017 • edited Loading

ararslan commented Jul 12, 2017

stevengj commented Jul 12, 2017 • edited Loading

stevengj commented Jul 12, 2017 • edited Loading

musm commented Jul 12, 2017 • edited Loading

stevengj commented Jul 12, 2017 • edited Loading

musm commented Jul 12, 2017

stevengj commented Jul 12, 2017 • edited Loading

StefanKarpinski commented Jul 12, 2017

ararslan commented Jul 12, 2017

stevengj commented Jul 12, 2017

StefanKarpinski commented Jul 12, 2017

musm commented Sep 20, 2017

StefanKarpinski commented Sep 20, 2017

pabloferz Jun 25, 2017 •

edited

Loading

andyferris commented Jun 25, 2017 •

edited

Loading

stevengj commented Jul 12, 2017 •

edited

Loading

stevengj commented Jul 12, 2017 •

edited

Loading

stevengj commented Jul 12, 2017 •

edited

Loading

musm commented Jul 12, 2017 •

edited

Loading

stevengj commented Jul 12, 2017 •

edited

Loading

stevengj commented Jul 12, 2017 •

edited

Loading