Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

one for AbstractString #19536

Closed
bdeonovic opened this issue Dec 8, 2016 · 16 comments
Closed

one for AbstractString #19536

bdeonovic opened this issue Dec 8, 2016 · 16 comments
Labels
help wanted Indicates that a maintainer wants help on an issue or pull request strings "Strings!"

Comments

@bdeonovic
Copy link
Contributor

I noticed you can do something like

julia> prod(["*" for k in 1:3])
"***"

but you might get an error if you try

julia> prod(["*" for k in 1:0])
ERROR: MethodError: no method matching one(::Type{String})

Why was multiplication decided to be the operator on strings which appends them together? Wouldn't addition make more sense?

What do you guys think of introducing

one(String) = ""
zero(String) = ""

since "zero" is often defined as a + 0 = a, and "one" is defined as a * 1 = 1. This zero then makes sense for the way multiplication is defined for Strings

@stevengj stevengj added the strings "Strings!" label Dec 8, 2016
@stevengj
Copy link
Member

stevengj commented Dec 8, 2016

Multiplication is the string concatenation operator because concatenation is not commutative, whereas + usually denotes a commutative operator in mathematics. This has been discussed many times, e.g. in #11030.

Given that, defining one{T<:AbstractString}(::T) = T("") makes sense to me, since this is the identity element for string concatenation, and one is defined as the multiplicative identity.

Defining zero (the additive identity), however, makes no sense here since we don't define + for strings.

@oscardssmith
Copy link
Member

Isn't multiplication usually commutative also?

@KristofferC
Copy link
Sponsor Member

Often not.

@bdeonovic
Copy link
Contributor Author

whereas + usually denotes a commutative operator in mathematics.

Addition doesn't have to be commutative either. Multiplication can be commutative. This just seems confusing to me.

After reading through some of #11030 I can see arguments for not having any kind of operator that does string concatenation, but I don't see why * is chosen over +.

But if one of these is included in the language I think there should be an appropriate zero and one.

@KristofferC
Copy link
Sponsor Member

KristofferC commented Dec 8, 2016

The discussion about using * or + has already been had. For future discussion to be productive, let's focus on the part:

What do you guys think of introducing

one(String) = ""
zero(String) = ""

Thanks.

@oxinabox
Copy link
Contributor

oxinabox commented Dec 9, 2016

our Strings should be the Free Monoid over all characters.
As a Free Monoid, they should have defined their identity element -- the empty string.
And since we define our monoid operation using the * symbol, we should term that identity the one.
Thus one(String) = "" is a good idea, and also one(::Type{String})

Should it further one{S<:AbstractString}(::Type{S}) = convert(S, "") be defined? (as @stevengj suggests)
Like for Numbers
I'm less sure on that.

Maybe it is more natural to have one(r::RepString) = RepString(r.string, 0)
and for Substring to be a zero length substring of that String.
I don't see why this would matter though.
So probably defining it for AbstractString would be fine


zero(String) on the other hand should be left undefined.
If we went and defined something using the + symbol,
then defining zero to be that identity would be expected.
As we are not defining such a operation, it is left free for some user (/package) to do so.
As such when they do so, they should also be defining zero(::String) (not Base)

@StefanKarpinski
Copy link
Sponsor Member

Let's go ahead and do this since we're still using * for string concatenation. If we move away from that, we'll naturally also move away from this at the same time – this will hardly be the most painful part of that transition.

@StefanKarpinski StefanKarpinski added the help wanted Indicates that a maintainer wants help on an issue or pull request label Dec 9, 2016
@StefanKarpinski StefanKarpinski changed the title zero and one for String one` for AbstractString Dec 9, 2016
@StefanKarpinski StefanKarpinski changed the title one` for AbstractString one for AbstractString Dec 9, 2016
@bdeonovic
Copy link
Contributor Author

What is the appropriate place in Base for such a definition?

@tkelman
Copy link
Contributor

tkelman commented Dec 9, 2016

This seems like a pun and more often than not sending one a string in generic code will be a bug rather than appropriate to return an empty string as the desired behavior. Particularly because Strings occupy a strange place where they behave sometimes like collections and sometimes like scalars. one strikes me as a mostly numeric concept. If we were ever to deprecate * for string concatenation as has been floated, this would need to be deprecated alongside it.

@StefanKarpinski
Copy link
Sponsor Member

If we were ever to deprecate * for string concatenation as has been floated, this would need to be deprecated alongside it.

Right, that's my point. This makes sense as long as we're using * and if we deprecate that, then deprecating one for String will hardly be a big deal.

bdeonovic added a commit to bdeonovic/julia that referenced this issue Dec 9, 2016
@nalimilan
Copy link
Member

I agree this makes it even clearer that we should move away from * for string concatenation. But for now better be consistent and implement one.

@andreasnoack
Copy link
Member

Fixed by #19548

@alanedelman
Copy link
Contributor

I actually had a use case just today for "one" being ""
I wanted to * reduce an array of strings, and was disappointed it would not work.

@bdeonovic
Copy link
Contributor Author

The PR has been merged into master so if you check out master branch it should work out for you. Otherwise you'll have to wait for next release.

@KristofferC
Copy link
Sponsor Member

Writing them one by one to an IOBuffer should be much more efficient I think (although perhaps less convenient).

@stevengj
Copy link
Member

You can just call join to efficiently concatenate an array of strings.

julia> a = [randstring(10) for i = 1:1000];

julia> using BenchmarkTools

julia> @benchmark join($a)
BenchmarkTools.Trial: 
  memory estimate:  13.13 kb
  allocs estimate:  16
  --------------
  minimum time:     54.150 μs (0.00% GC)
  median time:      56.809 μs (0.00% GC)
  mean time:        57.813 μs (0.28% GC)
  maximum time:     1.708 ms (94.77% GC)
  --------------
  samples:          10000
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

julia> @benchmark prod($a)
BenchmarkTools.Trial: 
  memory estimate:  4.87 mb
  allocs estimate:  999
  --------------
  minimum time:     620.688 μs (0.00% GC)
  median time:      1.592 ms (0.00% GC)
  mean time:        1.855 ms (29.36% GC)
  maximum time:     6.105 ms (82.00% GC)
  --------------
  samples:          2684
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Indicates that a maintainer wants help on an issue or pull request strings "Strings!"
Projects
None yet
Development

No branches or pull requests

10 participants