Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why String.split uses the delim input as set of chars instead of Regex pattern or string delim? #1399

Closed
enigma opened this issue Nov 6, 2016 · 14 comments
Labels
help wanted Extra attention is needed

Comments

@enigma
Copy link
Contributor

enigma commented Nov 6, 2016

Eg:

use "collections"
use "regex"

actor Main
  new create(env: Env) =>
    let base = "opinion"
    let delim = "pi"
    for i in base.split(delim).values() do
      env.out.write("actual[")
      env.out.write(i)
      env.out.print("]")
    end
    env.out.print("============")
    for i in alternative_split(base, delim).values() do
      env.out.write("expected[")
      env.out.write(i)
      env.out.print("]")
    end

  fun alternative_split(string: String, delim: String): Array[String] =>
    try
      return Regex(delim).split(string)
    end
    Array[String]

prints:

actual[o]
actual[]
actual[n]
actual[on]
============
expected[o]
expected[nion]

alternative_split is still not that great now as one has to escape regex special symbols. Also using Regex involves a lot of partial methods.
Given String.split takes a string I'd expect a behaviour more similar to what Python does:

>>> 'this is (not) true'.split(' (not) ')
['this is', 'true']

My understanding is that this doesn't matter for the default argument behaviour, ie split by whitespacy symbols.

Could the signature of split be:

fun split(delim: (String | None), n: USize = 0): Array[String] iso^ =>
   // if delim is None keep current behavior, splitting on any of " \t\v\f\r\n"

I think that could keep the default behaviour and make the specified one a bit less surprising (even though it's documented).
Also because of the influence Python had on me I think a common use case that might deserve to be the default is more similar to Regex("[\s\t\v\f\r\n]+").split. What do you think?

@jemc
Copy link
Member

jemc commented Nov 8, 2016

I haven't looked at this in detail yet, but it's possible we might want to consider this as a "principle of least surprise bug". I'm adding the "needs discussion during sync" label so we can discuss it at the next public weekly meeting.

@SeanTAllen
Copy link
Member

When this came up on IRC, I was very surprised. I consider this a bug.

@SeanTAllen
Copy link
Member

Discussed on sync. This is a "principle of least surprise bug".

@SeanTAllen
Copy link
Member

@enigma do you want to take this on? If yes, I can add you as a read only contributor so you can be assigned to the issue.

@enigma
Copy link
Contributor Author

enigma commented Nov 10, 2016

@SeanTAllen yes please!

@SeanTAllen
Copy link
Member

O wow. we can assign people who aren't team members now @jemc ?

@jemc
Copy link
Member

jemc commented Nov 10, 2016

@SeanTAllen - I assumed you had already added him.

But now that you mention it, I think maybe you can assign someone who is already participating in the ticket (or maybe the requirement is that they filed the ticket?).

@SeanTAllen
Copy link
Member

@jemc that's a pretty cool change. i don't think it used to be that way.

@SeanTAllen
Copy link
Member

@enigma shall i mark this as in progress?

@enigma
Copy link
Contributor Author

enigma commented Nov 13, 2016

@SeanTAllen Yes please. I don't think I can change any label.

@SeanTAllen
Copy link
Member

Hi @enigma,

Any progress?

Need any assistance?

@SeanTAllen
Copy link
Member

I'm unassigned @enigma and setting this back to ready for work.

@andersarpi
Copy link

Isn't the functionality requested here exactly what split_by does?

@SeanTAllen SeanTAllen added the help wanted Extra attention is needed label Nov 5, 2017
@enigma
Copy link
Contributor Author

enigma commented Nov 25, 2017

I didn't notice this was still open, my bad!

This should have been closed by f566596 - unless we want split_by to replace split for good.

@jemc jemc closed this as completed Nov 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants