Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a TrimmedString.trim method #487

Closed
wants to merge 2 commits into from

Conversation

ceedubs
Copy link
Contributor

@ceedubs ceedubs commented Apr 26, 2018

This is somewhat similar to FiniteStringOps.truncate in that it's
guaranteed to turn any String into a TrimmedString.

There are some funny characters that get removed by String.trim but
don't match on \s in regular expressions. I've added an example to
demonstrate that these characters can lead to a String being "trimmed"
a bit more than would technically be necessary to match the
TrimmedString regex.

This is somewhat similar to `FiniteStringOps.truncate` in that it's
guaranteed to turn any `String` into a `TrimmedString`.

There are some funny characters that get removed by `String.trim` but
don't match on `\s` in regular expressions. I've added an example to
demonstrate that these characters can lead to a `String` being "trimmed"
a bit more than would technically be necessary to match the
`TrimmedString` regex.
@ceedubs
Copy link
Contributor Author

ceedubs commented Apr 26, 2018

Oops I have a broken test :(

@ceedubs
Copy link
Contributor Author

ceedubs commented Apr 26, 2018

Strings are hard. I'm not sure what to do here.

@fthomas
Copy link
Owner

fthomas commented Apr 26, 2018

So the culprit is \u0085. The regex engine and String#trim disagree about if it is whitespace or not. It is for the regex engine but not for String#trim. Hmm... let me think about that.

@fthomas
Copy link
Owner

fthomas commented Apr 26, 2018

So String#trim removes all leading and trailing characters with a code less than or equal to \u0020. I think we can replace \s in the regex of TrimmedString to [\x{0000}-\x{0020}] to mimic the semantics of String#trim. I'm in favor of doing this although it changes the semantics of the refined type. But breaking the expectation that every s.trim is a valid TrimmedString is much worse IMO.

This should now match up with the strings that `String.trim` produces.
I had to change `.*` to `(?s:.*)` because the line separator character
isn't removed by `trim` yet doesn't match `.*` without the `s` flag, so
a string like `"\u2028"` can be the result of `trim` but didn't
previously match the regular expression.

Scalastyle and scalafmt had some competing opinions on how I should
format this long regex Witness expression.
@ceedubs
Copy link
Contributor Author

ceedubs commented Apr 26, 2018

@fthomas thanks for the guidance!

I pushed 9a9a877 which hopefully addresses the issue. I used your advice but also had to make an additional slight change. See the commit message for more details.

@ceedubs
Copy link
Contributor Author

ceedubs commented Apr 26, 2018

Okay I've upgraded to a NullPointerException as my build failure. Hmmm.

@ceedubs
Copy link
Contributor Author

ceedubs commented Apr 26, 2018

I'm not sure what the build failure is about. Any ideas? Also locally I have a change to add a scalacheck generator for trimmed strings -- should I add that here or would you prefer that as a separate PR afterwards?

@fthomas
Copy link
Owner

fthomas commented Apr 26, 2018

The test passes on the JVM but fails with JS. Oh no. :-( This is not the first time that we have difficulties writing regexes that should work on the JVM and with JS (see #453 (comment) and #357 (comment)). I've no idea if the current regex can be changed to work on both platforms. I'm at a point where I would suggest adding a Trimmed predicate that is implemented with String#trim and redefine TrimmedString as String Refined Trimmed. I bet this would be easier than changing the regex to work on both platforms and it would probably be more robust because it doesn't depend on the peculiarities of the underlying regex engine.

I'm sorry this is not a straightforward change as it should have been.

@fthomas
Copy link
Owner

fthomas commented Apr 26, 2018

Instead of adding Trimmed now you could also make the failing test JVM-only, and we add that predicate later. I could imagine that the original definition of TrimmedString already does not work with JS which is possible because there are currently no tests. So merging this PR now would not make anything worse.

WRT the ScalaCheck instances I'm fine with both.

@ceedubs
Copy link
Contributor Author

ceedubs commented Apr 27, 2018

Thanks again for the guidance @fthomas! I'm inclined to agree with your suggestion of adding a Trimmed predicate. I'll plan on making that change. I'll go ahead and close this PR out and start a fresh new one.

@ceedubs ceedubs closed this Apr 27, 2018
ceedubs added a commit to ceedubs/refined that referenced this pull request May 10, 2018
This also adds `TrimmedString.trim` and is a replacement for fthomas#487 based
on the discussion there.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants