Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Data.Text.replace total #425

Open
sol opened this issue Apr 18, 2022 · 6 comments
Open

Make Data.Text.replace total #425

sol opened this issue Apr 18, 2022 · 6 comments
Labels
feature request question Requires more investigation

Comments

@sol
Copy link
Member

sol commented Apr 18, 2022

Data.Text.replace errors if the needle is the empty string. Every other implementation I tried matches the empty string against every position in the string. I suggest that for consistency with other languages and to avoiding partiality we adopt that behavior.

Expected behavior:

ghci> replace "" "x" "foo"
"xfxoxox"

Actual behavior:

ghci> replace "" "x" "foo"
"*** Exception: Data.Text.replace: empty input
CallStack (from HasCallStack):
  error, called at libraries/text/src/Data/Text.hs:1862:18 in text-1.2.5.0:Data.Text
Rust:
fn main() {
    println!("{}", "foo".replace("", "x")); // prints "xfxoxox"
}
$ rustc main.rs && ./main
xfxoxox
Python:
>>> 'foo'.replace('', 'x')
'xfxoxox'
JavaScript:
> 'foo'.replaceAll('', 'x')
'xfxoxox'
@Bodigrim
Copy link
Contributor

I'm sympathetic to the idea, but technically this is a breaking change.

@Bodigrim
Copy link
Contributor

@sol could you please gather community feedback on this change?

@sol
Copy link
Member Author

sol commented Apr 24, 2022

@Bodigrim I would love to help, however, even if I try hard, I'll not be able to free up time to mediate this issue.

If somebody else wants to pick it up, this might be useful:

The reason why I think this change is a net positive is that relying on the semantics of pure exceptions is inherently risky:

https://gitlab.haskell.org/ghc/ghc/-/issues/1171
https://gitlab.haskell.org/ghc/ghc/-/issues/2273
https://gitlab.haskell.org/ghc/ghc/-/issues/5561
https://gitlab.haskell.org/ghc/ghc/-/issues/5129

Read: If your program relies on the type of pure exceptions, it may produce different results depending on GHC version and optimization level. For that reason some people will advice that you should only ever use a partial function if you make sure that your input will not trigger an exception.

Observations:

  1. Programs that follow that advice will not be affected by the proposed change.
  2. Programs that did not follow that advice may have their semantics changed, from possibly non-deterministic behavior to behaving deterministic.

@Lysxia Lysxia added feature request question Requires more investigation labels May 15, 2022
@Lysxia
Copy link
Contributor

Lysxia commented May 20, 2022

This is a very benign breaking change. People push back against changes that cause unnecessary churn, but nobody depends on replace "" _ _ being undefined. I will try to collect a few more such changes so we can ask/notify about them all in one post (there is a similar "nobody should care, but do you" conundrum in improving foldl).


I've confirmed that every occurrence of Text.replace on Hackage (~50 packages) is ostensibly applied or meant to be applied to a non-empty string. Almost all of them are a literal or some obvious construction like Text.singleton _ or "foo" <> _ <> "bar", so they're quick to go through. The few other uses are a little less obvious, but it is still clear that the empty string is not expected; listed below for exhaustiveness:

  1. nixpkgs-update: https://github.com/ryantm/nixpkgs-update/blob/934b497d8e32a42238a9525126614ad74142da78/src/File.hs#L48
  2. yesod-form: https://github.com/yesodweb/yesod/blob/99c1fd49a33cc3a87d07e898ab7b782f9f3bf679/yesod-form/Yesod/Form/Functions.hs#L686
  3. sdp4text defines a class method replaceBy = replace, which I couldn't find any use of on Hackage https://github.com/andreymulik/sdp4text/blob/12990eaa7b39ff1cad7bd52be26986fa0bd0a30d/src/SDP/Text.hs#L172
  4. rosmsg, a 6-year-old package, https://github.com/RoboticsHS/rosmsg/blob/9dd9e32fcf544ef75b1d2823f27e31076df396c9/src/Robotics/ROS/Msg/MD5.hs#L33

@sergv
Copy link
Contributor

sergv commented May 25, 2022

I've noticed that breakOn and breakOnEnd also throw exceptions when their pattern is an empty string. Surprisingly this is not mentioned in the documentation and can only be inferred from the presence of HasCallStack constraint.

I think it might be reasonable to make breakOn and breakOnEnd total. After all, breaking on empty string once can be treated as not doing anything - there's an empty string at the start and end of any string. So something like breakOn "" str = ("", str) and breakOnEnd "" str = (str, "") would make those functions total and probably shouldn't be too surprising.

There's also breakOnAll which is also partial, but at least this one mentions that the pattern to break on may not be empty in the docs. However I think it could be usefully defined on empty pattern as well - just explode original text into individual characters, breakOnAll "" str = map singleton (unpack str). At first glance this even seems to be consistent to what's written in the docs of breakOnAll, which is Find all non-overlapping instances of needle in haystack - all non-overlapping occurrences of an empty string in 'gaps' between characters got broken on.

@tbidne
Copy link
Contributor

tbidne commented Jun 21, 2022

For comparison, splitOn (also partial) has this to say:

String to split on. If this string is empty, an error will occur.

An empty delimiter is invalid, and will cause an error to be raised.

So I would suggest either making all of these total (my preference) or at least copying splitOn's documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request question Requires more investigation
Projects
None yet
Development

No branches or pull requests

5 participants