Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api: add Cow guarantee to replace API #1178

Merged
merged 1 commit into from
Mar 23, 2024
Merged

api: add Cow guarantee to replace API #1178

merged 1 commit into from
Mar 23, 2024

Conversation

BurntSushi
Copy link
Member

This adds a guarantee to the API of the replace, replace_all and
replacen routines that, when Cow::Borrowed is returned, it is
guaranteed that it is equivalent to the haystack given.

The implementation has always matched this behavior, but this elevates
the implementation behavior to an API guarantee.

There do exists implementations where this guarantee might not be upheld
in every case. For example, if the final result were the empty string,
we could return a Cow::Borrowed. Similarly, if the final result were a
substring of haystack, then Cow::Borrowed could be returned in that
case too. In practice, these sorts of optimizations are tricky to do in
practice, and seem like niche corner cases that aren't important to
optimize.

Nevertheless, having this guarantee is useful because it can be used as
a signal that the original input remains unchanged. This came up in
discussions with @quicknir on Discord. Namely, in cases where one is
doing a sequence of replacements and in most cases nothing is replaced,
using a Cow is nice to be able to avoid copying the haystack over and
over again. But to get this to work right, you have to know whether a
Cow::Borrowed matches the input or not. If it doesn't, then you'd need
to transform it into an owned string. For example, this code tries to do
replacements on each of a sequence of Cow<str> values, where the
common case is no replacement:

use std::borrow::Cow;

use regex::Regex;

fn trim_strs(strs: &mut Vec<Cow<str>>) {
    strs
    .iter_mut()
    .for_each(|s| moo(s, &regex_replace));
}

fn moo<F: FnOnce(&str) -> Cow<str>>(c: &mut Cow<str>, f: F) {
    let result = f(&c);
    match result {
        Cow::Owned(s) => *c = Cow::Owned(s),
        Cow::Borrowed(s) => {
            *c = Cow::Borrowed(s);
        }
    }
}

fn regex_replace(s: &str) -> Cow<str> {
    Regex::new(r"does-not-matter").unwrap().replace_all(s, "whatever")
}

But this doesn't pass borrowck. Instead, you could write moo like
this:

fn moo<F: FnOnce(&str) -> Cow<str>>(c: &mut Cow<str>, f: F) {
    let result = f(&c);
    match result {
        Cow::Owned(s) => *c = Cow::Owned(s),
        Cow::Borrowed(s) => {
            if !std::ptr::eq(s, &**c) {
                *c = Cow::Owned(s.to_owned())
            }
        }
    }
}

But the std::ptr:eq call here is a bit strange. Instead, after this PR
and the new guarantee, one can write it like this:

fn moo<F: FnOnce(&str) -> Cow<str>>(c: &mut Cow<str>, f: F) {
    if let Cow::Owned(s) = f(&c) {
        *c = Cow::Owned(s);
    }
}

This adds a guarantee to the API of the `replace`, `replace_all` and
`replacen` routines that, when `Cow::Borrowed` is returned, it is
guaranteed that it is equivalent to the `haystack` given.

The implementation has always matched this behavior, but this elevates
the implementation behavior to an API guarantee.

There do exists implementations where this guarantee might not be upheld
in every case. For example, if the final result were the empty string,
we could return a `Cow::Borrowed`. Similarly, if the final result were a
substring of `haystack`, then `Cow::Borrowed` could be returned in that
case too. In practice, these sorts of optimizations are tricky to do in
practice, and seem like niche corner cases that aren't important to
optimize.

Nevertheless, having this guarantee is useful because it can be used as
a signal that the original input remains unchanged. This came up in
discussions with @quicknir on Discord. Namely, in cases where one is
doing a sequence of replacements and in most cases nothing is replaced,
using a `Cow` is nice to be able to avoid copying the haystack over and
over again. But to get this to work right, you have to know whether a
`Cow::Borrowed` matches the input or not. If it doesn't, then you'd need
to transform it into an owned string. For example, this code tries to do
replacements on each of a sequence of `Cow<str>` values, where the
common case is no replacement:

```rust
use std::borrow::Cow;

use regex::Regex;

fn trim_strs(strs: &mut Vec<Cow<str>>) {
    strs
    .iter_mut()
    .for_each(|s| moo(s, &regex_replace));
}

fn moo<F: FnOnce(&str) -> Cow<str>>(c: &mut Cow<str>, f: F) {
    let result = f(&c);
    match result {
        Cow::Owned(s) => *c = Cow::Owned(s),
        Cow::Borrowed(s) => {
            *c = Cow::Borrowed(s);
        }
    }
}

fn regex_replace(s: &str) -> Cow<str> {
    Regex::new(r"does-not-matter").unwrap().replace_all(s, "whatever")
}
```

But this doesn't pass `borrowck`. Instead, you could write `moo` like
this:

```rust
fn moo<F: FnOnce(&str) -> Cow<str>>(c: &mut Cow<str>, f: F) {
    let result = f(&c);
    match result {
        Cow::Owned(s) => *c = Cow::Owned(s),
        Cow::Borrowed(s) => {
            if !std::ptr::eq(s, &**c) {
                *c = Cow::Owned(s.to_owned())
            }
        }
    }
}
```

But the `std::ptr:eq` call here is a bit strange. Instead, after this PR
and the new guarantee, one can write it like this:

```rust
fn moo<F: FnOnce(&str) -> Cow<str>>(c: &mut Cow<str>, f: F) {
    if let Cow::Owned(s) = f(&c) {
        *c = Cow::Owned(s);
    }
}
```
@quicknir
Copy link

Looks great! Both the change and the write up. Thanks very much, very appreciated!

@BurntSushi BurntSushi merged commit 088d7f3 into master Mar 23, 2024
16 checks passed
@BurntSushi BurntSushi deleted the ag/cow-guarantee branch March 23, 2024 01:25
@BurntSushi
Copy link
Member Author

This PR is on crates.io in regex 1.10.4. Thanks for the review!

@LunarLambda
Copy link

I'm the person that initially ran into this case and talked to quicknir about it. Thank you both very much for your help and efforts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants