feat(text/unstable): add `replaceStart`, `replaceEnd`, `replaceBoth` functions (#6265) #6286

lionel-rowe · 2024-12-20T12:56:28Z

Resolves #6265.

Roughly speaking, these replicate the combined functionality of trim_matches, trim_start_matches, trim_end_matches, strip_prefix, and strip_suffix (plus a hypothetical strip_prefix_and_suffix) from the Rust standard library.

codecov · 2024-12-20T17:24:03Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.34%. Comparing base (4989ba7) to head (ad6098a).
Report is 52 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6286      +/-   ##
==========================================
- Coverage   96.52%   96.34%   -0.19%     
==========================================
  Files         533      548      +15     
  Lines       40871    41708     +837     
  Branches     6120     6316     +196     
==========================================
+ Hits        39451    40182     +731     
- Misses       1378     1483     +105     
- Partials       42       43       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

babiabeo · 2024-12-21T05:15:03Z

text/unstable_strip.ts

+  const count = options?.count ?? Infinity;
+
+  for (let i = 0; i < count; ++i) {
+    str = str.replace(regExp, "");
+    if (str === prev) break;
+    prev = str;
+  }


Why Infinity? If the string doesn't include the pattern to strip, it will loop forever, won't it?

Why Infinity? If the string doesn't include the pattern to strip, it will loop forever, won't it?

It'll break on the str === prev condition as soon as there's nothing left to strip.

const aaa = 'a'.repeat(1000) const aab = aaa + 'b' const bbb = 'b'.repeat(1000) stripStart(aaa, 'a') === '' stripStart(aab, 'a') === 'b' stripStart(bbb, 'a') === bbb

timreichen · 2024-12-21T15:45:46Z

I think the count option should not be part of this api. Instead, the user can always call the function repeatedly or in a loop themselves. In that sense, I think stripStart() and stripEnd() should be the only exported functions, and maybe a stripAll() to reflect replaceAll().

lionel-rowe · 2024-12-24T04:29:21Z

I think the count option should not be part of this api. Instead, the user can always call the function repeatedly or in a loop themselves. In that sense, I think stripStart() and stripEnd() should be the only exported functions, and maybe a stripAll() to reflect replaceAll().

@timreichen Probably worth getting some more use cases, but if I grep my code for functions named *strip*, mostly they simply strip out all occurrences (usually based on replaceAll or similar, no need for a library function), and in a significant minority of cases they remove all prefixes and suffixes, like this:

// strip(str, /[^\p{L}\p{M}]+/gu, { count: Infinity })
function stripNonLetter(str: string) {
    return str.replaceAll(/^[^\p{L}\p{M}]+|[^\p{L}\p{M}]+$/gu, '')
}

// strip(str, /[_\- ]+/g, { count: Infinity })
const strip = (str: string) => str.replaceAll(/^[_\- ]+|[_\- ]+$/g, '')

There are also a few cases where they strip a single instance at the start or end and then stop there, and it sounds like @canac 's use case fits here:

// stripStart(str, '\ufeff', { count: 1 })
function stripBom(str: string) {
   return str.replace(/^\ufeff/, '')
}

I guess the "strip up to n occurrences" case is rare when 1 < n < Infinity, but very common (possibly the most common) when n == Infinity, and also reasonably common where n == 1. So maybe a better API is options?: { all?: boolean } or options?: { single?: boolean }, depending on what the chosen default is (no strong opinion either way on the default).

I'm not keen on the idea of doubling the API surface area from strip/stripStart/stripEnd to strip/stripStart/stripEnd/stripAll/stripAllStart/stripAllEnd, even though that's essentially what Rust std library does. Seems like overkill in JS/TS.

Another way of supporting single/all, but without a specific option, would be reading the g flag if the supplied pattern is a regex. I'm also not keen on this option as it doesn't support string patterns, and it's also not really the correct semantics for the g flag. Currently I just ignore g and y flags, but maybe throwing would be better if g doesn't match all-ness (and always throwing if y supplied).

timreichen · 2024-12-24T11:17:36Z

@timreichen Probably worth getting some more use cases, but if I grep my code for functions named *strip*, mostly they simply strip out all occurrences (usually based on replaceAll or similar, no need for a library function), and in a significant minority of cases they remove all prefixes and suffixes, like this:
// strip(str, /[^\p{L}\p{M}]+/gu, { count: Infinity })
function stripNonLetter(str: string) {
    return str.replaceAll(/^[^\p{L}\p{M}]+|[^\p{L}\p{M}]+$/gu, '')
}

// strip(str, /[_\- ]+/g, { count: Infinity })
const strip = (str: string) => str.replaceAll(/^[_\- ]+|[_\- ]+$/g, '')
There are also a few cases where they strip a single instance at the start or end and then stop there, and it sounds like @canac 's use case fits here:
// stripStart(str, '\ufeff', { count: 1 })
function stripBom(str: string) {
   return str.replace(/^\ufeff/, '')
}
I guess the "strip up to n occurrences" case is rare when 1 < n < Infinity, but very common (possibly the most common) when n == Infinity, and also reasonably common where n == 1. So maybe a better API is options?: { all?: boolean } or options?: { single?: boolean }, depending on what the chosen default is (no strong opinion either way on the default).

I agree, strip as a name is confusing as it doesn't indicate single or all occurrences.

I'm not keen on the idea of doubling the API surface area from strip/stripStart/stripEnd to strip/stripStart/stripEnd/stripAll/stripAllStart/stripAllEnd, even though that's essentially what Rust std library does. Seems like overkill in JS/TS.

I wouldn't like that either. I think the problem here is that the functions are explicit in name but might work differently depending on the passed regexp.

Another way of supporting single/all, but without a specific option, would be reading the g flag if the supplied pattern is a regex. I'm also not keen on this option as it doesn't support string patterns, and it's also not really the correct semantics for the g flag. Currently I just ignore g and y flags, but maybe throwing would be better if g doesn't match all-ness (and always throwing if y supplied).

Come to think of it, this api is tries to do too much combining all these rust functions into one imo. There is String.prototype.replace, String.prototype.replaceAll, which already cover a lot of use cases.

This api should be reduced to replaceStart() and replaceEnd(), which makes them actually more flexible, not only stripping but allow for any replacement and work more in a js way:

strip_prefix => replaceStart("foo:bar", "foo:", "") // "bar"
strip_suffix => replaceEnd("bar:foo", ":foo", "") // "bar"
trim_start_matches => replaceStart("11foo1bar", /1+/, "") // "foo1bar"
trim_end_matches => replaceEnd( "foo1bar11", /1+/, "") // "foo1bar"
trim_matches => "11foo1bar11".replace(/^1+|1+$/, "") // "foo1bar" or call both replaceStart() and replaceEnd()

N replacements can be handled by the regexp, using a prepared string or calling the function repeatedly:
replaceEnd("barfoofoofoo", /(foo){2}/, "") // "barfoo" or
replaceEnd("barfoofoofoo", "foo".repeat(2), "") // "barfoo" or
for (let i = 0; i < 2; i++) replaceEnd("barfoofoofoo", "foo", "") // "barfoo"

lionel-rowe · 2024-12-24T12:33:33Z

N replacements can be handled by the regexp, using a prepared string or calling the function repeatedly:
replaceEnd("barfoofoofoo", /(foo){2}/, "") // "barfoo" or
replaceEnd("barfoofoofoo", "foo".repeat(2), "") // "barfoo" or
for (let i = 0; i < 2; i++) replaceEnd("barfoofoofoo", "foo", "") // "barfoo"

@timreichen None of these are really satisfactory for the case where you want to dynamically strip arbitrarily many occurrences of an arbitrary string, but none of the use cases I've found involve doing that anyway — use cases with unlimited occurrences typically seem to be more suited to regex, where you can trivially use a non-capturing group along with * or +.

As for replace vs strip:

replaceEnd(url.pathname, /\/*/, '/') // pathname always ends with slash
stripEnd(url.pathname, /\/*/) + '/' // same result

But it could be useful for cases like this:

// only prepend 'https:' when replacing existing 'http:' protocol
replaceStart(url.href, 'http:', 'https:')

// doesn't work with `strip` version as the protocol might be something else
// e.g. you might end up with `https:file:///...`
'https:' + stripStart(url.href, 'http:')

Using the name replace* along with allowing regexes weakly implies that the string-replacement syntax will be supported, ~~which would add a moderate amount of implementation complexity — something like this~~. EDIT: actually I guess this doesn't require special handling as the regexes are cloned internally, with capture groups etc. staying the same.

'abcxyzabcxyz'.replace(/[abc]+/, '<$&>') // '<abc>xyz<abc>xyz'

// naive result would be `<$&>xyzabcxyz`
// but seems reasonable to expect result to be `<abc>xyzabcxyz`
replaceStart('abcxyzabcxyz', /[abc]+/, '<$&>')

Conversely, supporting callback replacements would be trivial, as you can just conditionalize the replacement based on whether the index lies at the start/end.

replaceStart('abcxyzabcxyz', /[abc]+/, (m) => `<${m}>`)

Also, the name replace for replace-both-ends would be confusing due to String#replace. Maybe replaceEnds or something. Requiring calling 2 functions just to strip/replace both ends is suboptimal as it requires either declaring an extra variable or repeating code.

timreichen · 2024-12-26T16:44:56Z

How about just replaceStart() and replaceEnd() just supporting strings without regexp support?
I feel if one wants to do a complex strip/replacement with a regexp, prepending ^ or appending $ and using the built in replace() function is more suited.
I also don't think we should provide a replaceBoth() function. The name is kinda weird and it is just a small convenience function which can be done by the user with one line.
If there are requests for more functionality by multiple users, we still can add to that.

feat(text/unstable): add strip function (denoland#6265)

f4cc4f2

lionel-rowe requested a review from kt3k as a code owner December 20, 2024 12:56

github-actions bot added the text label Dec 20, 2024

Separate stripStart and stripEnd functions

de060b4

lionel-rowe changed the title ~~feat(text/unstable): add strip function (#6265)~~ feat(text/unstable): add strip, stripStart, stripEnd functions (#6265) Dec 20, 2024

babiabeo reviewed Dec 21, 2024

View reviewed changes

Fix bug with supplying non-atomic regexes

8e48f0b

strip -> replace

ad6098a

lionel-rowe changed the title ~~feat(text/unstable): add strip, stripStart, stripEnd functions (#6265)~~ feat(text/unstable): add replace, replaceStart, replaceEnd functions (#6265) Dec 26, 2024

lionel-rowe changed the title ~~feat(text/unstable): add replace, replaceStart, replaceEnd functions (#6265)~~ feat(text/unstable): add replaceStart, replaceEnd, replaceBoth functions (#6265) Dec 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(text/unstable): add `replaceStart`, `replaceEnd`, `replaceBoth` functions (#6265) #6286

feat(text/unstable): add `replaceStart`, `replaceEnd`, `replaceBoth` functions (#6265) #6286

lionel-rowe commented Dec 20, 2024 •

edited

Loading

codecov bot commented Dec 20, 2024 •

edited

Loading

babiabeo Dec 21, 2024

lionel-rowe Dec 21, 2024

timreichen commented Dec 21, 2024

lionel-rowe commented Dec 24, 2024 •

edited

Loading

timreichen commented Dec 24, 2024 •

edited

Loading

lionel-rowe commented Dec 24, 2024 •

edited

Loading

timreichen commented Dec 26, 2024

feat(text/unstable): add replaceStart, replaceEnd, replaceBoth functions (#6265) #6286

Are you sure you want to change the base?

feat(text/unstable): add replaceStart, replaceEnd, replaceBoth functions (#6265) #6286

Conversation

lionel-rowe commented Dec 20, 2024 • edited Loading

codecov bot commented Dec 20, 2024 • edited Loading

Codecov Report

babiabeo Dec 21, 2024

Choose a reason for hiding this comment

lionel-rowe Dec 21, 2024

Choose a reason for hiding this comment

timreichen commented Dec 21, 2024

lionel-rowe commented Dec 24, 2024 • edited Loading

timreichen commented Dec 24, 2024 • edited Loading

lionel-rowe commented Dec 24, 2024 • edited Loading

timreichen commented Dec 26, 2024

feat(text/unstable): add `replaceStart`, `replaceEnd`, `replaceBoth` functions (#6265) #6286

feat(text/unstable): add `replaceStart`, `replaceEnd`, `replaceBoth` functions (#6265) #6286

lionel-rowe commented Dec 20, 2024 •

edited

Loading

codecov bot commented Dec 20, 2024 •

edited

Loading

lionel-rowe commented Dec 24, 2024 •

edited

Loading

timreichen commented Dec 24, 2024 •

edited

Loading

lionel-rowe commented Dec 24, 2024 •

edited

Loading