-
Notifications
You must be signed in to change notification settings - Fork 13
Reconsider unconditionally exposing capture offsets on result #12
Comments
I’d think a single getter or |
This was the original Stage 0 proposal, but I was cautioned against it due to the memory overhead added to every existing RegExp result, even if they didn't use this feature. I agree that a A less memory-intrusive approach (though possibly more break-y) would have been to use something like an Integer Indexed Exotic Object for the result of |
If done carefully, the memory overhead would be at most 2 ints per capture. Assuming 4-byte integers, that's comparable to each captured string being 8 characters longer. |
To be clear, it was cautioned against by the same people who are now asking to revisit this. We now believe the overhead could be minimal if implemented carefully. It seems like there's agreement this would be strictly better than having to add an options bag to multiple existing methods, so let's do it! schuay@'s suggestion applies regardless of whether we go with |
I'm looking into putting together a PR for this change using an Would that be sufficient to address the memory concerns? If so I can put up a PR by tomorrow. |
@ljharb If we don't have |
Sounds reasonable.
Why not? cc @hashseed |
@rbuckton I'm not sure what you mean; matchAll could easily create a getter function for each match object that access internal slots to build up the info. |
While we are at putting alternatives on the table: we could also introduce a new regexp flag to prescribe getting capture indices. I'm not saying that this is what I would prefer though. I'm not sure adding indices to |
@hashseed: We've already discussed and generally disliked the idea of making this a regexp flag. @ljharb: I think I misread part of
|
Yes and no. |
|
A major motivation behind this proposal (changing back to unconditionally including offsets on results) was that we would not have to modify any regexp builtins other than |
@ljharb right, thanks for clarifying. My point was that happens automatically, without modifying any of the logic in |
@ljharb: For the current version of the proposal (that uses an |
I think that the API should remain consistent; so yes, it would be good to include the options object on |
Is there a tool for searching JavaScript GitHub projects for usages of |
@rbuckton If I might offer one opinion from the wild… I personally explored a lot of subclassing patterns to augment Also, re argumenting the Maybe worthwhile considering rolling out
|
Subclassing RegExp properly requires overriding 1 or more of the Symbol hooks, or |
Yes, overriding without making breaking assumptions of the interface. Here is roughly how I go about it personally: class extends RegExp {
betterExec(... args) {
return // somehow get the result
}
capture(result, ... args) {
// somehow augment the result
return result;
}
exec(... args) {
const result = this.betterExec(...args) || super.exec(...args);
if (result) this.capture(result, ...args);
return result;
}
} @ljharb, I override |
Switching to |
Revisiting feedback from #1 (comment):
Currently, the proposed changes add an optional bag of args to RegExp.p.exec and multiple other RegExp and String methods. One can pass
{ capture: "indices" }
to tellexec
to return capture indices instead of the captured substrings. Indices are returned as a 2-element array[start, end]
.I don't think this is the way to go: it doesn't feel like clean API, it opens us up to future complexity ('we already have the options object, why not add more there'), it modifies 7+ functions just to pass through
matchOptions
, it adds conditional functionality toexec
(as the choke-point for all RegExp functions, IMO we should keepexec
simple).It may be cleaner to remove the option args and just expose indices on regexp results unconditionally.
result.captureStart(i)
/result.captureEnd(i)
).So, my proposal would be: expose indices unconditionally through
captureStart(index)
andcaptureEnd(index)
on the result object. The only overhead would be that the captures array (which we already have) is kept alive by the result object.The text was updated successfully, but these errors were encountered: