initial commit maintaining match_idx vars #53

zedlopez · 2022-05-29T18:31:55Z

This is inessential and you may well be disinterested in modifying the core to include it. But it's also harmless, a no-op for existing code. It creates two global variables that maintain the start and end indices of the most recent plain text match search, equivalent to RE_DATA1 and RE_DATA2 in the RE_Subexpressions array for regexps. And it's inspired by a need in an extension of mine, but I expect other people could come up with uses for them.

This is far less relevant than neroden's wn change: TEXT_TY_Replace_REI is far more tractable to replace than Parser__parse.

But enough with the hard sell; please do with the PR as you will.

ganelson · 2022-05-30T20:24:20Z

It's fairly harmless, though it does use two more globals, which is largely a non-issue for Glulx but may be a slight concern for Z. Hmm: I'm certainly not against, but sell me a bit more on this. What's the use case for it, and why can't it be done other ways?

zedlopez · 2022-05-30T21:06:35Z

My Textile extension provides phrases to return the starting character and length of matches of both plain text searches and regexp searches (by specified match number in the regexp case). In turn, that info is the basis for other useful things:

A phrase to replace the first occurrence of a plain text match with other text: I7's built-ins only offer replacing all of the matches. (This parallels a the extension's offering to replace just the first occurrence of a regexp.)

A phrase to split a text on a given substring, returning a list of texts.

It could be done other ways, by ignoring the existing implementation and writing from scratch, or by using a regexp match on a regexp that was a regexp-character-escaped version of the plain text string. Given that there's already a working implementation that has the data and discards it, and ties into the existing substring match phrases' use without additional effort (so that I can say this data is available after looking for a match, same as with regexps), I thought this was the obvious parsimonious implementation.

It wouldn't be the first feature to be glulx-only if these lines were to be wrapped in #ifdefs.

But it's a small thing that really isn't worth much time considering: if my case for it doesn't seem persuasive, feel free to close the PR and my extension will continue to replace TEXT_TY_Replace_REI.

ganelson · 2022-05-30T21:19:57Z

That all sounds quite reasonable. If BasicInformKit is providing these two variables, though, maybe Basic Inform ought to have matching affordances to read them. In Textile, what would be the phrase definitions for reading these two variables? I'm just wondering if it makes sense to move those (but not the rest of Textile) to BI.

zedlopez · 2022-05-30T21:31:58Z

Textile aliases the vars:

match0-index is a number variable.
The match0-index variable translates into I6 as "match0_idx".
match0-index2 is a number variable.
The match0-index2 variable translates into I6 as "match0_idx2".

I'm wholly open to less ugly names.

ganelson · 2022-05-30T21:39:55Z

Hmm, I'm not keen on exposing these as variables: that makes them possible to write to, which seems unsafe. I was thinking more of phrases to decide numbers, which returned the values of these variables, but didn't allow them to be written.

zedlopez · 2022-05-30T22:02:25Z

fair point; the original was written with casual disregard for third parties making direct use of those. Pretend I said

To decide what number is the matched text initial index: (- match0_idx -).
To decide what number is the matched text final index: (- match0_idx2 -).
To decide what number is the matched text length: (- (1 + match0_idx2 - match0_idx) -).

ganelson · 2022-05-30T22:05:27Z

If so, maybe the "no match" state of the variables should have match0_idx = 0 and match0_idx2 = -1, so that "matched text length" returns 0 if no match has yet been made?

curiousdannii · 2022-05-30T22:06:10Z

The I6 code would use 0-based indexing, but Inform usually uses 1-based indexing. Should those phrases add 1?

If Option types were in basic inform that would allow for better handling of no matches ;) Maybe next major release.

zedlopez · 2022-05-30T22:26:20Z

they're already made to be 1-based, and already 0-ed at the start (and stay that way if there's not a match).

        match0_idx = RE_PACKET_space-->RE_DATA1 + 1;
        match0_idx2 = RE_PACKET_space-->RE_DATA2 + 1;

Initializing match0_idx2 to -1 instead of 0 would be a clever way of having the length work out to 0 without additional fuss.

zedlopez · 2022-05-30T22:38:45Z

...but now I'm seeing Dannii's point that maybe the I6 versions should be 0-indexed and it's appropriate for the 1-indexification to happen at the I7 wrapper. My existing implementation is needlessly confusing if anyone's ever manipulating them in I6.

so maybe initialize them both to -1, add 1 in the I7 phrases for the indices, and let the I7 length phrase be a function..

To decide what number is the matched text length:
  if the matched text final index < 0, decide on 0;
  decide 1 + the matched text final index - the matched text initial index.

zedlopez · 2022-05-31T13:23:52Z

at this point, we've rewritten this patch in the comments. Shall I submit a different pull request?

ganelson · 2022-05-31T18:28:47Z

Why not initialise them to 0 and -1, then, which is conceptually right for 0-indexed variables, and then define the phrases by:

To decide what number is the matched text initial index: (- match0_idx + 1 -).
To decide what number is the matched text final index: (- match0_idx2 + 1 -).
To decide what number is the matched text length: (- (1 + match0_idx2 - match0_idx) -).

It's fractionally more efficient to compute the matched text length that way, rather than requiring a function call and a branch, I guess.

zedlopez · 2022-05-31T18:40:32Z

That's fine. The motivation for -1/-1 was to ensure both were obviously invalid in I6 in the no-match case, but the answer to the validity question is right there in match0_idx2, so it isn't all that compelling.

So would you like a revised pull request? or given that it's like 3 lines of code that can be cut and pasted from this comment, is it easier to not bother?

ganelson · 2022-05-31T20:09:09Z

Well, I think we're not there yet: we need to revise the PR to change the conventions for what's in the variables, and to add these phrases to Basic Inform, and to put in a test verifying the behaviour (probably adding some lines to BIP-TextReplacement, BIP-TextReplacement-C and BIP-TextReplacement-G would do it?), and also to add a few lines to Writing with Inform documenting the three new phrases...

zedlopez · 2022-06-06T06:06:44Z

Um, were you thinking that I would do those things and submit another PR? I realized I wasn't clear on this...

zedlopez · 2023-04-23T20:12:04Z

I'm about ready to finally submit this, but there's one more case I'd like to solicit thoughts on, @ganelson and @curiousdannii :

I7 thinks every text matches the text "". That's not unreasonable, but what should the reported start index, end index, and length be? Well, surely the length is 0. But how about start and end? The same arithmetic that makes everything else work out would say that the start index is 1 and the end index is 0. This is weird, but possibly less weird than adjusting the end index to 1, for which one could also make an argument.

zedlopez · 2023-04-23T23:53:14Z

Also, xrefs.txt would need updating after merging this. Since it's a generated file, I thought I shouldn't commit a changed version.

zedlopez · 2023-07-02T21:12:28Z

This is ready. As noted previously, xrefs.txt needs regenerating; I did not commit a new one.

initial commit maintaining match_idx vars

7de7b21

Merge branch 'master' into static_idx

f4f2b3c

zedlopez added 2 commits April 23, 2023 15:01

working version

a0f0d59

adding tests ; not enough yet

6a18ea6

zedlopez added 3 commits April 23, 2023 21:49

correcting documentation

e59bc7c

Merge branch 'master' into static_idx

12ae115

Merge branch 'master' into static_idx

09ab3d9

zedlopez mentioned this pull request Jun 11, 2023

IE-0023: Retaining text/regexp match start/end locations ganelson/inform-evolution#23

Merged

zedlopez added 5 commits June 26, 2023 08:20

more tests, improved docs

7d32f14

fixes to phrases, kit

598daef

syncing master

13fdb2a

finalizing match index

9f944ee

patch for edge case; revised docs

61058e8

zedlopez and others added 3 commits July 2, 2023 12:40

syncing from upstream

5ac954d

finalizing tests

a3f455f

Merge branch 'ganelson:master' into static_idx

fd7f51b

ganelson merged commit 19ad547 into ganelson:master Jul 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

initial commit maintaining match_idx vars #53

initial commit maintaining match_idx vars #53

zedlopez commented May 29, 2022

ganelson commented May 30, 2022

zedlopez commented May 30, 2022

ganelson commented May 30, 2022

zedlopez commented May 30, 2022

ganelson commented May 30, 2022

zedlopez commented May 30, 2022

ganelson commented May 30, 2022

curiousdannii commented May 30, 2022 •

edited

Loading

zedlopez commented May 30, 2022 •

edited

Loading

zedlopez commented May 30, 2022

zedlopez commented May 31, 2022

ganelson commented May 31, 2022

zedlopez commented May 31, 2022

ganelson commented May 31, 2022

zedlopez commented Jun 6, 2022

zedlopez commented Apr 23, 2023

zedlopez commented Apr 23, 2023

zedlopez commented Jul 2, 2023 •

edited

Loading

initial commit maintaining match_idx vars #53

initial commit maintaining match_idx vars #53

Conversation

zedlopez commented May 29, 2022

ganelson commented May 30, 2022

zedlopez commented May 30, 2022

ganelson commented May 30, 2022

zedlopez commented May 30, 2022

ganelson commented May 30, 2022

zedlopez commented May 30, 2022

ganelson commented May 30, 2022

curiousdannii commented May 30, 2022 • edited Loading

zedlopez commented May 30, 2022 • edited Loading

zedlopez commented May 30, 2022

zedlopez commented May 31, 2022

ganelson commented May 31, 2022

zedlopez commented May 31, 2022

ganelson commented May 31, 2022

zedlopez commented Jun 6, 2022

zedlopez commented Apr 23, 2023

zedlopez commented Apr 23, 2023

zedlopez commented Jul 2, 2023 • edited Loading

curiousdannii commented May 30, 2022 •

edited

Loading

zedlopez commented May 30, 2022 •

edited

Loading

zedlopez commented Jul 2, 2023 •

edited

Loading