Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial commit maintaining match_idx vars #53

Merged
merged 15 commits into from
Jul 3, 2023

Conversation

zedlopez
Copy link
Contributor

This is inessential and you may well be disinterested in modifying the core to include it. But it's also harmless, a no-op for existing code. It creates two global variables that maintain the start and end indices of the most recent plain text match search, equivalent to RE_DATA1 and RE_DATA2 in the RE_Subexpressions array for regexps. And it's inspired by a need in an extension of mine, but I expect other people could come up with uses for them.

This is far less relevant than neroden's wn change: TEXT_TY_Replace_REI is far more tractable to replace than Parser__parse.

But enough with the hard sell; please do with the PR as you will.

@ganelson
Copy link
Owner

It's fairly harmless, though it does use two more globals, which is largely a non-issue for Glulx but may be a slight concern for Z. Hmm: I'm certainly not against, but sell me a bit more on this. What's the use case for it, and why can't it be done other ways?

@zedlopez
Copy link
Contributor Author

My Textile extension provides phrases to return the starting character and length of matches of both plain text searches and regexp searches (by specified match number in the regexp case). In turn, that info is the basis for other useful things:

A phrase to replace the first occurrence of a plain text match with other text: I7's built-ins only offer replacing all of the matches. (This parallels a the extension's offering to replace just the first occurrence of a regexp.)

A phrase to split a text on a given substring, returning a list of texts.

It could be done other ways, by ignoring the existing implementation and writing from scratch, or by using a regexp match on a regexp that was a regexp-character-escaped version of the plain text string. Given that there's already a working implementation that has the data and discards it, and ties into the existing substring match phrases' use without additional effort (so that I can say this data is available after looking for a match, same as with regexps), I thought this was the obvious parsimonious implementation.

It wouldn't be the first feature to be glulx-only if these lines were to be wrapped in #ifdefs.

But it's a small thing that really isn't worth much time considering: if my case for it doesn't seem persuasive, feel free to close the PR and my extension will continue to replace TEXT_TY_Replace_REI.

@ganelson
Copy link
Owner

That all sounds quite reasonable. If BasicInformKit is providing these two variables, though, maybe Basic Inform ought to have matching affordances to read them. In Textile, what would be the phrase definitions for reading these two variables? I'm just wondering if it makes sense to move those (but not the rest of Textile) to BI.

@zedlopez
Copy link
Contributor Author

Textile aliases the vars:

match0-index is a number variable.
The match0-index variable translates into I6 as "match0_idx".
match0-index2 is a number variable.
The match0-index2 variable translates into I6 as "match0_idx2".

I'm wholly open to less ugly names.

@ganelson
Copy link
Owner

Hmm, I'm not keen on exposing these as variables: that makes them possible to write to, which seems unsafe. I was thinking more of phrases to decide numbers, which returned the values of these variables, but didn't allow them to be written.

@zedlopez
Copy link
Contributor Author

fair point; the original was written with casual disregard for third parties making direct use of those. Pretend I said

To decide what number is the matched text initial index: (- match0_idx -).
To decide what number is the matched text final index: (- match0_idx2 -).
To decide what number is the matched text length: (- (1 + match0_idx2 - match0_idx) -).

@ganelson
Copy link
Owner

If so, maybe the "no match" state of the variables should have match0_idx = 0 and match0_idx2 = -1, so that "matched text length" returns 0 if no match has yet been made?

@curiousdannii
Copy link
Contributor

curiousdannii commented May 30, 2022

The I6 code would use 0-based indexing, but Inform usually uses 1-based indexing. Should those phrases add 1?

If Option types were in basic inform that would allow for better handling of no matches ;) Maybe next major release.

@zedlopez
Copy link
Contributor Author

zedlopez commented May 30, 2022

they're already made to be 1-based, and already 0-ed at the start (and stay that way if there's not a match).

        match0_idx = RE_PACKET_space-->RE_DATA1 + 1;
        match0_idx2 = RE_PACKET_space-->RE_DATA2 + 1;

Initializing match0_idx2 to -1 instead of 0 would be a clever way of having the length work out to 0 without additional fuss.

@zedlopez
Copy link
Contributor Author

...but now I'm seeing Dannii's point that maybe the I6 versions should be 0-indexed and it's appropriate for the 1-indexification to happen at the I7 wrapper. My existing implementation is needlessly confusing if anyone's ever manipulating them in I6.

so maybe initialize them both to -1, add 1 in the I7 phrases for the indices, and let the I7 length phrase be a function..

To decide what number is the matched text length:
  if the matched text final index < 0, decide on 0;
  decide 1 + the matched text final index - the matched text initial index.

@zedlopez
Copy link
Contributor Author

at this point, we've rewritten this patch in the comments. Shall I submit a different pull request?

@ganelson
Copy link
Owner

Why not initialise them to 0 and -1, then, which is conceptually right for 0-indexed variables, and then define the phrases by:

To decide what number is the matched text initial index: (- match0_idx + 1 -).
To decide what number is the matched text final index: (- match0_idx2 + 1 -).
To decide what number is the matched text length: (- (1 + match0_idx2 - match0_idx) -).

It's fractionally more efficient to compute the matched text length that way, rather than requiring a function call and a branch, I guess.

@zedlopez
Copy link
Contributor Author

That's fine. The motivation for -1/-1 was to ensure both were obviously invalid in I6 in the no-match case, but the answer to the validity question is right there in match0_idx2, so it isn't all that compelling.

So would you like a revised pull request? or given that it's like 3 lines of code that can be cut and pasted from this comment, is it easier to not bother?

@ganelson
Copy link
Owner

Well, I think we're not there yet: we need to revise the PR to change the conventions for what's in the variables, and to add these phrases to Basic Inform, and to put in a test verifying the behaviour (probably adding some lines to BIP-TextReplacement, BIP-TextReplacement-C and BIP-TextReplacement-G would do it?), and also to add a few lines to Writing with Inform documenting the three new phrases...

@zedlopez
Copy link
Contributor Author

zedlopez commented Jun 6, 2022

Um, were you thinking that I would do those things and submit another PR? I realized I wasn't clear on this...

@zedlopez
Copy link
Contributor Author

I'm about ready to finally submit this, but there's one more case I'd like to solicit thoughts on, @ganelson and @curiousdannii :

I7 thinks every text matches the text "". That's not unreasonable, but what should the reported start index, end index, and length be? Well, surely the length is 0. But how about start and end? The same arithmetic that makes everything else work out would say that the start index is 1 and the end index is 0. This is weird, but possibly less weird than adjusting the end index to 1, for which one could also make an argument.

@zedlopez
Copy link
Contributor Author

Also, xrefs.txt would need updating after merging this. Since it's a generated file, I thought I shouldn't commit a changed version.

@zedlopez
Copy link
Contributor Author

zedlopez commented Jul 2, 2023

This is ready. As noted previously, xrefs.txt needs regenerating; I did not commit a new one.

@ganelson ganelson merged commit 19ad547 into ganelson:master Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants