-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial commit maintaining match_idx vars #53
Conversation
It's fairly harmless, though it does use two more globals, which is largely a non-issue for Glulx but may be a slight concern for Z. Hmm: I'm certainly not against, but sell me a bit more on this. What's the use case for it, and why can't it be done other ways? |
My Textile extension provides phrases to return the starting character and length of matches of both plain text searches and regexp searches (by specified match number in the regexp case). In turn, that info is the basis for other useful things: A phrase to replace the first occurrence of a plain text match with other text: I7's built-ins only offer replacing all of the matches. (This parallels a the extension's offering to replace just the first occurrence of a regexp.) A phrase to split a text on a given substring, returning a list of texts. It could be done other ways, by ignoring the existing implementation and writing from scratch, or by using a regexp match on a regexp that was a regexp-character-escaped version of the plain text string. Given that there's already a working implementation that has the data and discards it, and ties into the existing substring match phrases' use without additional effort (so that I can say this data is available after looking for a match, same as with regexps), I thought this was the obvious parsimonious implementation. It wouldn't be the first feature to be glulx-only if these lines were to be wrapped in #ifdefs. But it's a small thing that really isn't worth much time considering: if my case for it doesn't seem persuasive, feel free to close the PR and my extension will continue to replace TEXT_TY_Replace_REI. |
That all sounds quite reasonable. If BasicInformKit is providing these two variables, though, maybe Basic Inform ought to have matching affordances to read them. In Textile, what would be the phrase definitions for reading these two variables? I'm just wondering if it makes sense to move those (but not the rest of Textile) to BI. |
Textile aliases the vars:
I'm wholly open to less ugly names. |
Hmm, I'm not keen on exposing these as variables: that makes them possible to write to, which seems unsafe. I was thinking more of phrases to decide numbers, which returned the values of these variables, but didn't allow them to be written. |
fair point; the original was written with casual disregard for third parties making direct use of those. Pretend I said
|
If so, maybe the "no match" state of the variables should have match0_idx = 0 and match0_idx2 = -1, so that "matched text length" returns 0 if no match has yet been made? |
The I6 code would use 0-based indexing, but Inform usually uses 1-based indexing. Should those phrases add 1? If Option types were in basic inform that would allow for better handling of no matches ;) Maybe next major release. |
they're already made to be 1-based, and already 0-ed at the start (and stay that way if there's not a match).
Initializing match0_idx2 to -1 instead of 0 would be a clever way of having the length work out to 0 without additional fuss. |
...but now I'm seeing Dannii's point that maybe the I6 versions should be 0-indexed and it's appropriate for the 1-indexification to happen at the I7 wrapper. My existing implementation is needlessly confusing if anyone's ever manipulating them in I6. so maybe initialize them both to -1, add 1 in the I7 phrases for the indices, and let the I7 length phrase be a function..
|
at this point, we've rewritten this patch in the comments. Shall I submit a different pull request? |
Why not initialise them to 0 and -1, then, which is conceptually right for 0-indexed variables, and then define the phrases by: To decide what number is the matched text initial index: (- match0_idx + 1 -). It's fractionally more efficient to compute the matched text length that way, rather than requiring a function call and a branch, I guess. |
That's fine. The motivation for -1/-1 was to ensure both were obviously invalid in I6 in the no-match case, but the answer to the validity question is right there in match0_idx2, so it isn't all that compelling. So would you like a revised pull request? or given that it's like 3 lines of code that can be cut and pasted from this comment, is it easier to not bother? |
Well, I think we're not there yet: we need to revise the PR to change the conventions for what's in the variables, and to add these phrases to Basic Inform, and to put in a test verifying the behaviour (probably adding some lines to BIP-TextReplacement, BIP-TextReplacement-C and BIP-TextReplacement-G would do it?), and also to add a few lines to Writing with Inform documenting the three new phrases... |
Um, were you thinking that I would do those things and submit another PR? I realized I wasn't clear on this... |
I'm about ready to finally submit this, but there's one more case I'd like to solicit thoughts on, @ganelson and @curiousdannii : I7 thinks every text matches the text "". That's not unreasonable, but what should the reported start index, end index, and length be? Well, surely the length is 0. But how about start and end? The same arithmetic that makes everything else work out would say that the start index is 1 and the end index is 0. This is weird, but possibly less weird than adjusting the end index to 1, for which one could also make an argument. |
Also, xrefs.txt would need updating after merging this. Since it's a generated file, I thought I shouldn't commit a changed version. |
This is ready. As noted previously, xrefs.txt needs regenerating; I did not commit a new one. |
This is inessential and you may well be disinterested in modifying the core to include it. But it's also harmless, a no-op for existing code. It creates two global variables that maintain the start and end indices of the most recent plain text match search, equivalent to RE_DATA1 and RE_DATA2 in the RE_Subexpressions array for regexps. And it's inspired by a need in an extension of mine, but I expect other people could come up with uses for them.
This is far less relevant than neroden's wn change: TEXT_TY_Replace_REI is far more tractable to replace than Parser__parse.
But enough with the hard sell; please do with the PR as you will.