-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stdlib: add support regexp match and replace for strings #107
Comments
Hard in Jsonnet, easy as a builtin, but the trick is to make sure every language that someone might want to implement Jsonnet in (and therefore have to provide an implementation for each of the builtins) has a native regular expression library with exactly the same regex syntax and semantics. |
PCRE seems to be implemented in many languages. Perhaps it would be better to implement this as a native extension (#108) to the language and not part of the core. |
Does it support unicode typically? |
+1 |
It appears that PCRE does typically support unicode: http://man7.org/linux/man-pages/man3/pcreunicode.3.html |
Would be great if the 3 of you could offer some real use cases for this functionality so I can figure out how to prioritize it. |
For our use case, we need to strip all non-alphanumeric chars from a string variable. i am thinking this can be done currently by splitting string to array of chars, checking each char, and then rejoining.... but that seems very ugly. This may be able to be done more easily with a new function like std.toAlpha(x), but i would think full-on regex capabilities would be a more complete solution. |
It's not too bad but I can see why you'd rather write it with 0-9A-Za-z type ranges and on one line.
|
I have a case where I'd like to be able to replace all instances of On a nearer-term note, can the stdlib/built-ins be composed to produce a 'replace each instance of character x with an instance of character y' behavior? I'm not coming up with it, though I'm quite new to Jsonnet. |
Here's an example implementation of proposed 'replace each instance of character x with an instance of character y'
produces: The implementation above also support deleting characters or replacing them with multiple characters: It may be useful generally enough to add it to stdlib. @sparkprime what do you think? |
@sbarzowski thank you, that is fantastic. Not only a clean interface to accomplish what I'm looking to get done, but also a good bit of insight about how to approach programming jsonnet. I'd be in favor of adding this to stdlib, but I'm not on the hook for maintenance, so perhaps merely adding to the documentation would be sufficient to help future seekers like myself. Whatever the decision about adding to stdlib or docs, thank you for the help @sbarzowski. |
tr-like functionality is definitely a good candidate for stdlib. |
Since this has come up again - do we have compatible implementations of PCRE in Go and C++ that will work with unicode? |
Well, there is this thing: https://github.com/glenn-brown/golang-pkg-pcre. This is an interface to libpcre. It seems to hardcode assumptions about where libpcre is installed, though... I couldn't find anything else. Probably using libpcre directly with cgo would be a better option. |
My guess is that that package defeats part of the purpose of go-jsonnet, which is to allow go programs to use jsonnet without cgo. Could be wrong though. :-) |
Yeah I think unless we can find a library that has native Go and C++ support (for exactly the same regex syntax) we'll have to leave regexes as something that people add with native extensions. |
Coming back full circle, would RE2 along with Go's built-in regexp package not be a good fit? There're Unicode aware and syntax compatible. From Go's regexp package documentation:
(The |
In that case I guess RE2 is the way forward after all :) |
I'm currently prototyping RE2 regexp support in my master...dcoles:re2 branch. Boolean matches can be implemented pretty trivially, but positional and named captures are going to require a bit more thought. The current plan is to have a match return an object upon successful match or $ jsonnet -e 'std.regexFullMatch("hello", "h(?P<mid>.*)o")'
{
"captures": [
"ell"
],
"namedCaptures": {
"mid": "ell"
},
"string": "hello"
} This way you can still do things like |
I see the PR, which I eagerly anticipate, but just to summarize the points and questions about RE2:
|
It's nice to see this move along, the discussions on the PR is promising. I have a use case involving JSON schema, I'm building a validator in jsonnet and turns out JSON schema has a few features that use regular expressions. I don't know much about the different implementions of regex in the wild, the schema spec depends on the ECMA 262 implementation. I think it would be safe to provide one native implementation in stdlib and if users need a different for their use case they can leverage the native functions feature (or if they feel adventures, they can implement one in jsonnet). |
Just had a quick look in other projects as I was curious: Kubernetes uses ogen has an interface with a fallback from |
Would be easy to implement as a builtin.
https://github.com/google/re2
The text was updated successfully, but these errors were encountered: