Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test GraalVM / TRegex #16

Open
tarsa opened this issue Jun 22, 2024 · 5 comments
Open

Test GraalVM / TRegex #16

tarsa opened this issue Jun 22, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@tarsa
Copy link

tarsa commented Jun 22, 2024

some docs on how to run tregex are here: https://github.com/oracle/graal/blob/master/regex/README.md

probably graalvm enterprise (aka oracle graalvm) will run faster than graalvm ce (community edition).

@BurntSushi
Copy link
Owner

BurntSushi commented Jun 22, 2024

Either someone will need to contribute this, or there will need to be far more detailed instructions on how to setup a project and run it with GraalVM/TRegex. The README has scant mention of this. It says something about using it "Truffle's interop mechanisms." But that means nothing to me.

Also:

Unlike most regular expression engines which use backtracking, TRegex uses an automaton-based approach. The regex is parsed and then translated into a nondeterministic finite-state automaton (NFA). A powerset construction is then used to expand the NFA into a deterministic finite-state automaton (DFA).

Unless they're doing powerset construction lazily, their regex compile times will have worst case exponential time in the size of the regex pattern.

@BurntSushi BurntSushi added the enhancement New feature or request label Jun 22, 2024
@tarsa
Copy link
Author

tarsa commented Jun 23, 2024

Either someone will need to contribute this, or there will need to be far more detailed instructions on how to setup a project and run it with GraalVM/TRegex. The README has scant mention of this. It says something about using it "Truffle's interop mechanisms." But that means nothing to me.

i've tried to make some example, but as you've said, the documentation is totally insufficient for quick experiments, so i think i'll give up on using tregex directly.

...but there's another clue in that (overall very unsatisfactory) docs:

TRegex originated as part of the Graal JavaScript implementation, but is now standalone so implementers of other languages can use it.

maybe running tregex through graaljs would be a sensible option? graalvm offers (some degree of) node.js compatibility: https://github.com/oracle/graaljs?tab=readme-ov-file#nodejs-support . i'll try to experiment with that.

@tarsa
Copy link
Author

tarsa commented Jun 23, 2024

ok, running graaljs in node.js compatibility mode looks simple. i hope it will be easy enough for you to include it in your benchmarks.

what you need is to download graalnodejs from https://github.com/oracle/graaljs/releases . there are many variants:

  • graaljs doesn't offer node.js compatibility, only graalnodejs offer it
  • -jvm- variants spin up whole jvm which take time, but maybe the jvm jit makes for higher performance?
  • -community- variants use graal ce (community edition) instead of oracle graal (aka 'graal enterprise edition' in the past). graal ce is probably slower than the (semi-)commercial edition.

it seems that you don't need to download whole graalvm distribution or anything else than a single graalnodejs variant to be able to run node.js scripts (but i haven't tested that fully, just run the main.js from this repo with empty input json).

graalnodejs is meant to be drop-in compatible with node.js, so it offers binaries with names 'node', 'npm', etc and that would clash with other node.js versions if you put everything on $PATH.

my example invocations:

$ echo {} | time ~/devel/graalnodejs-jvm-24.0.1-linux-amd64/bin/node main.js
/dev/shm/rebar/main.js:287
    throw new Error(`invalid KLV item: could not find first ':'`);
          ^

Error: invalid KLV item: could not find first ':'
    at parseOneKLV (/dev/shm/rebar/main.js:287:11)
    at parseConfig (/dev/shm/rebar/main.js:248:17)
    at main (/dev/shm/rebar/main.js:17:18)
    at Object.<anonymous> (/dev/shm/rebar/main.js:359:1)
    at Object._compile (node:internal/modules/cjs/loader:1356:14)
    at Object.<anonymous> (node:internal/modules/cjs/loader:1414:10)
    at Object.load (node:internal/modules/cjs/loader:1197:32)
    at Function._load (node:internal/modules/cjs/loader:1013:12)
    at Function.executeUserEntryPoint (node:internal/modules/run_main:128:12)
    at node:internal/main/run_main_module:28:49

Node.js v18.19.1
Command exited with non-zero status 1
11.09user 0.32system 0:03.43elapsed 332%CPU (0avgtext+0avgdata 553780maxresident)k
0inputs+64outputs (0major+140687minor)pagefaults 0swaps
$ echo {} | time ~/devel/graalnodejs-nativeimage-24.0.1-linux-amd64/bin/node main.js
/dev/shm/rebar/main.js:287
    throw new Error(`invalid KLV item: could not find first ':'`);
          ^

Error: invalid KLV item: could not find first ':'
    at parseOneKLV (/dev/shm/rebar/main.js:287:11)
    at parseConfig (/dev/shm/rebar/main.js:248:17)
    at main (/dev/shm/rebar/main.js:17:18)
    at Object.<anonymous> (/dev/shm/rebar/main.js:359:1)
    at Object._compile (node:internal/modules/cjs/loader:1356:14)
    at Object.<anonymous> (node:internal/modules/cjs/loader:1414:10)
    at Object.load (node:internal/modules/cjs/loader:1197:32)
    at Function._load (node:internal/modules/cjs/loader:1013:12)
    at Function.executeUserEntryPoint (node:internal/modules/run_main:128:12)
    at node:internal/main/run_main_module:28:49

Node.js v18.19.1
Command exited with non-zero status 1
1.14user 0.06system 0:00.46elapsed 259%CPU (0avgtext+0avgdata 358724maxresident)k
0inputs+0outputs (0major+44710minor)pagefaults 0swaps
$ ~/devel/graalnodejs-jvm-24.0.1-linux-amd64/bin/node --version
v18.19.1
$ ~/devel/graalnodejs-nativeimage-24.0.1-linux-amd64/bin/node --version
v18.19.1

@BurntSushi
Copy link
Owner

Thanks. I appreciate the leg work. I am still pretty unlikely to work on this any time soon personally.

It would also be helpful to know who is using tregex. Like, is it being used anywhere in a consequential manner? Because it's a non-goal to include literally every regex engine in rebar.

@tarsa
Copy link
Author

tarsa commented Jun 23, 2024

It would also be helpful to know who is using tregex. Like, is it being used anywhere in a consequential manner?

since tregex is a part of graaljs (and graalnodejs), i'll analyze situation with graaljs.

i don't know how to measure whether graalvm-based node.js fork is being used frequently, but the javascript engine is probably already used in commercial scenarios.

graalvm-based js engine is integrated with oracle database and with java applications.

info about graaljs integration with oracle database:

https://www.graalvm.org/js/mle-oracle-db/

This page describes how to run JavaScript in an Oracle Database using the Oracle Database Multilingual Engine (MLE). MLE is powered by GraalVM: it can run JavaScript code in Oracle Database 23ai (and later) on Linux x64.

https://labs.oracle.com/pls/apex/f?p=94065:12:32018698560791:15

info about graaljs integration with java applications:

look at 'usages' column on https://mvnrepository.com/artifact/org.graalvm.js/js . the numbers in 'usages' column are clickable and will show you what other maven artifacts depend on particular version of graaljs engine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants