-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backtrack stack overflow (current limit is 400) #64
Comments
Steve, the change you want is to a constant that limits the amount of backtracking that a single pattern is able to do. I will create a v0.99k-mod5 containing a change of this value from 400 to 800 (later today when I get back to the office). Meanwhile, if you want to do it yourself, instructions are below. I do want to understand, at some point in time, how your patterns combine to need more than 400 possible backtracking points. It's certainly possibly with a lot of patterns that are combined into a grand "this or that or the next or the next..." pattern, and I'm interested in the use case. If you want to make the change yourself, you can edit as shown below and then follow the 2 steps below. Here, I've changed it from 400 to 800 in
After editing as shown above, and while in the directory containing Note that running rosie's
|
Ok, I managed to create release v0.99k-mod5 from my remote location. 😄 I'm going to close this issue for now, but of course let me know if you have any problems with the fix. Btw, to upgrade, you'll need to:
|
Hi, Jamie, Thank you very much for your prompt response and creating v0.99k-mod5 for me. After the limit is raised, I run into another problem further down. The following are in the MANIFEST: Each of the vmware.*.rpl files contains the following:
The vmware.rpl contains the following: When the limit was 400, I was getting the following: Traceback (most recent call last): After the limit has been raised to 800, I am now getting the following: If I remove the very last entry with 'vmware.rpl', the MANIFEST loads ok. Appreciate your advice on whether this can be worked around. Many thanks. The use case is to extract unique 'message types' (messages with the same pattern belong to a message type) from a log file so that specific action can be coded for each message type. The above limit would probably not be encountered for most log files. But in this case, we have a log which is relatively complex(rich in patterns) and causes us this problem. -Steve |
So the final pattern that is used for matching is an OR of around 1200 smaller (but nontrivial) patterns? I think I see what's happening internally to cause the "too many Lua values" error in that case. I'll work up a fix. |
Steve, I have not been able to replicate this, even by generating 16,000 patterns, each of which references the previous one. However, it's possible that I am missing something about your use case. One thought: Is it possible there is a typo that is creating a circular set of definitions? Rosie v1.0.0 detects such situations, but v0.99 does not. In theory, that could cause this error. Another thought: When the python module for Rosie v1.0.0 is released, will you consider moving to that? There will be two kinds of changes you will have to make (but many benefits):
There are many benefits, though, including built-in unit tests, better tracing, and a significant increase in speed. Not to mention a lot of new features. And, you'll only have to do those changes (above) once: I designed v1.0.0 to keep a stable RPL syntax and data format even as all the planned new features are added. |
Hi, Jamie, NOTE: back track limit is at 800 when this testcase is run. typeset -i i=0 The following is the python testcase(to be placed in ${ROSIE_HOME}/ffi/samples/python): The following are added to ${ROSIE_HOME}/MANIFEST: To reproduce the problem,
|
By the way, given all the goodies you have mentioned, I am looking forward to migrate to Rosie v1.0.0 when it's released. |
Steve, thanks for the script! It is exactly what I needed to reproduce the problem. I converted your patterns to Rosie v1.0.0 to confirm that the problem exists there as well, and I fixed it. The fix is in 1.0.0-alpha-3. Backporting the fix to v0.99k is possible, but will require a little work. I should be able to get to it this weekend. |
FYI, in the process of replicating the bug, I updated your pattern to Rosie 1.0.0, and made one simplifying change. The result was this script that writes to stdout:
And an example of using it: (note how fast it compiles in Rosie 1.0.0)
|
Great! Now I don't need to break the pattern definitions into a number of RPL files anymore. I did that because it seemed to help working around the back track limit at one point. And the speedup in compilation time! I will start looking into migrating to 1.0.0. |
Hi, Jamie, |
Hi Steve, Performance with v1.0.0 is much better. There are many permutations to consider, so I'll give you a few data points and you can let me know which is closest to your situation. I'm using this input file (
And my input data is a file of 2M log entries, all different, and taken from a running IBM Bluemix application. A sample log line looks like this:
My measurements are being done on an Early 2015 Macbook Pro with the specs below. And I'm running each test just 3 times and showing here the one that had the middle User time. (So, not scientific, but indicative of what you can expect.)
Case 1: Rosie CLI
2,000,000 entries in 20.58sec, or just under 100k entries/sec Note that if you do not need the JSON output, it is much faster. In this example, the input line is returned when there is a match (like grep does):
2,000,000 entries in 5.47s, or around 363k entries/sec Case 2: Python calling Rosie one line at a time, no JSON decoding Here is file
Processing the same data file and printing the resulting matches:
2,000,000 entries in 31.96sec, or around 62k entries/sec (Python is slow.) Processing the same data file and NOT printing the matches saves a small amount of time, but not much. Note: If you only need to know if the line matched or not, you can change
2,000,000 entries in 16sec, or around 125k entries/sec Case 3: Python calling Rosie one line at a time, and JSON decoding each match Here's where Python loses big time. It's JSON decoding is horrible. I can recommend a way to avoid it though -- let me know if you are interested. Also, Python's ability to print a small data structure to the console is slow as well. In this run, Python is using
2,000,000 entries in 186sec, or around 10.7k entries/sec Finally, Python processing 2M entries, JSON-decoding each match, and printing the match. This is not a realistic use case, since a real application would do something with the data that is presumably faster than asking Python to create and write a human-readable representation of the resulting data structure (a representation that takes ~ 1500 chars/entry).
2,000,000 entries in 343sec, or around 5800 entries/sec Conclusion Python is slow. I suspect Go or C/C++ would be significantly faster here. However, it depends on what you are doing with the data. If you mostly need to know whether there was a match or not, then you can get around 125k entries/sec from Python (under similar conditions as my test, of course). If you need to decode the JSON to get the details for every match, you are looking at a ceiling of around 10.7k entries/sec according to my tests, or around 650k/min. It is likely possible to get this figure over 1M/min by avoiding Python's If you are interested in this approach, let me know and I'll make it a priority to implement the decoder in Python and we can each try it on our own data, to see how it performs. There is also the option of moving away from Python (which, although popular, has many challenges for production use). For Rosie v0.99, I created a sample Go program and a sample C program, and my TODO list for v1.0.0 includes writing new and better ones for v1.0.0. If C or Go would help you, let me know and I'll make it a priority to write the librosie client. |
Hi, Jamie, Thank you very much for your prompt reply and comprehensive information/advice! To get a very rough estimate of based on the v1.0.0 information provided, I tested case#1 using Rosie v0.99k-mod5 with a test file containing 2 million lines of your sample log line: time rosie -encode json basic.matchall testfile | wc -l My hardware configuration is roughly the same as yours. So leavng aside the difference between syslog.rpl and basic,rpl, it looks like v1.0.0 needs only one third of the time v0.99 requires in this scenario. I did some profiling on json.loads(), it takes around 30s for one million calls. I also tested case#2 with the following program 'test1': But it look ages...still running after 10 minutes! Using the above program with 'testfile' changed to 10000 lines(i.e. instead of 2000000 lines). time ./test1real 0m13.259s I profiled the execution as follow: Using the following to python script 'prtstat' to print out the statistics: prtstatncalls tottime percall cumtime percall filename:lineno(function) Therefore, bulk of time spent was with _match_or_eval() Now with a similar program 'test2', using match_file() instead: time ./test2real 0m0.765s prtstatncalls tottime percall cumtime percall filename:lineno(function) Could you explain why there is such a big difference. This doesn't look right. When I first started using Rosie, I just had to deal with log files. So I could use match_file() instead of match(). But now I have to deal with streaming data, this will become an issue that has to be resolved. Appreciate your advice. Many thanks. |
Btw, my case is the worst case(i.e. case #3). At this point, I would like to get it to work and meet user requirement(or close to it) ASAP without rewriting the whole thing. |
Steve, the Python API is available now in the current release on the master branch. Doc for this is still needed, although it's a much simpler API than v0.99 -- see test.py. Thanks for pointing me to ujson, btw. The Python one is so slow! Why is matchfile() so much faster than match() in v0.99? The implementation of matchfile() is the same code that the Rosie CLI uses to process a file. As far as performance goes, most of the low-hanging fruit has been picked. Without some non-trivial effort, it won't go much faster. By contrast, match() has a lot of overhead per line of input data: (1) a Python call, (2) marshaling across the FFI interface which includes a lot of copying, and (3) allocation/deallocation of memory for the input string and the match data. Why is v1.0.0 much faster than v0.99? The relationship between matchfile() and match() is the same, in that matchfile() is all done in Rosie in a tight loop and the easy performance improvements have been made in the loop. But there are significant improvements in v1.0.0, and, crucially, these apply to both matchfile() and match():
|
Jamie, As I need to return a list of line patterns to the user in the input order, I need to find out the line number of lines that have not been matched so that I can insert back error indicators at the end. The match_file() will write the unmatched lines to the error file only. Is it possible to have the line number written together with an unmatched line in the error file? |
I can add this to v0.99 for you. An alternative would be to create a "no match" pattern that is a last resort, e.g.
This way you are guaranteed to get a match, and you can use the name of the match to determine whether it was one you wanted (e.g. Let me know if this workaround is sufficient? If it is not, then I'll create a v0.99k-mod6 for you with line numbers in the error output. |
The workaround will do. Thanks, Jamie. |
Hi, Jamie, |
With 1.0.0 about to enter Beta, we have no plans to fix v0.99 issues. (Also, the suggested work-around was acceptable for this issue.) |
Hi, Jamie,
I encountered the following error when loading MANIFEST:
Traceback (most recent call last):
File "/opt/log-analytics/bin/mytest", line 33, in
extract()
File "/opt/log-analytics/bin/mytest", line 31, in extract
r = engine.load_manifest("$sys/vmware.MANIFEST")
File "/opt/log-analytics/module/rosie.py", line 154, in load_manifest
retvals = self.rosie.get_retvals(r)
File "/opt/log-analytics/module/rosie.py", line 102, in get_retvals
return self._get_retvals(messages, self.rosie.rosieL_free_stringArray)
File "/opt/log-analytics/module/rosie.py", line 113, in _get_retvals
raise RuntimeError(retvals[1]) # exception indicating that the call failed
RuntimeError: src/core/engine.lua:121: backtrack stack overflow (current limit is 400)
It seems this happens when I have a large number of patterns defined in a number of rpl files.
Could you advise whether there is a way to workaround this?
Appreciate your advice. Many thanks.
-Steve
The text was updated successfully, but these errors were encountered: