-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serialized ATN const array generates horrible machine code fix #3009
Conversation
Hello. Is there something wrong with my PR? Should I modify anything? |
So, is this project dead? |
@mike-lischke any view on this proposed change? |
It's always a risk to conclude just because something doesn't affect you (or doesn't happen to you etc) means it doesn't affect anyone else. The template code is necessary to split huge ATN code arrays into pieces that are still valid. You probably have a pretty simple grammar, where no splitting is necessary, but try one of the more serious grammars like C++ or, even better, MySQL and see what comes out with your patch. Can the generated files still be compiled? If you really know more "horrible" things about the C++ code generation I'm eager to hear them. Maybe you can open a bug report for them here (please prefix its title with |
What I did is I made sure that it always generates generates split ATN code arrays, instead of the particular case where it generates one inline. And I checked on every major compiler that it generates sane code for amd64. Maybe I'm missing your point? How would this impact badly other usages? Because the old version does impact badly mine. And this is a real issue because it generates suboptiomal and oversized code that results in performance losses. It will behave correctly, sure, but my opinion is that we better do something about it. I could try one of the bigger grammars to see if it still compiles though, if that's what you mean. |
Oh, somehow I misread the patch, but stg code is not one of my favourite. You have to fix the conflict yet before this can be merged. @parrt This is ready to be merged, once the conflict is resolved. |
@mike-lischke @ericvergnaud looks like we have a build failure, but in Swift...should we pull this C++ PR in anyway? |
@parrt Yes we should |
Ugh, Github uses your pesonal machine @ericvergnaud? That's an odd set up and I don't know if I would rely on it if I were the repo owner. |
@mike-lischke it's the only way we found to build on Mac at an affordable cost. Linux and Windows CI remain in the cloud. |
(it's not my personal machine, it's a dedicated one, funded by our favorite BDFL) |
Hello. I recently found a way to see how much space each symbol takes in an executable. To my surprise, I found that the winner for most space taken is
Lexer::Initializer::Initializer
, with more than 100kb of code.The interesting code for that function looks like:
_serializedATN = { <big constant array> }
In my case, the ATN has about 8000 values of uint16.
While gcc and clang do the right think and place the array in .rdata(aka 16 kb), and then memcpy() it in the vector; msvc does not.
At a language level, that code:
Msvc does exactly that, plus the part where it sets each value individually in code. Example of how 2 uint16 get set:
Each uint16 gets a fantastic 13 bytes of x86 code to be set up. 13 bytes x 8000 values = 104000 bytes = 104kb. That's where my huge function size was coming for. You could say that's the compiler fault that it inlines this and you'd be right, but until msvc gets this fixed I think the sensible things to do is to generate C++ code that doesn't confuse any of the major compilers.
Now, that's what I was trying to do here, but I'm not sure that how I modified is the best way that can be done(I have minimal experience with java and .stg files). What I did is to remove the
<if (rest(model.segments))>
which was always false in my case and always generate "segments", and now it always generates code like:which works as expected on every compiler. This change makes my executable 84 kb smaller, and I'm pretty sure that 16kb data + memcpy() is faster than 100 kb of x86 code.
There are few more horrible things in what antlr generates for C++, like duplicating that ATN numbers in both the lexer and parser cpps, but the compilers seem to figure that out and not generate duplicate code. This is the most important change that I could find at this moment.