-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RecursiveASTVisitor slow to compile #93462
Comments
cc @AaronBallman @Endilll in case you have any ideas. |
@llvm/issue-subscribers-clang-frontend Author: Nikita Popov (nikic)
Compiling SourceCodeTest.cpp currently takes >50s (with clang 18 as host compiler). The reason is that it has 12 different subclasses of RecursiveASTVisitor. I believe this is also the root cause for a new other clang files with long compile times.
The basic problem is that I'm not entirely sure how this can be fixed -- I'd expect that using virtual methods would make it too slow. Maybe there is some kind of middle ground. |
Imo it might be worth exploring this since it is the more or less ‘obvious’ solution; I can look into refactoring the AST visitor to use virtual functions instead to see how much of a difference it would make, if that would help. @nikic Also, iirc, you’re the one who maintains the compile-time tracker, so I’d appreciate it if you could help w/ the benchmarking in that case because I’m candidly not entirely sure how to do that ‘properly’... |
Sure! I added your fork to the server, so you can now run experiments by pushing a branch that starts with |
Thanks! |
Ok, I’ve been making some progress, but I’ve run into a few issues that I’m not 100% sure how to solve:
|
(which simply just checks if two member function pointers are equal) |
I think that Post-order traversal will have to stay, I guess. I don't think it can be replaced with anything else. |
Yeah, I’m not really planning on removing it; it’s just that I’m not entierly sure how to refactor that particular part of it. |
@Sirraide are you exploring an idea to replace CRTP with dynamic inheritance in a single commit? (I suspect there's some prototype in your fork, but I didn't get a chance to poke at it yet, it'd be nice if you could provide a link to it) I suspect that using virtual functions instead of CRTP will be too slow in some cases, but would work nicely for most cases (and compile-time wins will make it a good trade-off overall). This test you mention is a good example of the latter, I suspect. Maybe a hybrid approach is what we need here?
Benefits:
Cons:
Open questions:
What do people think about that approach? |
I am just to see how it would go; I haven’t pushed anything yet because I have yet to get it to compile because of issues mentioned above.
Yeah, I’ve been thinking about that as well: One idea would be to do something like this: template <typename>
class RecursiveASTVisitor : RecursiveASTVisitorImpl {}; I.e. accept the template parameter but do nothing with it. However, there are a few other subtle issues; e.g. if you define bool VisitCallExpr(const CallExpr* CE); then that will break, because the base class function doesn’t have
Yeah, the fact of the matter is that the current pattern entails instantiating this class, which is over 10 000 LOC after macro expansion, over 100 times, so the theory that this is what’s causing at least some of the long compile times seems very plausible to me. The vast majority of AST visitors just override one or maybe two functions, and instantiating the entire class in those cases just seems really unnecessary...
Yeah, honestly, I’ve also been thinking about having two implementations: one using virtual functions, one that uses CRTP (and the former can just delegate to the latter by default). The downside is that now every time we add a function to
Yeah, I agree with pretty much all of this.
That’s true, but then again, the current diff is 160 files changed, and 5000 lines added and removed, so this would probably be much less complex in the short term at least.
We’re already using macros etc. for declaring the vast majority of the How about this then as a new approach: First, we introduce
Honestly, constructor parameters (or maybe just one parameter taking some
I personally agree that this is probably the way to go—there are a number of benefits to doing it this way, and moreover, this way we actually have a gradual migration path instead of requiring us (and downstream users) to change everything at once. I’ll experiment with this to see how it would go; the upside of this approach also is that I’ll be able to move much more quickly because I won’t have to refactor 200 AST visitors... |
Not just me actually because reviewing 5000+ LOC worth of changes across 160 files sounds fairly unreasonable... |
I’ve looked into it, and it seems at least the naive approach that uses no template instantiation whatsoever in I’ve only migrated some of the visitors (one of them being |
@Sirraide Thanks! Those results look quite promising to me. DynamicRecursiveASTVisitor has some overhead, but it's honestly less than I expected. I thought this might end up being only suitable for tests, but it looks like it may be applicable to many visitors that aren't particularly hot. (If you're wondering why your last commit had no impact on compile-time, it's because CLANG_ENABLE_ARCMT is disabled in the build here: |
I wasn’t entirely sure as to what constitutes an acceptable overhead, so that’s good to know.
That’s what I was thinking as well, yeah (that said, I have migrated at least one visitor that I know gets called rather often—that being
Ah, that explains it; I was rather confused by that one. I’ll work on migrating some more visitors to using On that note, I’m also not familiar with all of Clang, so while I know that some visitors (e.g. My plan currently is to refactor as many of them as we can get away with, and, if that ends up incurring too much overhead, which it well might, then we can move some of the more commonly used ones back to the CRTP approach. |
Also for the record, with the current approach, every call to a |
I also think that the overhead of less than 1% for significant compile-time wins would probably be acceptable. That being said, if people agree that having something like |
Yeah, as I pointed out, I’m still working on that to see how it’d go.
Currently, it seems like we’ll have to keep So far, we’ve landed on not allowing users to override the I’ve already implemented this change (that is, making (This week has also been unfortunately busy for me, but I will fortunately have a lot more time starting later next week.) |
Making those functions non-virtual does seem like a good idea given that few visitors actually care. It's a one-way road, so even if we have to take it, doing it later rather than earlier seems appropriate.
No rush. And sorry if I sounded like it's urgent. Definitely take your time on this, I was just pointing out that it's probably ok to land this even before we switch all the visitors (even if we don't switch them). Having a way to trade-off compile-time for runtime seems like a good change in general, given that you seem to have found an approach that does not incur a lot of maintenance costs (be it TableGen or mostly-mechanical changes). |
No problem! Just wanted to mention that real quick.
I’m definitely planning to land the However, we can definitely split up the actual visitor implementation and migrating the visitors into separate prs if need be or if that would be preferrable. I personally don’t really have strong opinions on this.
Yeah, with the current approach, most of the implementation of the dynamic visitor is just done via X-macros, so if e.g. a new AST node is added, then just implementing traversal for the CTRP-based |
#101305 is the newest RecursiveASTVisitor victim. Those 50 lines are 0.5% of total clang binary size... |
Yeah, I think Clang’s stage 2 binary size is down by like 7% on my RAV refactor branch as a result of replacing a bunch of them. I’m still working on that, but it’s taking some time because there are just that many of them... I’ll see if can to open a pr for what I have so far soon; if the implementation I have seems reasonable, then maybe people can start using the dynamic one instead of adding more CRTP-based ones... I definitely want to finish migrating all (or most) of the RAV unit tests to use the dynamic visitor (which I’m almost done w/) just as a sanity check to make sure I didn’t get something horribly wrong somewhere (I do run all of the clang tests regularly to check I didn’t break anything, but still...) |
Thanks a lot for driving this! I think we should definitely land some incremental PRs here. It would relieve the pressure from you personally and potentially make things much more parallel. Especially if we have a convincing case for a few visitors that migrated without a noticeable or significant performance drop. (E.g. I'm definitely happy to help migrate some visitors myself) |
Yeah, I’ve been migrating a lot of them mostly to figure out what we should and shouldn’t support. E.g. there are less than 5 or so visitors in the entire codebase (that I could find) that either override I’m currently just doing some testing to try and figure out what visitors are called most often etc. and I’ll open a pr for what I have so far when I’m done with that. |
Just created a pr for this: #105195 |
…o use DynamicRecursiveASTVisitor (#115144) This pr refactors all recursive AST visitors in `Sema`, `Analyze`, and `StaticAnalysis` to inherit from DRAV instead. This is over half of the visitors that inherit from RAV directly. See also #115132, #110040, #93462 LLVM Compile-Time Tracker link for this branch: https://llvm-compile-time-tracker.com/compare.php?from=5adb5c05a2e9f31385fbba8b0436cbc07d91a44d&to=b58e589a86c06ba28d4d90613864d10be29aa5ba&stat=instructions%3Au
…o use DynamicRecursiveASTVisitor (llvm#115144) This pr refactors all recursive AST visitors in `Sema`, `Analyze`, and `StaticAnalysis` to inherit from DRAV instead. This is over half of the visitors that inherit from RAV directly. See also llvm#115132, llvm#110040, llvm#93462 LLVM Compile-Time Tracker link for this branch: https://llvm-compile-time-tracker.com/compare.php?from=5adb5c05a2e9f31385fbba8b0436cbc07d91a44d&to=b58e589a86c06ba28d4d90613864d10be29aa5ba&stat=instructions%3Au
Compiling SourceCodeTest.cpp currently takes >50s (with clang 18 as host compiler). The reason is that it has 12 different subclasses of RecursiveASTVisitor. I believe this is also the root cause for a few other clang files with long compile times.
The basic problem is that
RecursiveASTVisitor
is a CRTP construction, which means that every subclass requires a completely separate instantiation of the visitor implementation -- which is huge. LLVM takes a long time to chew through that.I'm not entirely sure how this can be fixed -- I'd expect that using virtual methods would make it too slow. Maybe there is some kind of middle ground.
The text was updated successfully, but these errors were encountered: