Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RecursiveASTVisitor slow to compile #93462

Open
nikic opened this issue May 27, 2024 · 26 comments
Open

RecursiveASTVisitor slow to compile #93462

nikic opened this issue May 27, 2024 · 26 comments
Assignees
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema"

Comments

@nikic
Copy link
Contributor

nikic commented May 27, 2024

Compiling SourceCodeTest.cpp currently takes >50s (with clang 18 as host compiler). The reason is that it has 12 different subclasses of RecursiveASTVisitor. I believe this is also the root cause for a few other clang files with long compile times.

The basic problem is that RecursiveASTVisitor is a CRTP construction, which means that every subclass requires a completely separate instantiation of the visitor implementation -- which is huge. LLVM takes a long time to chew through that.

I'm not entirely sure how this can be fixed -- I'd expect that using virtual methods would make it too slow. Maybe there is some kind of middle ground.

@nikic nikic added clang Clang issues not falling into any other category and removed new issue labels May 27, 2024
@nikic
Copy link
Contributor Author

nikic commented May 27, 2024

cc @AaronBallman @Endilll in case you have any ideas.

@EugeneZelenko EugeneZelenko added clang:frontend Language frontend issues, e.g. anything involving "Sema" and removed clang Clang issues not falling into any other category labels May 27, 2024
@llvmbot
Copy link
Member

llvmbot commented May 27, 2024

@llvm/issue-subscribers-clang-frontend

Author: Nikita Popov (nikic)

Compiling SourceCodeTest.cpp currently takes >50s (with clang 18 as host compiler). The reason is that it has 12 different subclasses of RecursiveASTVisitor. I believe this is also the root cause for a new other clang files with long compile times.

The basic problem is that RecursiveASTVisitor is a CRTP construction, which means that every subclass requires a completely separate instantiation of the visitor implementation -- which is huge. LLVM takes a long time to chew through that.

I'm not entirely sure how this can be fixed -- I'd expect that using virtual methods would make it too slow. Maybe there is some kind of middle ground.

@Sirraide
Copy link
Member

I'd expect that using virtual methods would make it too slow

Imo it might be worth exploring this since it is the more or less ‘obvious’ solution; I can look into refactoring the AST visitor to use virtual functions instead to see how much of a difference it would make, if that would help.

@nikic Also, iirc, you’re the one who maintains the compile-time tracker, so I’d appreciate it if you could help w/ the benchmarking in that case because I’m candidly not entirely sure how to do that ‘properly’...

@nikic
Copy link
Contributor Author

nikic commented May 27, 2024

I'd expect that using virtual methods would make it too slow

Imo it might be worth exploring this since it is the more or less ‘obvious’ solution; I can look into refactoring the AST visitor to use virtual functions instead to see how much of a difference it would make, if that would help.

@nikic Also, iirc, you’re the one who maintains the compile-time tracker, so I’d appreciate it if you could help w/ the benchmarking in that case because I’m candidly not entirely sure how to do that ‘properly’...

Sure! I added your fork to the server, so you can now run experiments by pushing a branch that starts with perf/.

@Sirraide
Copy link
Member

Sure! I added your fork to the server, so you can now run experiments by pushing a branch that starts with perf/.

Thanks!

@Sirraide
Copy link
Member

Ok, I’ve been making some progress, but I’ve run into a few issues that I’m not 100% sure how to solve:

  1. TRAVERSE_STMT_BASE from what I can tell checks whether a derived class’s function accepts a DataRecursionQueue and passes it along if it does. The thing that confuses me a bit about this is that I couldn’t find a single function in any derived class that does anything with this queue whatsoever. It’s only the RecursiveASTVisitor itself that seems to be using it. Seeing as this isn’t really possible with virtual functions, can we get away with just storing the current queue in the visitor and pushing/popping a new one in some places (e.g. in TraverseStmt)?
  2. Somewhat relatedly, PostVisitStmt uses isSameMethod to check if a derived class overrides a method... which doesn’t work at all with this approach. I’m honestly not sure what the best solution to this would be. That said, postorder traversal seems to be relatively uncommon (I could only find about 5 visitors that use it, as opposed to the 100+ visitors using preorder traversal), so maybe imposing additional constraints on postorder traversal would be an option if need be.

@Sirraide
Copy link
Member

CC @AaronBallman

@Sirraide
Copy link
Member

PostVisitStmt uses isSameMethod

(which simply just checks if two member function pointers are equal)

@Endilll
Copy link
Contributor

Endilll commented May 29, 2024

I think that RecursiveASTVisitor is one of the interfaces we offer users. While we don't promise stability, the conservative assumption is that everything there is used by someone even if it's not used in our repository. So if you're going to remove anything, we should offer a transition path.

Post-order traversal will have to stay, I guess. I don't think it can be replaced with anything else.

@Sirraide
Copy link
Member

Post-order traversal will have to stay, I guess. I don't think it can be replaced with anything else.

Yeah, I’m not really planning on removing it; it’s just that I’m not entierly sure how to refactor that particular part of it.

@ilya-biryukov
Copy link
Contributor

@Sirraide are you exploring an idea to replace CRTP with dynamic inheritance in a single commit? (I suspect there's some prototype in your fork, but I didn't get a chance to poke at it yet, it'd be nice if you could provide a link to it)
I am not sure that's feasible as there are many downstream consumers of RAV that won't be able to switch quickly.

I suspect that using virtual functions instead of CRTP will be too slow in some cases, but would work nicely for most cases (and compile-time wins will make it a good trade-off overall). This test you mention is a good example of the latter, I suspect.

Maybe a hybrid approach is what we need here?

  • Create a new class DynamicRecursiveASTVisitor : RecursiveASTVisitor<DynamicRecursiveASTVisitor> { ... }.
  • All functions inside it are virtual and by default call into RecursiveASTVisitor.
  • Places that care about performance keep using RecursiveASTVisitor, places that don't derive from DynamicRecursiveASTVisitor and get better compile times, don't explode code and error checking.

Benefits:

  • we can switch uses of RAV one by one. Downstream users get a choice.
  • there is an explicit trade-off between compile-time and runtime performance.
  • if we ever find ourselves using DynamicRecursiveASTVisitor in all places, we can remove the original RAV with CRTP or make it a private implementation detail.

Cons:

  • extra complexity in implementation of DynamicRecursiveASTVisitor in addition to RAV (I think it's mostly manageable if we use .inc files for generating signatures and avoid exposing methods we don't necessarily need).

Open questions:

  • how to expose all the various configuration knobs that RAV has in the dynamic version (I think for the most part they could be parameters to the constructor and stored as fields, but also picking some specific default and sticking to them could make sense)

What do people think about that approach?

@Sirraide
Copy link
Member

Sirraide commented May 29, 2024

@Sirraide are you exploring an idea to replace CRTP with dynamic inheritance in a single commit? (I suspect there's some prototype in your fork, but I didn't get a chance to poke at it yet, it'd be nice if you could provide a link to it)

I am just to see how it would go; I haven’t pushed anything yet because I have yet to get it to compile because of issues mentioned above.

I am not sure that's feasible as there are many downstream consumers of RAV that won't be able to switch quickly.

Yeah, I’ve been thinking about that as well: One idea would be to do something like this:

template <typename>
class RecursiveASTVisitor : RecursiveASTVisitorImpl {};

I.e. accept the template parameter but do nothing with it. However, there are a few other subtle issues; e.g. if you define VisitCallExpr like so in your derived class

bool VisitCallExpr(const CallExpr* CE);

then that will break, because the base class function doesn’t have const there, so this isn’t overriding anything at all, it’s just hiding the base class function (which works with CRTP, but not with virtual functions). This is fairly error-prone imo, and I don’t think there is a good way of avoiding it... (yes, adding override does avoid this problem, but downstream users would have to know about that and refactor all their code)

I suspect that using virtual functions instead of CRTP will be too slow in some cases, but would work nicely for most cases (and compile-time wins will make it a good trade-off overall).

Yeah, the fact of the matter is that the current pattern entails instantiating this class, which is over 10 000 LOC after macro expansion, over 100 times, so the theory that this is what’s causing at least some of the long compile times seems very plausible to me. The vast majority of AST visitors just override one or maybe two functions, and instantiating the entire class in those cases just seems really unnecessary...

Maybe a hybrid approach is what we need here?

  • Create a new class DynamicRecursiveASTVisitor : RecursiveASTVisitor<DynamicRecursiveASTVisitor> { ... }.
  • All functions inside it are virtual and by default call into RecursiveASTVisitor.
  • Places that care about performance keep using RecursiveASTVisitor, places that don't derive from DynamicRecursiveASTVisitor and get better compile times, don't explode code and error checking.

Yeah, honestly, I’ve also been thinking about having two implementations: one using virtual functions, one that uses CRTP (and the former can just delegate to the latter by default). The downside is that now every time we add a function to (D)RAV we need to remember to add it in two places (though maybe not really because for AST nodes at least, the declarations are and the definitions could be generated via X macros, and that should cover the vast majority of functions we’ll ever need), but also, this won’t break everyone’s code and would allow us to migrate visitors selectively rather than all of them at once.

Benefits:

  • we can switch uses of RAV one by one. Downstream users get a choice.
  • there is an explicit trade-off between compile-time and runtime performance.
  • if we ever find ourselves using DynamicRecursiveASTVisitor in all places, we can remove the original RAV with CRTP or make it a private implementation detail.

Yeah, I agree with pretty much all of this.

Cons:

  • extra complexity in implementation of DynamicRecursiveASTVisitor in addition to RAV

That’s true, but then again, the current diff is 160 files changed, and 5000 lines added and removed, so this would probably be much less complex in the short term at least.

(I think it's mostly manageable if we use .inc files for generating signatures and avoid exposing methods we don't necessarily need).

We’re already using macros etc. for declaring the vast majority of the VisitX/TraverseX/etc. functions, so that should be fairly straight-forward.

How about this then as a new approach: First, we introduce DynamicRecursiveASTVisitor or whatever we decide calling it and add only the VisitX/TraverseX/WalkUpFromX functions, and also things like shouldVisitImplicitCode; the same pr that does that can also rewrite maybe 5–10 visitors (maybe all the ones in that test case) to see if doing so actually helps improve compile times. Then, we can gradually migrate AST visitors over to the new model. I’d also not support postorder walking for now until we’ve figured out if there are any problems with it.

Open questions:

  • how to expose all the various configuration knobs that RAV has in the dynamic version (I think for the most part they could be parameters to the constructor and stored as fields, but also picking some specific default and sticking to them could make sense)

Honestly, constructor parameters (or maybe just one parameter taking some Options object) sounds like a better idea than requiring users to override 10 different functions that return a bool; that said, most visitors only override one or two of them—if at all—and one more thing to consider is that the current approach allows users to do more complex things in those functions than just return true or false—that said, I can’t recall seeing a visitor that does that (and I’ve seen most of them by now at least in this codebase), except for in some tests where the visitor stores a bool member that is then returned from that function, but that’s pretty much it.

What do people think about that approach?

I personally agree that this is probably the way to go—there are a number of benefits to doing it this way, and moreover, this way we actually have a gradual migration path instead of requiring us (and downstream users) to change everything at once.

I’ll experiment with this to see how it would go; the upside of this approach also is that I’ll be able to move much more quickly because I won’t have to refactor 200 AST visitors...

@Sirraide Sirraide self-assigned this May 30, 2024
@Sirraide
Copy link
Member

Sirraide commented May 30, 2024

I’ll be able to move much more quickly

Not just me actually because reviewing 5000+ LOC worth of changes across 160 files sounds fairly unreasonable...

@Sirraide
Copy link
Member

Sirraide commented Jun 6, 2024

I’ve looked into it, and it seems at least the naive approach that uses no template instantiation whatsoever in DynamicRecursiveASTVisitor does make compiling Clang faster, but as expected, compiling other things seems to be a bit slower as a result.

I’ve only migrated some of the visitors (one of them being CollectUnexpandedParameterPacksVisitor or whatever it was called), but in case anyone’s curious, these are the results so far: https://llvm-compile-time-tracker.com/compare.php?from=e1c3e16d24b5cc097ff08e9283f53319acd3f245&to=ac15b6d005788cff0bbb5e393d43de916caead7f&stat=instructions%3Au

@nikic
Copy link
Contributor Author

nikic commented Jun 7, 2024

@Sirraide Thanks! Those results look quite promising to me. DynamicRecursiveASTVisitor has some overhead, but it's honestly less than I expected. I thought this might end up being only suitable for tests, but it looks like it may be applicable to many visitors that aren't particularly hot.

(If you're wondering why your last commit had no impact on compile-time, it's because CLANG_ENABLE_ARCMT is disabled in the build here:
https://github.com/nikic/llvm-compile-time-tracker/blob/2222d44c6be2cf490b88ae88f4da30b625aa9bb1/cmake_llvm_project_stage2.sh#L17-L18).

@Sirraide
Copy link
Member

Sirraide commented Jun 7, 2024

Those results look quite promising to me.

I wasn’t entirely sure as to what constitutes an acceptable overhead, so that’s good to know.

it looks like it may be applicable to many visitors that aren't particularly hot.

That’s what I was thinking as well, yeah (that said, I have migrated at least one visitor that I know gets called rather often—that being CollectUnexpandedParameterPacksVisitor—but that was mostly to see how much overhead that would cause).

(If you're wondering why your last commit had no impact on compile-time, it's because
CLANG_ENABLE_ARCMT is disabled in the build here:

Ah, that explains it; I was rather confused by that one.

I’ll work on migrating some more visitors to using DynamicRecursiveASTVisitor—especially the ones that are used for testing shouldn’t be an issue (the only exception are tests that test the AST visitor itself; my current plan for those is to test both RecursiveASTVisitor and DynamicRecursiveASTVisitor).

On that note, I’m also not familiar with all of Clang, so while I know that some visitors (e.g. CollectUnexpandedParameterPacksVisitor) are called more often than others, I would appreciate it if people that are more familiar w/ other parts of Clang (e.g. codegen or whatever ARCMigrate is actually for—I don’t really know Objective-C...) could let me know as to what visitors tend to be called more often and which we thus might not want to refactor to use DynamicRecursiveASTVisitor.

My plan currently is to refactor as many of them as we can get away with, and, if that ends up incurring too much overhead, which it well might, then we can move some of the more commonly used ones back to the CRTP approach.

@Sirraide
Copy link
Member

Sirraide commented Jun 7, 2024

Also for the record, with the current approach, every call to a Traverse, Visit, or WalkUpFrom function ends up being a virtual call, but I don’t really see a way around that...

@ilya-biryukov
Copy link
Contributor

I also think that the overhead of less than 1% for significant compile-time wins would probably be acceptable.
At the same time, I am worried we might see larger overhead if we migrate all known visitors.
I did expect larger compile-time wins, but we probably need to migrate more visitors for that.

That being said, if people agree that having something like DynamicRecursiveASTVisitor as an alternative to RecursiveASTVisitor in the Clang codebase is a good idea, we should probably add it now and give people it as an alternative. (That would mean subscribing to potentially support both visitor types forever, at least until someone makes a compelling case that removing the original CRTP visitor has more benefits than the overhead it causes).
But I would personally be in favour of that as ergonomics of virtual functions would be a win for stuff that I often do, i.e. writing clang-based tools rather than the compiler itself.

@Sirraide
Copy link
Member

At the same time, I am worried we might see larger overhead if we migrate all known visitors.
I did expect larger compile-time wins, but we probably need to migrate more visitors for that.

Yeah, as I pointed out, I’m still working on that to see how it’d go.

(That would mean subscribing to potentially support both visitor types forever, at least until someone makes a compelling case that removing the original CRTP visitor has more benefits than the overhead it causes)

Currently, it seems like we’ll have to keep RecursiveASTVisitor around for a while, partially because downstream users might be reliant on it, seeing it is part of the public API. Additionally, I talked to @Endilll about this recently, and we agree that restricting what the dynamic visitor can do should help reduce the overhead a bit.

So far, we’ve landed on not allowing users to override the WalkUpFromX functions, as well as not supporting post-order traversal—both of which are something that are used only by about 5 or so visitors combined across the entire code base; we also discussed whether we could use TableGen for some of this, but the fact that a lot of visitors customise traversal makes it hard to take anything other than an ‘all or nothing’ approach wrt how DynamicRecursiveASTVisitor—and visitors using it—could be implemented...

I’ve already implemented this change (that is, making WalkUpFromX non-virtual and disabling post-order traversal), and while I didn’t see much of a difference, that might also be because there are a lot of visitors I haven’t migrated yet, and there remains some more experimenting to be done here from what I can tell.

(This week has also been unfortunately busy for me, but I will fortunately have a lot more time starting later next week.)

@ilya-biryukov
Copy link
Contributor

Making those functions non-virtual does seem like a good idea given that few visitors actually care. It's a one-way road, so even if we have to take it, doing it later rather than earlier seems appropriate.

(This week has also been unfortunately busy for me, but I will fortunately have a lot more time starting later next week.)

No rush. And sorry if I sounded like it's urgent. Definitely take your time on this, I was just pointing out that it's probably ok to land this even before we switch all the visitors (even if we don't switch them). Having a way to trade-off compile-time for runtime seems like a good change in general, given that you seem to have found an approach that does not incur a lot of maintenance costs (be it TableGen or mostly-mechanical changes).

@Sirraide
Copy link
Member

No rush. And sorry if I sounded like it's urgent.

No problem! Just wanted to mention that real quick.

I was just pointing out that it's probably ok to land this even before we switch all the visitors (even if we don't switch them)

I’m definitely planning to land the DynamicRecursiveASTVisitor implementation, it’s mostly just a matter of figuring out what visitors we want to migrate to it—but also, at the same time, what visitors we do end up migrating to it may influence the implementation (e.g. WalkUpFrom and post-order traversal), so even though it would be possible to add it separately from migrating the visitors.

However, we can definitely split up the actual visitor implementation and migrating the visitors into separate prs if need be or if that would be preferrable. I personally don’t really have strong opinions on this.

you seem to have found an approach that does not incur a lot of maintenance costs

Yeah, with the current approach, most of the implementation of the dynamic visitor is just done via X-macros, so if e.g. a new AST node is added, then just implementing traversal for the CTRP-based RecursiveASTVisitor is the only thing you should have to do.

@nikic
Copy link
Contributor Author

nikic commented Aug 19, 2024

#101305 is the newest RecursiveASTVisitor victim. Those 50 lines are 0.5% of total clang binary size...

@Sirraide
Copy link
Member

Yeah, I think Clang’s stage 2 binary size is down by like 7% on my RAV refactor branch as a result of replacing a bunch of them. I’m still working on that, but it’s taking some time because there are just that many of them...

I’ll see if can to open a pr for what I have so far soon; if the implementation I have seems reasonable, then maybe people can start using the dynamic one instead of adding more CRTP-based ones...

I definitely want to finish migrating all (or most) of the RAV unit tests to use the dynamic visitor (which I’m almost done w/) just as a sanity check to make sure I didn’t get something horribly wrong somewhere (I do run all of the clang tests regularly to check I didn’t break anything, but still...)

@ilya-biryukov
Copy link
Contributor

Thanks a lot for driving this! I think we should definitely land some incremental PRs here.
If we have a dynamic AST visitor available, we could also ask other people in the community to migrate their visitors, where appropriate. (E.g. on Discourse)

It would relieve the pressure from you personally and potentially make things much more parallel. Especially if we have a convincing case for a few visitors that migrated without a noticeable or significant performance drop.

(E.g. I'm definitely happy to help migrate some visitors myself)

@Sirraide
Copy link
Member

Yeah, I’ve been migrating a lot of them mostly to figure out what we should and shouldn’t support. E.g. there are less than 5 or so visitors in the entire codebase (that I could find) that either override WalkUpFromX or use post-order traversal, so the dynamic visitor currently doesn’t support either.

I’m currently just doing some testing to try and figure out what visitors are called most often etc. and I’ll open a pr for what I have so far when I’m done with that.

@Sirraide
Copy link
Member

Just created a pr for this: #105195

Sirraide added a commit that referenced this issue Nov 15, 2024
…o use DynamicRecursiveASTVisitor (#115144)

This pr refactors all recursive AST visitors in `Sema`, `Analyze`, and
`StaticAnalysis` to inherit from DRAV instead. This is over half of the
visitors that inherit from RAV directly.

See also #115132, #110040, #93462

LLVM Compile-Time Tracker link for this branch:
https://llvm-compile-time-tracker.com/compare.php?from=5adb5c05a2e9f31385fbba8b0436cbc07d91a44d&to=b58e589a86c06ba28d4d90613864d10be29aa5ba&stat=instructions%3Au
akshayrdeodhar pushed a commit to akshayrdeodhar/llvm-project that referenced this issue Nov 18, 2024
…o use DynamicRecursiveASTVisitor (llvm#115144)

This pr refactors all recursive AST visitors in `Sema`, `Analyze`, and
`StaticAnalysis` to inherit from DRAV instead. This is over half of the
visitors that inherit from RAV directly.

See also llvm#115132, llvm#110040, llvm#93462

LLVM Compile-Time Tracker link for this branch:
https://llvm-compile-time-tracker.com/compare.php?from=5adb5c05a2e9f31385fbba8b0436cbc07d91a44d&to=b58e589a86c06ba28d4d90613864d10be29aa5ba&stat=instructions%3Au
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema"
Projects
None yet
Development

No branches or pull requests

6 participants