-
Notifications
You must be signed in to change notification settings - Fork 12.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[clang-repl] Keep the first llvm::Module empty to avoid invalid memory access. #89031
Conversation
@llvm/pr-subscribers-clang Author: Vassil Vassilev (vgvassilev) ChangesClang's CodeGen is designed to work with a single llvm::Module. In many cases for convenience various CodeGen parts have a reference to the llvm::Module (TheModule or Module) which does not change when a new module is pushed. However, the execution engine wants to take ownership of the module which does not map well to CodeGen's design. To work this around we clone the module and pass it down. With some effort it is possible to teach CodeGen to ask the CodeGenModule for its current module and that would have an overall positive impact on CodeGen improving the encapsulation of various parts but that's not resilient to future regression. This patch takes a more conservative approach and clones the llvm::Module before passing it to the Jit. That's also not bullet proof because we have to guarantee that CodeGen does not write on the blueprint. At that stage that seems more consistent to what clang-repl already does to map each partial translation unit to a new Module. This change will fixes a long-standing invalid memory access reported by valgrind when we enable the TBAA optimization passes. It also unblock progress on llvm/llvm-project#84758. Full diff: https://github.com/llvm/llvm-project/pull/89031.diff 1 Files Affected:
diff --git a/clang/lib/Interpreter/IncrementalExecutor.cpp b/clang/lib/Interpreter/IncrementalExecutor.cpp
index 6f036107c14a9c..e87f43f077f379 100644
--- a/clang/lib/Interpreter/IncrementalExecutor.cpp
+++ b/clang/lib/Interpreter/IncrementalExecutor.cpp
@@ -28,6 +28,7 @@
#include "llvm/IR/Module.h"
#include "llvm/Support/ManagedStatic.h"
#include "llvm/Support/TargetSelect.h"
+#include "llvm/Transforms/Utils/Cloning.h"
// Force linking some of the runtimes that helps attaching to a debugger.
LLVM_ATTRIBUTE_USED void linkComponents() {
@@ -73,7 +74,15 @@ llvm::Error IncrementalExecutor::addModule(PartialTranslationUnit &PTU) {
Jit->getMainJITDylib().createResourceTracker();
ResourceTrackers[&PTU] = RT;
- return Jit->addIRModule(RT, {std::move(PTU.TheModule), TSCtx});
+ // Clang's CodeGen is designed to work with a single llvm::Module. In many
+ // cases for convenience various CodeGen parts have a reference to the
+ // llvm::Module (TheModule or Module) which does not change when a new module
+ // is pushed. However, the execution engine wants to take ownership of the
+ // module which does not map well to CodeGen's design. To work this around
+ // we clone the module and pass it down.
+ std::unique_ptr<llvm::Module> ModuleClone = llvm::CloneModule(*PTU.TheModule);
+
+ return Jit->addIRModule(RT, {std::move(ModuleClone), TSCtx});
}
llvm::Error IncrementalExecutor::removeModule(PartialTranslationUnit &PTU) {
|
@@ -73,7 +74,15 @@ llvm::Error IncrementalExecutor::addModule(PartialTranslationUnit &PTU) { | |||
Jit->getMainJITDylib().createResourceTracker(); | |||
ResourceTrackers[&PTU] = RT; | |||
|
|||
return Jit->addIRModule(RT, {std::move(PTU.TheModule), TSCtx}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if we are attaching a resource tracker (RT
) the Jit decides to destroy the ThreadSafeModule
.
4805b4b
to
d396330
Compare
d396330
to
e5aae5f
Compare
After an offline discussion with @lhames, we have a more simplified approach which should consume less memory. Now we just keep the first llvm::Module empty and make sure it's used only for read-only purposes such as computing the llvm::DataLayout out of it in CodeGen. |
if (getCodeGen()) { | ||
PTU->TheModule = std::move(GenModule()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm my understanding of the change: this is an independent deduplication cleanup, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, independent clean up.
// of the module which does not map well to CodeGen's design. To work this | ||
// around we clone the module and pass it down. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this comment needs an update?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
e5aae5f
to
3c9698f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's a trivial reproducer for the original issue it may be worth adding it as a regression test.
It is manifested in #84758 which has a test. That PR will unblock the other one which will make the test more explicit by changing the valgrind invalid memory access into a proper crash. |
…y access. Clang's CodeGen is designed to work with a single llvm::Module. In many cases for convenience various CodeGen parts have a reference to the llvm::Module (TheModule or Module) which does not change when a new module is pushed. However, the execution engine wants to take ownership of the module which does not map well to CodeGen's design. To work this around we clone the module and pass it down. With some effort it is possible to teach CodeGen to ask the CodeGenModule for its current module and that would have an overall positive impact on CodeGen improving the encapsulation of various parts but that's not resilient to future regression. This patch takes a more conservative approach and keeps the first llvm::Module empty intentionally and does not pass it to the Jit. That's also not bullet proof because we have to guarantee that CodeGen does not write on the blueprint. However, we have inserted some assertions to catch accidental additions to that canary module. This change will fixes a long-standing invalid memory access reported by valgrind when we enable the TBAA optimization passes. It also unblock progress on llvm#84758.
3c9698f
to
751a3da
Compare
Looks like the windows builders have been dead since a day or two... |
…id memory access. (#89031)" Original commit message: " Clang's CodeGen is designed to work with a single llvm::Module. In many cases for convenience various CodeGen parts have a reference to the llvm::Module (TheModule or Module) which does not change when a new module is pushed. However, the execution engine wants to take ownership of the module which does not map well to CodeGen's design. To work this around we clone the module and pass it down. With some effort it is possible to teach CodeGen to ask the CodeGenModule for its current module and that would have an overall positive impact on CodeGen improving the encapsulation of various parts but that's not resilient to future regression. This patch takes a more conservative approach and keeps the first llvm::Module empty intentionally and does not pass it to the Jit. That's also not bullet proof because we have to guarantee that CodeGen does not write on the blueprint. However, we have inserted some assertions to catch accidental additions to that canary module. This change will fixes a long-standing invalid memory access reported by valgrind when we enable the TBAA optimization passes. It also unblock progress on #84758. " This patch reverts adc4f62 and removes the check of `named_metadata_empty` of the first llvm::Module because on darwin clang inserts some harmless metadata which we can ignore.
…y access. (llvm#89031) Clang's CodeGen is designed to work with a single llvm::Module. In many cases for convenience various CodeGen parts have a reference to the llvm::Module (TheModule or Module) which does not change when a new module is pushed. However, the execution engine wants to take ownership of the module which does not map well to CodeGen's design. To work this around we clone the module and pass it down. With some effort it is possible to teach CodeGen to ask the CodeGenModule for its current module and that would have an overall positive impact on CodeGen improving the encapsulation of various parts but that's not resilient to future regression. This patch takes a more conservative approach and keeps the first llvm::Module empty intentionally and does not pass it to the Jit. That's also not bullet proof because we have to guarantee that CodeGen does not write on the blueprint. However, we have inserted some assertions to catch accidental additions to that canary module. This change will fixes a long-standing invalid memory access reported by valgrind when we enable the TBAA optimization passes. It also unblock progress on llvm#84758.
…id memory access. (llvm#89031)" This reverts commit ca09045 and 1faf314 because it broke a darwin bot.
…id memory access. (llvm#89031)" Original commit message: " Clang's CodeGen is designed to work with a single llvm::Module. In many cases for convenience various CodeGen parts have a reference to the llvm::Module (TheModule or Module) which does not change when a new module is pushed. However, the execution engine wants to take ownership of the module which does not map well to CodeGen's design. To work this around we clone the module and pass it down. With some effort it is possible to teach CodeGen to ask the CodeGenModule for its current module and that would have an overall positive impact on CodeGen improving the encapsulation of various parts but that's not resilient to future regression. This patch takes a more conservative approach and keeps the first llvm::Module empty intentionally and does not pass it to the Jit. That's also not bullet proof because we have to guarantee that CodeGen does not write on the blueprint. However, we have inserted some assertions to catch accidental additions to that canary module. This change will fixes a long-standing invalid memory access reported by valgrind when we enable the TBAA optimization passes. It also unblock progress on llvm#84758. " This patch reverts adc4f62 and removes the check of `named_metadata_empty` of the first llvm::Module because on darwin clang inserts some harmless metadata which we can ignore.
…id memory access. (llvm#89031)" Original commit message: " Clang's CodeGen is designed to work with a single llvm::Module. In many cases for convenience various CodeGen parts have a reference to the llvm::Module (TheModule or Module) which does not change when a new module is pushed. However, the execution engine wants to take ownership of the module which does not map well to CodeGen's design. To work this around we clone the module and pass it down. With some effort it is possible to teach CodeGen to ask the CodeGenModule for its current module and that would have an overall positive impact on CodeGen improving the encapsulation of various parts but that's not resilient to future regression. This patch takes a more conservative approach and keeps the first llvm::Module empty intentionally and does not pass it to the Jit. That's also not bullet proof because we have to guarantee that CodeGen does not write on the blueprint. However, we have inserted some assertions to catch accidental additions to that canary module. This change will fixes a long-standing invalid memory access reported by valgrind when we enable the TBAA optimization passes. It also unblock progress on llvm#84758. " This patch reverts adc4f62 and removes the check of `named_metadata_empty` of the first llvm::Module because on darwin clang inserts some harmless metadata which we can ignore. (cherry picked from commit a3f07d3)
…id memory access. (llvm#89031)" Original commit message: " Clang's CodeGen is designed to work with a single llvm::Module. In many cases for convenience various CodeGen parts have a reference to the llvm::Module (TheModule or Module) which does not change when a new module is pushed. However, the execution engine wants to take ownership of the module which does not map well to CodeGen's design. To work this around we clone the module and pass it down. With some effort it is possible to teach CodeGen to ask the CodeGenModule for its current module and that would have an overall positive impact on CodeGen improving the encapsulation of various parts but that's not resilient to future regression. This patch takes a more conservative approach and keeps the first llvm::Module empty intentionally and does not pass it to the Jit. That's also not bullet proof because we have to guarantee that CodeGen does not write on the blueprint. However, we have inserted some assertions to catch accidental additions to that canary module. This change will fixes a long-standing invalid memory access reported by valgrind when we enable the TBAA optimization passes. It also unblock progress on llvm#84758. " This patch reverts adc4f62 and removes the check of `named_metadata_empty` of the first llvm::Module because on darwin clang inserts some harmless metadata which we can ignore. (cherry picked from commit a3f07d3)
Clang's CodeGen is designed to work with a single llvm::Module. In many cases
for convenience various CodeGen parts have a reference to the llvm::Module
(TheModule or Module) which does not change when a new module is pushed.
However, the execution engine wants to take ownership of the module which does
not map well to CodeGen's design. To work this around we clone the module and
pass it down.
With some effort it is possible to teach CodeGen to ask the CodeGenModule for
its current module and that would have an overall positive impact on CodeGen
improving the encapsulation of various parts but that's not resilient to future
regression.
This patch takes a more conservative approach and keeps the first llvm::Module
empty intentionally and does not pass it to the Jit. That's also not bullet
proof because we have to guarantee that CodeGen does not write on the
blueprint. However, we have inserted some assertions to catch accidental
additions to that canary module.
This change will fixes a long-standing invalid memory access reported by
valgrind when we enable the TBAA optimization passes. It also unblock progress
on #84758.