-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-package] Fix R memory leaks (fixes #4282, fixes #3462) #4597
Conversation
@jameslamb I see the linter complains about system headers being included before R headers, but the R extensions manual recommends doing it like that (e.g. in section 6) and in the past it has happened that some compiler updates cause strange breakage in packages when R headers are included before system ones like the linter demands (and it will BTW have an issue with clang13 if any of those LGBM headers include omp.h). Was this for some code aesthetic reason, or is there some issue with the order of header includes that the linter tries to prevent? |
Including I think it is enough with just declaring the function prototypes from |
🎉 thanks for this awesome contribution, really exciting!
After a little investigation, I see that you're referring to these specific errors from build https://github.com/microsoft/LightGBM/runs/3528466185?check_suite_focus=true.
Yes, Since it seems you've removed the header-reordering stuff from this PR, it seems those changes are not strictly related to this one, and could be proposed in a separate PR. If you think cpplint's decision here could lead to issues, we'd welcome a separate PR with the header-reordering for the R package, a link to the specific advice you're referring to from "Writing R Extensions", and a link to any other evidence you can provide that a different ordering is preferable. It's possible to add line-level comments to to suppress false positives from
I'll have to investigate a bit to see if just including the template for |
/gha run r-valgrind Workflow R valgrind tests has been triggered! 🚀 Status: failure ❌. |
/gha run r-solaris Workflow Solaris CRAN check has been triggered! 🚀 solaris-x86-patched: https://builder.r-hub.io/status/lightgbm_3.2.1.99.tar.gz-6b3be38f3c0c4923ae65d045fd766ae6 |
/gha run r-valgrind |
@david-cortes thanks for looking into the valgrind issues. Right now only maintainers can trigger new builds of that check (and other comment-triggered actions in this repo), so I'll do that for you. Alternatively, you can try running it manually using Docker. You can see the relevant setup at https://github.com/microsoft/LightGBM/blob/64f15005040aafa528bbec7f873364d94b7e36a4/.github/workflows/r_valgrind.yml |
/gha run r-valgrind Workflow R valgrind tests has been triggered! 🚀 Status: failure ❌. |
Well, valgrind this time does not report any "definitely lost" or any "indirectly lost" memory. The only thing that I see is a "possibly lost" which is the memory allocated by the C++ exception objects like There's also other "possibly lost" coming from GNU's openmp, and I'd guess those are false alarms since valgrind and libgomp don't play along. The RDvalgrind build is BTW not supposed to use openmp in packages. It additionally has some additional warnings coming from outside the package, such as from R's
|
After some more investigation, it seems in theory a standards-compliant compiler should not produce any memory leak from I've pushed a workaround which should fix it by using a global variable with a string buffer, which I guess should solve the "issue" but it's perhaps not as ideal (now it copies messages from a fixed buffer, to an exception, to another fixed buffer). |
@jameslamb Another unrelated thing: the R examples and tests are not supposed to use more than 1 or 2 threads. One of the tests currently calls function #pragma omp parallel for schedule(static) ... which would make it run with the maximum available number of threads. I guess the proper solution would be to modify that C function to take the number of threads as a parameter, but that'd be bigger then this PR as it's also used by the other interfaces. |
thanks for opening #4598 to document this outside of this PR!
wow awesome, thanks! I'll trigger another valgrind build to test that. |
/gha run r-valgrind Workflow R valgrind tests has been triggered! 🚀 Status: success ✔️. |
Looks like the memory leaks are fixed now. What remains from the valgrind logs (outside the scope of this PR) is to avoid multithreading in the examples (#4598) and to investigate the R string that's being used uninitialized (no idea about it and don't know which example is triggering it). The other check that is failing is unrelated to this PR since it's about the Python version. |
Yep, excellent! We've already gotten CRAN's agreement in the past that remaining "possibly lost" warning is allowable based on their policies, so it's intentionally allowed in our tests here.
LightGBM/.ci/test_r_package_valgrind.sh Lines 46 to 71 in 64f1500
yep, you are right! We have some ongoing issues in CI with a build that tries to emulate ARM64 architectures using QEMU. #4594 tracks it, if you're curious. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me, thanks so much for this awesome contribution!!!!!
I'm extra happy that it looks like we'll be able to get this into the next release (#4310) 😍
How I tested this
-
Checked CI logs for new warnings or other...didn't see any.
-
Tried running one of the tests that triggers an error from the C++ side (the tests whose
testthat::skip()
were removed in this PR). Confirmed that the expected output is printed to the R console (both in RStudio and running a script from a terminal withRscript
), and that an R exception is raised.
- triggered Solaris and
valgrind
CI checks. Both passed.
Given all that, I'm very confident in this fix and really happy to accept it! But I'd like one more review (@StrikerRUS, whenever you have time) before we merged since this change touches the core of the R package.
Extra links to help in reviewing:
R_FlushConsole()
description: https://cran.r-project.org/doc/manuals/R-exts.html#index-R_005fFlushConsole- "Writing R Extensions" recommendations for unwind protection: https://cran.r-project.org/doc/manuals/R-exts.html#Condition-handling-and-cleanup-code
@david-cortes sorry for the inconvenience, but could you please merge the latest |
/gha run r-valgrind Workflow R valgrind tests has been triggered! 🚀 Status: success ✔️. |
/gha run r-solaris Workflow Solaris CRAN check has been triggered! 🚀 solaris-x86-patched: https://builder.r-hub.io/status/lightgbm_3.2.1.99.tar.gz-a657d172ea9f425f875bc5fdfecf0eb7 |
On second thought, these changes would still leave open a very small chance of leaking memory if a continuation token fails to allocate, which should be very improbable but is in theory possible. I’ve modified it now to avoid potentially leaking if that happens, would be good to run valgrind again just in case. |
Nvermind, found a way around it by avoiding C++-type code. |
4c75753
to
adf3339
Compare
/gha run r-valgrind Workflow R valgrind tests has been triggered! 🚀 Status: success ✔️. |
/gha run r-solaris Workflow Solaris CRAN check has been triggered! 🚀 solaris-x86-patched: https://builder.r-hub.io/status/lightgbm_3.2.1.99.tar.gz-fc601db5613e465cadbf604135889147 |
R-package/src/lightgbm_R.cpp
Outdated
catch(std::exception& ex) { LGBM_R_save_exception_msg(ex); goto throw_R_errormsg; } \ | ||
catch(std::string& ex) { LGBM_R_save_exception_msg(ex); goto throw_R_errormsg; } \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to avoid using goto
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. Was added just for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for my own knowledge @StrikerRUS , what is problematic about goto
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jameslamb I'm one of those who are against goto
due to it leads to spaghetti code in general.
https://stackoverflow.com/a/3517765
https://en.wikipedia.org/wiki/Goto#Criticism
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah got it, thanks!
@Laurae2 Are you able to provide a review for this PR? |
/gha run r-valgrind Workflow R valgrind tests has been triggered! 🚀 Status: success ✔️. |
/gha run r-solaris Workflow Solaris CRAN check has been triggered! 🚀 solaris-x86-patched: https://builder.r-hub.io/status/lightgbm_3.2.1.99.tar.gz-e240edd218b3453a883d13cf510fdda7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️ LGTM, thanks a lot for the great help! I don't have any comments or suggestions for this PR, but I think that formal approval should be done by R-package maintainer.
thanks! I'll review the new changes tonight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just re-reviewed the changes and the all look good to me!
@david-cortes thank you SO MUCH for picking up this work and for working with us to keep it compliant with CRAN's requirements around Solaris and valgrind. I'm very excited that we'll be able to get this one into the upcoming 3.3.0 release, and I definitely couldn't have figured this out by myself.
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Fixes #4282
This PR addresses a couple sources of potential memory leaks in the R interface:
* Moves the order of header includes so as to have the R ones at the end as suggested in the R extensions manual, in order to avoid any possible misinitialization of non-R code.