-
Notifications
You must be signed in to change notification settings - Fork 993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warning on CRAN: cannot form a team with 24 threads, using 2 instead #3300
Comments
My reply : How do we know when we are being run by CRAN? Hence I rely on you setting OMP_THREAD_LIMIT=2. What's wrong with that? There is no way that I know of for the package to know its tests are being run by CRAN. There might be some nested parallelism for the first time in the latest data.table release (again in the froll.c new code) and I can see how that might possibly end up requesting more than OMP_THREAD_LIMIT within a nested parallel region by calling getDTthreads() twice. So I can look into that. But I don't see any warning locally. I tried just now. You say it's just some compilers. Either way, it's going to be some work for me to reproduce, understand and I need time. Given all this, I don't think I deserve the one-month-or-else treatment. It's not even as if these packages really are using 24 cores. The limit is working, they're using 2 cores and CRAN is healthy. There's a new warning. That's all. It's new, right? If it's old then you haven't told us before and data.table has been using OpenMP in this way for quite a while now. |
Looks like echo $OMP_THREAD_LIMIT
R --vanilla
library(data.table)
Sys.setenv("OMP_THREAD_LIMIT"=1)
Sys.getenv("OMP_THREAD_LIMIT")
getDTthreads(verbose=TRUE)
q("no")
echo $OMP_THREAD_LIMIT
export OMP_THREAD_LIMIT=1
R --vanilla
library(data.table)
Sys.getenv("OMP_THREAD_LIMIT")
getDTthreads(verbose=TRUE)
q("no") same behavior between solaris and master branches I checked echo $OMP_NUM_THREADS
R --vanilla
library(data.table)
Sys.setenv("OMP_NUM_THREADS"=1)
Sys.getenv("OMP_NUM_THREADS")
getDTthreads(verbose=TRUE)
q("no")
echo $OMP_NUM_THREADS
export OMP_NUM_THREADS=1
R --vanilla
library(data.table)
Sys.getenv("OMP_NUM_THREADS")
getDTthreads(verbose=TRUE)
q("no") |
Hm. I see the same. That means I've read the OpenMP specification before when I was writing getDTthreads() w.r.t. to the R-ext manual §1.2.1.1. Section 3.2.3 of OpenMP 4.5 specification states :
Which I had read as "it returns the upper bound"; i.e. if user has limited threads then this routine would return that limit. But that's not what it actually says is it. It says "it returns an upper bound". What I thought I was getting there was the user limited number. It's more like any old upper bound. Seems like getDTthreads needs to use
Why it's useful for OpenMP to have both omp_get_max_threads() has a note:
But equally, omp_get_thread_limit() could be used to allocate storage. It would be more appropriate too when omp_get_thread_limit() < omp_get_max_threads() otherwise storage would be allocated wastefully. Maybe we should use We could also detect and warn about ineffective attempts to use Sys.setenv(). And even apply their values so they were effective would probably be more convenient to user. |
Hi @mattdowle and @jangorecki, following your discussion, I did a small experiment to check the behavior of thread_limit <- function(omp_thread_limit) {
Sys.setenv(OMP_THREAD_LIMIT = omp_thread_limit)
Rcpp::cppFunction('
#include <omp.h>
int get_thread_limit() {
return omp_get_thread_limit();}
', plugins = "openmp")
get_thread_limit()
}
thread_limit(1) # ok
#> [1] 1
thread_limit(2) # ok
#> [1] 2
thread_limit(-1) # INT_MAX
#> [1] 2147483647
thread_limit("A") # INT_MAX
#> libgomp: Invalid value for environment variable OMP_THREAD_LIMIT
#> [1] 2147483647
thread_limit("20000") # ok (but inefficiently high)
#> libgomp: Invalid value for environment variable OMP_THREAD_LIMIT
#> [1] 20000
thread_limit(0) # INT_MAX
#> libgomp: Invalid value for environment variable OMP_THREAD_LIMIT
#> [1] 2147483647 So, if OMP_THREAD_LIMIT is set to a string, zero or a negative value before compiling the OMP code, INT_MAX will be returned upon calling std::min(omp_get_max_threads(), omp_get_thread_limit()) for calculating the number of threads is definitely the safest solution! There are use-cases for having more threads than (logical) cores (for example if the threads are doing a lot of waiting) but I don't think that applies to I hope you have some use for this info, thanks for the great work on |
We're having trouble with this too (see quanteda/quanteda#1581 and the linked issue) but it seems to happen on macOS and not Linux. I got the same "30-day or else" email from BDR a few days ago too. Our fix quanteda/quanteda@492cb4c to the package default settings (where Sys.setenv("OMP_THREAD_LIMIT" = value) did not make the problem disappear. |
@kbenoit The fact you can reproduce is good. quanteda uses data.table iirc. So we'll make the change to data.table and then ask you to rerun to see if that fixes it. We don't have macOS in CI so it's hard for us to reproduce although we can have a good guess. |
@mattdowle I did even better than that, I set up a reproducible example package for you to try out, see https://github.com/kbenoit/testomp. Interesting that not every function triggers this. I don't know if it was Thanks!! ps and yes of course we use data.table! 😄 |
Actually, |
Most of this discussion in over my head. I just want to report that in R 3.6 on Windows 10 I can set the max numbers of threads correctly for my hardware: but "OMP_THREAD_LIMIT" is blank not the default (2) mentioned in 'openmp-utils.html' help. library(data.table,verbose=TRUE) |
@rferrisx I see what you mean. Where 'openmp-utils.html' says "Please note again that CRAN sets OMP_THREAD_LIMIT to 2", it should be more clear that by "CRAN" it means "CRAN checks" i.e. https://cran.r-project.org/web/checks/check_results_data.table.html. As a user, it's correct that OMP_THREAD_LIMIT is blank for you. I'll modify the help page ... |
Matt:
Do I need modify that variable to some optimum number?
Ryan
…On Thu, May 16, 2019 at 2:24 PM Matt Dowle ***@***.***> wrote:
@rferrisx <https://github.com/rferrisx> done: 3f9358f
<3f9358f>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3300?email_source=notifications&email_token=AA6GWC74AEVX7S2NWO46MLTPVXGH5A5CNFSM4GROR2O2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVTDGZY#issuecomment-493237095>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA6GWC4SFLREBXT6OHBISKLPVXGH5ANCNFSM4GROR2OQ>
.
--
"Source code is real. Our lives are just manifestations of the source."
adapted from Dr. Martin Maechler
|
@rferrisx The optimum depends on your data, your queries, and how busy your machine is. But I think the new default from v1.12.2 of half the number of logical CPUs should be good for most cases. Experiment for yourself. Use |
Thank You!
…On Thu, May 16, 2019, 6:04 PM Matt Dowle ***@***.***> wrote:
@rferrisx <https://github.com/rferrisx> The optimum depends on your data,
your queries, and how busy your machine is. But I think the new default
from v1.12.2 of half the number of logical CPUs should be good for most
cases. Experiment for yourself. Use setDTthreads() or OMP_NUM_THREADS to
choose the number of threads. OMP_THREAD_LIMIT is intended more for
constraining threading, like by daily CRAN checks where those CRAN machines
are very busy checking many packages in parallel. On those machines, more
than 2 threads can be slow so 2 is used to still test that threading works.
On a dedicated server that nobody else uses, it may optimal to set the
number of threads to all logical CPUs. data.table's default of half the
logical CPUs is intended to be a reasonable compromise.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3300?email_source=notifications&email_token=AA6GWC62XY6CJVO5NCA5PHDPVYABJA5CNFSM4GROR2O2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVTODUY#issuecomment-493281747>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA6GWC7OXFOEZ2WH2YSRSLTPVYABJANCNFSM4GROR2OQ>
.
|
@mattdowle I am trying to submit a version of my package but I still get a NOTE saying "Examples with CPU time > 2.5 times elapsed time" and fails to pass CRAN submission checking. It seems like data.table still uses more than 2 cores in examples. Why data.table failed to catch the examples in my case? Thanks. |
Any reason why you think it is related to a) data.table, b) multithreading? |
@jangorecki Actually CRAN team suspended this note is caused by using more than 2 cores and emailed me on this. Please see below.
I did not get this NOTE on travis, appveyor and my local computer. To be honest, I did not know how I can reproduce this NOTE on my side. Based on the fact that CRAN machines limit multicore usage and to my knowledge, data.table is the only package I use that takes full advantage of multithreading, I guess this may relate to data.table. Please correct if I was wrong. |
@hongyuanjia I see your package |
@mattdowle Thanks for checking that. As I continuously got this NOTE when submitting my package to CRAN, I have to put most of my examples in As I did not know how to reproduce it on travis and my local computer, I will get you posted when I submit it a new version to CRAN. |
@hongyuanjia Ok I see. If I uncomment one of your |
Yes, that would be immensely helpful! Below is what the NOTE comes from:
For a start point, it would be great if you could test the examples of |
@mattdowle I am pretty sure this is a CRAN specific thing. I am not sure if this can be reproduced from RHub using the same platform. I will try it now. |
@hongyuanjia CRAN sets OMP_THREAD_LIMIT to 2. Set that in your environment before running |
@mattdowle Thanks for this hint! Will test it now. So if I can reproduce this, how could I make sure this environment variable take effect for every example code block? |
If you can reproduce this with current version of data.table, then it's data.table's problem to fix in data.table. |
@hongyuanjia. When you got those NOTEs from CRAN, was it all CRAN check machines, or just some? The example output you showed above has |
@mattdowle You are right. This note only comes from Debian CRAN machine.
The previous version gets the same note on Debian, but CRAN team somehow ignore it and let it land on CRAN. I recalled that the NOTE behavior in the previous CRAN follows exactly as what you described. Most of them are OK, only a few (probably only linux) have that NOTE. |
@hongyuanjia Good. Around that time, that CRAN machine used a value of 4 for OMP_THREAD_LIMIT. I discovered that and agreed with CRAN maintainers that it should be 2. It is now 2. That one machine (linux-debian) handles 4 lines of the CRAN checks matrix: devel-gcc, devel-clang, patched-linux and release-linux, which is why those 4 were affected. So you should be able to just turn on all your |
Great! Thank you for this insight! I will try to reproduce the note on my Linux machine and post the results here. |
Great. Closing for now then as that's almost surely it. Can still comment on closed issues and if need be we can reopen. |
It looks like CRAN have changed their minds about the above. I got the same submission NOTE about
I have seen that a few people have had this issue recently, and seen some workarounds suggested for getting past the CRAN checks, but wanted to raise it here in case this is something the |
Thanks for update on that. It's quite surprising they want the default to be set like that. Percentage of cores seems to be better fit rather than just some number of cores. It sets whole R ecosystem behind, in terms of machine utilization/performance, comparing to modern languages which use multiple cores by default. It doesn't feel like a proper way to go for R ecosystem. |
Reporting the same issue with CRAN's Debian machine (checks passing on Windows) with the R package scoringutils:
Our current solution is to simply call |
@nikosbosse You might want to check out #5658 (Jan also provided an example of how to fix examples, tests and vignettes) |
@ben-schwen that's perfect, thank you! |
From Prof Ripley :
The CRAN policy says:
'If running a package uses multiple threads/cores it must never use more
than two simultaneously: ...'
Packages
RNiftyReg Rfast data.table quanteda sentometrics sylcount
attempt to use more, often all the cores of a 24-core machine and this
shows under clang's or Intel's OMP runtime with messages like
OMP: Warning #96: Cannot form a team with 24 threads, using 2 instead.
when we have set OMP_THREAD_LIMIT=2 as a defensive measure. See the
clang-fedora log or clang-UBSAN additional issue on the CRAN result page
for your package.
How to control OpenMP thread usage is discussed in 'Writing R
Extensions' §1.2.1.1.
It is possible this comes from another package you use, in which case
take it up with the maintainer of that package and set OMP_THREAD_LIMIT
yourself.
Please correct ASAP and before Feb 20 to safely retain the package on CRAN.
The text was updated successfully, but these errors were encountered: