Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning on CRAN: cannot form a team with 24 threads, using 2 instead #3300

Closed
mattdowle opened this issue Jan 21, 2019 · 35 comments
Closed

Warning on CRAN: cannot form a team with 24 threads, using 2 instead #3300

mattdowle opened this issue Jan 21, 2019 · 35 comments
Milestone

Comments

@mattdowle
Copy link
Member

From Prof Ripley :


The CRAN policy says:

'If running a package uses multiple threads/cores it must never use more
than two simultaneously: ...'

Packages

RNiftyReg Rfast data.table quanteda sentometrics sylcount

attempt to use more, often all the cores of a 24-core machine and this
shows under clang's or Intel's OMP runtime with messages like

OMP: Warning #96: Cannot form a team with 24 threads, using 2 instead.

when we have set OMP_THREAD_LIMIT=2 as a defensive measure. See the
clang-fedora log or clang-UBSAN additional issue on the CRAN result page
for your package.

How to control OpenMP thread usage is discussed in 'Writing R
Extensions' §1.2.1.1.

It is possible this comes from another package you use, in which case
take it up with the maintainer of that package and set OMP_THREAD_LIMIT
yourself.

Please correct ASAP and before Feb 20 to safely retain the package on CRAN.


@mattdowle mattdowle added this to the 1.12.2 milestone Jan 21, 2019
@mattdowle
Copy link
Member Author

My reply :


How do we know when we are being run by CRAN? Hence I rely on you setting OMP_THREAD_LIMIT=2. What's wrong with that? There is no way that I know of for the package to know its tests are being run by CRAN.
The full sentence from CRAN policy is "If running a package uses multiple threads/cores it must never use more than two simultaneously: the check farm is a shared resource and will typically be running many checks simultaneously."
Well, that's true, because I know you set OMP_THREAD_LIMIT=2. Now it's only a defensive mechanism? That's news to me.
data.table is careful to follow §1.2.1.1: it always uses num_threads() and passes its own getDTthreads() to it. I have greps in CRAN_release.cmd procedures which ensure each and every parallel regions does this. Just as §1.2.1.1 advises. This getDTthreads() limits the user request by omp_get_max_threads() which respects OMP_THREAD_LIMIT.

There might be some nested parallelism for the first time in the latest data.table release (again in the froll.c new code) and I can see how that might possibly end up requesting more than OMP_THREAD_LIMIT within a nested parallel region by calling getDTthreads() twice. So I can look into that. But I don't see any warning locally. I tried just now. You say it's just some compilers. Either way, it's going to be some work for me to reproduce, understand and I need time.

Given all this, I don't think I deserve the one-month-or-else treatment. It's not even as if these packages really are using 24 cores. The limit is working, they're using 2 cores and CRAN is healthy. There's a new warning. That's all. It's new, right? If it's old then you haven't told us before and data.table has been using OpenMP in this way for quite a while now.


@jangorecki
Copy link
Member

jangorecki commented Jan 22, 2019

Looks like getDTthreads ignores OMP_THREAD_LIMIT env var. Try

echo $OMP_THREAD_LIMIT
R --vanilla
library(data.table)
Sys.setenv("OMP_THREAD_LIMIT"=1)
Sys.getenv("OMP_THREAD_LIMIT")
getDTthreads(verbose=TRUE)
q("no")

echo $OMP_THREAD_LIMIT
export OMP_THREAD_LIMIT=1
R --vanilla
library(data.table)
Sys.getenv("OMP_THREAD_LIMIT")
getDTthreads(verbose=TRUE)
q("no")

same behavior between solaris and master branches

I checked OMP_NUM_THREADS env var also, and here it does work, but only when it is being set before starting R, if it was set from inside R then it is ignored.

echo $OMP_NUM_THREADS
R --vanilla
library(data.table)
Sys.setenv("OMP_NUM_THREADS"=1)
Sys.getenv("OMP_NUM_THREADS")
getDTthreads(verbose=TRUE)
q("no")

echo $OMP_NUM_THREADS
export OMP_NUM_THREADS=1
R --vanilla
library(data.table)
Sys.getenv("OMP_NUM_THREADS")
getDTthreads(verbose=TRUE)
q("no")

@mattdowle
Copy link
Member Author

mattdowle commented Jan 22, 2019

Hm. I see the same. That means omp_get_max_threads() doesn't account for OMP_THREAD_LIMIT.

I've read the OpenMP specification before when I was writing getDTthreads() w.r.t. to the R-ext manual §1.2.1.1. Section 3.2.3 of OpenMP 4.5 specification states :

Summary: The omp_get_max_threads routine returns an upper bound on the number of threads that could be used to form a new team if a parallel construct without a num_threads clause were encountered after execution returns from this routine.
Binding: The binding task set for an omp_get_max_threads region is the generating task.
Effect: The value returned by omp_get_max_threads is the value of the first element of the nthreads-var ICV of the current task. This value is also an upper bound on the number of threads that could be used to form a new team if a parallel region without a num_threads clause were encountered after execution returns from this routine.

Which I had read as "it returns the upper bound"; i.e. if user has limited threads then this routine would return that limit. But that's not what it actually says is it. It says "it returns an upper bound". What I thought I was getting there was the user limited number. It's more like any old upper bound.

Seems like getDTthreads needs to use omp_get_thread_limit() not omp_get_max_threads(). Section 3.2.14 of the spec says :

Summary: The omp_get_thread_limit routine returns the maximum number of OpenMP threads available to participate in the current contention group.
Binding: The binding thread set for an omp_get_thread_limit region is all threads on the device. The effect of executing this routine is not related to any specific region corresponding to any construct or API routine.
Effect: The omp_get_thread_limit routine returns the value of the thread-limit-var ICV.
Cross References:

  • thread-limit-var ICV, see Section 2.3 on page 36.
  • OMP_THREAD_LIMIT environment variable, see Section 4.10 on page 300.

Why it's useful for OpenMP to have both omp_get_max_threads() as well as omp_get_thread_limit(), beats me. The cross references for neither one link to the other directly in version 4.5 of the spec that I had read. I checked the recent version 5.0 of the OpenMP Spec (Nov 2018) and the cross-references of both functions have now been updated to include links to each other but I don't see any other clarification as to why both exist. Algorithm 2.1 in section 2.5.1 of OpenMP 4.5 ( section 2.6.1 of OpenMP 5.0) alludes to more complexity, so maybe it's something to do with that.

omp_get_max_threads() has a note:

Note: The return value of the omp_get_max_threads routine can be used to dynamically allocate sufficient storage for all threads in the team formed at the subsequent active parallel region.

But equally, omp_get_thread_limit() could be used to allocate storage. It would be more appropriate too when omp_get_thread_limit() < omp_get_max_threads() otherwise storage would be allocated wastefully.

Maybe we should use MIN(omp_get_max_threads(), omp_get_thread_limit()) but doing that seems a bit onerous and inconvenient. At least for us it would be only once in the central getDTthreads().

We could also detect and warn about ineffective attempts to use Sys.setenv(). And even apply their values so they were effective would probably be more convenient to user.

@MarcusKlik
Copy link
Contributor

MarcusKlik commented Jan 24, 2019

Hi @mattdowle and @jangorecki,

following your discussion, I did a small experiment to check the behavior of omp_get_thread_limit() for different settings of OMP_THREAD_LIMIT. It turns out that for invalid settings of OMP_THREAD_LIMIT, a value of INT_MAX is returned. An equivalent example:

thread_limit <- function(omp_thread_limit) {

  Sys.setenv(OMP_THREAD_LIMIT = omp_thread_limit)

  Rcpp::cppFunction('

      #include <omp.h>
      int get_thread_limit() {
      return omp_get_thread_limit();}

    ', plugins = "openmp")

  get_thread_limit()
}


thread_limit(1)  # ok
#> [1] 1

thread_limit(2)  # ok
#> [1] 2

thread_limit(-1)  # INT_MAX
#> [1] 2147483647

thread_limit("A")  # INT_MAX
#> libgomp: Invalid value for environment variable OMP_THREAD_LIMIT
#> [1] 2147483647

thread_limit("20000")  # ok (but inefficiently high)
#> libgomp: Invalid value for environment variable OMP_THREAD_LIMIT
#> [1] 20000

thread_limit(0)  # INT_MAX
#> libgomp: Invalid value for environment variable OMP_THREAD_LIMIT
#> [1] 2147483647

So, if OMP_THREAD_LIMIT is set to a string, zero or a negative value before compiling the OMP code, INT_MAX will be returned upon calling omp_get_thread_limit(). I believe @mattdowle's suggestion to use

std::min(omp_get_max_threads(), omp_get_thread_limit())

for calculating the number of threads is definitely the safest solution! There are use-cases for having more threads than (logical) cores (for example if the threads are doing a lot of waiting) but I don't think that applies to data.table.

I hope you have some use for this info, thanks for the great work on data.table!

@kbenoit
Copy link

kbenoit commented Jan 28, 2019

We're having trouble with this too (see quanteda/quanteda#1581 and the linked issue) but it seems to happen on macOS and not Linux. I got the same "30-day or else" email from BDR a few days ago too.

Our fix quanteda/quanteda@492cb4c to the package default settings (where value = 2)

Sys.setenv("OMP_THREAD_LIMIT" = value)

did not make the problem disappear.

@mattdowle
Copy link
Member Author

mattdowle commented Jan 29, 2019

@kbenoit The fact you can reproduce is good. quanteda uses data.table iirc. So we'll make the change to data.table and then ask you to rerun to see if that fixes it. We don't have macOS in CI so it's hard for us to reproduce although we can have a good guess.

@kbenoit
Copy link

kbenoit commented Jan 29, 2019

@mattdowle I did even better than that, I set up a reproducible example package for you to try out, see https://github.com/kbenoit/testomp. Interesting that not every function triggers this. I don't know if it was := or the sample that was doing it.

Thanks!!

ps and yes of course we use data.table! 😄

@mattdowle
Copy link
Member Author

Actually, test.data.table() already does call setDTthreads(2) and that's long-standing.
https://github.com/Rdatatable/data.table/blob/master/R/test.data.table.R#L52
So maybe it was something in vignettes or examples that is now parallel in 1.12.0 that is causing the warning on CRAN.
So it wasn't like we didn't set 2 threads at all. We did and still do in tests.
Now #3390 is merged, that should catch the vignettes and examples.

@rferrisx
Copy link

Most of this discussion in over my head. I just want to report that in R 3.6 on Windows 10 I can set the max numbers of threads correctly for my hardware:
setDTthreads(threads = 0)
getDTthreads()
[1] 8

but "OMP_THREAD_LIMIT" is blank not the default (2) mentioned in 'openmp-utils.html' help.

library(data.table,verbose=TRUE)
data.table 1.12.2 using 8 threads (see ?getDTthreads). Latest news: r-datatable.com
Sys.getenv("OMP_THREAD_LIMIT")
[1] ""

@mattdowle
Copy link
Member Author

mattdowle commented May 16, 2019

@rferrisx I see what you mean. Where 'openmp-utils.html' says "Please note again that CRAN sets OMP_THREAD_LIMIT to 2", it should be more clear that by "CRAN" it means "CRAN checks" i.e. https://cran.r-project.org/web/checks/check_results_data.table.html. As a user, it's correct that OMP_THREAD_LIMIT is blank for you. I'll modify the help page ...

@mattdowle
Copy link
Member Author

@rferrisx done: 3f9358f

@rferrisx
Copy link

rferrisx commented May 16, 2019 via email

@mattdowle
Copy link
Member Author

@rferrisx The optimum depends on your data, your queries, and how busy your machine is. But I think the new default from v1.12.2 of half the number of logical CPUs should be good for most cases. Experiment for yourself. Use setDTthreads() or OMP_NUM_THREADS to choose the number of threads. OMP_THREAD_LIMIT is intended more for constraining threading, like by daily CRAN checks where those CRAN machines are very busy checking many packages in parallel. On those machines, more than 2 threads can be slow so 2 is used to still test that threading works. On a dedicated server that nobody else uses, it may optimal to set the number of threads to all logical CPUs. data.table's default of half the logical CPUs is intended to be a reasonable compromise.

@rferrisx
Copy link

rferrisx commented May 17, 2019 via email

@hongyuanjia
Copy link
Contributor

Actually, test.data.table() already does call setDTthreads(2) and that's long-standing.
https://github.com/Rdatatable/data.table/blob/master/R/test.data.table.R#L52
So maybe it was something in vignettes or examples that is now parallel in 1.12.0 that is causing the warning on CRAN.
So it wasn't like we didn't set 2 threads at all. We did and still do in tests.
Now #3390 is merged, that should catch the vignettes and examples.

@mattdowle I am trying to submit a version of my package but I still get a NOTE saying "Examples with CPU time > 2.5 times elapsed time" and fails to pass CRAN submission checking. It seems like data.table still uses more than 2 cores in examples. Why data.table failed to catch the examples in my case? Thanks.

@jangorecki
Copy link
Member

Any reason why you think it is related to a) data.table, b) multithreading?

@hongyuanjia
Copy link
Contributor

@jangorecki Actually CRAN team suspended this note is caused by using more than 2 cores and emailed me on this. Please see below.

On Wed, Jul 3, 2019 at 3:57 PM Uwe Ligges ligges@statistik.tu-dortmund.de
wrote:

Timing suggests: you are using more than 2 cores? Is this correct?

Best,
Uwe Ligges

On Wed, Jul 3, 2019 at 10:16 PM Uwe Ligges ligges@statistik.tu-dortmund.de wrote:
The default should always be to use at most 2 cores (for demonstration
purposes) and allow the user to ask for more.

If the defalt is wrong in data.table, pelase discuss with its maintainer.

Best,
Uwe Ligges

I did not get this NOTE on travis, appveyor and my local computer. To be honest, I did not know how I can reproduce this NOTE on my side.

Based on the fact that CRAN machines limit multicore usage and to my knowledge, data.table is the only package I use that takes full advantage of multithreading, I guess this may relate to data.table. Please correct if I was wrong.

@mattdowle
Copy link
Member Author

mattdowle commented Oct 7, 2019

@hongyuanjia I see your package eplusr is currently fully OK status on CRAN with no notes, last released July 8th. So it looks like you resolved everything. Please let us know if anything is outstanding.

@hongyuanjia
Copy link
Contributor

@mattdowle Thanks for checking that.

As I continuously got this NOTE when submitting my package to CRAN, I have to put most of my examples in \donotrun{} to bypass it in order to let the CRAN team satisfied and land it on CRAN. I am not sure if this issue still exists in the latest version of data.table.

As I did not know how to reproduce it on travis and my local computer, I will get you posted when I submit it a new version to CRAN.

@mattdowle
Copy link
Member Author

@hongyuanjia Ok I see. If I uncomment one of your donotrun do you think I should be able to reproduce it? If so, can you suggest a particular donotrun I could try? I already have your package in my revdep library and test it in release procedures.

@mattdowle mattdowle reopened this Oct 8, 2019
@hongyuanjia
Copy link
Contributor

Yes, that would be immensely helpful!

Below is what the NOTE comes from:

Flavor: r-devel-linux-x86_64-debian-gcc
Check: examples, Result: NOTE
  Examples with CPU time > 2.5 times elapsed time
             user system elapsed ratio
  Idf       2.294  0.049   0.839 2.793
  IdfObject 4.228  0.050   1.593 2.685
  IddObject 1.157  0.011   0.463 2.523

For a start point, it would be great if you could test the examples of IdfObject:
https://github.com/hongyuanjia/eplusr/blob/59a3140224b254f64098b52f60e038009d9b74dd/R/idf_object.R#L474-L614.

@hongyuanjia
Copy link
Contributor

@mattdowle I am pretty sure this is a CRAN specific thing. I am not sure if this can be reproduced from RHub using the same platform. I will try it now.

@mattdowle
Copy link
Member Author

@hongyuanjia CRAN sets OMP_THREAD_LIMIT to 2. Set that in your environment before running R CMD check --as-cran to mimic CRAN.

@hongyuanjia
Copy link
Contributor

@mattdowle Thanks for this hint! Will test it now.

So if I can reproduce this, how could I make sure this environment variable take effect for every example code block?

@mattdowle
Copy link
Member Author

If you can reproduce this with current version of data.table, then it's data.table's problem to fix in data.table.

@mattdowle
Copy link
Member Author

@hongyuanjia. When you got those NOTEs from CRAN, was it all CRAN check machines, or just some? The example output you showed above has linux-debian at the top. This implies your package passed win-builder with no notes, was accepted on CRAN, but then emitted those notes on that CRAN instance and perhaps a few others but mostly all OK status on CRAN? If so, I think I know why. But please confirm first that was the situation.

@hongyuanjia
Copy link
Contributor

@mattdowle You are right. This note only comes from Debian CRAN machine.

package eplusr_0.10.3.tar.gz does not pass the incoming checks automatically, please see the following pre-tests:
Windows: <https://win-builder.r-project.org/incoming_pretest/eplusr_0.10.3_20190702_154053/Windows/00check.log>
Status: OK
Debian: <https://win-builder.r-project.org/incoming_pretest/eplusr_0.10.3_20190702_154053/Debian/00check.log>
Status: 1 NOTE

The previous version gets the same note on Debian, but CRAN team somehow ignore it and let it land on CRAN. I recalled that the NOTE behavior in the previous CRAN follows exactly as what you described. Most of them are OK, only a few (probably only linux) have that NOTE.

@mattdowle
Copy link
Member Author

mattdowle commented Oct 8, 2019

@hongyuanjia Good. Around that time, that CRAN machine used a value of 4 for OMP_THREAD_LIMIT. I discovered that and agreed with CRAN maintainers that it should be 2. It is now 2. That one machine (linux-debian) handles 4 lines of the CRAN checks matrix: devel-gcc, devel-clang, patched-linux and release-linux, which is why those 4 were affected.
You should be able to reproduce the note with export OMP_THREAD_LIMIT=4, but not with 2.
There was a problem in data.table not respecting OMP_THREAD_LIMIT but that was fixed in v1.12.2 (7 Apr 2019); news item 3. Then when data.table started to correctly respect OMP_THREAD_LIMIT it took a while to discover that one CRAN machine used a value of 4.

So you should be able to just turn on all your donotrun again and resubmit to CRAN.

@hongyuanjia
Copy link
Contributor

Great! Thank you for this insight! I will try to reproduce the note on my Linux machine and post the results here.

@mattdowle
Copy link
Member Author

mattdowle commented Oct 8, 2019

Great. Closing for now then as that's almost surely it. Can still comment on closed issues and if need be we can reopen.

@phuongquan
Copy link

phuongquan commented Jul 14, 2023

@hongyuanjia Good. Around that time, that CRAN machine used a value of 4 for OMP_THREAD_LIMIT. I discovered that and agreed with CRAN maintainers that it should be 2. It is now 2. That one machine (linux-debian) handles 4 lines of the CRAN checks matrix: devel-gcc, devel-clang, patched-linux and release-linux, which is why those 4 were affected. You should be able to reproduce the note with export OMP_THREAD_LIMIT=4, but not with 2. There was a problem in data.table not respecting OMP_THREAD_LIMIT but that was fixed in v1.12.2 (7 Apr 2019); news item 3. Then when data.table started to correctly respect OMP_THREAD_LIMIT it took a while to discover that one CRAN machine used a value of 4.

So you should be able to just turn on all your donotrun again and resubmit to CRAN.

It looks like CRAN have changed their minds about the above.

I got the same submission NOTE about Examples with CPU time > 2.5 times elapsed time (only on Debian, not on Windows) this week for the daiquiri package (which uses data.table but does not employ any parallelism explicitly itself), and when I asked if the OMP_THREAD_LIMIT on the Debian machine was no longer set to 2 I got this response:

We do not set the flag anymore:
Users may be unaware that parallelism is used and that they have to set such an env var to avoid it.
The package shoudl make sure that not more than 2 cores are used unless expicitly requested by the user.

Best,
Uwe Ligges

I have seen that a few people have had this issue recently, and seen some workarounds suggested for getting past the CRAN checks, but wanted to raise it here in case this is something the data.table team want to consider again and/or provide definitive advice on.

@jangorecki
Copy link
Member

jangorecki commented Jul 14, 2023

Thanks for update on that. It's quite surprising they want the default to be set like that. Percentage of cores seems to be better fit rather than just some number of cores. It sets whole R ecosystem behind, in terms of machine utilization/performance, comparing to modern languages which use multiple cores by default. It doesn't feel like a proper way to go for R ecosystem.

@nikosbosse
Copy link

nikosbosse commented Nov 29, 2023

Reporting the same issue with CRAN's Debian machine (checks passing on Windows) with the R package scoringutils:

* checking examples ... [26s/17s] NOTE
Examples with CPU (user + system) or elapsed time > 5s
              user system elapsed
add_coverage 6.914  0.313   0.638
Examples with CPU time > 2.5 times elapsed time
              user system elapsed  ratio
add_coverage 6.914  0.313   0.638 11.328

* Running ‘testthat.R’ [211s/18s]
Running R code in ‘testthat.R’ had CPU time 11.8 times elapsed time

* checking re-building of vignette outputs ... [168s/18s] NOTE
Re-building vignettes had CPU time 9.1 times elapsed time

* DONE
Status: 3 NOTEs

Our current solution is to simply call data.table::setDTthreads(1) in examples and tests and the vignette. I'm not sure I completely understood what others did to solve the issue - Is there a solution that would be preferable to what we're doing?
Thanks a lot for all your work on this!

@ben-schwen
Copy link
Member

@nikosbosse You might want to check out #5658 (Jan also provided an example of how to fix examples, tests and vignettes)

@nikosbosse
Copy link

@ben-schwen that's perfect, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants