throttle threads for iterated small data tasks #4484

jangorecki · 2020-05-24T20:55:30Z

Closes #3205 (the 2nd part which was outstanding)

Closes #3438: rerun repeated uniqueN test

before patch

# 20 threads
   user  system elapsed 
478.054   1.334  24.300 

# 1 thread
   user  system elapsed 
  7.968   0.348   8.317 

# 40 threads
    user   system  elapsed 
1353.752    1.711   34.369

after patch

# 20 threads
   user  system elapsed 
 10.021   0.382   8.518 

# 1 thread
   user  system elapsed 
  8.008   0.323   8.331 

# 40 threads
   user  system elapsed 
 12.542   0.376   8.279

partially address #3735:

library(data.table)
allIterations <- data.frame(v1 = runif(1e5), v2 = runif(1e5))
DoSomething <- function(row) someCalculation <- row[["v1"]] + 1
system.time(for (r in 1:nrow(allIterations)) DoSomething(allIterations[r, ]))
allIterations <- as.data.table(allIterations)
setDTthreads(1)
system.time(for (r in 1:nrow(allIterations)) DoSomething(allIterations[r, ]))
setDTthreads(40)
system.time(for (r in 1:nrow(allIterations)) DoSomething(allIterations[r, ]))

output before patch

# DF
   user  system elapsed 
  3.143   0.005   3.150 
# DT: setDTthreads(1)
   user  system elapsed 
 12.958   0.134  13.093 
# DT: setDTthreads(40)
    user   system  elapsed 
1098.068    0.747   27.851

after patch

# DF
   user  system elapsed 
  3.129   0.020   3.149
# DT: setDTthreads(1)
   user  system elapsed 
 12.505   0.103  12.610 
# DT: setDTthreads(40)
   user  system elapsed 
 12.663   0.088  12.751

partially address "slow uniqueN" #3739

library(data.table)
set.seed(1000)
# small vector, uniqueN() is slower
x <- sample(1:1e5, size = 1e2, replace = TRUE)
microbenchmark::microbenchmark(
  uniqueN(x),
  length(unique(x))
)

before patch

Unit: microseconds
              expr    min      lq     mean  median      uq     max neval
        uniqueN(x) 42.426 45.1665 55.85915 46.0855 47.1960 953.592   100
 length(unique(x))  6.498  7.7030  9.03348  8.6160  9.9365  28.824   100

after patch

Unit: microseconds
              expr    min      lq     mean  median      uq    max neval
        uniqueN(x) 14.978 15.7665 17.01930 16.2505 16.7810 77.220   100
 length(unique(x))  5.011  6.0760  6.84729  6.7140  7.3435 22.196   100

slow uniqueN by group #1120

library(data.table)
irisdt <- setDT(iris[sample(1:150, size = 10000, replace = TRUE), ])
irisdt[, Sepal.Width := Sepal.Width+ sample(0:50, size = 10000, replace = TRUE)]
irisdt[, Sepal.Length:= Sepal.Width+ sample(0:5000, size = 10000, replace = TRUE)]
microbenchmark::microbenchmark(
  irisdt[, uniqueN(Sepal.Width), Sepal.Length],
  irisdt[, length(unique(Sepal.Width)), Sepal.Length],
  times = 2
)

before patch

Unit: milliseconds
                                                expr      min       lq     mean   median       uq      max neval
        irisdt[, uniqueN(Sepal.Width), Sepal.Length] 343.0611 343.0611 391.6741 391.6741 440.2871 440.2871     2
 irisdt[, length(unique(Sepal.Width)), Sepal.Length] 127.9050 127.9050 128.5702 128.5702 129.2355 129.2355     2

after patch

Unit: milliseconds
                                                expr       min        lq      mean    median        uq       max neval
        irisdt[, uniqueN(Sepal.Width), Sepal.Length] 103.63772 103.63772 107.19231 107.19231 110.74689 110.74689     2
 irisdt[, length(unique(Sepal.Width)), Sepal.Length]  49.09571  49.09571  52.49689  52.49689  55.89808  55.89808     2

codecov · 2020-05-24T21:06:14Z

Codecov Report

Merging #4484 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #4484   +/-   ##
=======================================
  Coverage   99.61%   99.61%           
=======================================
  Files          73       73           
  Lines       14103    14108    +5     
=======================================
+ Hits        14048    14053    +5     
  Misses         55       55

Impacted Files	Coverage Δ
src/between.c	`100.00% <ø> (ø)`
src/cj.c	`100.00% <ø> (ø)`
src/coalesce.c	`100.00% <ø> (ø)`
src/fifelse.c	`100.00% <ø> (ø)`
src/froll.c	`100.00% <ø> (ø)`
src/frollR.c	`100.00% <ø> (ø)`
src/frolladaptive.c	`100.00% <ø> (ø)`
src/nafill.c	`100.00% <ø> (ø)`
src/reorder.c	`100.00% <ø> (ø)`
src/subset.c	`100.00% <ø> (ø)`
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 12586af...5485d59. Read the comment docs.

src/subset.c

mattdowle · 2020-06-02T07:49:17Z

Related: #3205 (comment)

R/openmp-utils.R

src/data.table.h

src/openmp-utils.c

mattdowle · 2020-06-13T04:14:19Z

I just couldn't grasp the types so I changed this PR to what clicks in my head, at least. See what you think. The names OMP_ROWS and OMP_BATCH didn't click because in some cases that are not directly to do with rows, you do want to throttle. I just find it easier to see a bool flag (true/false) telling me directly whether the throttle should be applied or not in that region, so I don't have the mental hop via the type codes.
While I was changing the getDTthreads() calls, I also changed some of them to be throttle=false, where Jan had OMP_ROWS; i.e. I'm aware I didn't just change the API. Those ones seemed more like batches to me.
The default throttle of 1024 I just guesstimated. The throttle value means one thread for nrow<=1024, two threads for nrow<=2048, etc, subject to the same limits as before (percent of num_procs, etc).

jangorecki

nice! much simpler approach than mine

src/openmp-utils.c

src/types.c

src/frollR.c

src/nafill.c

jangorecki · 2020-06-13T11:46:43Z

PR in action, just today https://stackoverflow.com/questions/62347788/multi-threaded-data-table-in-r-much-slower-than-one-using-single-thread?noredirect=1#comment110287890_62347788

jangorecki · 2020-06-13T21:58:50Z

@mattdowle I updated timings in first post. It is interesting to see that length(unique( in the last example did speed up as well. Probably forder on Sepal.Length to find groups escapes some overhead as well.
One issue is definitely resolved #3438.
Three others I mentioned as "partially addressed", but I think we should define further actions to be taken in each, if we don't have any idea for those, we could eventually close them with the status "not perfect but much better ~ good enough".
Does this PR already covers #3205? If not then maybe avoiding stress-test threading/batchSize could give more speed?

jangorecki · 2020-06-17T00:00:12Z

Just to ensure this PR addresses the problem on Windows platform as well.

before patch

# 10 threads
   user  system elapsed
 264.71  257.06  291.49

# 1 thread
   user  system elapsed
  25.14   23.88   49.01

# 20 threads
   user  system elapsed
 609.39  609.09  621.81

after patch

# 10 threads
   user  system elapsed
  22.77   23.63   46.50

# 1 thread
   user  system elapsed
  25.35   24.16   49.50

# 20 threads
   user  system elapsed
  25.81   26.43   52.75

Note that issue #4527 is about single threaded overhead, which on windows seems to be much bigger than on linux. Here problem is the overhead caused by many threads.

subsetDT escape openmp, #3735

51b0286

jangorecki mentioned this pull request May 24, 2020

Selecting from data.table by row is very slow #3735

Open

MichaelChirico reviewed May 25, 2020

View reviewed changes

src/subset.c Outdated Show resolved Hide resolved

jangorecki mentioned this pull request May 25, 2020

new with optimization to allow avoid [ overhead #4488

Draft

forder also escape openmp

78d4e6c

jangorecki changed the title ~~subsetDT escape openmp, #3735~~ escape openmp: subsetDT, between, forder May 25, 2020

jangorecki added this to the 1.12.9 milestone May 26, 2020

jangorecki added 2 commits June 2, 2020 21:27

getDTthreads dynamically determine threads

7c49807

fix warning about man

bb9c840

This comment has been minimized.

Sign in to view

missing alias

4785fbc

jangorecki changed the title ~~escape openmp: subsetDT, between, forder~~ escape openmp Jun 2, 2020

MichaelChirico reviewed Jun 3, 2020

View reviewed changes

R/openmp-utils.R Outdated Show resolved Hide resolved

MichaelChirico reviewed Jun 3, 2020

View reviewed changes

src/data.table.h Outdated Show resolved Hide resolved

MichaelChirico reviewed Jun 3, 2020

View reviewed changes

src/openmp-utils.c Outdated Show resolved Hide resolved

jangorecki mentioned this pull request Jun 5, 2020

Windows 10 is faster with -fno-openmp than setDTthreads(1) on many repeated calls #4527

Closed

mattdowle added 2 commits June 11, 2020 16:37

Merge branch 'master' into subsetDT1

54f2802

renamed th_iters to throttle, and replaced types with a bool flag

acb45ba

mattdowle changed the title ~~escape openmp~~ throttle threads for iterated small data tasks Jun 13, 2020

news item

302e513

jangorecki commented Jun 13, 2020

View reviewed changes

src/openmp-utils.c Outdated Show resolved Hide resolved

src/openmp-utils.c Outdated Show resolved Hide resolved

src/openmp-utils.c Outdated Show resolved Hide resolved

src/types.c Outdated Show resolved Hide resolved

src/frollR.c Outdated Show resolved Hide resolved

jangorecki commented Jun 13, 2020

View reviewed changes

src/nafill.c Outdated Show resolved Hide resolved

jangorecki mentioned this pull request Jun 13, 2020

[R-Forge #5260] basic operations like row-subsets on data.table could be faster #542

Closed

jangorecki linked an issue Jun 13, 2020 that may be closed by this pull request

Rerun repeated uniqueN test #3438

Closed

jangorecki added the High label Jun 17, 2020

This was referenced Jun 18, 2020

test and confirm new parallel subset performance #3175

Closed

Rerun repeated uniqueN test #3438

Closed

Stress-test threading/batchSize is always on #3205

Closed

mattdowle added 2 commits June 18, 2020 02:23

more links added to news item, plus addressed Jan's comments

bcdff80

coverage

5485d59

mattdowle merged commit 89830e9 into master Jun 18, 2020

mattdowle deleted the subsetDT1 branch June 18, 2020 08:56

jangorecki mentioned this pull request Aug 25, 2020

Continuous Benchmarking #4687

Closed

9 tasks

jangorecki mentioned this pull request Nov 25, 2020

Julia: fix missing ( and ) bug h2oai/db-benchmark#147

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

throttle threads for iterated small data tasks #4484

throttle threads for iterated small data tasks #4484

jangorecki commented May 24, 2020 •

edited by mattdowle

Loading

codecov bot commented May 24, 2020 •

edited

Loading

mattdowle commented Jun 2, 2020

This comment has been minimized.

mattdowle commented Jun 13, 2020 •

edited

Loading

jangorecki left a comment

jangorecki commented Jun 13, 2020

jangorecki commented Jun 13, 2020 •

edited

Loading

jangorecki commented Jun 17, 2020 •

edited

Loading

throttle threads for iterated small data tasks #4484

throttle threads for iterated small data tasks #4484

Conversation

jangorecki commented May 24, 2020 • edited by mattdowle Loading

codecov bot commented May 24, 2020 • edited Loading

Codecov Report

mattdowle commented Jun 2, 2020

This comment has been minimized.

mattdowle commented Jun 13, 2020 • edited Loading

jangorecki left a comment

Choose a reason for hiding this comment

jangorecki commented Jun 13, 2020

jangorecki commented Jun 13, 2020 • edited Loading

jangorecki commented Jun 17, 2020 • edited Loading

jangorecki commented May 24, 2020 •

edited by mattdowle

Loading

codecov bot commented May 24, 2020 •

edited

Loading

mattdowle commented Jun 13, 2020 •

edited

Loading

jangorecki commented Jun 13, 2020 •

edited

Loading

jangorecki commented Jun 17, 2020 •

edited

Loading