-
Notifications
You must be signed in to change notification settings - Fork 993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance drop with 1.12.0 (Selection + assignment) #3395
Comments
Thanks for reporting. library(data.table)
fread("id colA colB colC colD colE colF colG colH colI colJ colK colL colM colN
1 text text text text text text text text text text text text text text
2 text text text text text text text text text text text text text text
3 text text text text text text text text text text text text text text
4 text text text text text text text text text text text text text text", sep=" ") -> myDataTable
quote({
myDataTable[colJ != "production", colK:=na.omit(colK), by=.(colA)]
myDataTable[colJ != "production", colH:=na.omit(colH), by=.(colA)]
myDataTable[colJ != "production", colL:=na.omit(colL), by=.(colA)]
myDataTable[colJ == "code",colM:=colE, by=.(colA)]
myDataTable[,colM:=na.omit(colM), by=.(colA)]
myDataTable[is.na(colM), colM:=""]
myDataTable[colJ == "test",colN:=colE, by=.(colA)]
myDataTable[,colN:=na.omit(colN), by=.(colA)]
}) -> q
setNames(as.list(q[-1L]), paste0("q", seq.int(length(q)-1L))) -> l
packageVersion("data.table")
rbindlist(lapply(l, function(expr) as.list(system.time(eval(expr)))), idcol="q") and the timings I am getting are as follows
1.11.8
Timings are so tiny that it is hard to draw any conclusions, but at the first glance latest master appears to be faster. Could you please run above code on your 1.11.8 and 1.12.0/master and provide results? Also provide output of |
Thank you for raising this issue. It identified a massive performance drop in a package when being tested on CircleCI. The automatic setting of threads resulted in a tenfold speed decrease. Perhaps multithreading should be disabled either by default or in the presence of an environment variable. The problem with |
We need good reproducible examples which highlights the speed decrease, so we have something to debug on. There are already some, but none of them as much as x10. As of now default is using all cores. We could try to run revdeps using all threads vs |
Here you go: https://circleci.com/gh/HughParsonage/grattan/tree/population-forecast Turned out to be a 300x increase in runtime(!!): With multithreading: and without: |
Looks like what I observed in my systems (#3298). @HughParsonage, would you mind to take a look at how |
Using https://circleci.com/gh/HughParsonage/grattan/259#tests/containers/3 |
@HughParsonage are you able to run test script from console? something like I am getting some not well described error devtools::test(filter="pension")
Loading grattan
Error: Command failed (1) I think it is because of lack of other dependencies of grattan. |
Can you run |
I set 8 threads in second try, but it actually uses 4.
have you tried reversing the order of your commands? so test first many threads, and then single thread. |
Thank you for this quick answer. I have run your test @jangorecki
Number of core/threads:
|
@JeremyBesson thanks, could you try |
Sorry jangorecki I'm not sure if I've answered your question, but reversing the commands did not seem to make any difference. |
@HughParsonage OK, problem seems to be environment specific and manifests when different services are running as well as R. Could you also please check if limiting to number of physical cores (not just 2, unless you test on 2 core machine) will resolve speed issue? If it will then probably the safest option will be to change the default, as suggested by @renkun-ken |
With
With
|
Here is another example for this issue: On an otherwise idle AWS r4.16xlarge ec2 instance (64 vCPU) I get: N_X <- 1e6
n_day <- 60
n_clientid <- 1e5
n_Platform <- 7
X <- data.table(
day = sample(1:n_day, N_X, TRUE),
clientid = as.character(sample(1:n_clientid, N_X, TRUE)),
Platform = as.character(sample(1:n_Platform, N_X, TRUE))
)
setDTthreads()
getDTthreads(verbose = TRUE)
#omp_get_max_threads() = 64
#omp_get_thread_limit() = 2147483647
#DTthreads = 0
#RestoreAfterFork = true
#[1] 64
system.time(
X[, .(x = uniqueN(day) - 1L,
first_active_day = min(day),
last_active_day = max(day)),
by = .(Platform, clientid)]
)
# user system elapsed
#3958.998 10.023 64.454
setDTthreads(1)
system.time(
X[, .(x = uniqueN(day) - 1L,
first_active_day = min(day),
last_active_day = max(day)),
by = .(Platform, clientid)]
)
# user system elapsed
# 7.903 0.277 8.184
setDTthreads(32)
system.time(
X[, .(x = uniqueN(day) - 1L,
first_active_day = min(day),
last_active_day = max(day)),
by = .(Platform, clientid)]
)
# user system elapsed
#611.456 1.192 19.571 For the real data where X is about 1e8 rows the difference in run-time is even larger. Overall, one larger part of code which (with data.table version < 1.12.0) took about 30 minutes now takes about 10 hours. |
Also facing the issue, using:
Here is more details using @akersting reproducible example. The issue is linked to the presence (or non-presence) of uniqueN with high cardinality group by. min and max are negligible. Does uniqueN uses extra parallelism or causes any side effect? It starts from the following line: Line 147 in ca43a47
With Intel VTune, time "seems" mostly spent creating threads (might not be true, I don't have debugging symbols): As I don't have the debugging symbols for OpenMP, I cannot find out exactly where it happens: Some examples: library(data.table)
N_X <- 1e6
n_day <- 60
n_clientid <- 1e5
n_Platform <- 7
X <- data.table(day = sample(1:n_day, N_X, TRUE),
clientid = as.character(sample(1:n_clientid, N_X, TRUE)),
Platform = as.character(sample(1:n_Platform, N_X, TRUE)))
n_cores <- parallel::detectCores()
# Reported slowdown: uniqueN
results <- list()
for (i in seq_len(n_cores)) {
cat(sprintf(paste0("%0", floor(log10(i) + 1), "d"), i), ": ", sep = "")
setDTthreads(i)
results[[i]] <- system.time(X[, .(x = uniqueN(day) - 1L,
first_active_day = min(day),
last_active_day = max(day)),
by = .(Platform, clientid)])
cat("[user=", sprintf("%06.03f", results[[i]][1]),
", system=", sprintf("%06.03f", results[[i]][2]),
", elapsed=", sprintf("%06.03f", results[[i]][3]), "]\n", sep = "")
}
# No slowdown: length(unique()) workaround
results2 <- list()
for (i in seq_len(n_cores)) {
cat(sprintf(paste0("%0", floor(log10(i) + 1), "d"), i), ": ", sep = "")
setDTthreads(i)
results2[[i]] <- system.time(X[, .(x = length(unique(day)) - 1L,
first_active_day = min(day),
last_active_day = max(day)),
by = .(Platform, clientid)])
cat("[user=", sprintf("%06.03f", results2[[i]][1]),
", system=", sprintf("%06.03f", results2[[i]][2]),
", elapsed=", sprintf("%06.03f", results2[[i]][3]), "]\n", sep = "")
}
# Slowdown: uniqueN only
results3 <- list()
for (i in seq_len(n_cores)) {
cat(sprintf(paste0("%0", floor(log10(i) + 1), "d"), i), ": ", sep = "")
setDTthreads(i)
results3[[i]] <- system.time(X[, .(x = uniqueN(day) - 1L),
by = .(Platform, clientid)])
cat("[user=", sprintf("%06.03f", results3[[i]][1]),
", system=", sprintf("%06.03f", results3[[i]][2]),
", elapsed=", sprintf("%06.03f", results3[[i]][3]), "]\n", sep = "")
}
# No slowdown: no uniqueN
results4 <- list()
for (i in seq_len(n_cores)) {
cat(sprintf(paste0("%0", floor(log10(i) + 1), "d"), i), ": ", sep = "")
setDTthreads(i)
results4[[i]] <- system.time(X[, .(first_active_day = min(day),
last_active_day = max(day)),
by = .(Platform, clientid)])
cat("[user=", sprintf("%06.03f", results4[[i]][1]),
", system=", sprintf("%06.03f", results4[[i]][2]),
", elapsed=", sprintf("%06.03f", results4[[i]][3]), "]\n", sep = "")
}
# No slowdown: uniqueN, low cardinality
results5 <- list()
for (i in seq_len(n_cores)) {
cat(sprintf(paste0("%0", floor(log10(i) + 1), "d"), i), ": ", sep = "")
setDTthreads(i)
results5[[i]] <- system.time(X[, .(x = uniqueN(day) - 1L),
by = .(Platform)])
cat("[user=", sprintf("%06.03f", results5[[i]][1]),
", system=", sprintf("%06.03f", results5[[i]][2]),
", elapsed=", sprintf("%06.03f", results5[[i]][3]), "]\n", sep = "")
}
# Slowdown: uniqueN, high cardinality
results6 <- list()
for (i in seq_len(n_cores)) {
cat(sprintf(paste0("%0", floor(log10(i) + 1), "d"), i), ": ", sep = "")
setDTthreads(i)
results6[[i]] <- system.time(X[, .(x = uniqueN(day) - 1L),
by = .(clientid)])
cat("[user=", sprintf("%06.03f", results6[[i]][1]),
", system=", sprintf("%06.03f", results6[[i]][2]),
", elapsed=", sprintf("%06.03f", results6[[i]][3]), "]\n", sep = "")
} Debugger output for branching: > debugonce(uniqueN)
> X[, .(x = uniqueN(day),
+ first_active_day = min(day),
+ last_active_day = max(day))]
debugging in: uniqueN(day)
debug: {
if (missing(by) && is.data.table(x) && isTRUE(getOption("datatable.old.unique.by.key"))) {
by = key(x)
warning(warning_oldUniqueByKey)
}
if (is.null(x))
return(0L)
if (!is.atomic(x) && !is.data.frame(x))
stop("x must be an atomic vector or data.frames/data.tables")
if (is.atomic(x)) {
if (is.logical(x))
return(.Call(CuniqueNlogical, x, na.rm = na.rm))
x = as_list(x)
}
if (is.null(by))
by = seq_along(x)
o = forderv(x, by = by, retGrp = TRUE, na.last = if (!na.rm)
FALSE
else NA)
starts = attr(o, "starts")
if (!na.rm) {
length(starts)
}
else {
sum((if (length(o)) o[starts] else starts) != 0L)
}
}
Browse[2]> n
debug: if (missing(by) && is.data.table(x) && isTRUE(getOption("datatable.old.unique.by.key"))) {
by = key(x)
warning(warning_oldUniqueByKey)
}
Browse[2]> n
debug: if (is.null(x)) return(0L)
Browse[2]> n
debug: if (!is.atomic(x) && !is.data.frame(x)) stop("x must be an atomic vector or data.frames/data.tables")
Browse[2]> n
debug: if (is.atomic(x)) {
if (is.logical(x))
return(.Call(CuniqueNlogical, x, na.rm = na.rm))
x = as_list(x)
}
Browse[2]> n
debug: if (is.logical(x)) return(.Call(CuniqueNlogical, x, na.rm = na.rm))
Browse[2]> n
debug: x = as_list(x)
Browse[2]> as_list
function (x)
{
lx = vector("list", 1L)
.Call(Csetlistelt, lx, 1L, x)
lx
}
<bytecode: 0x55555bc452a8>
<environment: namespace:data.table>
Browse[2]> n
debug: if (is.null(by)) by = seq_along(x)
Browse[2]> n
debug: seq_along(x)
Browse[2]> n
debug: o = forderv(x, by = by, retGrp = TRUE, na.last = if (!na.rm) FALSE else NA)
Browse[2]> n
debug: [1] FALSE
Browse[2]> n
debug: starts = attr(o, "starts")
Browse[2]> n
debug: if (!na.rm) {
length(starts)
} else {
sum((if (length(o)) o[starts] else starts) != 0L)
}
Browse[2]> n
debug: length(starts)
Browse[2]> n
exiting from: uniqueN(day) Test function for Intel VTune: library(data.table)
N_X <- 1e6
n_day <- 60
n_clientid <- 1e5
n_Platform <- 7
X <- data.table(day = sample(1:n_day, N_X, TRUE),
clientid = as.character(sample(1:n_clientid, N_X, TRUE)),
Platform = as.character(sample(1:n_Platform, N_X, TRUE)))
setDTthreads(parallel::detectCores())
results <- list()
for (i in seq_len(5)) {
cat(sprintf(paste0("%0", floor(log10(i) + 1), "d"), i), ": ", sep = "")
results[[i]] <- system.time(X[, .(x = uniqueN(day) - 1L,
first_active_day = min(day),
last_active_day = max(day)),
by = .(clientid)])
cat("[user=", sprintf("%06.03f", results[[i]][1]),
", system=", sprintf("%06.03f", results[[i]][2]),
", elapsed=", sprintf("%06.03f", results[[i]][3]), "]\n", sep = "")
} |
I'm not sure if it's the same issue as #3330 because I haven't set the DTthreads parameters to 1.
But I could imagine that the default value is 1.
This list of operations take less than 1 second to be executed with data.table v1.11.8.
With data.table v1.12.0 it's take more than 4 seconds !
And there is only 4 rows in my data table :
(You can replace "text" by whatever)
We see this problem only in our production environment and not in other environment. This production server is really busy so it can explain why it's so long. But the test with v1.11.8 was done in the same environment without performance issue (We done this test many times with both version).
We run R in a Docker container with the system Ubuntu 16.04.
I hope this will be fixed in a future data.table release because we will be blocked in v1.11.8 until this issue is not resolved.
Thank you for your support.
The text was updated successfully, but these errors were encountered: