-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP]optimize for the sparse feature #216
Conversation
@wxchan can you help to test this on higgs and yahoo dataset ? |
@guolinke I have several issues, used Bosch dataset for example:
I use this call: lgb.train(params = list(num_threads = 12,
max_depth = 6,
min_hessian = 0,
min_data_in_leaf = 1,
learning_rate = 0.20,
objective = "binary",
metric = "binary_logloss",
sparse_aware = TRUE),
data = lgb_data,
nrounds = 50,
verbose = 0) If it is the wrong call to use, tell me so I can edit appropriately for my bench. Edit: back with one benchmark:
|
My machine is running some else task, it may take a while, I will test it later (if it is still needed). |
@Laurae2 The caching is single threading now, will try to make it multi-threading. |
@Laurae2 can you share you scripts that you convert for Bosch dataset? it seems have some non-numerical fields in the data file. |
With I'll run with your latest commit (3c0d1e2) now. Script used (I collated parts of my scripts to have all in one, tell me if something is wrong): # Libraries
library(data.table)
library(Matrix)
library(recommenderlab)
library(xgboost)
# SET YOUR WORKING DIRECTORY
setwd("E:/")
gc()
# Read 1183747 rows and 970 (of 970) columns from 1.993 GB file in 00:02:52
train_numeric <- fread("train_numeric.csv", header = TRUE, sep = ",", stringsAsFactors = FALSE, colClasses = rep("numeric", 970))
# Delete if not needed
gc()
saveRDS(train_numeric, file = "train_numeric.rds", compress = TRUE) # For fast R load
write_feather(train_numeric, "train_numeric.feather") # Allows to fast load in Python
# Coerce to matrix
gc()
train_numeric <- as.matrix(train_numeric) # much faster: train_numeric <- Laurae::DT2mat(train_numeric)
# Sparse = NAs
gc()
train_numeric <- dropNA(train_numeric)
# Delete if not needed
gc()
saveRDS(train_numeric, file = "train_numeric_sparse.rds", compress = TRUE) # For fast R load. 721,720,806 bytes, CRC32 = 50D25879
# Save xgboost format
gc()
train_data <- xgb.DMatrix(data = train_numeric[1:1183747, 1:969], label = train_numeric[1:1183747, 970])
gc(verbose = FALSE)
xgb.DMatrix.save(train_data, "bosch.train_xgb")
# Save svmlight/libsvm format, does not load properly in xgboost for unknown reasons but works in LightGBM
library(sparsity) # Requires: devtools::install_github("Laurae2/sparsity")
write.svmlight(train_numeric[1:1183747, 1:969], train_numeric[1:1183747, 970], "bosch.train") |
some simple benchmark:
@Laurae2 I change opt option from O2 to O3 in R package. |
@guolinke Still a very large overhead for Edit: after restarting server, still same issue. Edit 2: re-ran for sparse_aware = false. I updated table results after running 10 times on 6-12 threads + 50 iterations. Commit used: 3e61395 Bosch set, 50 iterations:
Bosch set, 200 iterations:
P.S: my params for xgb/lightgbm: xgb.train(params = list(nthread = i,
max_depth = 0,
max_leaves = 255,
max_bin = 255,
eta = 0.05,
objective = "binary:logistic",
booster = "gbtree",
tree_method = "hist",
grow_policy = "lossguide",
eval_metric = "auc",
gamma = 1,
min_child_weight = 100),
data = xgb_data,
verbose = FALSE,
#early_stopping_rounds = 20,
nrounds = 200)
temp_model <- lgb.train(params = list(num_threads = i,
max_depth = -1,
num_leaves = 255,
max_bin = 255,
learning_rate = 0.05,
objective = "binary",
metric = "auc",
min_gain_to_split = 1,
min_hessian = 100,
min_data_in_leaf = 0,
sparse_aware = FALSE), # or TRUE
data = lgb_data,
verbose = 0,
#early_stopping_rounds = 20,
nrounds = 200) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting a bit faster LightGBM without static->guided (commit 3e61395). But first I'll test all commits 1 by 1.
Edit: tested all commit, no good results for sparse_aware = true on Bosch. I'll change dataset.
Edit 2: faster on another set when using sparse_aware = true (private data). Wondering why LightGBM is this slow on Bosch.
Edit 3: private data set: 855,290 observations, 413 features, 17.1% sparse, binary classification: 689710(0) and 165580(1), 50 boosting iterations.
Threads | 1 | 2 |
---|---|---|
sparse_aware = false | 102.59 | 97.80 |
sparse_aware = true | 60.01 | 41.51 |
@Laurae2 it seems is caused by negative values in sparse feature. |
@guolinke here for xgboost:
|
@Laurae2 fix, you can try on bosch dataset now. |
Commit used: 522a4d4 Still slower but can be sometimes much faster (see 1 thread for 50 iterations, 6 threads for 200 iterations) than Speed for 50 iterations:
Speed for 200 iterations (will add 1 thread later):
|
@guolinke Environment: Windows Server 2012 R2, Rtools compiler. More details about 50 iterations, each run warmed up by training once before (caching + training):
I use:
I have a large inflated |
@Laurae2 I see. |
@guolinke CPU usage maxed (per thread) during cache time. The only issue is the stuff done before training even start, otherwise the performance is very good (and faster than master branch). With this PR, CPU usage is never maxed during training when using many threads (6 threads => 90% CPU usage, not 100%, reached diminishing returns cf benchmarks) for Bosch. It seems if I want to max CPU usage with this PR I need to go for significantly larger datasets, because even my private dataset (855K x 413) can't run faster with more than 1 thread with This is both a good and bad thing, because it shows:
|
@guolinke I ran 500 iterations on Bosch (changed to
CPU usage still an issue (cores only 50% busy, even with 6 threads on a 6 core CPU), but it's very fast. |
@guolinke New run with commit @e35095f . Excellent performance so far. Sparse performance ( Also, I reached 29 seconds wall clock time for 50 iterations (12 threads, learning rate = 0.05), including both all the stuff before training + training. 500 iterations (learning rate = 0.02)Timings:
=> Sparse is great for long training jobs. Excellent performance for singlethread. CPU usage approximate in average (reported by glances & scaled):
=> Data does not seem big enough anymore to make cores busy 100%. Need a bigger dataset, will try Higgs later when I have time. 50 iterations (learning rate = 0.20, invalidates old runs)Timings:
=> Sparse is better with more iterations 50 iterations (learning rate = 0.05, same as old runs)Timings:
=> Better performance than previously. Sparse is better with more iterations. Dense is having a 3s boost on each run (?). Sparse timing reaches 26-29s (12 threads) without all the stuff before training. |
@Laurae2 sorry, there are some bugs in the previous version. It will affect the result of |
@guolinke With your fixes for small conclusion: I suppose note: remove 6 seconds if you want the 500 iterations (learning rate = 0.02)Timings:
=> Sparse multithread is faster than non-sparse multithread. CPU usage approximate in average (reported by glances, scaled):
=> Sparse makes cores more busy. 50 iterations (learning rate = 0.20)Timings, run 5 times when standard deviation is provided:
=> Sparse slightly slower, not enough iterations, too much overhead (approx 6s) for the binning/cache happening before training (but I don't think one would train for 50 iterations for serious modeling anyway). => If I remove the binning/caching time, 50 iterations (learning rate = 0.05)Timings, run 5 times when standard deviation is provided:
=> Sparse slower, because the training is very fast (not deep trees). => If I remove the binning/caching time, not fast enough to catch |
@guolinke Very odd behavior since commit 66bf4c2. It does seem to train trees in a very odd (and non-identical) way when I'm training for 50 iterations and I'm getting this: > temp_model <- warmup(lgb_data = lgb_data, sparse = FALSE)
[LightGBM] [Info] Number of postive: 6879, number of negative: 1176868
[LightGBM] [Info] Number of data: 1183747, number of features: 960
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 31
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 31
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 30
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 31
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 30
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 30
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 28
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 28
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 32
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 28
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 28
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 30
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 28
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 28
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 31
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 30
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 30
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 30
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 32
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 31
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 33
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 31
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 32
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 32
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 33
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 31
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 33
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 33
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 33
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 32
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 34
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 33
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 33
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 32
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 32
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 33
LightGBM Warmup time: 27.529066
> temp_model <- warmup(lgb_data = lgb_data, sparse = TRUE) # where did the printing go!?
[LightGBM] [Info] Number of postive: 6879, number of negative: 1176868
[LightGBM] [Info] Number of data: 1183747, number of features: 960
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 30
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 33
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 32
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 31
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 32
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 44
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 38
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 201
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 211
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 205
LightGBM Warmup time: 103.549416 |
@Laurae2 sorry. Fixed. |
@guolinke Re-did all benchmarks using your bux fix 60f6c9f . Small fluctuations, but Higgs 500 iterations (learning rate = 0.10)Timings:
AUC check for 50 iterations:
Bosch 500 iterations (learning rate = 0.02)Timings:
Bosch 50 iterations (learning rate = 0.20)Timings, run 5 times when standard deviation is provided:
Bosch 50 iterations (learning rate = 0.05)Timings, run 5 times when standard deviation is provided:
|
close this due to i cannot find a good solution for multi-threading optimization. |
Remove OrderedSparseBin.
Combine many sparse bin to a pool and construct them simultaneously.