Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP]optimize for the sparse feature #216

Closed
wants to merge 18 commits into from
Closed

[WIP]optimize for the sparse feature #216

wants to merge 18 commits into from

Conversation

guolinke
Copy link
Collaborator

@guolinke guolinke commented Jan 15, 2017

Remove OrderedSparseBin.
Combine many sparse bin to a pool and construct them simultaneously.

  • improve multi-threading performance

@guolinke
Copy link
Collaborator Author

@wxchan can you help to test this on higgs and yahoo dataset ?
And you can try sparse_aware=true as well.

@Laurae2
Copy link
Contributor

Laurae2 commented Jan 15, 2017

@guolinke I have several issues, used Bosch dataset for example:

  • Is the caching done single thread? I have a very long wait time before a model starts training (1 minute?, CPU sits on only one core 100% usage during that time).
  • Model is very slow to train. It takes 3 minutes for only 50 iterations (while it should take only about 35 seconds in my case), caching included.

I use this call:

lgb.train(params = list(num_threads = 12,
                                      max_depth = 6,
                                      min_hessian = 0,
                                      min_data_in_leaf = 1,
                                      learning_rate = 0.20,
                                      objective = "binary",
                                      metric = "binary_logloss",
                                      sparse_aware = TRUE),
                        data = lgb_data,
                        nrounds = 50,
                        verbose = 0)

If it is the wrong call to use, tell me so I can edit appropriately for my bench.

Edit: back with one benchmark:

Mode Thread Speed (s)
Previous bench 1 202.05
New: sparse_aware 1 709.42

@wxchan
Copy link
Contributor

wxchan commented Jan 15, 2017

My machine is running some else task, it may take a while, I will test it later (if it is still needed).

@guolinke
Copy link
Collaborator Author

@Laurae2 The caching is single threading now, will try to make it multi-threading.
Very slow ? Did you try sparse_aware = false ?

@guolinke
Copy link
Collaborator Author

@Laurae2 can you share you scripts that you convert for Bosch dataset? it seems have some non-numerical fields in the data file.

@Laurae2
Copy link
Contributor

Laurae2 commented Jan 15, 2017

@guolinke

With sparse_aware = FALSE (double = my previous benchmark):

image

I'll run with your latest commit (3c0d1e2) now.

Script used (I collated parts of my scripts to have all in one, tell me if something is wrong):

# Libraries
library(data.table)
library(Matrix)
library(recommenderlab)
library(xgboost)

# SET YOUR WORKING DIRECTORY
setwd("E:/")

gc()
# Read 1183747 rows and 970 (of 970) columns from 1.993 GB file in 00:02:52
train_numeric <- fread("train_numeric.csv", header = TRUE, sep = ",", stringsAsFactors = FALSE, colClasses = rep("numeric", 970))

# Delete if not needed
gc()
saveRDS(train_numeric, file = "train_numeric.rds", compress = TRUE) # For fast R load
write_feather(train_numeric, "train_numeric.feather") # Allows to fast load in Python

# Coerce to matrix
gc()
train_numeric <- as.matrix(train_numeric) # much faster: train_numeric <- Laurae::DT2mat(train_numeric)

# Sparse = NAs
gc()
train_numeric <- dropNA(train_numeric)

# Delete if not needed
gc()
saveRDS(train_numeric, file = "train_numeric_sparse.rds", compress = TRUE) # For fast R load. 721,720,806 bytes, CRC32 = 50D25879

# Save xgboost format
gc()
train_data <- xgb.DMatrix(data = train_numeric[1:1183747, 1:969], label = train_numeric[1:1183747, 970])
gc(verbose = FALSE)
xgb.DMatrix.save(train_data, "bosch.train_xgb")

# Save svmlight/libsvm format, does not load properly in xgboost for unknown reasons but works in LightGBM
library(sparsity) # Requires: devtools::install_github("Laurae2/sparsity")
write.svmlight(train_numeric[1:1183747, 1:969], train_numeric[1:1183747, 970], "bosch.train")

@guolinke
Copy link
Collaborator Author

some simple benchmark:
100 iter on yahoo ranking dataset:

threads 1 4 8 16
master version 369 128 63 48
sparse_aware = false 355 100 62 52
sparse_aware = true 265 86 60 75

@Laurae2 I change opt option from O2 to O3 in R package.
can you try it again?

@Laurae2
Copy link
Contributor

Laurae2 commented Jan 15, 2017

@guolinke Still a very large overhead for sparse_aware = true in preparing data. sparse_aware = false is slower than master version for multithread (overhead cost too large when it comes to using Intel hyperthreading, but low enough to still provide a boost). I'll restart the server to check if it is just not a sporadic issue.

Edit: after restarting server, still same issue. sparse_aware = true is very slow.

Edit 2: re-ran for sparse_aware = false. I updated table results after running 10 times on 6-12 threads + 50 iterations.

Commit used: 3e61395

Bosch set, 50 iterations:

Threads 1 6 12
master version 167.14 (5x run avg) 41.87 (5x run avg) 35.87
sparse_aware = false 161.75 (10x run avg) 43.89 (10x run avg) 41.13
sparse_aware = true (not run) (not a typo) 351.125 (not a typo) 299.83

Bosch set, 200 iterations:

Threads 1 2 3 4 6 12
xgboost (fast, lossguide) 875.73 531.13 398.90 391.15 374.33 337.78
sparse_aware = false 1198.34 735.62 532.07 460.93 not run not run
sparse_aware = true cancelled manually after 1 hour not run not run not run not run not run

P.S: my params for xgb/lightgbm:

xgb.train(params = list(nthread = i,
                        max_depth = 0,
                        max_leaves = 255,
                        max_bin = 255,
                        eta = 0.05,
                        objective = "binary:logistic",
                        booster = "gbtree",
                        tree_method = "hist",
                        grow_policy = "lossguide",
                        eval_metric = "auc",
                        gamma = 1,
                        min_child_weight = 100),
          data = xgb_data,
          verbose = FALSE,
          #early_stopping_rounds = 20,
          nrounds = 200)

temp_model <- lgb.train(params = list(num_threads = i,
                                        max_depth = -1,
                                        num_leaves = 255,
                                        max_bin = 255,
                                        learning_rate = 0.05,
                                        objective = "binary",
                                        metric = "auc",
                                        min_gain_to_split = 1,
                                        min_hessian = 100,
                                        min_data_in_leaf = 0,
                                        sparse_aware = FALSE), # or TRUE
                          data = lgb_data,
                          verbose = 0,
                          #early_stopping_rounds = 20,
                          nrounds = 200)

Copy link
Contributor

@Laurae2 Laurae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting a bit faster LightGBM without static->guided (commit 3e61395). But first I'll test all commits 1 by 1.

Edit: tested all commit, no good results for sparse_aware = true on Bosch. I'll change dataset.

Edit 2: faster on another set when using sparse_aware = true (private data). Wondering why LightGBM is this slow on Bosch.

Edit 3: private data set: 855,290 observations, 413 features, 17.1% sparse, binary classification: 689710(0) and 165580(1), 50 boosting iterations.

Threads 1 2
sparse_aware = false 102.59 97.80
sparse_aware = true 60.01 41.51

@guolinke
Copy link
Collaborator Author

guolinke commented Jan 16, 2017

@Laurae2 it seems is caused by negative values in sparse feature.
I will try to fix .
BTW, can you also add xgboost_hist result in your private dataset?

@Laurae2
Copy link
Contributor

Laurae2 commented Jan 16, 2017

@guolinke here for xgboost:

Threads 1 2
xgboost hist lossguide 226.85 110.86
sparse_aware = false 102.59 97.80
sparse_aware = true 60.01 41.51

@guolinke
Copy link
Collaborator Author

@Laurae2 fix, you can try on bosch dataset now.

@Laurae2
Copy link
Contributor

Laurae2 commented Jan 16, 2017

Commit used: 522a4d4

Still slower but can be sometimes much faster (see 1 thread for 50 iterations, 6 threads for 200 iterations) than sparse_aware = false, but at least the difference is much closer.

Speed for 50 iterations:

Threads 1 6 12
master 167.14 41.87 35.87
sparse_aware = false 165.56 41.43 33.70
sparse_aware = true 144.75 53.69 63.75

Speed for 200 iterations (will add 1 thread later):

Threads 6 12
sparse_aware = false 371.73 307.22
sparse_aware = true 284.81 394.15

@guolinke
Copy link
Collaborator Author

guolinke commented Jan 16, 2017

@Laurae2
weird...
It is about 1x faster in my test on bosch when set sparse_aware=true (4 threads)

@Laurae2 what is your environment ? windows / linux ? and which compiler you used ?

@Laurae2
Copy link
Contributor

Laurae2 commented Jan 16, 2017

@guolinke Environment: Windows Server 2012 R2, Rtools compiler.

More details about 50 iterations, each run warmed up by training once before (caching + training):

Threads 4 6
sparse_aware = false negligible + 53.16 negligible + 43.09
sparse_aware = true 20 + 33.09 16 + 37.01

I use:

  • R for sparse_aware = false timing (negligible < 0.5s)
  • bash for sparse_aware = true timing and wall clock for time it takes to cache before starting training (can't get timing in R which excludes the caching part :( )

I have a large inflated sparse_aware = true if I take into account the caching before training even starts (approx 20s for 4 threads, approx 16s for 6 threads). This is why I see a larger difference when using 200 iterations, and also why I get diminishing returns so quickly when using too many threads.

@guolinke
Copy link
Collaborator Author

@Laurae2 I see.
It seems the multi-threading speed of sparse_aware is still not good..
Did you check the cpu usage for sparse_aware ?
Thanks, I will try to continue optimizing it.

@Laurae2
Copy link
Contributor

Laurae2 commented Jan 16, 2017

@guolinke CPU usage maxed (per thread) during cache time. The only issue is the stuff done before training even start, otherwise the performance is very good (and faster than master branch).

With this PR, CPU usage is never maxed during training when using many threads (6 threads => 90% CPU usage, not 100%, reached diminishing returns cf benchmarks) for Bosch.

It seems if I want to max CPU usage with this PR I need to go for significantly larger datasets, because even my private dataset (855K x 413) can't run faster with more than 1 thread with sparse_aware branch (whether parameter is false or true) - but is still MUCH faster than single/multithread than master branch LightGBM.

This is both a good and bad thing, because it shows:

  • You can parallel train multiple models faster if you want to do cross-validation quickly instead of train models sequentially (the speed is very fast even single thread, faster than multithreaded xgboost exact as long as we talk about "low end CPUs" for servers like 6 core CPUs).
  • Overhead cost for multithreading increased dramatically. Therefore, no more linear scaling over the amount of threads (this is what xgboost approx/hist is experiencing, but the impact is higher for xgboost than LightGBM).

@Laurae2
Copy link
Contributor

Laurae2 commented Jan 16, 2017

@guolinke I ran 500 iterations on Bosch (changed to learning_rate = 0.02). The speed difference is very good, especially with 1 thread (more than 2x faster).

Threads 1 6 12
sparse_aware = false 2919.92 885.02 771.00
sparse_aware = true 1348.84 684.70 953.10

CPU usage still an issue (cores only 50% busy, even with 6 threads on a 6 core CPU), but it's very fast. sparse_aware = false and sparse_aware = true both exhibit this issue.

@guolinke guolinke changed the title optimize for the sparse feature [WIP]optimize for the sparse feature Jan 17, 2017
@Laurae2
Copy link
Contributor

Laurae2 commented Jan 23, 2017

@guolinke New run with commit @e35095f . Excellent performance so far. Sparse performance (sparse_aware = true) is scaling worse multithreaded, but still faster without sparse_aware = false. Also, dense performance seems to have improved significantly.

Also, I reached 29 seconds wall clock time for 50 iterations (12 threads, learning rate = 0.05), including both all the stuff before training + training.

  • 522a4d4 = previous benchmark
  • e35095f = fastest performance (last commit)

500 iterations (learning rate = 0.02)

Timings:

Threads 1 6 12
sparse_aware = false e35095f 2153.28 569.54 456.17
sparse_aware = false 522a4d4 2919.92 885.02 771.00
sparse_aware = true e35095f 1125.24 415.29 507.43
sparse_aware = true 522a4d4 1348.84 684.70 953.10

=> Sparse is great for long training jobs. Excellent performance for singlethread.

CPU usage approximate in average (reported by glances & scaled):

Threads 1 6 12
sparse_aware = false 7% (82%) 40% (80%) 78% (78%)
sparse_aware = true 7% (82%) 39% (78%) 76% (76%)

=> Data does not seem big enough anymore to make cores busy 100%. Need a bigger dataset, will try Higgs later when I have time.


50 iterations (learning rate = 0.20, invalidates old runs)

Timings:

Threads 1 6 12
sparse_aware = false e35095f 214.95 57.84 45.05
sparse_aware = false 522a4d4 ? ? ?
sparse_aware = true e35095f 176.96 58.16 64.41
sparse_aware = true 522a4d4 ? ? ?

=> Sparse is better with more iterations


50 iterations (learning rate = 0.05, same as old runs)

Timings:

Threads 1 6 12
sparse_aware = false e35095f 162.70 38.79 29.86
sparse_aware = false 522a4d4 165.56 41.43 33.70
sparse_aware = true e35095f 143.99 49.05 50.33
sparse_aware = true 522a4d4 144.75 53.69 63.75

=> Better performance than previously. Sparse is better with more iterations. Dense is having a 3s boost on each run (?). Sparse timing reaches 26-29s (12 threads) without all the stuff before training.

@guolinke
Copy link
Collaborator Author

@Laurae2 sorry, there are some bugs in the previous version. It will affect the result of sparse_aware = false .

@Laurae2
Copy link
Contributor

Laurae2 commented Jan 24, 2017

@guolinke With your fixes for sparse_aware = false, re-did all benchmarks:

small conclusion: I suppose sparse_aware = true should shine for long trainings on (very) large datasets (bigger than Bosch I suppose), while sparse_aware = false is more "model testing ready".

note: remove 6 seconds if you want the sparse_aware = true timings without binning/caching.


500 iterations (learning rate = 0.02)

Timings:

Threads 1 6 12
sparse_aware = false d085c16 1022.78 454.63 512.42
sparse_aware = false e35095f 2153.28 569.54 456.17
sparse_aware = false 522a4d4 2919.92 885.02 771.00
sparse_aware = true d085c16 1096.04 415.92 504.73
sparse_aware = true e35095f 1125.24 415.29 507.43
sparse_aware = true 522a4d4 1348.84 684.70 953.10

=> Sparse multithread is faster than non-sparse multithread.

CPU usage approximate in average (reported by glances, scaled):

Threads 1 6 12
sparse_aware = false d085c16 10% (93.1%) 40% (70.8%) 51% (59.3%)
sparse_aware = true d085c16 9% (91.1%) 38% (88.4%) 75% (78.2%)

=> Sparse makes cores more busy.


50 iterations (learning rate = 0.20)

Timings, run 5 times when standard deviation is provided:

Threads 1 6 12
sparse_aware = false d085c16 109.68 + 1.08 48.28 + 0.29 53.29 + 0.45
sparse_aware = false e35095f 214.95 57.84 45.05
sparse_aware = false 522a4d4 ? ? ?
sparse_aware = true d085c16 136.37 + 8.95 51.62 + 1.13 58.02 + 2.00
sparse_aware = true e35095f 176.96 58.16 64.41
sparse_aware = true 522a4d4 ? ? ?

=> Sparse slightly slower, not enough iterations, too much overhead (approx 6s) for the binning/cache happening before training (but I don't think one would train for 50 iterations for serious modeling anyway).

=> If I remove the binning/caching time, sparse_aware = true is faster when multithreaded vs sparse_aware = false.


50 iterations (learning rate = 0.05)

Timings, run 5 times when standard deviation is provided:

Threads 1 6 12
sparse_aware = false d085c16 67.83 + 0.96 31.12 + 0.51 34.56 + 0.15
sparse_aware = false e35095f 162.70 38.79 29.86
sparse_aware = false 522a4d4 165.56 41.43 33.70
sparse_aware = true d085c16 102.19 + 1.47 39.53 + 1.90 42.97 + 1.12
sparse_aware = true e35095f 143.99 49.05 50.33
sparse_aware = true 522a4d4 144.75 53.69 63.75

=> Sparse slower, because the training is very fast (not deep trees).

=> If I remove the binning/caching time, not fast enough to catch sparse_aware = false but this is expected (too fast training).

@Laurae2
Copy link
Contributor

Laurae2 commented Jan 25, 2017

@guolinke Very odd behavior since commit 66bf4c2. It does seem to train trees in a very odd (and non-identical) way when sparse = true. Also, significantly slower but this must be due to the strange trees.

I'm training for 50 iterations and I'm getting this:

> temp_model <- warmup(lgb_data = lgb_data, sparse = FALSE)
[LightGBM] [Info] Number of postive: 6879, number of negative: 1176868
[LightGBM] [Info] Number of data: 1183747, number of features: 960
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 31
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 31
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 30
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 31
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 30
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 30
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 28
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 28
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 32
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 28
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 28
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 30
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 28
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 28
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 31
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 30
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 30
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 30
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 32
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 31
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 33
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 31
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 32
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 32
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 33
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 31
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 33
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 33
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 33
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 32
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 34
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 33
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 33
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 32
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 32
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 33
LightGBM Warmup time: 27.529066  
> temp_model <- warmup(lgb_data = lgb_data, sparse = TRUE) # where did the printing go!?
[LightGBM] [Info] Number of postive: 6879, number of negative: 1176868
[LightGBM] [Info] Number of data: 1183747, number of features: 960
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 30
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 33
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 32
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 31
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 32
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 29
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 44
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 38
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 201
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 211
[LightGBM] [Info] No further splits with positive gain, best gain: -inf, leaves: 205
LightGBM Warmup time: 103.549416

@guolinke
Copy link
Collaborator Author

@Laurae2 sorry. Fixed.

@Laurae2
Copy link
Contributor

Laurae2 commented Jan 25, 2017

@guolinke Re-did all benchmarks using your bux fix 60f6c9f . Small fluctuations, but sparse_aware = true is slower. Added Higgs 500 iterations also + AUC check + master branch comparison. Singlethread performance is excellent.


Higgs 500 iterations (learning rate = 0.10)

Timings:

Threads Commit 1 6 12
sparse_aware = false 60f6c9f 2935.89 848.85 790.65
sparse_aware = false d085c16 2412.94 763.14 744.15
sparse_aware = true 60f6c9f 2180.98 760.63 838.84
sparse_aware = true d085c16 2123.12 760.17 822.43
none master 3003.66 820.09 730.93

AUC check for 50 iterations:

Threads Commit AUC
sparse_aware = false 60f6c9f 0.821919
sparse_aware = false d085c16 0.821919
sparse_aware = true 60f6c9f 0.821919
sparse_aware = true d085c16 0.821919
none master 0.821919

Bosch 500 iterations (learning rate = 0.02)

Timings:

Threads Commit 1 6 12
sparse_aware = false 60f6c9f 1016.10 432.66 512.29
sparse_aware = false d085c16 1022.78 454.63 512.42
sparse_aware = true 60f6c9f 1141.91 422.74 507.54
sparse_aware = true d085c16 1096.04 415.92 504.73
none master 1534.08 673.57 560.71

Bosch 50 iterations (learning rate = 0.20)

Timings, run 5 times when standard deviation is provided:

Threads Commit 1 6 12
sparse_aware = false 60f6c9f 107.33 + 1.07 46.53 + 0.83 53.39 + 1.44
sparse_aware = false d085c16 109.68 + 1.08 48.28 + 0.29 53.29 + 0.45
sparse_aware = true 60f6c9f 132.08 + 2.25 51.31 + 1.59 57.75 + 0.34
sparse_aware = true d085c16 136.37 + 8.95 51.62 + 1.13 58.02 + 2.00
none master 153.83 + 4.03 68.73 + 0.50 54.90 + 0.85

Bosch 50 iterations (learning rate = 0.05)

Timings, run 5 times when standard deviation is provided:

Threads Commit 1 6 12
sparse_aware = false 60f6c9f 67.07 + 0.36 30.14 + 0.69 33.90 + 1.05
sparse_aware = false d085c16 67.83 + 0.96 31.12 + 0.51 34.56 + 0.15
sparse_aware = true 60f6c9f 100.25 + 0.71 37.96 + 0.78 43.95 + 0.46
sparse_aware = true d085c16 102.19 + 1.47 39.53 + 1.90 42.97 + 1.12
none master 99.73 + 2.28 45.25 + 0.72 33.63 + 0.47

@guolinke
Copy link
Collaborator Author

close this due to i cannot find a good solution for multi-threading optimization.
welcome to leave your comments if you have any ideas.
Thanks.

@guolinke guolinke closed this Feb 14, 2017
@guolinke guolinke deleted the sparse-aware branch February 20, 2017 03:33
@lock lock bot locked as resolved and limited conversation to collaborators Mar 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants