Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tree is not input scale invariant for simple X transformation? #4017

Closed
joegaotao opened this issue Dec 23, 2018 · 11 comments
Closed

Tree is not input scale invariant for simple X transformation? #4017

joegaotao opened this issue Dec 23, 2018 · 11 comments

Comments

@joegaotao
Copy link

Theoretically, tree is invariant for X simple transformation, such as "a * X - b". However, I do some simple tests, and I surprisingly found different version xgboost has different odd behavior, simple transformation will lead to different results. Here is the R code:

xgboost 0.71.2, change X to X - 8

library(xgboost)
set.seed(111)
N <- 80000
p <- 50
X <- matrix(runif(N * p, 0, 1), ncol = p)
colnames(X) <- paste0("x", 1:p)
beta <- runif(p)
y <- X %*% beta #+ rnorm(N, mean = 0, sd  = 0.1)

tr <- sample.int(N, N * 0.75)

###

param1 <- list(nrounds = 10, num_parallel_tree = 1, nthread = 10L, eta = 0.3, max_depth = 30,
  seed = 2018, colsample_bytree = 1, subsample = 1,  min_child_weight = 10,
  tree_method = "exact")
param1$data <- X[tr,]
param1$label <- y[tr]

set.seed(2019)
bst1 <- do.call(xgboost::xgboost, param1)
test_pred1 <- predict(bst1, newdata = X[-tr,])

newX <- X  - 8


param2 <- list(nrounds = 10, num_parallel_tree = 1, nthread = 10L, eta = 0.3, max_depth = 30,
  seed = 2018, colsample_bytree = 1, subsample = 1,  min_child_weight = 10,
  tree_method = "exact")
param2$data <- newX[tr,]
param2$label <- y[tr]

set.seed(2019)
bst2 <- do.call(xgboost::xgboost, param2)
test_pred2 <- predict(bst2, newdata = newX[-tr,])

summary(test_pred1 - test_pred2)
#     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
#-1.784631 -0.316670 -0.001692  0.002795  0.321040  1.831196

R sessionInfo()

> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so

locale:
 [1] LC_CTYPE=C                 LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8
 [8] LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] xgboost_0.71.2

loaded via a namespace (and not attached):
[1] compiler_3.5.2    magrittr_1.5      Matrix_1.2-15     tools_3.5.2       stringi_1.2.4     grid_3.5.2        data.table_1.11.8 lattice_0.20-38

xgboost from master compilation, 0.81.0.1, change X - 8 to X - 1 or X / 10

library(xgboost)
set.seed(111)
N <- 80000
p <- 50
X <- matrix(runif(N * p, 0, 1), ncol = p)
colnames(X) <- paste0("x", 1:p)
beta <- runif(p)
y <- X %*% beta #+ rnorm(N, mean = 0, sd  = 0.1)

tr <- sample.int(N, N * 0.75)

###

param1 <- list(nrounds = 10, num_parallel_tree = 1, nthread = 10L, eta = 0.3, max_depth = 30,
  seed = 2018, colsample_bytree = 1, subsample = 1,  min_child_weight = 10,
  tree_method = "exact")
param1$data <- X[tr,]
param1$label <- y[tr]

set.seed(2019)
bst1 <- do.call(xgboost::xgboost, param1)
test_pred1 <- predict(bst1, newdata = X[-tr,])

newX <- X  - 1


param2 <- list(nrounds = 10, num_parallel_tree = 1, nthread = 10L, eta = 0.3, max_depth = 30,
  seed = 2018, colsample_bytree = 1, subsample = 1,  min_child_weight = 10,
  tree_method = "exact")
param2$data <- newX[tr,]
param2$label <- y[tr]

set.seed(2019)
bst2 <- do.call(xgboost::xgboost, param2)
test_pred2 <- predict(bst2, newdata = newX[-tr,])

summary(test_pred1 - test_pred2)
#     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
#-0.714748 -0.109097 -0.003238 -0.002930  0.105858  0.726057
@joegaotao joegaotao changed the title Tree is not invariant for simple X transformation? Tree is not input scale invariant for simple X transformation? Dec 24, 2018
@trivialfis
Copy link
Member

I will try to reproduce it in Python when time allows.

@joegaotao
Copy link
Author

I test in python 3.5.2 with xgboost 0.81

import xgboost as xgb
import numpy as np

np.random.seed(111)
N = 80000
p = 50
X = np.random.uniform(0, 1, (N, p))
beta = np.random.uniform(0, 1, p)
y = X.dot(beta)

tr = int(N * 0.75)

trainX = X[:tr, :]
trainy = y[:tr]
dtest = xgb.DMatrix(X[tr:, :])

param = {"num_parallel_tree":1, "nthread":10, "eta":0.3, "max_depth":30, "seed":2018, 
	"colsample_bytree":1, "subsample":1, "min_child_weight":10, "tree_method":"exact"}

dtrain1 = xgb.DMatrix(trainX, label = trainy)
bst1 = xgb.train(param, dtrain1, num_boost_round = 10)
pred1 = bst1.predict(dtest)

trainX_new = trainX / 10
dtrain2 = xgb.DMatrix(trainX_new, label = trainy)
bst2 = xgb.train(param, dtrain2, num_boost_round = 10)
pred2 = bst2.predict(dtest)

np.cov(pred1 - pred2)
array(0.56649683)

@trivialfis
Copy link
Member

trivialfis commented Dec 28, 2018

@joegaotao Thanks! Will look into it this weekend. :)

@trivialfis
Copy link
Member

@joegaotao You need to scale the prediction dataset.

@trivialfis
Copy link
Member

@joegaotao
Here:
pred2 = bst2.predict(dtest)
dtest needs to be scaled accordingly.

@joegaotao
Copy link
Author

@trivialfis sorry, I make a mistake. I did some tests again, X - 10

import xgboost as xgb
import numpy as np

np.random.seed(111)
N = 80000
p = 50
X = np.random.uniform(0, 1, (N, p))
beta = np.random.uniform(0, 1, p)
y = X.dot(beta)

tr = int(N * 0.75)

param = {"num_parallel_tree":1, "nthread":10, "eta":0.3, "max_depth":30, "seed":2018, 
	"colsample_bytree":1, "subsample":1, "min_child_weight":10, "tree_method":"exact"}

trainX = X[:tr, :]
trainy = y[:tr]
dtest = xgb.DMatrix(X[tr:, :])

dtrain1 = xgb.DMatrix(trainX, label = trainy)
bst1 = xgb.train(param, dtrain1, num_boost_round = 10)
pred1 = bst1.predict(dtest)

newX = X - 10
trainX_new = newX[:tr, :]
dtest_new = xgb.DMatrix(newX[tr:, :])

dtrain2 = xgb.DMatrix(trainX_new, label = trainy)
bst2 = xgb.train(param, dtrain2, num_boost_round = 10)
pred2 = bst2.predict(dtest_new)

np.cov(pred1 - pred2)
# array(0.12094639)

@trivialfis
Copy link
Member

trivialfis commented Jan 2, 2019

@joegaotao Beats me. I tested with multiple configurations including different number of rows and different transformations. The situation in multiplication is much better than addition. And the problem in GPU side seems even worse.

My guess is the funny floating point issue. But will look more closely, debugging an issue only occurs with more than 10000 lines of data is quite messy...

@joegaotao
Copy link
Author

joegaotao commented Jan 2, 2019

@trivialfis I also wondered it's the floating issue for boosting iteration, because when setting smaller max_depth and nrounds, it's hard to reproduce the difference. But I feel it's weird that X location shift or scale change has effect on the tree split even in tree_method=exact

@khotilov
Copy link
Member

khotilov commented Jan 2, 2019

While theoretically there must be the invariance, the scaling is not precise because of the finite float precision. As for the shifts, keep in mind that the number of unique possible float numbers within a unit interval decreases significantly when shifting away from [0,1). E.g., few float numbers survive the following roundtrip:

In [51]: np.random.seed(111)
    ...: X = np.random.uniform(0, 1, 1000000).astype('float32')
    ...: X_10 = X - 10
    ...: X_ = X_10 + 10
    ...: (X == X_).sum()
    ...:
    ...:
Out[51]: 41725

With a sizeable datasample and such deep trees with hundreds of splits, the chances that one would hit some imprecision points after the transformation in one or a few splits get higher. While I'm not ruling out any potential causes within the xgboost that might also contribute to what we see here, but it very much seems to me as a floating point issue.

@joegaotao
Copy link
Author

@khotilov I think maybe you are right because float32 will change slightly the order due to the precision, especially small interval.

In [19]: X = np.random.uniform(0, 1, 100000).astype("float64") 
    ...: newX = X - 10 
    ...:  
    ...: X = X.astype("float32") 
    ...: newX = newX.astype("float32") 
    ...: (np.argsort(newX) == np.argsort(X)).sum() 
    ...:  
    ...:                                                                                                                                                                                                         
Out[19]: 95265

@trivialfis
Copy link
Member

@joegaotao I looked into gpu hist and gpu exact and I still think floating point is the culprit here.

@lock lock bot locked as resolved and limited conversation to collaborators Apr 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants