Tree is not input scale invariant for simple X transformation? #4017

joegaotao · 2018-12-23T13:56:01Z

Theoretically, tree is invariant for X simple transformation, such as "a * X - b". However, I do some simple tests, and I surprisingly found different version xgboost has different odd behavior, simple transformation will lead to different results. Here is the R code:

xgboost 0.71.2, change X to X - 8

library(xgboost)
set.seed(111)
N <- 80000
p <- 50
X <- matrix(runif(N * p, 0, 1), ncol = p)
colnames(X) <- paste0("x", 1:p)
beta <- runif(p)
y <- X %*% beta #+ rnorm(N, mean = 0, sd  = 0.1)

tr <- sample.int(N, N * 0.75)

###

param1 <- list(nrounds = 10, num_parallel_tree = 1, nthread = 10L, eta = 0.3, max_depth = 30,
  seed = 2018, colsample_bytree = 1, subsample = 1,  min_child_weight = 10,
  tree_method = "exact")
param1$data <- X[tr,]
param1$label <- y[tr]

set.seed(2019)
bst1 <- do.call(xgboost::xgboost, param1)
test_pred1 <- predict(bst1, newdata = X[-tr,])

newX <- X  - 8


param2 <- list(nrounds = 10, num_parallel_tree = 1, nthread = 10L, eta = 0.3, max_depth = 30,
  seed = 2018, colsample_bytree = 1, subsample = 1,  min_child_weight = 10,
  tree_method = "exact")
param2$data <- newX[tr,]
param2$label <- y[tr]

set.seed(2019)
bst2 <- do.call(xgboost::xgboost, param2)
test_pred2 <- predict(bst2, newdata = newX[-tr,])

summary(test_pred1 - test_pred2)
#     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
#-1.784631 -0.316670 -0.001692  0.002795  0.321040  1.831196

R sessionInfo()

> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so

locale:
 [1] LC_CTYPE=C                 LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8
 [8] LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] xgboost_0.71.2

loaded via a namespace (and not attached):
[1] compiler_3.5.2    magrittr_1.5      Matrix_1.2-15     tools_3.5.2       stringi_1.2.4     grid_3.5.2        data.table_1.11.8 lattice_0.20-38

xgboost from master compilation, 0.81.0.1, change X - 8 to X - 1 or X / 10

library(xgboost)
set.seed(111)
N <- 80000
p <- 50
X <- matrix(runif(N * p, 0, 1), ncol = p)
colnames(X) <- paste0("x", 1:p)
beta <- runif(p)
y <- X %*% beta #+ rnorm(N, mean = 0, sd  = 0.1)

tr <- sample.int(N, N * 0.75)

###

param1 <- list(nrounds = 10, num_parallel_tree = 1, nthread = 10L, eta = 0.3, max_depth = 30,
  seed = 2018, colsample_bytree = 1, subsample = 1,  min_child_weight = 10,
  tree_method = "exact")
param1$data <- X[tr,]
param1$label <- y[tr]

set.seed(2019)
bst1 <- do.call(xgboost::xgboost, param1)
test_pred1 <- predict(bst1, newdata = X[-tr,])

newX <- X  - 1


param2 <- list(nrounds = 10, num_parallel_tree = 1, nthread = 10L, eta = 0.3, max_depth = 30,
  seed = 2018, colsample_bytree = 1, subsample = 1,  min_child_weight = 10,
  tree_method = "exact")
param2$data <- newX[tr,]
param2$label <- y[tr]

set.seed(2019)
bst2 <- do.call(xgboost::xgboost, param2)
test_pred2 <- predict(bst2, newdata = newX[-tr,])

summary(test_pred1 - test_pred2)
#     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
#-0.714748 -0.109097 -0.003238 -0.002930  0.105858  0.726057

The text was updated successfully, but these errors were encountered:

trivialfis · 2018-12-27T17:05:02Z

I will try to reproduce it in Python when time allows.

joegaotao · 2018-12-28T04:02:31Z

I test in python 3.5.2 with xgboost 0.81

import xgboost as xgb
import numpy as np

np.random.seed(111)
N = 80000
p = 50
X = np.random.uniform(0, 1, (N, p))
beta = np.random.uniform(0, 1, p)
y = X.dot(beta)

tr = int(N * 0.75)

trainX = X[:tr, :]
trainy = y[:tr]
dtest = xgb.DMatrix(X[tr:, :])

param = {"num_parallel_tree":1, "nthread":10, "eta":0.3, "max_depth":30, "seed":2018, 
	"colsample_bytree":1, "subsample":1, "min_child_weight":10, "tree_method":"exact"}

dtrain1 = xgb.DMatrix(trainX, label = trainy)
bst1 = xgb.train(param, dtrain1, num_boost_round = 10)
pred1 = bst1.predict(dtest)

trainX_new = trainX / 10
dtrain2 = xgb.DMatrix(trainX_new, label = trainy)
bst2 = xgb.train(param, dtrain2, num_boost_round = 10)
pred2 = bst2.predict(dtest)

np.cov(pred1 - pred2)
array(0.56649683)

trivialfis · 2018-12-28T08:45:27Z

@joegaotao Thanks! Will look into it this weekend. :)

trivialfis · 2018-12-28T09:33:09Z

@joegaotao You need to scale the prediction dataset.

trivialfis · 2018-12-28T09:35:29Z

@joegaotao
Here:
pred2 = bst2.predict(dtest)
dtest needs to be scaled accordingly.

joegaotao · 2019-01-01T16:27:45Z

@trivialfis sorry, I make a mistake. I did some tests again, X - 10

import xgboost as xgb
import numpy as np

np.random.seed(111)
N = 80000
p = 50
X = np.random.uniform(0, 1, (N, p))
beta = np.random.uniform(0, 1, p)
y = X.dot(beta)

tr = int(N * 0.75)

param = {"num_parallel_tree":1, "nthread":10, "eta":0.3, "max_depth":30, "seed":2018, 
	"colsample_bytree":1, "subsample":1, "min_child_weight":10, "tree_method":"exact"}

trainX = X[:tr, :]
trainy = y[:tr]
dtest = xgb.DMatrix(X[tr:, :])

dtrain1 = xgb.DMatrix(trainX, label = trainy)
bst1 = xgb.train(param, dtrain1, num_boost_round = 10)
pred1 = bst1.predict(dtest)

newX = X - 10
trainX_new = newX[:tr, :]
dtest_new = xgb.DMatrix(newX[tr:, :])

dtrain2 = xgb.DMatrix(trainX_new, label = trainy)
bst2 = xgb.train(param, dtrain2, num_boost_round = 10)
pred2 = bst2.predict(dtest_new)

np.cov(pred1 - pred2)
# array(0.12094639)

trivialfis · 2019-01-02T03:40:14Z

@joegaotao Beats me. I tested with multiple configurations including different number of rows and different transformations. The situation in multiplication is much better than addition. And the problem in GPU side seems even worse.

My guess is the funny floating point issue. But will look more closely, debugging an issue only occurs with more than 10000 lines of data is quite messy...

joegaotao · 2019-01-02T06:50:45Z

@trivialfis I also wondered it's the floating issue for boosting iteration, because when setting smaller max_depth and nrounds, it's hard to reproduce the difference. But I feel it's weird that X location shift or scale change has effect on the tree split even in tree_method=exact

khotilov · 2019-01-02T07:41:05Z

While theoretically there must be the invariance, the scaling is not precise because of the finite float precision. As for the shifts, keep in mind that the number of unique possible float numbers within a unit interval decreases significantly when shifting away from [0,1). E.g., few float numbers survive the following roundtrip:

In [51]: np.random.seed(111)
    ...: X = np.random.uniform(0, 1, 1000000).astype('float32')
    ...: X_10 = X - 10
    ...: X_ = X_10 + 10
    ...: (X == X_).sum()
    ...:
    ...:
Out[51]: 41725

With a sizeable datasample and such deep trees with hundreds of splits, the chances that one would hit some imprecision points after the transformation in one or a few splits get higher. While I'm not ruling out any potential causes within the xgboost that might also contribute to what we see here, but it very much seems to me as a floating point issue.

joegaotao · 2019-01-03T02:25:14Z

@khotilov I think maybe you are right because float32 will change slightly the order due to the precision, especially small interval.

In [19]: X = np.random.uniform(0, 1, 100000).astype("float64") 
    ...: newX = X - 10 
    ...:  
    ...: X = X.astype("float32") 
    ...: newX = newX.astype("float32") 
    ...: (np.argsort(newX) == np.argsort(X)).sum() 
    ...:  
    ...:                                                                                                                                                                                                         
Out[19]: 95265

trivialfis · 2019-01-03T13:41:54Z

@joegaotao I looked into gpu hist and gpu exact and I still think floating point is the culprit here.

joegaotao changed the title ~~Tree is not invariant for simple X transformation?~~ Tree is not input scale invariant for simple X transformation? Dec 24, 2018

joegaotao closed this as completed Jan 4, 2019

trivialfis mentioned this issue Feb 28, 2019

Results depend on feature scaling #4189

Closed

lock bot locked as resolved and limited conversation to collaborators Apr 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tree is not input scale invariant for simple X transformation? #4017

Tree is not input scale invariant for simple X transformation? #4017

joegaotao commented Dec 23, 2018

trivialfis commented Dec 27, 2018

joegaotao commented Dec 28, 2018

trivialfis commented Dec 28, 2018 •

edited

Loading

trivialfis commented Dec 28, 2018

trivialfis commented Dec 28, 2018

joegaotao commented Jan 1, 2019

trivialfis commented Jan 2, 2019 •

edited

Loading

joegaotao commented Jan 2, 2019 •

edited

Loading

khotilov commented Jan 2, 2019 •

edited

Loading

joegaotao commented Jan 3, 2019

trivialfis commented Jan 3, 2019

Tree is not input scale invariant for simple X transformation? #4017

Tree is not input scale invariant for simple X transformation? #4017

Comments

joegaotao commented Dec 23, 2018

trivialfis commented Dec 27, 2018

joegaotao commented Dec 28, 2018

trivialfis commented Dec 28, 2018 • edited Loading

trivialfis commented Dec 28, 2018

trivialfis commented Dec 28, 2018

joegaotao commented Jan 1, 2019

trivialfis commented Jan 2, 2019 • edited Loading

joegaotao commented Jan 2, 2019 • edited Loading

khotilov commented Jan 2, 2019 • edited Loading

joegaotao commented Jan 3, 2019

trivialfis commented Jan 3, 2019

trivialfis commented Dec 28, 2018 •

edited

Loading

trivialfis commented Jan 2, 2019 •

edited

Loading

joegaotao commented Jan 2, 2019 •

edited

Loading

khotilov commented Jan 2, 2019 •

edited

Loading