XGBoost has trouble modeling multiplication/division #4069

ledmaster · 2019-01-20T15:45:22Z

Hi

I am using Python 3.6 and XGBoost version 0.81. When I try a simple experiment, creating a matrix X of numbers between 1 and -1, and then Y = X1 * X2 or Y = X1 / X2, xgboost can't learn and predicts a constant number.

Now, if I add gaussian noise, it can model the function:

I tried changing the range, tuning hyperparameters, base_score, using the native xgb.train vs XGBoostRegressor, but couldn't make it learn.

Is this a known issue? Do you know why it happens?

Thanks

trivialfis · 2019-01-21T13:21:52Z

@ledmaster I tried to generate the following dataset, is it the right one?

x = np.random.rand(64, 2)
x = x * 2 - 1.0
y_true = x[:, 0] * x[:, 1]

Following above script:

dtrain = xgb.DMatrix(x, label=y_true)

params = {
    'tree_method': 'gpu_hist'
}

bst = xgb.train(params, dtrain, evals=[(dtrain, "train")], num_boost_round=10)

y_pred = bst.predict(dtrain)
# Z = pred_y

X = x[:, 0]
Y = x[:, 1]

fig = plt.figure(figsize=plt.figaspect(2.))
ax = fig.add_subplot(2, 1, 1, projection='3d')
ax.plot_trisurf(X, Y, y_pred, cmap='viridis')

ax = fig.add_subplot(2, 1, 2, projection='3d')
ax.plot_trisurf(X, Y, y_true, cmap='viridis')

plt.show()

I got:

Seems pretty reasonable. Did I generate the wrong dataset?

ledmaster · 2019-01-21T14:29:07Z

@trivialfis
This is the code I used:

size = 10000
X = np.zeros((size, 2))

Z = np.meshgrid(np.linspace(-1,1, 100), np.linspace(-1,1, 100))

X[:, 0] = Z[0].flatten()
X[:, 1] = Z[1].flatten()

y_mul = X[:,0] * X[:, 1]
y_div = X[:,0] / X[:, 1]

ops = [('MULTIPLICATION', y_mul), ('DIVISION', y_div)]
for name, op in ops:
    fig = plt.figure(figsize=(15,10))
    ax = fig.gca(projection='3d')
    ax.set_title(name)
    ax.plot_trisurf(X[:, 0], X[:, 1], op, cmap=plt.cm.viridis, linewidth=0.2)
    #plt.show()
    plt.savefig("{}.jpg".format(name))

ops = [('MULTIPLICATION', y_mul), ('DIVISION', y_div)]
for name, op in ops:
    mdl = xgb.XGBRegressor()
    mdl.fit(X, op)
  
    fig = plt.figure(figsize=(15,10))
    ax = fig.gca(projection='3d')
    ax.set_title("{} - NOISE = 0".format(name))
    ax.plot_trisurf(X[:, 0], X[:, 1], mdl.predict(X), cmap=plt.cm.viridis, linewidth=0.2)
    #plt.show()
    plt.savefig("{}_noise0.jpg".format(name))

Figure for the non predict plot:

trivialfis · 2019-01-21T18:18:19Z

I lose. Adding small normal noise (np.random.randn()) to label everything works, but without such noise xgboost just jump into some sort of local minimum.

        noise = np.random.randn(size)
        noise = noise / 1000
        y = y + noise  # somehow this helps XGBoost leave the local minimum
        dtrain = xgb.DMatrix(X, label=y)

I'm not sure if this is a bug. XGBoost relies on greedy algorithm after all. Would love to hear some other opinions. @RAMitchell

khotilov · 2019-01-22T04:17:37Z

I agree with @trivialfis - it's not a bug. But it's a nice example of data with "perfect symmetry" with unstable balance. With such perfect dataset, when the algorithm is looking for a split, say in variable X1, the sums of residuals at each x1 location of it WRT X2 are always zero, thus it cannot find any split and is only able to approximate the total average. Any random disturbance, e.g., some noise or subsample=0.8 would help to kick it off the equilibrium and to start learning.

trivialfis · 2019-01-22T06:18:56Z

@khotilov That's a very interesting example. I wouldn't come up with it myself. Maybe we can document it in a tutorial?

hcho3 · 2019-01-22T07:13:39Z

@ledmaster Thanks for the article: http://mariofilho.com/can-gradient-boosting-learn-simple-arithmetic/

ledmaster · 2019-01-22T11:50:41Z

Thanks @khotilov and @trivialfis for the answers and investigation.

As @hcho3 cited above, I wrote an article about GBMs and arithmetic operations, so this is why I ended up finding the issue. I added your answer there. Feel free to link to it.

RAMitchell · 2019-01-22T20:44:08Z

Decision trees can only look at one feature at a time. In your example if I take any 1d feature range such as 0.25 < f0 < 0.5 and average all the samples in that slice I suspect you will always get exactly 0 due to the symmetry of your problem. So xgboost cant find anything when it looks from the perspective of a single feature.

khotilov closed this as completed Jan 22, 2019

trivialfis reopened this Jan 22, 2019

trivialfis mentioned this issue Feb 28, 2019

Results depend on feature scaling #4189

Closed

hcho3 mentioned this issue Mar 8, 2019

Document limitation of one-split-at-a-time Greedy tree learning heuristic #4233

Merged

hcho3 closed this as completed in #4233 Mar 8, 2019

lock bot locked as resolved and limited conversation to collaborators Jun 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XGBoost has trouble modeling multiplication/division #4069

XGBoost has trouble modeling multiplication/division #4069

ledmaster commented Jan 20, 2019

trivialfis commented Jan 21, 2019

ledmaster commented Jan 21, 2019 •

edited

Loading

trivialfis commented Jan 21, 2019 •

edited

Loading

khotilov commented Jan 22, 2019

trivialfis commented Jan 22, 2019

hcho3 commented Jan 22, 2019

ledmaster commented Jan 22, 2019

RAMitchell commented Jan 22, 2019

XGBoost has trouble modeling multiplication/division #4069

XGBoost has trouble modeling multiplication/division #4069

Comments

ledmaster commented Jan 20, 2019

trivialfis commented Jan 21, 2019

ledmaster commented Jan 21, 2019 • edited Loading

trivialfis commented Jan 21, 2019 • edited Loading

khotilov commented Jan 22, 2019

trivialfis commented Jan 22, 2019

hcho3 commented Jan 22, 2019

ledmaster commented Jan 22, 2019

RAMitchell commented Jan 22, 2019

ledmaster commented Jan 21, 2019 •

edited

Loading

trivialfis commented Jan 21, 2019 •

edited

Loading