Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sklearn's GradientBoostingRegressor not yielding the correct results #80

Closed
viggotw opened this issue Jun 19, 2022 · 2 comments
Closed
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@viggotw
Copy link

viggotw commented Jun 19, 2022

Hi!
I have an issue with the GradientBoostingRegressor not outputing the expcted values after being formulated as part of my pyomo model. The deviation seems to be a static offset, not random.

I'm pretty new to both pyomo and omlt, so this might be human error. I've also been looking for a list of supported models without luck, but I guess that could also be the issue.

I've attached an MRE below:

import numpy as np
from sklearn.ensemble import GradientBoostingRegressor
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
import pyomo.environ as pyo
from omlt import OmltBlock
from omlt.gbt import GBTBigMFormulation, GradientBoostedTreeModel
import matplotlib.pyplot as plt

# Train simple model
x = np.array(range(101)).reshape(-1,1)
y = np.array(range(101))

model_gbt = GradientBoostingRegressor(n_estimators=5)
model_gbt.fit(x, y)

# Get predictions directly from model
y_pred = model_gbt.predict(np.array(range(100)).reshape(-1,1))

# Get predictions from pyomo formulation
y_opt = []
for i in range(100):
    initial_type = [('float_input', FloatTensorType([None, 1]))]
    model_onx = convert_sklearn(model_gbt, initial_types=initial_type)

    m = pyo.ConcreteModel('Random GradientBoostingRegressor')

    m.gbt = OmltBlock()
    input_bounds = {0: (i, i)}  # Forces the input to equal the input from the previous prediction
    gbt_model = GradientBoostedTreeModel(model_onx, scaled_input_bounds=input_bounds)

    formulation = GBTBigMFormulation(gbt_model)
    m.gbt.build_formulation(formulation)

    m.obj = pyo.Objective(expr=0)

    solver = pyo.SolverFactory('cbc')
    status = solver.solve(m, tee=False)

    y_opt.append(m.gbt.outputs[0].value)

# Plot predictions
plt.plot(y_pred)
plt.plot(y_opt)
plt.title('Predictions')
plt.legend(['Original model', 'Pyomo formulation'])

Output
image

@ThebTron
Copy link
Collaborator

Hi Viggo,

Thanks a lot for raising this issue! I think the problem is related to the base score of the gradient-boosted trees (GBRT) training. For more information https://stackoverflow.com/questions/47596486/xgboost-the-meaning-of-the-base-score-parameter

We need to read this constant value from the ONNX representation and add it to the output. I'll try to look at this in the coming days if I have some time.
In the meantime, it seems that most GBRT algorithms use the mean of the training data as the base score. So for a quick fix, adding the mean to the output of the block should give the correct result.

import numpy as np
from sklearn.ensemble import GradientBoostingRegressor
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
import pyomo.environ as pyo
from omlt import OmltBlock
from omlt.gbt import GBTBigMFormulation, GradientBoostedTreeModel
import matplotlib.pyplot as plt

# Train simple model
x = np.array(range(101)).reshape(-1,1)
y = np.array(range(101))

model_gbt = GradientBoostingRegressor(n_estimators=5)
model_gbt.fit(x, y)

# Get predictions directly from model
y_pred = model_gbt.predict(np.array(range(100)).reshape(-1,1))

# Get predictions from pyomo formulation
y_opt = []
for i in range(100):
    initial_type = [('float_input', FloatTensorType([None, 1]))]
    model_onx = convert_sklearn(model_gbt, initial_types=initial_type)

    m = pyo.ConcreteModel('Random GradientBoostingRegressor')

    m.gbt = OmltBlock()
    input_bounds = {0: (i, i)}  # Forces the input to equal the input from the previous prediction
    gbt_model = GradientBoostedTreeModel(model_onx, scaled_input_bounds=input_bounds)

    formulation = GBTBigMFormulation(gbt_model)
    m.gbt.build_formulation(formulation)

    m.obj = pyo.Objective(expr=0)

    solver = pyo.SolverFactory('cbc')
    status = solver.solve(m, tee=False)

    y_opt.append(m.gbt.outputs[0].value + np.mean(y))

# Plot predictions
plt.plot(y_pred)
plt.plot(y_opt)
plt.title('Predictions')
plt.legend(['Original model', 'Pyomo formulation'])

Output
output

@ThebTron ThebTron added bug Something isn't working good first issue Good for newcomers labels Jun 20, 2022
@rmisener
Copy link
Member

rmisener commented Jul 6, 2022

@ThebTron fixed the issue. The problem is that each of the different libraries (lightgbm, scikit-learn, etc.) manage the constant value slightly differently. For example, lightgbm just adds the constant value to each node of the first tree whereas scikit-learn has an additional constant base score that is added to the prediction of the model.

We may see issues like this in the future for other tree libraries, it seems that each of them pass the constant value to ONNX slightly differently. ONNX tries to stay loyal to the formulations of the individual libraries and therefore the representations are all a bit different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants