Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backward compatibility with v.1.7.6 #9624

Closed
iftg opened this issue Oct 3, 2023 · 7 comments
Closed

Backward compatibility with v.1.7.6 #9624

iftg opened this issue Oct 3, 2023 · 7 comments

Comments

@iftg
Copy link

iftg commented Oct 3, 2023

Hi,
When training the model using v.2.0.0 I'm getting substantially different results from v.1.7.6. Could you please clarify why this may be happening even though my code remains untouched?
I'm using:
import xgboost as xgb
paramIn = {
'disable_default_eval_metric' : False,
'objective' : 'reg:squarederror',
'eval_metric' : 'rmse',
'max_depth': 3,
'base_score' : 0.,
'max_leaves' : 0,
'min_child_weight' : 1,
'max_delta_step' : 0,
'subsample' : 1,
'colsample_bytree' : 1,
'lambda' : 0,
'alpha' : 0,
'eta' : 0.05,
'gamma' : 0
}

dtrain = xgb.DMatrix(X, feature_names=feature_names, label=y, nthread=-1) #X is a TxN numpy array,
#y is a Tx1 numpy array
evals_result = {}
bst = xgb.train(paramIn,
dtrain,
num_boost_round=500,
early_stopping_rounds=1000, #i.e. no early stopping
obj=custom_obj,
custom_metric=custom_eval,
evals=[(dtrain, 'train')],
evals_result=evals_result,
verbose_eval=False)

@xbanke
Copy link

xbanke commented Oct 4, 2023

I met the similar problem. When I training model with rank:pairwise, the result is quite different from the previous version. It looks like that the base_score in 1.7.6 is 0.5 by default, however it is 0.0 in 2.0.0. In the document of 2.0.0, it says rank:pairwise: Use LambdaRank to perform pair-wise ranking using the ranknet objective. but rank:pairwise: Use LambdaMART to perform pairwise ranking where the pairwise loss is minimized in 1.7.0

@hcho3
Copy link
Collaborator

hcho3 commented Oct 4, 2023

Some defaults have changed in the 2.0 version.

  • The tree_method parameter now defaults to hist. Previously (in 1.7) it was defaulted to approx or exact.
  • The base_score parameter is no longer set to 0.5. When unspecified, base_score is now estimated from the input labels.
  • Learning-to-rank algorithm has a brand new implementation. We chose to re-implement the learning-to-rank for a number of reasons: 1) Better alignment with current academic literature of learning-to-rank; 2) Previous implementation of learning-to-rank was non-deterministic (multiple runs would yield different results); 3) We wanted to support unbiased learning-to-rank for a biased data source such as click data.

See the Release Note for the full list of changes.

In general, developers of XGBoost do not guarantee that different versions of XGBoost would behave identically. (Making such guarantee would prevent us from making necessary improvements.) Instead, we make the following guarantees:

  • Reproducible execution defined as follows: if you run the same training script on the same machine with the same version of XGBoost, you will get identical results.
  • You can train a model with a previous version of XGBoost and save the model. Later versions of XGBoost can load the saved model and produce identical predictions as the previous version.
  • Any changes will be documented in the Release note.

However, if you observe significant degradation of model accuracy or training taking longer time, please file a new GitHub issue.

@iftg
Copy link
Author

iftg commented Oct 4, 2023

Thank you for the explanation. In my case 1.7.6 uses 'tree_method' : 'exact', and this was the only source of the difference. I explicitly specify base_score, so this point is irrelevant in my case. I do not think I use learning-to-rank as I am running a regression ('objective' : 'reg:squarederror').

@iftg
Copy link
Author

iftg commented Oct 4, 2023

Ah, and BTW, for my problem 'exact' works better than 'hist', hands down.

@iftg iftg closed this as completed Oct 5, 2023
@nomagic
Copy link

nomagic commented Oct 31, 2023

is base_score := F0 (in e.g. the Friedman paper)?

@iftg
Copy link
Author

iftg commented Oct 31, 2023

I am explicitly using 0 in my problem. However, in other applications other settings may work better.

@mvalley21
Copy link

Adding to this: setting base_score to 0.5 in xgboost 2.x resolved the differences I saw between 2.x and 1.7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants
@nomagic @hcho3 @xbanke @iftg @mvalley21 and others