Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XGBoost Python API ignoring random state #10997

Closed
mbarsacchi opened this issue Nov 13, 2024 · 2 comments · Fixed by #10998
Closed

XGBoost Python API ignoring random state #10997

mbarsacchi opened this issue Nov 13, 2024 · 2 comments · Fixed by #10998

Comments

@mbarsacchi
Copy link

mbarsacchi commented Nov 13, 2024

Hello!
I've noticed that, since version 2.1.0, Xgboost Python API has been ignoring the random seed, causing the results to be deterministic (even if colsample_by_node is applied - this issue does not apply to subsample).

You can reproduce by running this

from numpy import loadtxt
from xgboost import XGBClassifier


# data available https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")


X = dataset[:,0:8]
Y = dataset[:,8]

model1 = XGBClassifier(n_estimators=2, learning_rate=0.04, max_depth=9,tree_method="hist", colsample_bynode=0.5, random_state=42)
model1.fit(X, Y)

model2 = XGBClassifier(n_estimators=2, learning_rate=0.04, max_depth=9,tree_method="hist", colsample_bynode=0.5, random_state=43)
model2.fit(X, Y)

assert list(model1.feature_importances_) != list(model2.feature_importances_)

The feature importances differ on version 2.0.3, but they do not on 2.1.0. No matter what random_state I use, I get the same results.

I've managed to narrow it down to this specific commit: fedd967

Checking out the commit before (178cfe7) and compiling still works as intended (two importances are different). Once you check out the commit (linked before - and any of the following one) and compile, the assert fails.


This is the process I followed every time:

git checkout <commit-hash>
git submodule update
cmake -B build -S . -GNinja
cmake --build build
# Install as editable installation
cd ./python-package
pip install -e .
@trivialfis
Copy link
Member

Thank you for sharing, will look into it. Perhaps the sampler is initialized before the global rng

@trivialfis
Copy link
Member

trivialfis commented Nov 14, 2024

Opened a PR for the fix: #10998 Will be part of the next patch release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants