Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results depend on feature scaling #4189

Closed
Denisevi4 opened this issue Feb 27, 2019 · 1 comment
Closed

Results depend on feature scaling #4189

Denisevi4 opened this issue Feb 27, 2019 · 1 comment

Comments

@Denisevi4
Copy link

Denisevi4 commented Feb 27, 2019

Is there a reason why output of xgboost would be dependent on scaling features?

I don't have any NANs. All features are numeric. If I scale all features by some number, say 100.0, I get slightly different output from xgboost. I tried both exact and hist methods.

I tried xgboost 0.81 and 0.72.

I see that split conditions are slightly different. They start similar, but at the end trees diverge.

Foe example, here is first tree output:
{ "nodeid": 0, "depth": 0, "split": "f7", "split_condition": 0.0167161003, "yes": 1, "no": 2, "missing": 1, "children": [
{ "nodeid": 1, "depth": 1, "split": "f7", "split_condition": 0.0104448413, "yes": 3, "no": 4, "missing": 3, "children": [
{ "nodeid": 3, "depth": 2, "split": "f196", "split_condition": 0.258707613, "yes": 7, "no": 8, "missing": 7, "children": [
{ "nodeid": 7, "leaf": -0.00222015986 },
{ "nodeid": 8, "leaf": 0.011970697 }
]},
{ "nodeid": 4, "depth": 2, "split": "f195", "split_condition": 0.481782585, "yes": 9, "no": 10, "missing": 9, "children": [
{ "nodeid": 9, "leaf": -0.0198136196 },
{ "nodeid": 10, "leaf": -0.00636780029 }
]}
]},
{ "nodeid": 2, "depth": 1, "split": "f17", "split_condition": 0.00645940099, "yes": 5, "no": 6, "missing": 5, "children": [
{ "nodeid": 5, "depth": 2, "split": "f199", "split_condition": 0.48925361, "yes": 11, "no": 12, "missing": 11, "children": [
{ "nodeid": 11, "leaf": -0.035415493 },
{ "nodeid": 12, "leaf": 0.0152317053 }
]},
{ "nodeid": 6, "depth": 2, "split": "f195", "split_condition": 1.06237507, "yes": 13, "no": 14, "missing": 13, "children": [
{ "nodeid": 13, "leaf": -0.0514464751 },
{ "nodeid": 14, "leaf": 0.00441155536 }
]}
]}
]}

After scaling all features by 100.0, I get this tree:
{ "nodeid": 0, "depth": 0, "split": "f7", "split_condition": 1.67161012, "yes": 1, "no": 2, "missing": 1, "children": [
{ "nodeid": 1, "depth": 1, "split": "f7", "split_condition": 1.04448414, "yes": 3, "no": 4, "missing": 3, "children": [
{ "nodeid": 3, "depth": 2, "split": "f196", "split_condition": 25.8707619, "yes": 7, "no": 8, "missing": 7, "children": [
{ "nodeid": 7, "leaf": -0.00222015986 },
{ "nodeid": 8, "leaf": 0.011970697 }
]},
{ "nodeid": 4, "depth": 2, "split": "f195", "split_condition": 48.1782608, "yes": 9, "no": 10, "missing": 9, "children": [
{ "nodeid": 9, "leaf": -0.0198136196 },
{ "nodeid": 10, "leaf": -0.00636780029 }
]}
]},
{ "nodeid": 2, "depth": 1, "split": "f17", "split_condition": 0.645940125, "yes": 5, "no": 6, "missing": 5, "children": [
{ "nodeid": 5, "depth": 2, "split": "f199", "split_condition": 48.9253616, "yes": 11, "no": 12, "missing": 11, "children": [
{ "nodeid": 11, "leaf": -0.035415493 },
{ "nodeid": 12, "leaf": 0.0152317053 }
]},
{ "nodeid": 6, "depth": 2, "split": "f195", "split_condition": 106.237503, "yes": 13, "no": 14, "missing": 13, "children": [
{ "nodeid": 13, "leaf": -0.0514464751 },
{ "nodeid": 14, "leaf": 0.00441155536 }
]}
]}
]}

@trivialfis
Copy link
Member

@Denisevi4 We had seen this a few times before, see #4017 for example, one another recent discovery is #4069. Our drawn conclusion is transformation can cause precision loss.

@hcho3 hcho3 closed this as completed Mar 13, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jun 11, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants