Results depend on feature scaling #4189

Denisevi4 · 2019-02-27T15:09:47Z

Is there a reason why output of xgboost would be dependent on scaling features?

I don't have any NANs. All features are numeric. If I scale all features by some number, say 100.0, I get slightly different output from xgboost. I tried both exact and hist methods.

I tried xgboost 0.81 and 0.72.

I see that split conditions are slightly different. They start similar, but at the end trees diverge.

Foe example, here is first tree output:
{ "nodeid": 0, "depth": 0, "split": "f7", "split_condition": 0.0167161003, "yes": 1, "no": 2, "missing": 1, "children": [
{ "nodeid": 1, "depth": 1, "split": "f7", "split_condition": 0.0104448413, "yes": 3, "no": 4, "missing": 3, "children": [
{ "nodeid": 3, "depth": 2, "split": "f196", "split_condition": 0.258707613, "yes": 7, "no": 8, "missing": 7, "children": [
{ "nodeid": 7, "leaf": -0.00222015986 },
{ "nodeid": 8, "leaf": 0.011970697 }
]},
{ "nodeid": 4, "depth": 2, "split": "f195", "split_condition": 0.481782585, "yes": 9, "no": 10, "missing": 9, "children": [
{ "nodeid": 9, "leaf": -0.0198136196 },
{ "nodeid": 10, "leaf": -0.00636780029 }
]}
]},
{ "nodeid": 2, "depth": 1, "split": "f17", "split_condition": 0.00645940099, "yes": 5, "no": 6, "missing": 5, "children": [
{ "nodeid": 5, "depth": 2, "split": "f199", "split_condition": 0.48925361, "yes": 11, "no": 12, "missing": 11, "children": [
{ "nodeid": 11, "leaf": -0.035415493 },
{ "nodeid": 12, "leaf": 0.0152317053 }
]},
{ "nodeid": 6, "depth": 2, "split": "f195", "split_condition": 1.06237507, "yes": 13, "no": 14, "missing": 13, "children": [
{ "nodeid": 13, "leaf": -0.0514464751 },
{ "nodeid": 14, "leaf": 0.00441155536 }
]}
]}
]}

After scaling all features by 100.0, I get this tree:
{ "nodeid": 0, "depth": 0, "split": "f7", "split_condition": 1.67161012, "yes": 1, "no": 2, "missing": 1, "children": [
{ "nodeid": 1, "depth": 1, "split": "f7", "split_condition": 1.04448414, "yes": 3, "no": 4, "missing": 3, "children": [
{ "nodeid": 3, "depth": 2, "split": "f196", "split_condition": 25.8707619, "yes": 7, "no": 8, "missing": 7, "children": [
{ "nodeid": 7, "leaf": -0.00222015986 },
{ "nodeid": 8, "leaf": 0.011970697 }
]},
{ "nodeid": 4, "depth": 2, "split": "f195", "split_condition": 48.1782608, "yes": 9, "no": 10, "missing": 9, "children": [
{ "nodeid": 9, "leaf": -0.0198136196 },
{ "nodeid": 10, "leaf": -0.00636780029 }
]}
]},
{ "nodeid": 2, "depth": 1, "split": "f17", "split_condition": 0.645940125, "yes": 5, "no": 6, "missing": 5, "children": [
{ "nodeid": 5, "depth": 2, "split": "f199", "split_condition": 48.9253616, "yes": 11, "no": 12, "missing": 11, "children": [
{ "nodeid": 11, "leaf": -0.035415493 },
{ "nodeid": 12, "leaf": 0.0152317053 }
]},
{ "nodeid": 6, "depth": 2, "split": "f195", "split_condition": 106.237503, "yes": 13, "no": 14, "missing": 13, "children": [
{ "nodeid": 13, "leaf": -0.0514464751 },
{ "nodeid": 14, "leaf": 0.00441155536 }
]}
]}
]}

trivialfis · 2019-02-28T09:55:10Z

@Denisevi4 We had seen this a few times before, see #4017 for example, one another recent discovery is #4069. Our drawn conclusion is transformation can cause precision loss.

hcho3 closed this as completed Mar 13, 2019

lock bot locked as resolved and limited conversation to collaborators Jun 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Results depend on feature scaling #4189

Results depend on feature scaling #4189

Denisevi4 commented Feb 27, 2019 •

edited

Loading

trivialfis commented Feb 28, 2019

Results depend on feature scaling #4189

Results depend on feature scaling #4189

Comments

Denisevi4 commented Feb 27, 2019 • edited Loading

trivialfis commented Feb 28, 2019

Denisevi4 commented Feb 27, 2019 •

edited

Loading