You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there a reason why output of xgboost would be dependent on scaling features?
I don't have any NANs. All features are numeric. If I scale all features by some number, say 100.0, I get slightly different output from xgboost. I tried both exact and hist methods.
I tried xgboost 0.81 and 0.72.
I see that split conditions are slightly different. They start similar, but at the end trees diverge.
@Denisevi4 We had seen this a few times before, see #4017 for example, one another recent discovery is #4069. Our drawn conclusion is transformation can cause precision loss.
Is there a reason why output of xgboost would be dependent on scaling features?
I don't have any NANs. All features are numeric. If I scale all features by some number, say 100.0, I get slightly different output from xgboost. I tried both exact and hist methods.
I tried xgboost 0.81 and 0.72.
I see that split conditions are slightly different. They start similar, but at the end trees diverge.
Foe example, here is first tree output:
{ "nodeid": 0, "depth": 0, "split": "f7", "split_condition": 0.0167161003, "yes": 1, "no": 2, "missing": 1, "children": [
{ "nodeid": 1, "depth": 1, "split": "f7", "split_condition": 0.0104448413, "yes": 3, "no": 4, "missing": 3, "children": [
{ "nodeid": 3, "depth": 2, "split": "f196", "split_condition": 0.258707613, "yes": 7, "no": 8, "missing": 7, "children": [
{ "nodeid": 7, "leaf": -0.00222015986 },
{ "nodeid": 8, "leaf": 0.011970697 }
]},
{ "nodeid": 4, "depth": 2, "split": "f195", "split_condition": 0.481782585, "yes": 9, "no": 10, "missing": 9, "children": [
{ "nodeid": 9, "leaf": -0.0198136196 },
{ "nodeid": 10, "leaf": -0.00636780029 }
]}
]},
{ "nodeid": 2, "depth": 1, "split": "f17", "split_condition": 0.00645940099, "yes": 5, "no": 6, "missing": 5, "children": [
{ "nodeid": 5, "depth": 2, "split": "f199", "split_condition": 0.48925361, "yes": 11, "no": 12, "missing": 11, "children": [
{ "nodeid": 11, "leaf": -0.035415493 },
{ "nodeid": 12, "leaf": 0.0152317053 }
]},
{ "nodeid": 6, "depth": 2, "split": "f195", "split_condition": 1.06237507, "yes": 13, "no": 14, "missing": 13, "children": [
{ "nodeid": 13, "leaf": -0.0514464751 },
{ "nodeid": 14, "leaf": 0.00441155536 }
]}
]}
]}
After scaling all features by 100.0, I get this tree:
{ "nodeid": 0, "depth": 0, "split": "f7", "split_condition": 1.67161012, "yes": 1, "no": 2, "missing": 1, "children": [
{ "nodeid": 1, "depth": 1, "split": "f7", "split_condition": 1.04448414, "yes": 3, "no": 4, "missing": 3, "children": [
{ "nodeid": 3, "depth": 2, "split": "f196", "split_condition": 25.8707619, "yes": 7, "no": 8, "missing": 7, "children": [
{ "nodeid": 7, "leaf": -0.00222015986 },
{ "nodeid": 8, "leaf": 0.011970697 }
]},
{ "nodeid": 4, "depth": 2, "split": "f195", "split_condition": 48.1782608, "yes": 9, "no": 10, "missing": 9, "children": [
{ "nodeid": 9, "leaf": -0.0198136196 },
{ "nodeid": 10, "leaf": -0.00636780029 }
]}
]},
{ "nodeid": 2, "depth": 1, "split": "f17", "split_condition": 0.645940125, "yes": 5, "no": 6, "missing": 5, "children": [
{ "nodeid": 5, "depth": 2, "split": "f199", "split_condition": 48.9253616, "yes": 11, "no": 12, "missing": 11, "children": [
{ "nodeid": 11, "leaf": -0.035415493 },
{ "nodeid": 12, "leaf": 0.0152317053 }
]},
{ "nodeid": 6, "depth": 2, "split": "f195", "split_condition": 106.237503, "yes": 13, "no": 14, "missing": 13, "children": [
{ "nodeid": 13, "leaf": -0.0514464751 },
{ "nodeid": 14, "leaf": 0.00441155536 }
]}
]}
]}
The text was updated successfully, but these errors were encountered: