Do I need to scale/normalize the features(input variables #193

ferrenlove · 2024-02-22T01:45:57Z

ferrenlove
Feb 22, 2024

I am a data scientist who does some research without AB test. I am trying double ML in R and have a few questions:

Do I need to scale/normalize the features(input variables)? Because I noticed after I scale the features, the coefficient will be different in the direction of >0 or <0.
If the input variables are highly correlated, which one is better when I use the learner? Example : I have below two learners and both can be used on the same treatment variable but the results of coefficient are totally different. One coeff is negative and another one is positive. And p value are both significant and CI are both narrow.

XGBoost

learner_xgb <- lrn("regr.xgboost", objective = "reg:squarederror")
dml_plr_xgb <- DoubleMLPLR$new(data_ml, learner_xgb, learner_xgb)
dml_plr_xgb$fit()

GLMNET

learner_glmnet <- lrn("regr.cv_glmnet")
dml_plr_glmnet <- DoubleMLPLR$new(data_ml, learner_glmnet, learner_glmnet)
dml_plr_glmnet$fit()

Hope these questions make sense to you!
Thank you!

PhilippBach · 2024-02-29T09:08:28Z

PhilippBach
Feb 29, 2024
Maintainer

Dear @ferrenlove ,

thanks for opening this discussion. Generally, you don't have to normalize the data before using DoubleML. Many learners (like cv.glmnet() or it's mlr3 interface) normalize the data internally.
One example is the CV Lasso in Python (scikit-learn) and R (glmnet), where glmnet normalizes by default but scikit-learn doesn't. You can have a look at the learners in the 401 k example with the Python package and the R package. In Python, we had to define a pipeline that implements the normalization for lasso.

Regarding your second question: I think it's very difficult to give general advice on this. I'd recommend you to have a look at the predictive performance of each of these learners, i.e., the first-stage prediction errors for the nuisance components ml_l and ml_m. Our experience tells us that the choice and the performance of the learners plays an important role for the resulting causal estimate. See for example our WP here for more information: https://arxiv.org/abs/2402.04674

Maybe that helps you to better understand which learner works well and which doesn't in your setting. Also you might consider the hyperparameter choice of your learner. Whereas the CV Lasso adjusts the value for $\lambda$ internally, you'd have to tune the parameters for the gradient boosting learner. We experienced that the default parameters of XGBoost do not seem appropriate in several settings and tuning changes XGBoost's performance pretty much.

I hope that helps you. Feel free to give an update on your causal modeling task :)

0 replies

ferrenlove · 2024-05-16T22:03:16Z

ferrenlove
May 16, 2024
Author

Hi PhilippBac: Thank you for your reply! It is super helpful! I did check your training material, but it seems I only have time for Europe and Asia sessions, not for Pacific time. Will there be any recordings available? Regarding the contents, will you cover something like causal forest? I am interested in that as well. Secondly, thanks for the detailed explanation. I have a follow-up question. I found that using ml_l = lm, lasso, or random forest gives me relatively low MAE (or RMSE). However, ml_l = xgb usually produces a model that is about 10%-15% larger than other models. For the second step, with ml_m = xgb, it produces the largest treatment effect coefficient, which can be a few times that from ml_l = lm, lasso, or random forest. What is your opinion?

…

On Thu, Feb 29, 2024 at 1:08 AM PhilippBach ***@***.***> wrote: Dear @ferrenlove <https://github.com/ferrenlove> , thanks for opening this discussion. Generally, you don't have to normalize the data before using DoubleML. Many learners (like cv.glmnet() or it's mlr3 interface) normalize the data internally. One example is the CV Lasso in Python (scikit-learn) and R (glmnet), where glmnet normalizes by default but scikit-learn doesn't. You can have a look at the learners in the 401 k example with the Python package <https://docs.doubleml.org/stable/examples/py_double_ml_pension.html> and the R package <https://docs.doubleml.org/stable/examples/R_double_ml_pension.html>. In Python, we had to define a pipeline that implements the normalization for lasso. Regarding your second question: I think it's very difficult to give general advice on this. I'd recommend you to have a look at the predictive performance of each of these learners, i.e., the first-stage prediction errors for the nuisance components ml_l and ml_m. Our experience tells us that the choice and the performance of the learners plays an important role for the resulting causal estimate. See for example our WP here for more information: https://arxiv.org/abs/2402.04674 Maybe that helps you to better understand which learner works well and which doesn't in your setting. Also you might consider the hyperparameter choice of your learner. Whereas the CV Lasso adjusts the value for $\lambda$ internally, you'd have to tune the parameters for the gradient boosting learner. We experienced that the default parameters of XGBoost do not seem appropriate in several settings and tuning changes XGBoost's performance pretty much. I hope that helps you. Feel free to give an update on your causal modeling task :) — Reply to this email directly, view it on GitHub <#193 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHMDKLHLF54MV2X6GO6C7KLYV3XZRAVCNFSM6AAAAABDUCUDQ6VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DMMRYGA4DQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Sincerely, Xue Liu (Sheryl)

0 replies

PhilippBach · 2024-05-17T07:24:24Z

PhilippBach
May 17, 2024
Maintainer

Hi @ferrenlove ,

thanks for following up on this...

Regarding the estimation with XGBoost etc. I'm not sure if I fully understand. Do you refer to a 10%-15% larger coefficient or a 10%-15% larger RMSE for the predictive tasks? Generally, I'd recommend you to double check the predictive performance, e.g. by repeating estimation several times (e.g. setting n_rep and n_folds to a larger value) and then see by how much the predictive performance of all the learner changes and also how that impacts the coefficient estimate. At the end, I'd go for the best out-of-sample prediction (ideally in terms of the combined prediction loss) and report additional results on the sensitivity of the estimates in an additional figure/table. Also you can benchmark the ML-based estimates against the case of simply using linear/logistic regression

Regarding the trainings: We generally don't record the sessions... Sorry to hear that you won't be able to join the pacific-time training in June. Hopefully, it works on another occasion. We do not cover causal forests in much detail. But we do have a session on heterogeneous treatment effects and also some materials on the relation of DML for CATES & GATES to causal forests.

I hope this helps!

Best,

Philipp

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do I need to scale/normalize the features(input variables #193

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Do I need to scale/normalize the features(input variables #193

ferrenlove Feb 22, 2024

XGBoost

GLMNET

Replies: 3 comments

PhilippBach Feb 29, 2024 Maintainer

ferrenlove May 16, 2024 Author

PhilippBach May 17, 2024 Maintainer

ferrenlove
Feb 22, 2024

PhilippBach
Feb 29, 2024
Maintainer

ferrenlove
May 16, 2024
Author

PhilippBach
May 17, 2024
Maintainer