Inverted validation curve #51

aabk-bkaa · 2020-08-25T08:55:22Z

After fitting our model it appears that our validation curve is inverted:

The validation RMSE is systematically lower than the training RMSE which does not make intuitive sense to us.

The modelling was produced with the following code:

`
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3, random_state=1)

lambdas = np.logspace(0, 8, 12)

folds = KFold(n_splits = 5)
MSE_list =[]

for _lambda in tqdm(lambdas):
pipe_preproc = make_pipeline(PolynomialFeatures(2),StandardScaler(),
Lasso(alpha = _lambda, max_iter = 1000))
MSE_train = []
MSE_list_intermediate = []

for train_index, val_index in tqdm(folds.split(X_train,y_train)):
    
    X_tr, y_tr = X_train.iloc[train_index], y_train.iloc[train_index]
    X_val, y_val = X_train.iloc[val_index], y_train.iloc[val_index]

    MSE_list_intermediate.append(mse(y_val,pipe_preproc.fit(X_tr,y_tr).predict(X_val))**(1/2))
    
    MSE_train.append(mse(y_train,pipe_preproc.fit(X_tr,y_tr).predict(X_train))**(1/2))

MSE_list.append([_lambda] + MSE_list_intermediate + [np.mean(MSE_list_intermediate)] + [np.mean(MSE_train)])

MSE = pd.DataFrame(MSE_list)
MSE.columns = ["Lambda", "Fold 1", "Fold 2","Fold 3","Fold 4","Fold 5","Mean_RMSE", "Mean_RMSE_Evaluation"]

MSE.to_excel("LASSO_output.xlsx")
`

Can anybody help us.

Kind regards Anton and Søren

The text was updated successfully, but these errors were encountered:

jsr-p · 2020-08-25T09:08:56Z

hi @aabk-bkaa,
assuming that you did not plot the data and label the curves incorrectly, there could be other reasons for the RMSE being lower on the validation data than on the training data.
See:
https://stats.stackexchange.com/questions/187335/validation-error-less-than-training-error

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inverted validation curve #51

Inverted validation curve #51

aabk-bkaa commented Aug 25, 2020 •

edited

Loading

jsr-p commented Aug 25, 2020

Inverted validation curve #51

Inverted validation curve #51

Comments

aabk-bkaa commented Aug 25, 2020 • edited Loading

jsr-p commented Aug 25, 2020

aabk-bkaa commented Aug 25, 2020 •

edited

Loading