Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running fit method on LGBMRegressor kills Jupyter Kernel #4301

Closed
VadimLopatin opened this issue May 19, 2021 · 16 comments · Fixed by #4325
Closed

Running fit method on LGBMRegressor kills Jupyter Kernel #4301

VadimLopatin opened this issue May 19, 2021 · 16 comments · Fixed by #4325
Labels

Comments

@VadimLopatin
Copy link

VadimLopatin commented May 19, 2021

Hello team,

I'm facing a weird behavior of LGBMRegressor. Sometimes it causes Jupyter Kernel to die on some datasets.

I was able to minimize the test case.

The code itself is pretty simple:
import pandas as pd
import numpy as np
import lightgbm
test = pd.read_pickle('weird.pkl')
lightgbm.LGBMRegressor().fit(test.drop(columns=['y']), test['y'])

In order to reproduce you will need to launch it for the specific dataset:
weird.zip

versions:
pandas - '1.1.5', numpy - '1.19.5', lightgbm - '3.2.1'
I was able to reproduce on Windows and on Ubuntu

Command(s) you used to install LightGBM
pip install lighgbm

Could you please advise how would it be possible to tackle this point?

Thank you!

@jameslamb jameslamb added the bug label May 19, 2021
@jameslamb
Copy link
Collaborator

Thanks for using LightGBM! We will look at this as soon as possible.

Are you able to try this example while monitoring memory usage, maybe with a tool like htop? That would help us to understand whether the kernel is dying because training is requesting more memory than available.

@VadimLopatin
Copy link
Author

Hi @jameslamb ,

Thank you for the feedback.

In fact I've checked top/htop and didn't spot anything suspicious. Moreover, usually, when I have a memory issue system unambiguously throws MemoryError without killing Jupyter Kernel. It doesn't seem to be the case this time.

We have the hypothesis that the issue is caused by the big number of duplicated rows in the dataset. There're about 15k duplicates while the total number of rows is around 16k.

Anyway, I've tried to add try/except block and catch LightGBMError but with no luck.

Best Regards,
Vadim Lopatin

Thank you.

@jameslamb
Copy link
Collaborator

Ok, I pulled this today and was able to reproduce the issue. This is all I see in my Jupyter logs, and not output is printed in the notebook itself.

[I 2021-05-20 20:47:03.852 ServerApp] Saving file at /Untitled1.ipynb
double free or corruption (!prev)
[I 2021-05-20 20:47:32.333 ServerApp] AsyncIOLoopKernelRestarter: restarting kernel (1/5), keep random ports
kernel 5cf6c697-99f7-40ca-8dea-1e4e8656f813 restarted
kernel 5cf6c697-99f7-40ca-8dea-1e4e8656f813 restarted
[I 2021-05-20 20:47:32.358 ServerApp] Starting buffering for 5cf6c697-99f7-40ca-8dea-1e4e8656f813:4f8387c6-fa72-4445-b288-b90cb9ba7574
[I 2021-05-20 20:47:32.384 ServerApp] Restoring connection for 5cf6c697-99f7-40ca-8dea-1e4e8656f813:4f8387c6-fa72-4445-b288-b90cb9ba75

I reproduced this with the following code, run in a notebook created from the steps at https://github.com/jameslamb/lightgbm-dask-testing/blob/ededb8491c999fa4d9babfbb6749c8e399da2b76/README.md (based on the daskdev/dask-notebook:latest image).

import lightgbm
import numpy as np
import pandas as pd
import requests
import zipfile
from io import BytesIO

data_url = "https://github.com/microsoft/LightGBM/files/6508547/weird.zip"

zipdata = BytesIO()
zipdata.write(requests.get(data_url, headers={"Accept": "application/octet-stream"}).content)
zip_contents = zipfile.ZipFile(zipdata)
data_file = zip_contents.open('weird.pkl')

test = pd.read_pickle(data_file)

lightgbm.LGBMRegressor().fit(test.drop(columns=['y']), test['y'])

I installed lightgbm 3.2.1 from source with the following commands

git fetch --tags
git checkout v3.2.1
cd python-package
python setup.py install

We will look into this!

Here are a few theories I tested:

"Maybe the error is happening in Dataset construction" --> probably not

I tried constructing an lgb.Dataset object to see if we could simplify this example further (to narrow down the possibilities), but that ran successfully.

ds = lightgbm.Dataset(
    data=test.drop(columns=['y']),
    label=test['y']
)
ds.construct()

"Maybe the error is specific to regression" --> no

I also tried switching the problem to classification, to see if we could narrow the problem down to code paths related to regression specifically. That ended up killing the kernel exactly like the LGBMRegressor example did.

lightgbm.LGBMClassifier(verbose=1).fit(
    test.drop(columns=['y']),
    test['y'] > np.median(test['y'])
)

"Maybe the error is specific to GBDT boosting" --> no

I tried switching to random forest boosting...that ended up killing the kernel exactly like the original example code did.

lightgbm.LGBMRegressor(verbose=1, boosting='rf', bagging_freq=1, bagging_fraction=0.5).fit(
    test.drop(columns=['y']),
    test['y']
)

Tonight or tomorrow, I'll try adding more debugging log statements to the library to see if we can narrow this down.

@jameslamb
Copy link
Collaborator

I just tried these examples on latest master (f076ca5) and go the same results.

@jameslamb
Copy link
Collaborator

Some observations about the data that might help in debugging.

  1. most features have very few (<=6) unique values. For features with 2 unique values, at most 1 split will be possible.
test.nunique()

image

  1. y=0.0 for almost 98% of the observations
np.sum(test["y"] == 0.0) / float(test.shape[0])

image

@jameslamb jameslamb mentioned this issue May 20, 2021
21 tasks
@VadimLopatin
Copy link
Author

Hi @jameslamb,

Thank you for your thorough analysis.

Some observations from my side:

  1. Adding random feature leads to no crash
    test['x8'] = np.random.randn(test.shape[0])

  2. Adding constant feature leads to the same crash
    test['x8'] = 1

  3. Even though 'x7' feature has a lot of non-unique values, removing it from the dataset leads to no crash
    test = test[['x1', 'x2', 'x3', 'x4', 'x5', 'x6','y']]

  4. As I've mentioned previously, there're a lot of duplicated entries
    test.duplicated().sum() will return 15259 while the total number of entries is 16269

I've cut the dataset
test = test.iloc[:8000,:]

there're still a lot of duplicates
test.duplicated().sum() shows 7359 duplicates out of 8000 entries

However no crash.

So this is a very strange behavior, but indeed I also have a strong smell of "split problem".

@VadimLopatin
Copy link
Author

Hi @jameslamb ,

I've also noticed that the issue is not specific for Jupyter Notebook.

I put everything in Weird.py file

import pandas as pd
import numpy as np
import lightgbm
import sys

test = pd.read_pickle('weird.pkl')
print('before')
lightgbm.LGBMRegressor().fit(test.drop(columns=['y']), test['y'])
print('after')
sys.exit(0)

Then I've launched it via command line
python Weird.py

And my program ended up with only "before" being printed out.

Thanks!

@jameslamb
Copy link
Collaborator

jameslamb commented May 25, 2021

I've tried a few more things, and have narrowed this down a bit further but I'm still not sure exactly where the problem is.

I create a branch with a LOT more logging, on my fork. You can see the logs I've added at https://github.com/microsoft/LightGBM/compare/master...jameslamb:louder-logs?expand=1.

Given this script (notice I've added verbose=1)...

# test.py
import zipfile
from io import BytesIO

import lightgbm
import pandas as pd
import requests

data_url = "https://github.com/microsoft/LightGBM/files/6508547/weird.zip"

zipdata = BytesIO()
zipdata.write(requests.get(data_url, headers={"Accept": "application/octet-stream"}).content)
zip_contents = zipfile.ZipFile(zipdata)
data_file = zip_contents.open("weird.pkl")

test = pd.read_pickle(data_file)

lightgbm.LGBMRegressor(verbose=1).fit(test.drop(columns=["y"]), test["y"])

and LightGBM built from that branch I linked to above, I see several different outcomes running python test.py.

Outcome 1: best split results in a leaf node with 0 records

Sometimes, that scripts ends with this error.

[LightGBM] [Info] Start training from score -1432.250421
[LightGBM] [Fatal] Check failed: (best_split_info.left_count) > (0) at /Users/jlamb/repos/lightgbm-dask-testing/LightGBM/python-package/compile/src/treelearner/serial_tree_learner.cpp, line 655 .

Traceback (most recent call last):
File "test.py", line 17, in
lightgbm.LGBMRegressor(verbose=1).fit(test.drop(columns=["y"]), test["y"])
File "/Users/jlamb/miniconda3/lib/python3.8/site-packages/lightgbm/sklearn.py", line 842, in fit
super().fit(X, y, sample_weight=sample_weight, init_score=init_score,
File "/Users/jlamb/miniconda3/lib/python3.8/site-packages/lightgbm/sklearn.py", line 705, in fit
self._Booster = train(params, train_set,
File "/Users/jlamb/miniconda3/lib/python3.8/site-packages/lightgbm/engine.py", line 253, in train
booster.update(fobj=fobj)
File "/Users/jlamb/miniconda3/lib/python3.8/site-packages/lightgbm/basic.py", line 2644, in update
_safe_call(_LIB.LGBM_BoosterUpdateOneIter(
File "/Users/jlamb/miniconda3/lib/python3.8/site-packages/lightgbm/basic.py", line 110, in _safe_call
raise LightGBMError(_LIB.LGBM_GetLastError().decode('utf-8'))
lightgbm.basic.LightGBMError: Check failed: (best_split_info.left_count) > (0) at /Users/jlamb/repos/lightgbm-dask-testing/LightGBM/python-package/compile/src/treelearner/serial_tree_learner.cpp, line 655 .

which comes from

CHECK_GT(best_split_info.left_count, 0);

full log
LGBMModel.fit() - begin
LGBMModel.fit() - line 574
LGBMModel.fit() - line 576
LGBMModel.fit() - line 607
LGBMModel.fit() - line 610
LGBMModel.fit() - line 631
LGBMModel.fit() - line 638
LGBMModel.fit() - line 648
LGBMModel.fit() - line 658
LGBMModel.fit() - line 661
LGBMModel.fit() - line 704
[LightGBM] [Info] LGBM_DatasetCreateFromMat()
[LightGBM] [Info] LGBM_DatasetCreateFromMats() - begin
[LightGBM] [Info] line 1113
[LightGBM] [Info] line 1115
[LightGBM] [Info] line 1117
[LightGBM] [Info] line 1119
[LightGBM] [Info] line 1125
[LightGBM] [Info] line 1127
[LightGBM] [Info] line 1129
[LightGBM] [Info] line 1133
[LightGBM] [Info] line 1136
[LightGBM] [Info] RowFunctionFromDenseMatric() - begin
[LightGBM] [Info] RowFunctionFromDenseMatric() - float64
[LightGBM] [Info] RowFunctionFromDenseMatric_helper() - begin
[LightGBM] [Info] RowFunctionFromDenseMatric_helper() - is_row_major
[LightGBM] [Info] line 1140
[LightGBM] [Info] line 1143
[LightGBM] [Info] line 1146
[LightGBM] [Info] line 1148
[LightGBM] [Info] line 1150
[LightGBM] [Info] line 1152
[LightGBM] [Info] line 1154
[LightGBM] [Info] line 1156
[LightGBM] [Info] line 1193
[LightGBM] [Info] line 1209
[LightGBM] [Info] line 1211
[LightGBM] [Info] LGBM_DatasetCreateFromMats() - end
[LightGBM] [Info] LGBM_DatasetSetField() - begin
[LightGBM] [Info] LGBM_DatasetSetField() - end
[LightGBM] [Info] LGBM_DatasetGetField() - begin
[LightGBM] [Info] LGBM_DatasetGetField() - end
[LightGBM] [Info] LGBM_DatasetGetNumFeature() - begin
[LightGBM] [Info] LGBM_DatasetGetNumFeature() - end
[LightGBM] [Info] LGBM_DatasetSetFeatureNames() - begin
[LightGBM] [Info] LGBM_DatasetSetFeatureNames() - end
[LightGBM] [Info] LGBM_BoosterCreate() - begin
[LightGBM] [Info] Booster::Booster() - begin
[LightGBM] [Info] Booster::CreateObjectiveAndMetrics() - begin
[LightGBM] [Info] Booster::CreateObjectiveAndMetrics() - end
[LightGBM] [Info] SerialTreeLearner::Init() - begin
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001183 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 309
[LightGBM] [Info] Number of data points in the train set: 16269, number of used features: 7
[LightGBM] [Info] SerialTreeLearner::Init() - end
[LightGBM] [Info] Booster::Booster() - end
[LightGBM] [Info] LGBM_BoosterCreate() - end
[LightGBM] [Info] LGBM_BoosterGetNumClasses() - begin
[LightGBM] [Info] LGBM_BoosterGetNumClasses() - end
[LightGBM] [Info] LGBM_BoosterGetEvalCounts() - begin
[LightGBM] [Info] Booster::GetEvalCounts() - begin
[LightGBM] [Info] Booster::GetEvalCounts() - end
[LightGBM] [Info] LGBM_BoosterGetEvalCounts() - end
[LightGBM] [Info] LGBM_BoosterGetEvalNames() - begin
[LightGBM] [Info] Booster::GetEvalNames() - begin
[LightGBM] [Info] Booster::GetEvalNames() - end
[LightGBM] [Info] LGBM_BoosterGetEvalNames() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] Start training from score -1432.250421
[LightGBM] [Fatal] Check failed: (best_split_info.left_count) > (0) at /Users/jlamb/repos/lightgbm-dask-testing/LightGBM/python-package/compile/src/treelearner/serial_tree_learner.cpp, line 655 .

Traceback (most recent call last):
  File "test.py", line 17, in <module>
    lightgbm.LGBMRegressor(verbose=1).fit(test.drop(columns=["y"]), test["y"])
  File "/Users/jlamb/miniconda3/lib/python3.8/site-packages/lightgbm/sklearn.py", line 842, in fit
    super().fit(X, y, sample_weight=sample_weight, init_score=init_score,
  File "/Users/jlamb/miniconda3/lib/python3.8/site-packages/lightgbm/sklearn.py", line 705, in fit
    self._Booster = train(params, train_set,
  File "/Users/jlamb/miniconda3/lib/python3.8/site-packages/lightgbm/engine.py", line 253, in train
    booster.update(fobj=fobj)
  File "/Users/jlamb/miniconda3/lib/python3.8/site-packages/lightgbm/basic.py", line 2644, in update
    _safe_call(_LIB.LGBM_BoosterUpdateOneIter(
  File "/Users/jlamb/miniconda3/lib/python3.8/site-packages/lightgbm/basic.py", line 110, in _safe_call
    raise LightGBMError(_LIB.LGBM_GetLastError().decode('utf-8'))
lightgbm.basic.LightGBMError: Check failed: (best_split_info.left_count) > (0) at /Users/jlamb/repos/lightgbm-dask-testing/LightGBM/python-package/compile/src/treelearner/serial_tree_learner.cpp, line 655 .

[LightGBM] [Info] LGBM_BoosterFree() - begin
[LightGBM] [Info] LGBM_BoosterFree() - end
[LightGBM] [Info] LGBM_DatasetFree() - begin
[LightGBM] [Info] LGBM_DatasetFree() - end

Outcome 2: training succeeds

Sometimes, training succeeds without error.

full log
LGBMModel.fit() - begin
LGBMModel.fit() - line 574
LGBMModel.fit() - line 576
LGBMModel.fit() - line 607
LGBMModel.fit() - line 610
LGBMModel.fit() - line 631
LGBMModel.fit() - line 638
LGBMModel.fit() - line 648
LGBMModel.fit() - line 658
LGBMModel.fit() - line 661
LGBMModel.fit() - line 704
[LightGBM] [Info] LGBM_DatasetCreateFromMat()
[LightGBM] [Info] LGBM_DatasetCreateFromMats() - begin
[LightGBM] [Info] line 1113
[LightGBM] [Info] line 1115
[LightGBM] [Info] line 1117
[LightGBM] [Info] line 1119
[LightGBM] [Info] line 1125
[LightGBM] [Info] line 1127
[LightGBM] [Info] line 1129
[LightGBM] [Info] line 1133
[LightGBM] [Info] line 1136
[LightGBM] [Info] RowFunctionFromDenseMatric() - begin
[LightGBM] [Info] RowFunctionFromDenseMatric() - float64
[LightGBM] [Info] RowFunctionFromDenseMatric_helper() - begin
[LightGBM] [Info] RowFunctionFromDenseMatric_helper() - is_row_major
[LightGBM] [Info] line 1140
[LightGBM] [Info] line 1143
[LightGBM] [Info] line 1146
[LightGBM] [Info] line 1148
[LightGBM] [Info] line 1150
[LightGBM] [Info] line 1152
[LightGBM] [Info] line 1154
[LightGBM] [Info] line 1156
[LightGBM] [Info] line 1193
[LightGBM] [Info] line 1209
[LightGBM] [Info] line 1211
[LightGBM] [Info] LGBM_DatasetCreateFromMats() - end
[LightGBM] [Info] LGBM_DatasetSetField() - begin
[LightGBM] [Info] LGBM_DatasetSetField() - end
[LightGBM] [Info] LGBM_DatasetGetField() - begin
[LightGBM] [Info] LGBM_DatasetGetField() - end
[LightGBM] [Info] LGBM_DatasetGetNumFeature() - begin
[LightGBM] [Info] LGBM_DatasetGetNumFeature() - end
[LightGBM] [Info] LGBM_DatasetSetFeatureNames() - begin
[LightGBM] [Info] LGBM_DatasetSetFeatureNames() - end
[LightGBM] [Info] LGBM_BoosterCreate() - begin
[LightGBM] [Info] Booster::Booster() - begin
[LightGBM] [Info] Booster::CreateObjectiveAndMetrics() - begin
[LightGBM] [Info] Booster::CreateObjectiveAndMetrics() - end
[LightGBM] [Info] SerialTreeLearner::Init() - begin
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001381 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 309
[LightGBM] [Info] Number of data points in the train set: 16269, number of used features: 7
[LightGBM] [Info] SerialTreeLearner::Init() - end
[LightGBM] [Info] Booster::Booster() - end
[LightGBM] [Info] LGBM_BoosterCreate() - end
[LightGBM] [Info] LGBM_BoosterGetNumClasses() - begin
[LightGBM] [Info] LGBM_BoosterGetNumClasses() - end
[LightGBM] [Info] LGBM_BoosterGetEvalCounts() - begin
[LightGBM] [Info] Booster::GetEvalCounts() - begin
[LightGBM] [Info] Booster::GetEvalCounts() - end
[LightGBM] [Info] LGBM_BoosterGetEvalCounts() - end
[LightGBM] [Info] LGBM_BoosterGetEvalNames() - begin
[LightGBM] [Info] Booster::GetEvalNames() - begin
[LightGBM] [Info] Booster::GetEvalNames() - end
[LightGBM] [Info] LGBM_BoosterGetEvalNames() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] Start training from score -1432.250421
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - begin
[LightGBM] [Info] Booster::TrainOneIter()
[LightGBM] [Info] LGBM_BoosterUpdateOneIter() - end
[LightGBM] [Info] LGBM_BoosterSaveModelToString() - begin
[LightGBM] [Info] Booster::SaveModelToString()
[LightGBM] [Info] LGBM_BoosterSaveModelToString() - end
[LightGBM] [Info] LGBM_BoosterFree() - begin
[LightGBM] [Info] LGBM_BoosterFree() - end
[LightGBM] [Info] LGBM_BoosterLoadModelFromString() - begin
[LightGBM] [Info] Booster::LoadModelFromString() - begin
[LightGBM] [Info] Booster::LoadModelFromString() - end
[LightGBM] [Info] LGBM_BoosterLoadModelFromString() - end
[LightGBM] [Info] LGBM_BoosterGetNumClasses() - begin
[LightGBM] [Info] LGBM_BoosterGetNumClasses() - end
LGBMModel.fit() - line 711
LGBMModel.fit() - line 724
LGBMModel.fit() - line 726
[LightGBM] [Info] LGBM_DatasetFree() - begin
[LightGBM] [Info] LGBM_DatasetFree() - end
[LightGBM] [Info] LGBM_BoosterFree() - begin
[LightGBM] [Info] LGBM_BoosterFree() - end

Outcome 3: Unable to initialize a SerialTreeLearner

[LightGBM] [Info] SerialTreeLearner::Init() - begin
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.008382 seconds.
You can set force_col_wise=true to remove the overhead.
python(3997,0x10c7825c0) malloc: Incorrect checksum for freed object 0x7ff3d9e22600: probably modified after being freed.
Corrupt value: 0x0
python(3997,0x10c7825c0) malloc: *** set a breakpoint in malloc_error_break to debug
Abort trap: 6

This comes from

void SerialTreeLearner::Init(const Dataset* train_data, bool is_constant_hessian) {
train_data_ = train_data;
num_data_ = train_data_->num_data();
num_features_ = train_data_->num_features();
int max_cache_size = 0;
// Get the max size of pool
if (config_->histogram_pool_size <= 0) {
max_cache_size = config_->num_leaves;
} else {
size_t total_histogram_size = 0;
for (int i = 0; i < train_data_->num_features(); ++i) {
total_histogram_size += kHistEntrySize * train_data_->FeatureNumBin(i);
}
max_cache_size = static_cast<int>(config_->histogram_pool_size * 1024 * 1024 / total_histogram_size);
}
// at least need 2 leaves
max_cache_size = std::max(2, max_cache_size);
max_cache_size = std::min(max_cache_size, config_->num_leaves);
// push split information for all leaves
best_split_per_leaf_.resize(config_->num_leaves);
constraints_.reset(LeafConstraintsBase::Create(config_, config_->num_leaves, train_data_->num_features()));
// initialize splits for leaf
smaller_leaf_splits_.reset(new LeafSplits(train_data_->num_data(), config_));
larger_leaf_splits_.reset(new LeafSplits(train_data_->num_data(), config_));
// initialize data partition
data_partition_.reset(new DataPartition(num_data_, config_->num_leaves));
col_sampler_.SetTrainingData(train_data_);
// initialize ordered gradients and hessians
ordered_gradients_.resize(num_data_);
ordered_hessians_.resize(num_data_);
GetShareStates(train_data_, is_constant_hessian, true);
histogram_pool_.DynamicChangeSize(train_data_,
share_state_->num_hist_total_bin(),
share_state_->feature_hist_offsets(),
config_, max_cache_size, config_->num_leaves);
Log::Info("Number of data points in the train set: %d, number of used features: %d", num_data_, num_features_);
if (CostEfficientGradientBoosting::IsEnable(config_)) {
cegb_.reset(new CostEfficientGradientBoosting(this));
cegb_->Init();
}
}

full log
LGBMModel.fit() - begin
LGBMModel.fit() - line 574
LGBMModel.fit() - line 576
LGBMModel.fit() - line 607
LGBMModel.fit() - line 610
LGBMModel.fit() - line 631
LGBMModel.fit() - line 638
LGBMModel.fit() - line 648
LGBMModel.fit() - line 658
LGBMModel.fit() - line 661
LGBMModel.fit() - line 704
[LightGBM] [Info] LGBM_DatasetCreateFromMat()
[LightGBM] [Info] LGBM_DatasetCreateFromMats() - begin
[LightGBM] [Info] line 1113
[LightGBM] [Info] line 1115
[LightGBM] [Info] line 1117
[LightGBM] [Info] line 1119
[LightGBM] [Info] line 1125
[LightGBM] [Info] line 1127
[LightGBM] [Info] line 1129
[LightGBM] [Info] line 1133
[LightGBM] [Info] line 1136
[LightGBM] [Info] RowFunctionFromDenseMatric() - begin
[LightGBM] [Info] RowFunctionFromDenseMatric() - float64
[LightGBM] [Info] RowFunctionFromDenseMatric_helper() - begin
[LightGBM] [Info] RowFunctionFromDenseMatric_helper() - is_row_major
[LightGBM] [Info] line 1140
[LightGBM] [Info] line 1143
[LightGBM] [Info] line 1146
[LightGBM] [Info] line 1148
[LightGBM] [Info] line 1150
[LightGBM] [Info] line 1152
[LightGBM] [Info] line 1154
[LightGBM] [Info] line 1156
[LightGBM] [Info] line 1193
[LightGBM] [Info] line 1209
[LightGBM] [Info] line 1211
[LightGBM] [Info] LGBM_DatasetCreateFromMats() - end
[LightGBM] [Info] LGBM_DatasetSetField() - begin
[LightGBM] [Info] LGBM_DatasetSetField() - end
[LightGBM] [Info] LGBM_DatasetGetField() - begin
[LightGBM] [Info] LGBM_DatasetGetField() - end
[LightGBM] [Info] LGBM_DatasetGetNumFeature() - begin
[LightGBM] [Info] LGBM_DatasetGetNumFeature() - end
[LightGBM] [Info] LGBM_DatasetSetFeatureNames() - begin
[LightGBM] [Info] LGBM_DatasetSetFeatureNames() - end
[LightGBM] [Info] LGBM_BoosterCreate() - begin
[LightGBM] [Info] Booster::Booster() - begin
[LightGBM] [Info] Booster::CreateObjectiveAndMetrics() - begin
[LightGBM] [Info] Booster::CreateObjectiveAndMetrics() - end
[LightGBM] [Info] SerialTreeLearner::Init() - begin
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.008382 seconds.
You can set `force_col_wise=true` to remove the overhead.
python(3997,0x10c7825c0) malloc: Incorrect checksum for freed object 0x7ff3d9e22600: probably modified after being freed.
Corrupt value: 0x0
python(3997,0x10c7825c0) malloc: *** set a breakpoint in malloc_error_break to debug
Abort trap: 6

@jameslamb
Copy link
Collaborator

Ok I added some additional fine-grained logs to SerialTreeLearner::Init(), and found that when the example for this issue fails in that function, it happens here:

GetShareStates(train_data_, is_constant_hessian, true);

Tomorrow (if no one else gets to it sooner), I can try adding more logs to narrow it down further.

@jameslamb
Copy link
Collaborator

Ok, I have some more specific logs. Sometimes training fails here:

LightGBM/src/io/dataset.cpp

Lines 644 to 645 in da3465c

row_wise_state->SetMultiValBin(GetMultiBinFromAllFeatures(row_wise_offsets), num_data_,
feature_groups_, false, false);

full logs
LGBMModel.git() - begin
LGBMModel.git() - line 574
LGBMModel.git() - line 576
LGBMModel.git() - line 607
LGBMModel.git() - line 610
LGBMModel.git() - line 631
LGBMModel.git() - line 638
LGBMModel.git() - line 648
LGBMModel.git() - line 658
LGBMModel.git() - line 661
LGBMModel.git() - line 704
[LightGBM] [Info] LGBM_DatasetCreateFromMat()
[LightGBM] [Info] LGBM_DatasetCreateFromMats() - begin
[LightGBM] [Info] line 1113
[LightGBM] [Info] line 1115
[LightGBM] [Info] line 1117
[LightGBM] [Info] line 1119
[LightGBM] [Info] line 1125
[LightGBM] [Info] line 1127
[LightGBM] [Info] line 1129
[LightGBM] [Info] line 1133
[LightGBM] [Info] line 1136
[LightGBM] [Info] RowFunctionFromDenseMatric() - begin
[LightGBM] [Info] RowFunctionFromDenseMatric() - float64
[LightGBM] [Info] RowFunctionFromDenseMatric_helper() - begin
[LightGBM] [Info] RowFunctionFromDenseMatric_helper() - is_row_major
[LightGBM] [Info] line 1140
[LightGBM] [Info] line 1143
[LightGBM] [Info] line 1146
[LightGBM] [Info] line 1148
[LightGBM] [Info] line 1150
[LightGBM] [Info] line 1152
[LightGBM] [Info] line 1154
[LightGBM] [Info] line 1156
[LightGBM] [Info] line 1193
[LightGBM] [Info] line 1209
[LightGBM] [Info] line 1211
[LightGBM] [Info] LGBM_DatasetCreateFromMats() - end
[LightGBM] [Info] LGBM_DatasetSetField() - begin
[LightGBM] [Info] LGBM_DatasetSetField() - end
[LightGBM] [Info] LGBM_DatasetGetField() - begin
[LightGBM] [Info] LGBM_DatasetGetField() - end
[LightGBM] [Info] LGBM_DatasetGetNumFeature() - begin
[LightGBM] [Info] LGBM_DatasetGetNumFeature() - end
[LightGBM] [Info] LGBM_DatasetSetFeatureNames() - begin
[LightGBM] [Info] LGBM_DatasetSetFeatureNames() - end
[LightGBM] [Info] LGBM_BoosterCreate() - begin
[LightGBM] [Info] Booster::Booster() - begin
[LightGBM] [Info] Booster::CreateObjectiveAndMetrics() - begin
[LightGBM] [Info] Booster::CreateObjectiveAndMetrics() - end
[LightGBM] [Info] SerialTreeLearner::Init() - begin
[LightGBM] [Info] SerialTreeLearner::Init() - line 31
[LightGBM] [Info] SerialTreeLearner::Init() - line 33
[LightGBM] [Info] SerialTreeLearner::Init() - line 35
[LightGBM] [Info] SerialTreeLearner::Init() - line 39
[LightGBM] [Info] SerialTreeLearner::Init() - line 41
[LightGBM] [Info] SerialTreeLearner::Init() - line 55
[LightGBM] [Info] SerialTreeLearner::Init() - line 59
[LightGBM] [Info] SerialTreeLearner::Init() - line 60
[LightGBM] [Info] SerialTreeLearner::Init() - line 64
[LightGBM] [Info] SerialTreeLearner::Init() - line 66
[LightGBM] [Info] SerialTreeLearner::Init() - line 68
[LightGBM] [Info] SerialTreeLearner::Init() - line 72
[LightGBM] [Info] SerialTreeLearner::Init() - line 74
[LightGBM] [Info] SerialTreeLearner::Init() - line 77
[LightGBM] [Info] SerialTreeLearner::Init() - line 79
[LightGBM] [Info] SerialTreeLearner::GetShareStates() - line 99
[LightGBM] [Info] Dataset::GetShareStates() - line 593
[LightGBM] [Info] Dataset::GetShareStates() - line 609
[LightGBM] [Info] Dataset::GetShareStates() - line 644
[LightGBM] [Info] Dataset::GetShareStates() - line 646
[LightGBM] [Info] Dataset::GetShareStates() - line 648
[LightGBM] [Info] Dataset::GetShareStates() - line 650
[LightGBM] [Info] Dataset::GetShareStates() - line 652
[LightGBM] [Info] Dataset::GetShareStates() - line 654
[LightGBM] [Info] Dataset::GetShareStates() - line 656
[LightGBM] [Info] Dataset::GetShareStates() - line 659
[LightGBM] [Info] Dataset::GetShareStates() - line 660
[LightGBM] [Info] Dataset::GetShareStates() - line 663
[LightGBM] [Info] Dataset::GetShareStates() - line 665
[LightGBM] [Info] Dataset::GetShareStates() - line 668
[LightGBM] [Info] Dataset::GetShareStates() - line 670
[LightGBM] [Info] Dataset::GetShareStates() - line 673
[LightGBM] [Info] Dataset::GetShareStates() - line 675
[LightGBM] [Info] Dataset::GetShareStates() - line 677
Segmentation fault: 11

And sometimes it fails at

return col_wise_state.release();

[LightGBM] [Info] Dataset::GetShareStates() - line 714
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.002272 seconds.
You can set force_col_wise=true to remove the overhead.
python(5126,0x10a6875c0) malloc: Incorrect checksum for freed object 0x7f84d523da00: probably modified after being freed.
Corrupt value: 0x0
python(5126,0x10a6875c0) malloc: *** set a breakpoint in malloc_error_break to debug
Abort trap: 6

full logs
LGBMModel.git() - begin
LGBMModel.git() - line 574
LGBMModel.git() - line 576
LGBMModel.git() - line 607
LGBMModel.git() - line 610
LGBMModel.git() - line 631
LGBMModel.git() - line 638
LGBMModel.git() - line 648
LGBMModel.git() - line 658
LGBMModel.git() - line 661
LGBMModel.git() - line 704
[LightGBM] [Info] LGBM_DatasetCreateFromMat()
[LightGBM] [Info] LGBM_DatasetCreateFromMats() - begin
[LightGBM] [Info] line 1113
[LightGBM] [Info] line 1115
[LightGBM] [Info] line 1117
[LightGBM] [Info] line 1119
[LightGBM] [Info] line 1125
[LightGBM] [Info] line 1127
[LightGBM] [Info] line 1129
[LightGBM] [Info] line 1133
[LightGBM] [Info] line 1136
[LightGBM] [Info] RowFunctionFromDenseMatric() - begin
[LightGBM] [Info] RowFunctionFromDenseMatric() - float64
[LightGBM] [Info] RowFunctionFromDenseMatric_helper() - begin
[LightGBM] [Info] RowFunctionFromDenseMatric_helper() - is_row_major
[LightGBM] [Info] line 1140
[LightGBM] [Info] line 1143
[LightGBM] [Info] line 1146
[LightGBM] [Info] line 1148
[LightGBM] [Info] line 1150
[LightGBM] [Info] line 1152
[LightGBM] [Info] line 1154
[LightGBM] [Info] line 1156
[LightGBM] [Info] line 1193
[LightGBM] [Info] line 1209
[LightGBM] [Info] line 1211
[LightGBM] [Info] LGBM_DatasetCreateFromMats() - end
[LightGBM] [Info] LGBM_DatasetSetField() - begin
[LightGBM] [Info] LGBM_DatasetSetField() - end
[LightGBM] [Info] LGBM_DatasetGetField() - begin
[LightGBM] [Info] LGBM_DatasetGetField() - end
[LightGBM] [Info] LGBM_DatasetGetNumFeature() - begin
[LightGBM] [Info] LGBM_DatasetGetNumFeature() - end
[LightGBM] [Info] LGBM_DatasetSetFeatureNames() - begin
[LightGBM] [Info] LGBM_DatasetSetFeatureNames() - end
[LightGBM] [Info] LGBM_BoosterCreate() - begin
[LightGBM] [Info] Booster::Booster() - begin
[LightGBM] [Info] Booster::CreateObjectiveAndMetrics() - begin
[LightGBM] [Info] Booster::CreateObjectiveAndMetrics() - end
[LightGBM] [Info] SerialTreeLearner::Init() - begin
[LightGBM] [Info] SerialTreeLearner::Init() - line 31
[LightGBM] [Info] SerialTreeLearner::Init() - line 33
[LightGBM] [Info] SerialTreeLearner::Init() - line 35
[LightGBM] [Info] SerialTreeLearner::Init() - line 39
[LightGBM] [Info] SerialTreeLearner::Init() - line 41
[LightGBM] [Info] SerialTreeLearner::Init() - line 55
[LightGBM] [Info] SerialTreeLearner::Init() - line 59
[LightGBM] [Info] SerialTreeLearner::Init() - line 60
[LightGBM] [Info] SerialTreeLearner::Init() - line 64
[LightGBM] [Info] SerialTreeLearner::Init() - line 66
[LightGBM] [Info] SerialTreeLearner::Init() - line 68
[LightGBM] [Info] SerialTreeLearner::Init() - line 72
[LightGBM] [Info] SerialTreeLearner::Init() - line 74
[LightGBM] [Info] SerialTreeLearner::Init() - line 77
[LightGBM] [Info] SerialTreeLearner::Init() - line 79
[LightGBM] [Info] SerialTreeLearner::GetShareStates() - line 99
[LightGBM] [Info] Dataset::GetShareStates() - line 593
[LightGBM] [Info] Dataset::GetShareStates() - line 609
[LightGBM] [Info] Dataset::GetShareStates() - line 644
[LightGBM] [Info] Dataset::GetShareStates() - line 646
[LightGBM] [Info] Dataset::GetShareStates() - line 648
[LightGBM] [Info] Dataset::GetShareStates() - line 650
[LightGBM] [Info] Dataset::GetShareStates() - line 652
[LightGBM] [Info] Dataset::GetShareStates() - line 654
[LightGBM] [Info] Dataset::GetShareStates() - line 656
[LightGBM] [Info] Dataset::GetShareStates() - line 659
[LightGBM] [Info] Dataset::GetShareStates() - line 660
[LightGBM] [Info] Dataset::GetShareStates() - line 663
[LightGBM] [Info] Dataset::GetShareStates() - line 665
[LightGBM] [Info] Dataset::GetShareStates() - line 668
[LightGBM] [Info] Dataset::GetShareStates() - line 670
[LightGBM] [Info] Dataset::GetShareStates() - line 673
[LightGBM] [Info] Dataset::GetShareStates() - line 675
[LightGBM] [Info] Dataset::GetShareStates() - line 677
[LightGBM] [Info] Dataset::GetShareStates() - line 680
[LightGBM] [Info] Dataset::GetShareStates() - line 682
[LightGBM] [Info] Dataset::GetShareStates() - line 686
[LightGBM] [Info] Dataset::GetShareStates() - line 689
[LightGBM] [Info] Dataset::GetShareStates() - line 703
[LightGBM] [Info] Dataset::GetShareStates() - line 707
[LightGBM] [Info] Dataset::GetShareStates() - line 710
[LightGBM] [Info] Dataset::GetShareStates() - line 714
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.002272 seconds.
You can set `force_col_wise=true` to remove the overhead.
python(5126,0x10a6875c0) malloc: Incorrect checksum for freed object 0x7f84d523da00: probably modified after being freed.
Corrupt value: 0x0
python(5126,0x10a6875c0) malloc: *** set a breakpoint in malloc_error_break to debug
Abort trap: 6

@shiyu1994 could you look at the discussion in this issue and let me know if you have any ideas for things to test? I'm not sure what to test next.

@shiyu1994
Copy link
Collaborator

@jameslamb The error happens in SerialTreeLearner::GetShareStates, which has resulted in several previous bugs. The error happens starting from 3.0.0. I'll look into this further to fix.

@shiyu1994
Copy link
Collaborator

shiyu1994 commented May 26, 2021

Actually, though the problem occurs in SerialTreeLearner::GetShareStates, I found the root is in BinMapper::FindBin. Since the dataset contains feature values that are quite close to each other, (e.g., -0.00112051336219505895 vs -0.00112051336219505852), when calculating the upper bounds for the bins in

LightGBM/src/io/bin.cpp

Lines 146 to 152 in 346f883

auto val = Common::GetDoubleUpperBound((upper_bounds[i] + lower_bounds[i + 1]) / 2.0);
if (bin_upper_bound.empty() || !Common::CheckDoubleEqualOrdered(bin_upper_bound.back(), val)) {
bin_upper_bound.push_back(val);
}
}
// last bin upper bound
bin_upper_bound.push_back(std::numeric_limits<double>::infinity());

we are intended to find something between the two numbers upper_bounds[i]=-0.00112051336219505895 and lower_bounds[i+1]=-0.00112051336219505852, so that the two numbers will be divided into two different bins. Note that here both the upper_bounds[i] and the lower_bounds[i+1] are from the distinct feature values of a feature.

But unfortunately, due to the discretization nature of floating point numbers, after calling Common::GetDoubleUpperBound, here we actually gets val=-0.00112051336219505852, which is exactly the (slightly) greater feature value. Thus, it is possible to get an empty bin here. Because now both distinct values fall below val. (Originally, we want one below val and the other above val).

Now switch to the logic where we calculate the number of sampled data in each bin.

LightGBM/src/io/bin.cpp

Lines 411 to 418 in 346f883

cnt_in_bin.resize(num_bin_, 0);
int i_bin = 0;
for (int i = 0; i < num_distinct_values; ++i) {
if (distinct_values[i] > bin_upper_bound_[i_bin]) {
++i_bin;
}
cnt_in_bin[i_bin] += counts[i];
}

The code above by default assumes that, if distinct_values[i] > bin_upper_bound_[i_bin], then we must have distinct_values[i] <= bin_upper_bound_[i_bin + 1]. It is assuming that by incrementing i_bin by 1, we can always get to the correct bin for this distinct_values[i]. But as we mentioned above, since it is possible to get some empty bins. This assumption is wrong. So we may get a wrong count in cnt_in_bin.

So far, it seems the problem is not very serious, because cnt_bin_bin is just used for calculating the feature sparse rate, and determining the most_freq_bin_.

LightGBM/src/io/bin.cpp

Lines 506 to 507 in 346f883

most_freq_bin_ =
static_cast<uint32_t>(ArrayArgs<int>::ArgMax(cnt_in_bin));

But, if we wrongly calculated the most_freq_bin_, even if only differ by 1, troubles may happen. Because the sparse rate is used to estimate the number of feature bins other than most_freq_bin_ (estimate_total_entries) in the whole training data,

LightGBM/src/io/bin.cpp

Lines 696 to 735 in 346f883

MultiValBin* MultiValBin::CreateMultiValSparseBin(data_size_t num_data,
int num_bin,
double estimate_element_per_row) {
size_t estimate_total_entries =
static_cast<size_t>(estimate_element_per_row * 1.1 * num_data);
if (estimate_total_entries <= std::numeric_limits<uint16_t>::max()) {
if (num_bin <= 256) {
return new MultiValSparseBin<uint16_t, uint8_t>(
num_data, num_bin, estimate_element_per_row);
} else if (num_bin <= 65536) {
return new MultiValSparseBin<uint16_t, uint16_t>(
num_data, num_bin, estimate_element_per_row);
} else {
return new MultiValSparseBin<uint16_t, uint32_t>(
num_data, num_bin, estimate_element_per_row);
}
} else if (estimate_total_entries <= std::numeric_limits<uint32_t>::max()) {
if (num_bin <= 256) {
return new MultiValSparseBin<uint32_t, uint8_t>(
num_data, num_bin, estimate_element_per_row);
} else if (num_bin <= 65536) {
return new MultiValSparseBin<uint32_t, uint16_t>(
num_data, num_bin, estimate_element_per_row);
} else {
return new MultiValSparseBin<uint32_t, uint32_t>(
num_data, num_bin, estimate_element_per_row);
}
} else {
if (num_bin <= 256) {
return new MultiValSparseBin<size_t, uint8_t>(
num_data, num_bin, estimate_element_per_row);
} else if (num_bin <= 65536) {
return new MultiValSparseBin<size_t, uint16_t>(
num_data, num_bin, estimate_element_per_row);
} else {
return new MultiValSparseBin<size_t, uint32_t>(
num_data, num_bin, estimate_element_per_row);
}
}
}

which further decides the type of the integers (uint16_t, uint32_t or uint64_t) we used in the row pointer of the sparse data representation of MultiValSparseBin. (Note that in the sparse representation, we treat most_freq_bin_ as missing to maximally reduce the memory cost.)

In the example above, the estimate_total_entries is 58554, so in fact, uint16_t would be enough. But since we wrongly decided the most_freq_bin_ of some feature, the actual feature bins other than most_freq_bin_ is larger, and exceeds the range of uint16_t. This finally results in a memory error.

This is a very raw case, it only happens when there're two extremely close feature values, and these two values happens to be near the most frequent bin. But it is still worth noticing for the robustness of LightGBM. Thanks @VadimLopatin for reporting this!

Because BinMapper::FindBin is a very fundamental method in dataset construction, and involves considering of many boundary conditions, I suggest a minimal modification to fix this, which is to replace

if (distinct_values[i] > bin_upper_bound_[i_bin]) {

with

while (distinct_values[i] > bin_upper_bound_[i_bin] && i_bin < num_bin_ - 1) {

Sorry for the long description, but I think it is valuable to make the problem clear for future reference.

@jameslamb
Copy link
Collaborator

@VadimLopatin thank you again for the bug report and reproducible example. This is now fixed on this project's master branch, and will be included in the next release.

@VadimLopatin
Copy link
Author

Hello @jameslamb , @shiyu1994 ,

Many thanks for your analysis and quick fix!

@dhirajpatra
Copy link

Visual Studio Code (1.79.2, undefined, desktop)
Jupyter Extension Version: 2023.5.1101742258.
Python Extension Version: 2023.10.1.
Platform: darwin (arm64).
Workspace folder /Desktop/python/jupyter_notebooks, Home = /Users/Admin
07:58:19.463 [info] Start refreshing Interpreter Kernel Picker (1687660099463)
07:58:19.540 [info] Using Pylance
07:58:20.915 [warn] Failed to get activated env vars for /opt/homebrew/bin/python3 in 733ms
07:58:20.918 [info] Process Execution: /opt/homebrew/bin/python3 -c "import site;print("USER_BASE_VALUE");print(site.USER_BASE);print("USER_BASE_VALUE");"
07:58:21.808 [info] Process Execution: /opt/homebrew/bin/python3 -m pip list
07:58:23.649 [info] Starting Kernel startUsingPythonInterpreter, .jvsc74a57bd0c0f14c7cf8eecca91c0ab3dfcbec784b05523d454463c7f1685f826f3896c3e8./opt/homebrew/Caskroom/miniconda/base/envs/py38/python./opt/homebrew/Caskroom/miniconda/base/envs/py38/python.-m#ipykernel_launcher (Python Path: /opt/homebrew/Caskroom/miniconda/base/envs/py38/bin/python, Conda, py38, 3.8.16) for '
/Desktop/python/jupyter_notebooks/DataScienceProjects/XGB vs LightGBM vs CatBoost/CatBoost Vs XGBoost Vs LightGBM - Part 1.ipynb' (disableUI=true)
07:58:24.615 [info] End refreshing Interpreter Kernel Picker (1687660099463)
07:58:24.736 [info] Process Execution: /opt/homebrew/Caskroom/miniconda/base/envs/py38/bin/python -m pip list
07:58:24.737 [info] Process Execution: /opt/homebrew/Caskroom/miniconda/base/envs/py38/bin/python -c "import ipykernel; print(ipykernel.version); print("5dc3a68c-e34e-4080-9c3e-2a532b2ccb4d"); print(ipykernel.file)"
07:58:24.738 [info] Process Execution: /opt/homebrew/Caskroom/miniconda/base/envs/py38/bin/python -m ipykernel_launcher --ip=127.0.0.1 --stdin=9003 --control=9001 --hb=9000 --Session.signature_scheme="hmac-sha256" --Session.key=b"84a59208-2529-411a-a140-42d2e8c559f1" --shell=9002 --transport="tcp" --iopub=9004 --f=~/Library/Jupyter/runtime/kernel-v2-16206XF4bhRPEBs3o.json
> cwd: ~/Desktop/python/jupyter_notebooks/DataScienceProjects/XGB vs LightGBM vs CatBoost
07:58:24.938 [info] ipykernel version & path 6.19.2, /opt/homebrew/Caskroom/miniconda/base/envs/py38/lib/python3.8/site-packages/ipykernel/init.py for /opt/homebrew/Caskroom/miniconda/base/envs/py38/bin/python
07:58:25.576 [warn] StdErr from Kernel Process Unable to load extension: pydevd_plugins.extensions.types.pydevd_plugin_pandas_types
07:58:25.815 [info] Started Kernel py38 (Python 3.8.16) (pid: 16359)
07:58:25.815 [info] Started new session 75416e13-d823-47f8-b83b-b75bba4500dd
07:58:25.918 [info] Process Execution: /opt/homebrew/Caskroom/miniconda/base/envs/py38/bin/python /.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/pythonFiles/printJupyterDataDir.py
07:58:25.956 [warn] Got a non-existent Jupyer Data Dir file://
/.local/share/jupyter
07:58:26.609 [info] Handle Execution of Cells 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,16 for ~/Desktop/python/jupyter_notebooks/DataScienceProjects/XGB vs LightGBM vs CatBoost/CatBoost Vs XGBoost Vs LightGBM - Part 1.ipynb
07:58:26.634 [info] Kernel acknowledged execution of cell 0 @ 1687660106618
07:58:26.648 [info] End cell 0 execution @ 1687660106632, started @ 1687660106618, elapsed time = 0.014s
07:58:26.656 [info] Kernel acknowledged execution of cell 1 @ 1687660106654
07:58:29.235 [info] End cell 1 execution @ 1687660109233, started @ 1687660106654, elapsed time = 2.579s
07:58:29.239 [info] Kernel acknowledged execution of cell 2 @ 1687660109236
07:58:29.241 [info] End cell 2 execution @ 1687660109240, started @ 1687660109236, elapsed time = 0.004s
07:58:29.244 [info] Kernel acknowledged execution of cell 3 @ 1687660109243
07:58:29.254 [info] End cell 3 execution @ 1687660109252, started @ 1687660109243, elapsed time = 0.009s
07:58:29.259 [info] Kernel acknowledged execution of cell 4 @ 1687660109255
07:58:29.262 [info] End cell 4 execution @ 1687660109260, started @ 1687660109255, elapsed time = 0.005s
07:58:29.265 [info] Kernel acknowledged execution of cell 5 @ 1687660109263
07:58:29.270 [info] End cell 5 execution @ 1687660109269, started @ 1687660109263, elapsed time = 0.006s
07:58:29.276 [info] Kernel acknowledged execution of cell 6 @ 1687660109271
07:58:29.280 [info] End cell 6 execution @ 1687660109278, started @ 1687660109271, elapsed time = 0.007s
07:58:29.283 [info] Kernel acknowledged execution of cell 7 @ 1687660109280
07:58:29.285 [info] End cell 7 execution @ 1687660109283, started @ 1687660109280, elapsed time = 0.003s
07:58:29.289 [info] Kernel acknowledged execution of cell 8 @ 1687660109287
07:58:29.328 [info] End cell 8 execution @ 1687660109326, started @ 1687660109287, elapsed time = 0.039s
07:58:29.333 [info] Kernel acknowledged execution of cell 9 @ 1687660109329
07:58:29.340 [info] End cell 9 execution @ 1687660109338, started @ 1687660109329, elapsed time = 0.009s
07:58:29.345 [info] Kernel acknowledged execution of cell 10 @ 1687660109342
07:58:29.477 [info] End cell 10 execution @ 1687660109476, started @ 1687660109342, elapsed time = 0.134s
07:58:29.481 [info] Kernel acknowledged execution of cell 11 @ 1687660109478
07:58:29.483 [info] End cell 11 execution @ 1687660109482, started @ 1687660109478, elapsed time = 0.004s
07:58:29.491 [info] Kernel acknowledged execution of cell 12 @ 1687660109484
07:58:29.493 [info] End cell 12 execution @ 1687660109491, started @ 1687660109484, elapsed time = 0.007s
07:58:29.501 [info] Kernel acknowledged execution of cell 13 @ 1687660109495
07:58:29.750 [info] End cell 13 execution @ 1687660109748, started @ 1687660109495, elapsed time = 0.253s
07:58:29.757 [info] Kernel acknowledged execution of cell 14 @ 1687660109751
07:58:29.872 [error] Disposing session as kernel process died ExitCode: undefined, Reason: Unable to load extension: pydevd_plugins.extensions.types.pydevd_plugin_pandas_types

07:58:29.872 [info] Dispose Kernel process 16359.
07:58:29.872 [error] Raw kernel process exited code: undefined
07:58:29.873 [error] Error in waiting for cell to complete [Error: Canceled future for execute_request message before replies were done
at t.KernelShellFutureHandler.dispose (/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:32375)
at /.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:51427
at Map.forEach ()
at y._clearKernelState (
/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:51412)
at y.dispose (
/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:44894)
at /.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:24:112498
at ne (
/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:1586779)
at cy.dispose (/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:24:112474)
at uy.dispose (
/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:24:119757)
at process.processTicksAndRejections (node:internal/process/task_queues:96:5)]
07:58:29.874 [warn] Cell completed with errors {
message: 'Canceled future for execute_request message before replies were done'
}
07:58:29.875 [info] End cell 14 execution @ 1687660109875, started @ 1687660109751, elapsed time = 0.124s
07:58:29.875 [warn] Cancel all remaining cells due to cancellation or failure in execution
07:58:29.875 [info] End cell 16 execution @ undefined, started @ undefined, elapsed time = 0s
07:58:29.906 [info] End cell 14 execution @ undefined, started @ undefined, elapsed time = 0s
07:58:56.661 [info] Restart requested /Desktop/python/jupyter_notebooks/DataScienceProjects/XGB vs LightGBM vs CatBoost/CatBoost Vs XGBoost Vs LightGBM - Part 1.ipynb
07:58:56.666 [info] Process Execution: /opt/homebrew/Caskroom/miniconda/base/envs/py38/bin/python -c "import ipykernel; print(ipykernel.version); print("5dc3a68c-e34e-4080-9c3e-2a532b2ccb4d"); print(ipykernel.file)"
07:58:56.678 [info] Process Execution: /opt/homebrew/Caskroom/miniconda/base/envs/py38/bin/python -m ipykernel_launcher --ip=127.0.0.1 --stdin=9003 --control=9001 --hb=9000 --Session.signature_scheme="hmac-sha256" --Session.key=b"807c9de1-d73d-47c0-bc58-1694c815d704" --shell=9002 --transport="tcp" --iopub=9004 --f=
/Library/Jupyter/runtime/kernel-v2-16206ACRT1Fm4USne.json
> cwd: ~/Desktop/python/jupyter_notebooks/DataScienceProjects/XGB vs LightGBM vs CatBoost
07:58:56.906 [info] ipykernel version & path 6.19.2, /opt/homebrew/Caskroom/miniconda/base/envs/py38/lib/python3.8/site-packages/ipykernel/init.py for /opt/homebrew/Caskroom/miniconda/base/envs/py38/bin/python
07:58:57.489 [warn] StdErr from Kernel Process Unable to load extension: pydevd_plugins.extensions.types.pydevd_plugin_pandas_types
07:58:57.702 [info] Started Kernel py38 (Python 3.8.16) (pid: 16419)
07:58:57.702 [info] Started new session 2e1230fd-b62f-40f5-ac81-6119db59d473
07:58:57.702 [info] Shutdown old session undefined
07:58:59.342 [info] Handle Execution of Cells 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,16 for ~/Desktop/python/jupyter_notebooks/DataScienceProjects/XGB vs LightGBM vs CatBoost/CatBoost Vs XGBoost Vs LightGBM - Part 1.ipynb
07:58:59.350 [info] Kernel acknowledged execution of cell 0 @ 1687660139349
07:58:59.352 [info] End cell 0 execution @ 1687660139351, started @ 1687660139349, elapsed time = 0.002s
07:58:59.355 [info] Kernel acknowledged execution of cell 1 @ 1687660139353
07:59:01.063 [info] End cell 1 execution @ 1687660141061, started @ 1687660139353, elapsed time = 1.708s
07:59:01.067 [info] Kernel acknowledged execution of cell 2 @ 1687660141064
07:59:01.069 [info] End cell 2 execution @ 1687660141068, started @ 1687660141064, elapsed time = 0.004s
07:59:01.073 [info] Kernel acknowledged execution of cell 3 @ 1687660141070
07:59:01.080 [info] End cell 3 execution @ 1687660141079, started @ 1687660141070, elapsed time = 0.009s
07:59:01.085 [info] Kernel acknowledged execution of cell 4 @ 1687660141081
07:59:01.087 [info] End cell 4 execution @ 1687660141086, started @ 1687660141081, elapsed time = 0.005s
07:59:01.090 [info] Kernel acknowledged execution of cell 5 @ 1687660141088
07:59:01.094 [info] End cell 5 execution @ 1687660141093, started @ 1687660141088, elapsed time = 0.005s
07:59:01.101 [info] Kernel acknowledged execution of cell 6 @ 1687660141095
07:59:01.103 [info] End cell 6 execution @ 1687660141102, started @ 1687660141095, elapsed time = 0.007s
07:59:01.105 [info] Kernel acknowledged execution of cell 7 @ 1687660141104
07:59:01.108 [info] End cell 7 execution @ 1687660141107, started @ 1687660141104, elapsed time = 0.003s
07:59:01.111 [info] Kernel acknowledged execution of cell 8 @ 1687660141109
07:59:01.145 [info] End cell 8 execution @ 1687660141143, started @ 1687660141109, elapsed time = 0.034s
07:59:01.149 [info] Kernel acknowledged execution of cell 9 @ 1687660141146
07:59:01.154 [info] End cell 9 execution @ 1687660141153, started @ 1687660141146, elapsed time = 0.007s
07:59:01.157 [info] Kernel acknowledged execution of cell 10 @ 1687660141155
07:59:01.291 [info] End cell 10 execution @ 1687660141290, started @ 1687660141155, elapsed time = 0.135s
07:59:01.295 [info] Kernel acknowledged execution of cell 11 @ 1687660141292
07:59:01.297 [info] End cell 11 execution @ 1687660141295, started @ 1687660141292, elapsed time = 0.003s
07:59:01.303 [info] Kernel acknowledged execution of cell 12 @ 1687660141297
07:59:01.305 [info] End cell 12 execution @ 1687660141303, started @ 1687660141297, elapsed time = 0.006s
07:59:01.306 [info] Kernel acknowledged execution of cell 13 @ 1687660141306
07:59:01.403 [info] End cell 13 execution @ 1687660141402, started @ 1687660141306, elapsed time = 0.096s
07:59:01.408 [info] Kernel acknowledged execution of cell 14 @ 1687660141404
07:59:01.409 [info] End cell 14 execution @ 1687660141408, started @ 1687660141404, elapsed time = 0.004s
07:59:01.410 [info] End cell 16 execution @ undefined, started @ undefined, elapsed time = 0s
07:59:16.946 [info] Handle Execution of Cells 14 for ~/Desktop/python/jupyter_notebooks/DataScienceProjects/XGB vs LightGBM vs CatBoost/CatBoost Vs XGBoost Vs LightGBM - Part 1.ipynb
07:59:16.952 [info] Kernel acknowledged execution of cell 14 @ 1687660156948
07:59:16.955 [info] End cell 14 execution @ 1687660156954, started @ 1687660156948, elapsed time = 0.006s
07:59:26.549 [info] Handle Execution of Cells 14 for ~/Desktop/python/jupyter_notebooks/DataScienceProjects/XGB vs LightGBM vs CatBoost/CatBoost Vs XGBoost Vs LightGBM - Part 1.ipynb
07:59:26.558 [info] Kernel acknowledged execution of cell 14 @ 1687660166551
07:59:26.560 [info] End cell 14 execution @ 1687660166558, started @ 1687660166551, elapsed time = 0.007s
07:59:34.139 [info] Handle Execution of Cells 14 for ~/Desktop/python/jupyter_notebooks/DataScienceProjects/XGB vs LightGBM vs CatBoost/CatBoost Vs XGBoost Vs LightGBM - Part 1.ipynb
07:59:34.147 [info] Kernel acknowledged execution of cell 14 @ 1687660174142
07:59:34.255 [error] Disposing session as kernel process died ExitCode: undefined, Reason: Unable to load extension: pydevd_plugins.extensions.types.pydevd_plugin_pandas_types

07:59:34.256 [info] Dispose Kernel process 16419.
07:59:34.256 [error] Raw kernel process exited code: undefined
07:59:34.257 [error] Error in waiting for cell to complete [Error: Canceled future for execute_request message before replies were done
at t.KernelShellFutureHandler.dispose (/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:32375)
at /.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:51427
at Map.forEach ()
at y._clearKernelState (
/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:51412)
at y.dispose (
/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:44894)
at /.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:24:112498
at ne (
/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:1586779)
at cy.dispose (/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:24:112474)
at uy.dispose (
/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:24:119757)
at process.processTicksAndRejections (node:internal/process/task_queues:96:5)]
07:59:34.257 [warn] Cell completed with errors {
message: 'Canceled future for execute_request message before replies were done'
}
07:59:34.257 [info] End cell 14 execution @ 1687660174257, started @ 1687660174142, elapsed time = 0.115s
07:59:34.257 [warn] Cancel all remaining cells due to cancellation or failure in execution
07:59:34.287 [info] End cell 14 execution @ undefined, started @ undefined, elapsed time = 0s
07:59:48.790 [info] Restart requested /Desktop/python/jupyter_notebooks/DataScienceProjects/XGB vs LightGBM vs CatBoost/CatBoost Vs XGBoost Vs LightGBM - Part 1.ipynb
07:59:48.794 [info] Process Execution: /opt/homebrew/Caskroom/miniconda/base/envs/py38/bin/python -c "import ipykernel; print(ipykernel.version); print("5dc3a68c-e34e-4080-9c3e-2a532b2ccb4d"); print(ipykernel.file)"
07:59:48.804 [info] Process Execution: /opt/homebrew/Caskroom/miniconda/base/envs/py38/bin/python -m ipykernel_launcher --ip=127.0.0.1 --stdin=9003 --control=9001 --hb=9000 --Session.signature_scheme="hmac-sha256" --Session.key=b"fc85de1c-1ccc-44a7-9993-7b0ca7c95990" --shell=9002 --transport="tcp" --iopub=9004 --f=
/Library/Jupyter/runtime/kernel-v2-16206mBFUcj0llS1S.json
> cwd: ~/Desktop/python/jupyter_notebooks/DataScienceProjects/XGB vs LightGBM vs CatBoost
07:59:49.068 [info] ipykernel version & path 6.19.2, /opt/homebrew/Caskroom/miniconda/base/envs/py38/lib/python3.8/site-packages/ipykernel/init.py for /opt/homebrew/Caskroom/miniconda/base/envs/py38/bin/python
07:59:49.698 [warn] StdErr from Kernel Process Unable to load extension: pydevd_plugins.extensions.types.pydevd_plugin_pandas_types
07:59:49.834 [info] Started Kernel py38 (Python 3.8.16) (pid: 16499)
07:59:49.834 [info] Started new session b9853b1d-506a-44ec-b476-8e5c870a0e54
07:59:49.834 [info] Shutdown old session undefined
07:59:52.896 [info] Handle Execution of Cells 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,16 for ~/Desktop/python/jupyter_notebooks/DataScienceProjects/XGB vs LightGBM vs CatBoost/CatBoost Vs XGBoost Vs LightGBM - Part 1.ipynb
07:59:52.904 [info] Kernel acknowledged execution of cell 0 @ 1687660192902
07:59:52.905 [info] End cell 0 execution @ 1687660192904, started @ 1687660192902, elapsed time = 0.002s
07:59:52.908 [info] Kernel acknowledged execution of cell 1 @ 1687660192906
07:59:54.698 [info] End cell 1 execution @ 1687660194696, started @ 1687660192906, elapsed time = 1.79s
07:59:54.701 [info] Kernel acknowledged execution of cell 2 @ 1687660194699
07:59:54.704 [info] End cell 2 execution @ 1687660194702, started @ 1687660194699, elapsed time = 0.003s
07:59:54.707 [info] Kernel acknowledged execution of cell 3 @ 1687660194705
07:59:54.715 [info] End cell 3 execution @ 1687660194714, started @ 1687660194705, elapsed time = 0.009s
07:59:54.720 [info] Kernel acknowledged execution of cell 4 @ 1687660194716
07:59:54.722 [info] End cell 4 execution @ 1687660194721, started @ 1687660194716, elapsed time = 0.005s
07:59:54.725 [info] Kernel acknowledged execution of cell 5 @ 1687660194723
07:59:54.729 [info] End cell 5 execution @ 1687660194728, started @ 1687660194723, elapsed time = 0.005s
07:59:54.733 [info] Kernel acknowledged execution of cell 6 @ 1687660194730
07:59:54.738 [info] End cell 6 execution @ 1687660194737, started @ 1687660194730, elapsed time = 0.007s
07:59:54.741 [info] Kernel acknowledged execution of cell 7 @ 1687660194739
07:59:54.744 [info] End cell 7 execution @ 1687660194742, started @ 1687660194739, elapsed time = 0.003s
07:59:54.746 [info] Kernel acknowledged execution of cell 8 @ 1687660194744
07:59:54.797 [info] End cell 8 execution @ 1687660194795, started @ 1687660194744, elapsed time = 0.051s
07:59:54.801 [info] Kernel acknowledged execution of cell 9 @ 1687660194798
07:59:54.806 [info] End cell 9 execution @ 1687660194805, started @ 1687660194798, elapsed time = 0.007s
07:59:54.811 [info] Kernel acknowledged execution of cell 10 @ 1687660194807
07:59:54.938 [info] End cell 10 execution @ 1687660194936, started @ 1687660194807, elapsed time = 0.129s
07:59:54.939 [info] Kernel acknowledged execution of cell 11 @ 1687660194938
07:59:54.943 [info] End cell 11 execution @ 1687660194941, started @ 1687660194938, elapsed time = 0.003s
07:59:54.949 [info] Kernel acknowledged execution of cell 12 @ 1687660194944
07:59:54.951 [info] End cell 12 execution @ 1687660194950, started @ 1687660194944, elapsed time = 0.006s
07:59:54.954 [info] Kernel acknowledged execution of cell 13 @ 1687660194951
07:59:55.061 [info] End cell 13 execution @ 1687660195060, started @ 1687660194951, elapsed time = 0.109s
07:59:55.069 [info] Kernel acknowledged execution of cell 14 @ 1687660195062
07:59:55.178 [error] Disposing session as kernel process died ExitCode: undefined, Reason: Unable to load extension: pydevd_plugins.extensions.types.pydevd_plugin_pandas_types

07:59:55.179 [info] Dispose Kernel process 16499.
07:59:55.179 [error] Raw kernel process exited code: undefined
07:59:55.179 [error] Error in waiting for cell to complete [Error: Canceled future for execute_request message before replies were done
at t.KernelShellFutureHandler.dispose (/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:32375)
at /.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:51427
at Map.forEach ()
at y._clearKernelState (
/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:51412)
at y.dispose (
/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:44894)
at /.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:24:112498
at ne (
/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:1586779)
at cy.dispose (/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:24:112474)
at uy.dispose (
/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:24:119757)
at process.processTicksAndRejections (node:internal/process/task_queues:96:5)]
07:59:55.179 [warn] Cell completed with errors {
message: 'Canceled future for execute_request message before replies were done'
}
07:59:55.180 [info] End cell 14 execution @ 1687660195180, started @ 1687660195062, elapsed time = 0.118s
07:59:55.180 [warn] Cancel all remaining cells due to cancellation or failure in execution
07:59:55.180 [info] End cell 16 execution @ undefined, started @ undefined, elapsed time = 0s
07:59:55.206 [info] End cell 14 execution @ undefined, started @ undefined, elapsed time = 0s
08:02:26.773 [info] Restart requested /Desktop/python/jupyter_notebooks/DataScienceProjects/XGB vs LightGBM vs CatBoost/CatBoost Vs XGBoost Vs LightGBM - Part 1.ipynb
08:02:26.778 [info] Process Execution: /opt/homebrew/Caskroom/miniconda/base/envs/py38/bin/python -c "import ipykernel; print(ipykernel.version); print("5dc3a68c-e34e-4080-9c3e-2a532b2ccb4d"); print(ipykernel.file)"
08:02:26.789 [info] Process Execution: /opt/homebrew/Caskroom/miniconda/base/envs/py38/bin/python -m ipykernel_launcher --ip=127.0.0.1 --stdin=9003 --control=9001 --hb=9000 --Session.signature_scheme="hmac-sha256" --Session.key=b"b46ea03a-4f2a-441d-bc81-366610d11887" --shell=9002 --transport="tcp" --iopub=9004 --f=
/Library/Jupyter/runtime/kernel-v2-16206DjW5S062c53v.json
> cwd: ~/Desktop/python/jupyter_notebooks/DataScienceProjects/XGB vs LightGBM vs CatBoost
08:02:27.007 [info] ipykernel version & path 6.19.2, /opt/homebrew/Caskroom/miniconda/base/envs/py38/lib/python3.8/site-packages/ipykernel/init.py for /opt/homebrew/Caskroom/miniconda/base/envs/py38/bin/python
08:02:27.515 [warn] StdErr from Kernel Process Unable to load extension: pydevd_plugins.extensions.types.pydevd_plugin_pandas_types
08:02:27.628 [info] Started Kernel py38 (Python 3.8.16) (pid: 16720)
08:02:27.628 [info] Started new session da70ad46-e871-464c-a83e-621af8c4dc52
08:02:27.628 [info] Shutdown old session undefined
08:02:30.375 [info] Handle Execution of Cells 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,16 for ~/Desktop/python/jupyter_notebooks/DataScienceProjects/XGB vs LightGBM vs CatBoost/CatBoost Vs XGBoost Vs LightGBM - Part 1.ipynb
08:02:30.384 [info] Kernel acknowledged execution of cell 0 @ 1687660350382
08:02:30.386 [info] End cell 0 execution @ 1687660350384, started @ 1687660350382, elapsed time = 0.002s
08:02:30.388 [info] Kernel acknowledged execution of cell 1 @ 1687660350386
08:02:32.158 [info] End cell 1 execution @ 1687660352157, started @ 1687660350386, elapsed time = 1.771s
08:02:32.162 [info] Kernel acknowledged execution of cell 2 @ 1687660352159
08:02:32.165 [info] End cell 2 execution @ 1687660352163, started @ 1687660352159, elapsed time = 0.004s
08:02:32.167 [info] Kernel acknowledged execution of cell 3 @ 1687660352166
08:02:32.176 [info] End cell 3 execution @ 1687660352174, started @ 1687660352166, elapsed time = 0.008s
08:02:32.180 [info] Kernel acknowledged execution of cell 4 @ 1687660352177
08:02:32.182 [info] End cell 4 execution @ 1687660352181, started @ 1687660352177, elapsed time = 0.004s
08:02:32.187 [info] Kernel acknowledged execution of cell 5 @ 1687660352183
08:02:32.189 [info] End cell 5 execution @ 1687660352188, started @ 1687660352183, elapsed time = 0.005s
08:02:32.193 [info] Kernel acknowledged execution of cell 6 @ 1687660352190
08:02:32.198 [info] End cell 6 execution @ 1687660352196, started @ 1687660352190, elapsed time = 0.006s
08:02:32.199 [info] Kernel acknowledged execution of cell 7 @ 1687660352198
08:02:32.202 [info] End cell 7 execution @ 1687660352201, started @ 1687660352198, elapsed time = 0.003s
08:02:32.205 [info] Kernel acknowledged execution of cell 8 @ 1687660352203
08:02:32.256 [info] End cell 8 execution @ 1687660352254, started @ 1687660352203, elapsed time = 0.051s
08:02:32.260 [info] Kernel acknowledged execution of cell 9 @ 1687660352257
08:02:32.266 [info] End cell 9 execution @ 1687660352264, started @ 1687660352257, elapsed time = 0.007s
08:02:32.269 [info] Kernel acknowledged execution of cell 10 @ 1687660352266
08:02:32.400 [info] End cell 10 execution @ 1687660352399, started @ 1687660352266, elapsed time = 0.133s
08:02:32.403 [info] Kernel acknowledged execution of cell 11 @ 1687660352401
08:02:32.405 [info] End cell 11 execution @ 1687660352404, started @ 1687660352401, elapsed time = 0.003s
08:02:32.411 [info] Kernel acknowledged execution of cell 12 @ 1687660352406
08:02:32.413 [info] End cell 12 execution @ 1687660352411, started @ 1687660352406, elapsed time = 0.005s
08:02:32.417 [info] Kernel acknowledged execution of cell 13 @ 1687660352414
08:02:32.517 [info] End cell 13 execution @ 1687660352516, started @ 1687660352414, elapsed time = 0.102s
08:02:32.522 [info] Kernel acknowledged execution of cell 14 @ 1687660352518
08:02:32.628 [error] Disposing session as kernel process died ExitCode: undefined, Reason: Unable to load extension: pydevd_plugins.extensions.types.pydevd_plugin_pandas_types

08:02:32.629 [info] Dispose Kernel process 16720.
08:02:32.629 [error] Raw kernel process exited code: undefined
08:02:32.629 [error] Error in waiting for cell to complete [Error: Canceled future for execute_request message before replies were done
at t.KernelShellFutureHandler.dispose (/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:32375)
at /.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:51427
at Map.forEach ()
at y._clearKernelState (
/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:51412)
at y.dispose (
/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:44894)
at /.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:24:112498
at ne (
/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:1586779)
at cy.dispose (/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:24:112474)
at uy.dispose (
/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:24:119757)
at process.processTicksAndRejections (node:internal/process/task_queues:96:5)]
08:02:32.630 [warn] Cell completed with errors {
message: 'Canceled future for execute_request message before replies were done'
}
08:02:32.630 [info] End cell 14 execution @ 1687660352630, started @ 1687660352518, elapsed time = 0.112s
08:02:32.630 [warn] Cancel all remaining cells due to cancellation or failure in execution
08:02:32.630 [info] End cell 16 execution @ undefined, started @ undefined, elapsed time = 0s
08:02:32.656 [info] End cell 14 execution @ undefined, started @ undefined, elapsed time = 0s
08:02:48.374 [info] Restart requested /Desktop/python/jupyter_notebooks/DataScienceProjects/XGB vs LightGBM vs CatBoost/CatBoost Vs XGBoost Vs LightGBM - Part 1.ipynb
08:02:48.377 [info] Process Execution: /opt/homebrew/Caskroom/miniconda/base/envs/py38/bin/python -c "import ipykernel; print(ipykernel.version); print("5dc3a68c-e34e-4080-9c3e-2a532b2ccb4d"); print(ipykernel.file)"
08:02:48.391 [info] Process Execution: /opt/homebrew/Caskroom/miniconda/base/envs/py38/bin/python -m ipykernel_launcher --ip=127.0.0.1 --stdin=9003 --control=9001 --hb=9000 --Session.signature_scheme="hmac-sha256" --Session.key=b"863e9c7f-901a-4a36-9989-c27ecaf69b32" --shell=9002 --transport="tcp" --iopub=9004 --f=
/Library/Jupyter/runtime/kernel-v2-162064Ju5f8CnY0a5.json
> cwd: ~/Desktop/python/jupyter_notebooks/DataScienceProjects/XGB vs LightGBM vs CatBoost
08:02:48.603 [info] ipykernel version & path 6.19.2, /opt/homebrew/Caskroom/miniconda/base/envs/py38/lib/python3.8/site-packages/ipykernel/init.py for /opt/homebrew/Caskroom/miniconda/base/envs/py38/bin/python
08:02:49.145 [warn] StdErr from Kernel Process Unable to load extension: pydevd_plugins.extensions.types.pydevd_plugin_pandas_types
08:02:49.259 [info] Started Kernel py38 (Python 3.8.16) (pid: 16765)
08:02:49.259 [info] Started new session 201fc3c0-a149-4b7d-94e3-3482a82c37db
08:02:49.259 [info] Shutdown old session undefined
08:03:48.080 [info] Restart requested /Desktop/python/jupyter_notebooks/DataScienceProjects/XGB vs LightGBM vs CatBoost/CatBoost Vs XGBoost Vs LightGBM - Part 1.ipynb
08:03:48.083 [info] Process Execution: /opt/homebrew/Caskroom/miniconda/base/envs/py38/bin/python -c "import ipykernel; print(ipykernel.version); print("5dc3a68c-e34e-4080-9c3e-2a532b2ccb4d"); print(ipykernel.file)"
08:03:48.096 [info] Process Execution: /opt/homebrew/Caskroom/miniconda/base/envs/py38/bin/python -m ipykernel_launcher --ip=127.0.0.1 --stdin=9008 --control=9006 --hb=9005 --Session.signature_scheme="hmac-sha256" --Session.key=b"b4bf15aa-6a15-4f38-9749-7c0cf1f56706" --shell=9007 --transport="tcp" --iopub=9009 --f=
/Library/Jupyter/runtime/kernel-v2-16206E0iMszqUQHRG.json
> cwd: ~/Desktop/python/jupyter_notebooks/DataScienceProjects/XGB vs LightGBM vs CatBoost
08:03:49.395 [info] ipykernel version & path 6.19.2, /opt/homebrew/Caskroom/miniconda/base/envs/py38/lib/python3.8/site-packages/ipykernel/init.py for /opt/homebrew/Caskroom/miniconda/base/envs/py38/bin/python
08:03:52.943 [warn] StdErr from Kernel Process Unable to load extension: pydevd_plugins.extensions.types.pydevd_plugin_pandas_types
08:03:53.483 [info] Started Kernel py38 (Python 3.8.16) (pid: 16853)
08:03:53.483 [info] Started new session a9491455-9841-40c1-84d1-140641c52c71
08:03:53.483 [info] Shutdown old session 201fc3c0-a149-4b7d-94e3-3482a82c37db
08:03:53.483 [info] Dispose Kernel process 16765.
08:03:55.135 [info] Handle Execution of Cells 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,16 for ~/Desktop/python/jupyter_notebooks/DataScienceProjects/XGB vs LightGBM vs CatBoost/CatBoost Vs XGBoost Vs LightGBM - Part 1.ipynb
08:03:55.143 [info] Kernel acknowledged execution of cell 0 @ 1687660435141
08:03:55.145 [info] End cell 0 execution @ 1687660435143, started @ 1687660435141, elapsed time = 0.002s
08:03:55.148 [info] Kernel acknowledged execution of cell 1 @ 1687660435146
08:04:03.704 [info] End cell 1 execution @ 1687660443701, started @ 1687660435146, elapsed time = 8.555s
08:04:03.710 [info] Kernel acknowledged execution of cell 2 @ 1687660443706
08:04:03.714 [info] End cell 2 execution @ 1687660443712, started @ 1687660443706, elapsed time = 0.006s
08:04:03.716 [info] Kernel acknowledged execution of cell 3 @ 1687660443715
08:04:03.733 [info] End cell 3 execution @ 1687660443732, started @ 1687660443715, elapsed time = 0.017s
08:04:03.737 [info] Kernel acknowledged execution of cell 4 @ 1687660443734
08:04:03.739 [info] End cell 4 execution @ 1687660443738, started @ 1687660443734, elapsed time = 0.004s
08:04:03.744 [info] Kernel acknowledged execution of cell 5 @ 1687660443740
08:04:03.746 [info] End cell 5 execution @ 1687660443745, started @ 1687660443740, elapsed time = 0.005s
08:04:03.753 [info] Kernel acknowledged execution of cell 6 @ 1687660443747
08:04:03.755 [info] End cell 6 execution @ 1687660443753, started @ 1687660443747, elapsed time = 0.006s
08:04:03.758 [info] Kernel acknowledged execution of cell 7 @ 1687660443755
08:04:03.759 [info] End cell 7 execution @ 1687660443758, started @ 1687660443755, elapsed time = 0.003s
08:04:03.762 [info] Kernel acknowledged execution of cell 8 @ 1687660443760
08:04:03.933 [info] End cell 8 execution @ 1687660443932, started @ 1687660443760, elapsed time = 0.172s
08:04:03.938 [info] Kernel acknowledged execution of cell 9 @ 1687660443934
08:04:03.943 [info] End cell 9 execution @ 1687660443942, started @ 1687660443934, elapsed time = 0.008s
08:04:03.946 [info] Kernel acknowledged execution of cell 10 @ 1687660443944
08:04:04.084 [info] End cell 10 execution @ 1687660444082, started @ 1687660443944, elapsed time = 0.138s
08:04:04.087 [info] Kernel acknowledged execution of cell 11 @ 1687660444084
08:04:04.089 [info] End cell 11 execution @ 1687660444087, started @ 1687660444084, elapsed time = 0.003s
08:04:04.095 [info] Kernel acknowledged execution of cell 12 @ 1687660444090
08:04:04.097 [info] End cell 12 execution @ 1687660444095, started @ 1687660444090, elapsed time = 0.005s
08:04:04.101 [info] Kernel acknowledged execution of cell 13 @ 1687660444097
08:04:04.211 [info] End cell 13 execution @ 1687660444209, started @ 1687660444097, elapsed time = 0.112s
08:04:04.215 [info] Kernel acknowledged execution of cell 14 @ 1687660444212
08:04:04.330 [error] Disposing session as kernel process died ExitCode: undefined, Reason: Unable to load extension: pydevd_plugins.extensions.types.pydevd_plugin_pandas_types

08:04:04.331 [info] Dispose Kernel process 16853.
08:04:04.331 [error] Raw kernel process exited code: undefined
08:04:04.332 [error] Error in waiting for cell to complete [Error: Canceled future for execute_request message before replies were done
at t.KernelShellFutureHandler.dispose (/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:32375)
at /.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:51427
at Map.forEach ()
at y._clearKernelState (
/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:51412)
at y.dispose (
/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:44894)
at /.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:24:112498
at ne (
/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:2:1586779)
at cy.dispose (/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:24:112474)
at uy.dispose (
/.vscode/extensions/ms-toolsai.jupyter-2023.5.1101742258-darwin-arm64/out/extension.node.js:24:119757)
at process.processTicksAndRejections (node:internal/process/task_queues:96:5)]
08:04:04.332 [warn] Cell completed with errors {
message: 'Canceled future for execute_request message before replies were done'
}
08:04:04.332 [info] End cell 14 execution @ 1687660444332, started @ 1687660444212, elapsed time = 0.12s
08:04:04.332 [warn] Cancel all remaining cells due to cancellation or failure in execution
08:04:04.332 [info] End cell 16 execution @ undefined, started @ undefined, elapsed time = 0s
08:04:04.357 [info] End cell 14 execution @ undefined, started @ undefined, elapsed time = 0s

@jameslamb
Copy link
Collaborator

@dhirajpatra There is nothing we can do with just a dump of a ton of logs like that. I'm not even sure how you drew the conclusion that you issue is the same as this one.

I'm going to lock this issue.

If you'd like some help, please open a new issue at https://github.com/microsoft/LightGBM/issues, including a minimal, reproducible example with enough information for someone to help you. That would include:

  • version of LightGBM you're running and how you installed it
  • operating system
  • the smallest possible self-contained code which causes the issue you're seeing
  • anything else you've tried to resolve it, which might help us to narrow down the root cause

@microsoft microsoft locked as resolved and limited conversation to collaborators Jun 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
4 participants