Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault during predict #4156

Closed
Tracked by #5153
pseudotensor opened this issue Apr 2, 2021 · 15 comments
Closed
Tracked by #5153

segfault during predict #4156

pseudotensor opened this issue Apr 2, 2021 · 15 comments
Labels

Comments

@pseudotensor
Copy link

import pickle
model, X, kwargs = pickle.load(open("lgbsegfault.pkl", "rb"))
model.predict(X, **kwargs)

data can be anything for predict, 1 row or whatever fails. Only special thing is extra_trees=True

Backtrace:
/home/jon/minicondadai/lib/python3.6/site-packages/lightgbm_gpu/lib_lightgbm.so(_ZNK8LightGBM4GBDT10PredictRawEPKdPdPKNS_27PredictionEarlyStopInstanceE+0x730)[0x7f4d47691880]
/home/jon/minicondadai/lib/python3.6/site-packages/lightgbm_gpu/lib_lightgbm.so(_ZNK8LightGBM4GBDT7PredictEPKdPdPKNS_27PredictionEarlyStopInstanceE+0x15)[0x7f4d47692585]
/home/jon/minicondadai/lib/python3.6/site-packages/lightgbm_gpu/lib_lightgbm.so(_ZNSt17_Function_handlerIFvRKSt6vectorISt4pairIidESaIS2_EEPdEZN8LightGBM9PredictorC4EPNS9_8BoostingEiibbbbidEUlS6_S7_E3_E9_M_invokeERKSt9_Any_dataS6_OS7_+0x1fa)[0x7f4d4798ce6a]
/home/jon/minicondadai/lib/python3.6/site-packages/lightgbm_gpu/lib_lightgbm.so(+0x40c05c)[0x7f4d4798405c]
/home/jon/minicondadai/lib/python3.6/site-packages/numpy/core/../../../.././libgomp.so.1(GOMP_parallel+0x42)[0x7f4e3d716e8c]
/home/jon/minicondadai/lib/python3.6/site-packages/lightgbm_gpu/lib_lightgbm.so(_ZNK8LightGBM7Booster7PredictEiiiiiSt8functionIFSt6vectorISt4pairIidESaIS4_EEiEERKNS_6ConfigEPdPl+0x205)[0x7f4d47990ce5]
/home/jon/minicondadai/lib/python3.6/site-packages/lightgbm_gpu/lib_lightgbm.so(LGBM_BoosterPredictForMat+0xd1)[0x7f4d479800c1]
/home/jon/minicondadai/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c)[0x7f4e40b02630]
/home/jon/minicondadai/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0x22d)[0x7f4e40b01fed]
/home/jon/minicondadai/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce)[0x7f4e3f438f9e]
/home/jon/minicondadai/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(+0x139d5)[0x7f4e3f4399d5]

lgbsegfault.pkl.zip

@btrotta
Copy link
Collaborator

btrotta commented Apr 8, 2021

Is it possible to provide a self-contained reproducible example? I tried to reproduce the problem from your description as follows, but this runs without error.

import numpy as np
import lightgbm as lgb
import pickle

x = np.random.random([10, 2])
y = np.random.choice([0, 1], 10)

lgb_data = lgb.Dataset(x, label=y)
est = lgb.train({'objective': 'binary', 'extra_trees': True}, lgb_data, num_boost_round=5)

with open('pickled_model', 'wb') as f:
    pickle.dump([est, x, {'num_iteration': 3}], f, pickle.HIGHEST_PROTOCOL)

with open('pickled_model', 'rb') as f:
    model, X, kw_args = pickle.load(f)

model.predict(X, **kw_args)

@pseudotensor
Copy link
Author

Sorry, did you try the script and pickle I provided? That is quite contained.

@btrotta
Copy link
Collaborator

btrotta commented Apr 9, 2021

I meant could you provide a script that includes the model training and pickle steps (ideally using random data or a built-in dataset from sklearn etc). That would help isolate the source of the problem.

As a workaround, you could try using LightGBM's save_model and load_model (https://lightgbm.readthedocs.io/en/latest/Python-Intro.html#training) rather than pickle.

@pseudotensor
Copy link
Author

The predict that failed is at end of non-trivial steps. It has nothing to do with pickle itself. It's just that lightgbm segfaults with this particular model.

@jameslamb jameslamb added the bug label Apr 19, 2021
@StrikerRUS StrikerRUS mentioned this issue Jul 12, 2021
21 tasks
@peixin-lin
Copy link

peixin-lin commented Jan 13, 2022

I encountered a similar issue. I am deploying a lgb model within the Tornado framework.
The model loaded from pickle file, in the init() function works normally, but when I call the model (used as an instance variable) in an instance method, the predict() or predict_proba() function cause a segmentation fault.
I used FaultHandler to trace the exact line that caused the segfault. The result is:
python3.6/site-packages/lightgbm/basic.py", line 656 in inner_predict
which is:
preds.ctypes.data_as(ctypes.pointer(ctypes.c_double))))

Is there a way to solve or work around this issue? Please advise.

@jameslamb
Copy link
Collaborator

@peixin-lin if you could provide some additional information, we'd be happy to investigate:

  • version of lightgbm you're using
  • text-format version of the model (can be obtained with Booster.save_model())
  • input data that causes this .predict() to segfault
  • exact way you're calling predict() (e.g., are you using predict() or predict_proba()? are you passing additional parameters?)

@peixin-lin
Copy link

@peixin-lin if you could provide some additional information, we'd be happy to investigate:

  • version of lightgbm you're using
  • text-format version of the model (can be obtained with Booster.save_model())
  • input data that causes this .predict() to segfault
  • exact way you're calling predict() (e.g., are you using predict() or predict_proba()? are you passing additional parameters?)

Thanks for the reply. The details are as below:

  • I encountered the segfault when using v3.3.2 at first so I downgraded it to v2.3.1 but the problem still exists.
  • I tried the pickle format model saved by using the sklearn API (LGBMClassifier) and the .txt format model generated by Booster.save_model(). They are all generated and called by the same version of lightgbm, but the model format makes no difference.
  • I tried different input shapes, values, dtypes and data structures (list, ndarray and matrix) and got the same segfault.
  • Both predict() and predict_proba() are tried, no additional parameters.

@jameslamb
Copy link
Collaborator

Thanks for that information!

I might not have been clear...I'm asking if you can actually provide here the text-format model file and a sample input data that causes prediction to segfault.

That way, we could try experimenting with a heavily-instrumented version of LightGBM to try to find the source of the segfault.

@guolinke
Copy link
Collaborator

guolinke commented Mar 2, 2022

@peixin-lin can you provide a reproduce example?

@github-actions
Copy link

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!

@pseudotensor
Copy link
Author

This shouldn't be closed, I gave a MRE for the fitted state, should be good enough.

@guolinke
Copy link
Collaborator

@shiyu1994 can the new CUDA version run this example successfully?

@StrikerRUS StrikerRUS mentioned this issue Apr 15, 2022
60 tasks
@shiyu1994
Copy link
Collaborator

@guolinke I run the example provided by @pseudotensor successfully with both gpu and cuda_exp versions. Both versions provide the same output:

[LightGBM] [Warning] Unknown parameter: silent
[LightGBM] [Warning] Unknown parameter: predict_batching
[LightGBM] [Warning] seed is set=20070863, random_state=42 will be ignored. Current value: seed=20070863
[LightGBM] [Warning] num_threads is set=8, n_jobs=8 will be ignored. Current value: num_threads=8
[402720. 402720. 402720. ... 402720. 402720. 402720.]

@pseudotensor Could you please take a look at this output to see if it is the identical with your previous trials?

@jameslamb
Copy link
Collaborator

Closing this due to lack of response from @shiyu1994 's post in May 2022: #4156 (comment)

Copy link

github-actions bot commented Dec 6, 2023

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants