-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training on GPU fails (OSError: exception: access violation) #1717
Comments
Exactly the same problem here with GPU version built with MinGW: Windows 10, Building goes fine, CLI interface also works okay on GPU with test examples, but python wrapper drops exactly the same error (OSError: exception: access violation writing 0xFFFFFFFF95A80000) on Booster init. This command is used to install python wrapper: |
Got similar error on my win10 machine too, works okay on GPU with test examples. Windows 10, Traceback (most recent call last): |
ping @huanzhang12 |
@funkindy @marcualin7412 Could you try if GPU Caps Viewer works on your system? For debugging this kind of issue I suggest using the CLI version of LightGBM instead of Python. Could you please run LightGBM using the CLI (command line interface) and get a full output log? This will be really helpful for me to investigate this issue. |
@huanzhang12 |
@huanzhang12 attached is the output of this command: OpenCL page of the GPU Caps Viewer is Okay like the @marcualin7412 one. The log looks good so the issue may be specific for python interface. |
ping @huanzhang12 |
gently ping @huanzhang12 |
Any information on this issue yet? |
@Mtale |
Any solution on this issue? I compiled LightGBM (using VS 2017) on my windows 10 machine with 1060 6GB GPU. It runs well in CPU and always got error message: OSError: exception: access violation reading 0x0000000000000020 when using GPU. Checked all discussion regarding this issue but no useful information so far. Any idea? |
I have the similar problem: |
When i use it with basic env, i works well. |
It seems the problem mainly happen in windows. and one comment say disable the intel GPU can help.
you can try this solution. and gentle ping @huanzhang12 for the better word around. We have a new CUDA implementation (#3160), which does not depend on OpenCL, and it should fix this. |
I have similar OSError in reading 0x0000000000000038. |
I have been trying to run LightGBM GPU for some time without success. The software works well on CPU.
I've compiled LightGBM using MinGW following the instructions here and using MSVC like instructed here. I used Visual Studio 2017 to compile.
No matter the way of compilation, while I try to train a model in Jupyter on Python I get the same error message:
OSError: exception: access violation reading 0x0000000000000020
More details on error below. The referenced error is for sklearn API but the error stays the same if I use lightgbm.cv API.
While trying to run CLI example in the instructions of MinGW compilation, the program fails silently. I have MSVC compilation installed right now and can't reproduce but if you refer to image in the instructions, silent fail occurs after the line Total bins 6143.
output of CLI example
I've run Tensorflow GPU earlier, hence the GPU does work. However, GPU Caps Viewer fails silently while starting. Probably related, but I wan't able to find anything on that problem online.
I've tried suggestions in the following issues:
#836
#1028
Environment info
Operating System: Windows 10 Home
CPU Model: i7 7700
GPU model: Geforce GTX 1060 6Gb
CUDA: 9.0.176.2
OpenCL: 1.2
C++/Python/R version: Python 3.6
Error message
in model(features, test_features, encoding, n_folds)
125 eval_set = [(valid_features, valid_labels), (train_features, train_labels)],
126 eval_names = ['valid', 'train'], categorical_feature = cat_indices,
--> 127 early_stopping_rounds = 100, verbose = 200)
128
129 # Record the best iteration
C:\Anaconda3\lib\site-packages\lightgbm\sklearn.py in fit(self, X, y, sample_weight, init_score, eval_set, eval_names, eval_sample_weight, eval_class_weight, eval_init_score, eval_metric, early_stopping_rounds, verbose, feature_name, categorical_feature, callbacks)
697 verbose=verbose, feature_name=feature_name,
698 categorical_feature=categorical_feature,
--> 699 callbacks=callbacks)
700 return self
701
C:\Anaconda3\lib\site-packages\lightgbm\sklearn.py in fit(self, X, y, sample_weight, init_score, group, eval_set, eval_names, eval_sample_weight, eval_class_weight, eval_init_score, eval_group, eval_metric, early_stopping_rounds, verbose, feature_name, categorical_feature, callbacks)
500 verbose_eval=verbose, feature_name=feature_name,
501 categorical_feature=categorical_feature,
--> 502 callbacks=callbacks)
503
504 if evals_result:
C:\Anaconda3\lib\site-packages\lightgbm\engine.py in train(params, train_set, num_boost_round, valid_sets, valid_names, fobj, feval, init_model, feature_name, categorical_feature, early_stopping_rounds, evals_result, verbose_eval, learning_rates, keep_training_booster, callbacks)
188 # construct booster
189 try:
--> 190 booster = Booster(params=params, train_set=train_set)
191 if is_valid_contain_train:
192 booster.set_train_data_name(train_data_name)
C:\Anaconda3\lib\site-packages\lightgbm\basic.py in init(self, params, train_set, model_file, silent)
1474 train_set.construct().handle,
1475 c_str(params_str),
-> 1476 ctypes.byref(self.handle)))
1477 # save reference to data
1478 self.train_set = train_set
OSError: exception: access violation reading 0x0000000000000020
The text was updated successfully, but these errors were encountered: