-
Notifications
You must be signed in to change notification settings - Fork 838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
key not found #406
Comments
@tanjiaxin Sorry about the issue you are having, I believe this has been fixed here: --packages The fix should be in the next v0.15 release. |
@imatiach-msft thanks for you help, I will try it again. |
@imatiach-msft |
@tanjiaxin could you try disabling autoscale? it is currently not supported. I wonder if that is causing the errors. |
@imatiach-msft I will change the setting and run it again. |
@tanjiaxin ok, that is the autoscale setting, so it is disabled. It sounds like you are encountering another issue then. It is not clear to me what the issue is from the logs because the connection refused error is a red herring, there should be another error on one of the workers which is the real exception. |
@imatiach-msft I'm going to check all my spark logs, thanks for your patient. |
@imatiach-msft I do have found a OOM error in a node, the logs are below: |
hi @tanjiaxin , sorry, this is an issue with lightgbm - the dataset on each partition is replicated in native memory (so native lightgbm code can run), so at minimum lightgbm takes about 2X dataset size to train. |
@imatiach-msft I will try it, thanks for you help |
@imatiach-msft I have found a incremental training example: https://gist.github.com/goraj/6df8f22a49534e042804a299d81bf2d6 |
@tanjiaxin assuming you are using pyspark based on the example above, you can use modelString (from this source in scala): Otherwise, you can always rescale the cluster to a larger size which should handle the full dataset. |
Hi @imatiach-msft Ihave get a error : " Model file doesn't specify the number of classes". |
@tanjiaxin can you please send me the model file? Usually I get this error when the file is invalid (eg a blank string). You shouldn't have to set num_class. |
@imatiach-msft Sorry for the trouble, I think it's my problem, the model saved as a dir, of course I can't get the num_class. part-00000-02b3b4dd-d082-45e0-8463-55bed1d177e2-c000.zip |
@imatiach-msft I'm still can't fix the prroblem, I have viewed the model file, it do have a line said"num_class=1", I create the instance of lightGBMClassifier with |
@tanjiaxin sorry, I must have confused you, the model string is the actual string contents, not the file path. You would have to read from the file and then pass the string contents to the learner. That is probably why you are getting the error. Also, if you prefer, maybe we can try and resolve this over a skype call? You can email mmlspark-support@microsoft.com and I can invite you to a meeting. |
@imatiach-msft I have solved the problem according your answer, but I have a doublt that what is difference between training the model with whole dataset and incremental training with part of dataset . |
@tanjiaxin I think the post here might be relevant from the main developer of LightGBM, this would apply to any partial dataset. There might be other reasons that accuracy could drop as well. |
@imatiach-msft Thanks very much for your help and patient, I have incremental trianed a model with the whole dataset . |
hello, I'm trying to use lightGBM on the standalone mode spark cluster.
I have got some data on the hdfs, lightGBMClassifier works fine when I use part of the data to train the model , but when I use all the data it will come out the error below.
error.log
And I also tried to use the same part of data to run a cross validate, it sometimes get the same error above and sometimes can get the result.
I'm working on :
spark 2.3.1 python3.6.5
adclick-Copy1.zip
above is the notebook file I used to submit the application.
Could you please help me to find out what is the problem?
The text was updated successfully, but these errors were encountered: