Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mmlspark-LightGBM gcc version problem #539

Closed
mrchor opened this issue Apr 15, 2019 · 7 comments
Closed

mmlspark-LightGBM gcc version problem #539

mrchor opened this issue Apr 15, 2019 · 7 comments
Assignees

Comments

@mrchor
Copy link

mrchor commented Apr 15, 2019

Guys, I have a question, mmlspark-LightGBM has a problem of GCC version when running spark. The current version is GCC 5.8. Can you support a lower version, such as GCC 4.8?

@mrchor
Copy link
Author

mrchor commented Apr 16, 2019

java.lang.UnsatisfiedLinkError: /data7/yarn1/local/usercache/mart_mobile/appcache/application_1533628320510_19885747/container_e31_1533628320510_19885747_01_000013/tmp/mml-natives7074460575321003512/lib_lightgbm.so: /lib64/libm.so.6: version `GLIBC_2.23' not found (required by /data7/yarn1/local/usercache/mart_mobile/appcache/application_1533628320510_19885747/container_e31_1533628320510_19885747_01_000013/tmp/mml-natives7074460575321003512/lib_lightgbm.so)

@imatiach-msft
Copy link
Contributor

@mrchor very sorry, yes I am working on fixing this in my spare time. Please see here for more info:
microsoft/LightGBM#1945
Basically, I need to add the JAVA SWIG wrapper generation to the lightgbm docker build process, and that should hopefully fix the GCC issues.

@imatiach-msft imatiach-msft self-assigned this Apr 16, 2019
@mrchor
Copy link
Author

mrchor commented Apr 18, 2019

Thank you for your answer. In addition, I would like to ask two other questions about mmlspark-lightgbm: 1.Do you have any suggestions of supporting batch predicting on mmlspark-lightgbm?2.LightGBM on-line prediction has a serious timeouts problem. Is there any optimization for this problem, such as writing a Scala code to predict instead of C++ DLL?

@imatiach-msft
Copy link
Contributor

imatiach-msft commented Apr 18, 2019

hi @mrchor , besides using the MMLSpark API, you can export the lightgbm model to native file and load it inside the python booster or the R-based learner. You can also export it to PMML and then use any of the PMML evaluators.
Specifically for (1), you can do batch predictions and online predictions with the current API. There were some big performance improvements checked in by @eisber both to the lightgbm repo and mmlspark to improve performance of predictions which haven't been released yet.
You could try to run one of the PR builds, eg:
--packages
com.microsoft.ml.spark:mmlspark_2.11:0.16.dev15+2.g2d494cb
and --repositories
https://mmlspark.azureedge.net/maven
(created from last build in this PR: #537)
Which include his changes.

@mrchor
Copy link
Author

mrchor commented Apr 18, 2019

Thank you very much. PMML does not seem to be a good solution. Since the service is Java architecture currently, is there any other solution to call lightgbm model (spark trained) with Java?

@imatiach-msft
Copy link
Contributor

@mrchor not that I know of currently. What are the problems that you are having with the spark based API specifically? What is the timeout problem and do you have specific performance numbers currently that we could try and improve on - and does the build I sent you help improve the performance?

--packages
com.microsoft.ml.spark:mmlspark_2.11:0.16.dev15+2.g2d494cb
and --repositories
https://mmlspark.azureedge.net/maven

@imatiach-msft
Copy link
Contributor

closing as now we use the official linux .so files produced from Microsoft/LightGBM build, which uses ubuntu 14.04 docker that does not have the glibc issue.
This was fixed with the PR:
#526
Which updates the lightgbm version to:
"com.microsoft.ml.lightgbm" % "lightgbmlib" % "2.2.350"
It should be available in next release. For now, you can use the latest builds from master, eg the build for that PR was:

--packages
com.microsoft.ml.spark:mmlspark_2.11:0.17.dev1+1.g5e0b2a0
and --repositories
https://mmlspark.azureedge.net/maven

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants