-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] model deployment fails -- Could not initialize class ai.djl.onnxruntime.engine.OrtNDManager #3207
Comments
@jovanovic-milos can you please share the command how you register the model? we need to reproduce the issue. Please let us know the model type that you used. thanks |
Hey @mingshl, thanks for replying! I couldn't reproduce the issue since last week and now the deployment seems to be working again. I didn't change anything in my project and im still using the latest docker image of OpenSearch. But just in case you want to try it out:
After the registration was finished i simply called:
|
What is the bug?
Deployment of model is failing because of what seems to be an exception in ml-commons.
How can one reproduce the bug?
Steps to reproduce the behavior:
What is the expected behavior?
Successful deployment of the model
What is your host/environment?
OpenSearch 2.18 running in Docker
Do you have any additional context?
org.opensearch.ml.common.exception.MLException: Failed to deploy model w1BJEpMBbOORGaoAR7h5 2024-11-09T19:29:46.547698532Z at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:300) ~[?:?] 2024-11-09T19:29:46.547704056Z at java.base/java.security.AccessController.doPrivileged(AccessController.java:571) ~[?:?] 2024-11-09T19:29:46.547708040Z at org.opensearch.ml.engine.algorithms.DLModel.loadModel(DLModel.java:252) ~[?:?] 2024-11-09T19:29:46.547723453Z at org.opensearch.ml.engine.algorithms.DLModel.initModel(DLModel.java:142) ~[?:?] 2024-11-09T19:29:46.547727230Z at org.opensearch.ml.engine.MLEngine.deploy(MLEngine.java:125) ~[?:?] 2024-11-09T19:29:46.547730758Z at org.opensearch.ml.model.MLModelManager.lambda$deployModel$52(MLModelManager.java:1083) ~[?:?] 2024-11-09T19:29:46.547734525Z at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.17.0.jar:2.17.0] 2024-11-09T19:29:46.547738193Z at org.opensearch.ml.model.MLModelManager.lambda$retrieveModelChunks$73(MLModelManager.java:1703) [opensearch-ml-2.17.0.0.jar:2.17.0.0] 2024-11-09T19:29:46.547741754Z at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.17.0.jar:2.17.0] 2024-11-09T19:29:46.547745270Z at org.opensearch.action.support.ThreadedActionListener$1.doRun(ThreadedActionListener.java:78) [opensearch-2.17.0.jar:2.17.0] 2024-11-09T19:29:46.547748852Z at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1005) [opensearch-2.17.0.jar:2.17.0] 2024-11-09T19:29:46.547752467Z at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.17.0.jar:2.17.0] 2024-11-09T19:29:46.547755951Z at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?] 2024-11-09T19:29:46.547759414Z at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?] 2024-11-09T19:29:46.547762898Z at java.base/java.lang.Thread.run(Thread.java:1583) [?:?] 2024-11-09T19:29:46.547766339Z Caused by: java.lang.NoClassDefFoundError: Could not initialize class ai.djl.onnxruntime.engine.OrtNDManager 2024-11-09T19:29:46.547769823Z at ai.djl.onnxruntime.engine.OrtEngine.newBaseManager(OrtEngine.java:134) ~[?:?] 2024-11-09T19:29:46.547773286Z at ai.djl.onnxruntime.engine.OrtEngine.newModel(OrtEngine.java:122) ~[?:?] 2024-11-09T19:29:46.547779006Z at ai.djl.Model.newInstance(Model.java:99) ~[?:?] 2024-11-09T19:29:46.547782609Z at ai.djl.repository.zoo.BaseModelLoader.createModel(BaseModelLoader.java:196) ~[?:?] 2024-11-09T19:29:46.547786115Z at ai.djl.repository.zoo.BaseModelLoader.loadModel(BaseModelLoader.java:159) ~[?:?] 2024-11-09T19:29:46.547789621Z at ai.djl.repository.zoo.Criteria.loadModel(Criteria.java:174) ~[?:?] 2024-11-09T19:29:46.547795624Z at org.opensearch.ml.engine.algorithms.DLModel.doLoadModel(DLModel.java:217) ~[?:?] 2024-11-09T19:29:46.547801105Z at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:286) ~[?:?] 2024-11-09T19:29:46.547804633Z ... 14 more 2024-11-09T19:29:46.547808106Z Caused by: java.lang.ExceptionInInitializerError: Exception ai.djl.engine.EngineException: Failed to save pytorch index file [in thread "opensearch[opensearch-node][opensearch_ml_deploy][T#7]"] 2024-11-09T19:29:46.547813577Z at ai.djl.pytorch.jni.LibUtils.downloadPyTorch(LibUtils.java:429) ~[?:?] 2024-11-09T19:29:46.547822391Z at ai.djl.pytorch.jni.LibUtils.findNativeLibrary(LibUtils.java:314) ~[?:?] 2024-11-09T19:29:46.547826200Z at ai.djl.pytorch.jni.LibUtils.getLibTorch(LibUtils.java:93) ~[?:?] 2024-11-09T19:29:46.547829717Z at ai.djl.pytorch.jni.LibUtils.loadLibrary(LibUtils.java:81) ~[?:?] 2024-11-09T19:29:46.547833234Z at ai.djl.pytorch.engine.PtEngine.newInstance(PtEngine.java:53) ~[?:?] 2024-11-09T19:29:46.547836783Z at ai.djl.pytorch.engine.PtEngineProvider.getEngine(PtEngineProvider.java:41) ~[?:?] 2024-11-09T19:29:46.547840279Z at ai.djl.engine.Engine.getEngine(Engine.java:190) ~[?:?] 2024-11-09T19:29:46.547843698Z at ai.djl.engine.Engine.getInstance(Engine.java:145) ~[?:?] 2024-11-09T19:29:46.547847149Z at ai.djl.onnxruntime.engine.OrtEngine.getAlternativeEngine(OrtEngine.java:75) ~[?:?] 2024-11-09T19:29:46.547850623Z at ai.djl.ndarray.BaseNDManager.<init>(BaseNDManager.java:64) ~[?:?] 2024-11-09T19:29:46.547854324Z at ai.djl.onnxruntime.engine.OrtNDManager.<init>(OrtNDManager.java:42) ~[?:?] 2024-11-09T19:29:46.547858210Z at ai.djl.onnxruntime.engine.OrtNDManager.<init>(OrtNDManager.java:35) ~[?:?] 2024-11-09T19:29:46.547861911Z at ai.djl.onnxruntime.engine.OrtNDManager$SystemManager.<init>(OrtNDManager.java:177) ~[?:?] 2024-11-09T19:29:46.547865450Z at ai.djl.onnxruntime.engine.OrtNDManager.<clinit>(OrtNDManager.java:37) ~[?:?] 2024-11-09T19:29:46.547869043Z at ai.djl.onnxruntime.engine.OrtEngine.newBaseManager(OrtEngine.java:134) ~[?:?] 2024-11-09T19:29:46.547872635Z at ai.djl.onnxruntime.engine.OrtEngine.newModel(OrtEngine.java:122) ~[?:?] 2024-11-09T19:29:46.547876120Z at ai.djl.Model.newInstance(Model.java:99) ~[?:?] 2024-11-09T19:29:46.547879582Z at ai.djl.repository.zoo.BaseModelLoader.createModel(BaseModelLoader.java:196) ~[?:?] 2024-11-09T19:29:46.547884022Z at ai.djl.repository.zoo.BaseModelLoader.loadModel(BaseModelLoader.java:159) ~[?:?] 2024-11-09T19:29:46.547887604Z at ai.djl.repository.zoo.Criteria.loadModel(Criteria.java:174) ~[?:?] 2024-11-09T19:29:46.547891131Z at org.opensearch.ml.engine.algorithms.DLModel.doLoadModel(DLModel.java:217) ~[?:?] 2024-11-09T19:29:46.547894789Z at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:286) ~[?:?] 2024-11-09T19:29:46.547898415Z ... 14 more
The text was updated successfully, but these errors were encountered: