Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to initialize HuggingFaceTokenizer in Vespa environment #2224

Closed
dnmca opened this issue Dec 14, 2022 · 11 comments
Closed

Not able to initialize HuggingFaceTokenizer in Vespa environment #2224

dnmca opened this issue Dec 14, 2022 · 11 comments
Labels
bug Something isn't working

Comments

@dnmca
Copy link

dnmca commented Dec 14, 2022

Description

I'm trying to test Vespa application with custom Embedder that uses DJL's HuggingFaceTokenizer under the hood.

It is initialized internally in a straightforward manner:

tokenizer = HuggingFaceTokenizer.newInstance(Paths.get(config.tokenizerPath().toString()));

Local testing of this code was successful, but when this code is being run inside Vespa docker image, I'm getting the following error:

com.yahoo.container.di.componentgraph.core.ComponentNode$ComponentConstructorException: Error constructing 'xlmRoberta' of type 'com.product.search.embedding.XlmRobertaEmbedder': null
Caused by: java.lang.AssertionError: No tokenizers version found in property file.
	at ai.djl.util.Platform.detectPlatform(Platform.java:85)
	at ai.djl.huggingface.tokenizers.jni.LibUtils.copyJniLibraryFromClasspath(LibUtils.java:76)
	at ai.djl.huggingface.tokenizers.jni.LibUtils.loadLibrary(LibUtils.java:66)
	at ai.djl.huggingface.tokenizers.jni.LibUtils.<clinit>(LibUtils.java:41)
	at ai.djl.huggingface.tokenizers.HuggingFaceTokenizer.newInstance(HuggingFaceTokenizer.java:146)
	at ai.djl.huggingface.tokenizers.HuggingFaceTokenizer.newInstance(HuggingFaceTokenizer.java:132)
	at ai.djl.huggingface.tokenizers.HuggingFaceTokenizer.newInstance(HuggingFaceTokenizer.java:115)

LibUtils is trying to load tokenizers library from CLASSPATH, but it seems that it's missing.

Before running my application in Docker image, I'm building maven project that contains the following dependency:

<dependency>
    <groupId>ai.djl.huggingface</groupId>
    <artifactId>tokenizers</artifactId>
    <version>0.20.0</version>
</dependency>

As far as I see from build.gradle of tokenizers package, it relies on some external library files.
Do I understand correctly that these libraries are not part of the following dependency and should be installed manually?
If that's not the case, what I'm doing wrong while applying tokenizers package for my use-case?

Expected Behavior

I expect HuggingFaceTokenizer to initialize without errors.

Error Message

Please look in the first section.

How to Reproduce?

Unfortunately, I could not share code base.

Steps to reproduce

What have you tried to solve it?

Environment Info

@dnmca dnmca added the bug Something isn't working label Dec 14, 2022
@frankfliu
Copy link
Contributor

@dnmca
How you package your application. Did you repackage our .jar file?

We expect there is native/lib/tokenizer.properties in the classpath. Can you check if this file exist after you repackage the .jar?

@dnmca
Copy link
Author

dnmca commented Dec 14, 2022

I am packaging the application as a JAR file.
I've checked its content, and it seems that all necessary files are present, (tokenizer.properties as well).

@frankfliu
Copy link
Contributor

What os are you running on? Is that a x86_64 or aarch64?

The exception thrown here: https://github.com/deepjavalibrary/djl/blob/master/api/src/main/java/ai/djl/util/Platform.java#L85

If you have a native/lib/tokenizer.properties file, it should return here: https://github.com/deepjavalibrary/djl/blob/master/api/src/main/java/ai/djl/util/Platform.java#L77

Can you copy the Platform code and add few more print out and see why it fall through to the exception line?

@dnmca
Copy link
Author

dnmca commented Dec 15, 2022

I've printed out the value of variables urls and systemPlatform in the method Platfrom.detectPlatform(String engine):

urls variable turns out to be empty and
systemPlatform is "cpu-linux-x86_64:null"

initialization fails because the following condition is satisfied:

if (systemPlatform.version == null) {
    throw new AssertionError("No " + engine + " version found in property file.");
}

It looks like the following resource is not being found:

String engineProp = engine + "-engine.properties";

And hence, systemPlatform.version is set with null.

@frankfliu
Copy link
Contributor

frankfliu commented Dec 15, 2022

@dnmca
systemPlatform is "cpu-linux-x86_64:null" should be OK, the expected behavior is detectPlatform() function should return at line 77: https://github.com/deepjavalibrary/djl/blob/master/api/src/main/java/ai/djl/util/Platform.java#L77

The problem is urls is empty, which means native/lib/tokenizer.properties is not found in classpath.
Is tokenizer.properties in the right location?

You might need to set a proper context class loader if your are use customized ClassLoader in your application.

@dnmca
Copy link
Author

dnmca commented Dec 26, 2022

Hello @frankfliu and sorry for late reply.

I've investigated a bit further, and it turns out that Vespa application is built as OSGi bundle and
that seems to be the reason why resource file tokenizer.properties could not be located with

urls = ClassLoaderUtils.getContextClassLoader().getResources(nativeProp);

Do you think that resetting context class loader would fix this?
And if not, is it possible to make tokenizers package OSGi-compatible?

@dnmca
Copy link
Author

dnmca commented Dec 26, 2022

Thank you for the hint!
I managed to make it work with the following piece of code:

ClassLoader tccl = Thread.currentThread().getContextClassLoader();
try {
    Thread.currentThread().setContextClassLoader(getClass().getClassLoader());
    tokenizer = HuggingFaceTokenizer.newInstance(Paths.get(config.tokenizerPath().toString()));
} finally {
    Thread.currentThread().setContextClassLoader(tccl);
}

@frankfliu
Copy link
Contributor

@dnmca

You are facing a common plugin issue. It looks like your plugin class loader is different from execution class loader (ContextClassLoader), which assume all the resources are loaded at plugin initialization time. You can either move HuggingFaceTokenizer to plugin initialization or use correct ContextClassLoader as your code.

@dnmca dnmca closed this as completed Jan 4, 2023
@carlos-aguayo
Copy link

Hi @dnmca Thank you for posting a solution to your problem. I ran into the same issue and your solution worked for me as well. I think we are doing the exact same thing.
I have a new issue. If I unload the plugin and load it again, I ran into this issue:

2023-03-15 21:44:30,076 [Acme Plugin Hot Deploy] ERROR com.atlassian.plugin.manager.DefaultPluginManager - There was an error loading the descriptor 'Similarity' of plugin 'com.acme'. Disabling.
com.atlassian.plugin.PluginParseException: java.lang.UnsatisfiedLinkError: Native Library /usr/local/acme/tomcat/temp/.djl.ai/tokenizers/0.13.2-0.21.0-linux-x86_64/libtokenizers.so already loaded in another classloader
	at com.atlassian.plugin.module.LegacyModuleFactory.getModuleClass(LegacyModuleFactory.java:43)

I'm curious to know if you ran into it as well and if you solved it. Else, I'll take a look and post here whatever solution I find. Thanks!

@frankfliu
Copy link
Contributor

You run into the same issue as: #179.

Currently this only work for PyTorch native library. I can make it available for Huggingface as well

@tjcarroll11
Copy link

@frankfliu I am trying to get the native helper working for huggingface as well. I'm finding that the helper works but then an error still gets thrown in at this line: 9106f95#diff-83b8cdd89c2c087ef69b441f0f73b423e93601cca9d8feed7bb711064c239951R306

Just looking at the code, it seems like this line defeats the purpose of the native helper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants