Embed the native libraries in the hadoop-lzo jar #73

julienledem · 2013-07-03T16:11:57Z

the snappy-java library has this cool feature of embedding the native libraries in the jar and loads the correct one depending on the os. That would be a cool feature to add to LZO and would make testing easier.
https://github.com/xerial/snappy-java

in particular:
https://github.com/xerial/snappy-java/blob/develop/src/main/java/org/xerial/snappy/SnappyLoader.java
https://github.com/xerial/snappy-java/tree/develop/src/main/resources/org/xerial/snappy/native

sjlee · 2013-07-08T16:30:03Z

It sounds like a good idea. I can see how it can make testing easier.

There are a couple of caveats:
First, unlike snappy, hadoop-lzo depends on another native library (lzo itself) present on the machine. So the lzo library needs to be installed on the machine and added to the path environment variable before hadoop-lzo can run successfully. So it would not be "zero-configuration" even if we embed the hadoop-lzo native lib in the jar. So in that sense, the value of this may be somewhat limited.

It seems like snappy checks in the built native libs back into git! So both the native source and the built libraries are checked in git. That seems a little bit yucky to me. It also probably implies that the release process would be a two-step process. But if we want to embed libraries for more than one platform, this may be an inevitable conclusion (there never will be a single build that will build the whole thing anyway).

At any rate, a pull request is always welcome! :)

julienledem · 2013-07-08T17:08:48Z

Hi @sjlee
The goal is really to get to a self contained jar with no other dependency, so maybe the native jni library could statically link the lzo library to avoid that dependency. I agree that checking in the binaries is not great, especially as it hides how those binaries were built. The java code that decides what library should be loaded based on the OS, then puts the said library somewhere and loads it is interesting though.

sjlee · 2013-07-08T21:10:01Z

Yeah I agree there are some nifty ideas in that java code that loads the library. As for static linking, I think it would make the jar self-contained. However, I do want us to think through the implications of statically linking lzo v. dynamically linking (upgrade implication, any resource usage implication, etc.)

sjlee · 2013-10-04T23:03:38Z

I'd like to restart this discussion. I think there are a at least a couple of different ways of embedding the native binaries, each of which would have its pros and cons.

One approach is to generate and embed the native binaries into the jar at build time. This approach is lightweight and doesn't have a lot of implications in terms of maintaining separate native binaries under source control. But it would deliver most of the benefit, and would make the jar more self-contained. In native library loading, it could check the presence of the embedded native library and load it from there if found. And if not found, it could simply fall back on the current behavior (i.e. finding it in the library path). So none of the existing use cases would be disturbed, but only with more convenience.

The other approach is to create an area to check in the native libraries (like snappy). While the benefit is that a single jar can support multiple OSes and platforms, the drawback is that this could incur more significant maintenance burden (every time the native source changes, one needs to generate the native libraries for all "supported" platforms and check them in).

I would favor going with the former approach. It's lightweight, unintrusive, and still adds good value.

Thoughts?

jrottinghuis · 2013-10-05T02:25:52Z

Shipping native libraries in the jar already means generating multiple jars right, or are you thinking we have one jar with native binaries for many platforms all built in one?

How would this work across Linux flavored and even Windows ?

Thanks,

Joep

Sent from my iPhone

On Oct 4, 2013, at 4:03 PM, Sangjin Lee notifications@github.com wrote:

I'd like to restart this discussion. I think there are a at least a couple of different ways of embedding the native binaries, each of which would have its pros and cons.

One approach is to generate and embed the native binaries into the jar at build time. This approach is lightweight and doesn't have a lot of implications in terms of maintaining separate native binaries under source control. But it would deliver most of the benefit, and would make the jar more self-contained. In native library loading, it could check the presence of the embedded native library and load it from there if found. And if not found, it could simply fall back on the current behavior (i.e. finding it in the library path). So none of the existing use cases would be disturbed, but only with more convenience.

The other approach is to create an area to check in the native libraries (like snappy). While the benefit is that a single jar can support multiple OSes and platforms, the drawback is that this could incur more significant maintenance burden (every time the native source changes, one needs to generate the native libraries for all "supported" platforms and check them in).

I would favor going with the former approach. It's lightweight, unintrusive, and still adds good value.

Thoughts?

—
Reply to this email directly or view it on GitHub.

sjlee · 2013-10-07T21:43:14Z

With approach (1), we will not create a single jar that has native libraries for many platforms all built into one. When you build, the native libraries for the platform you're building on (and that only) will be added to the jar. The goal is more of added convenience than creating a single jar that officially supports multiple platforms out of the box.

However, if your deployment environment is of a single platform, then you could build the jar once on that environment, and the jar will be self-contained.

On the other hand, if the jar is deployed into a platform the embedded native libraries do not match, it would simply fall back to the current behavior and look for the appropriate native libraries in the library path.

sjlee · 2013-10-08T20:18:02Z

I'm going to creating a pull request shortly for this...

sjlee · 2014-08-29T04:53:15Z

This was resolved with pull request #81.

julienledem · 2014-08-29T23:51:21Z

thanks!
@sjlee which version do I use to try this?

sjlee · 2014-08-30T00:33:37Z

It seems like we didn't do a release after this was merged. It would be 0.4.20.

zman0900 · 2016-08-16T18:26:55Z

Sorry to bring this back from the dead, but are there any plans to do a release with this? If so, what platform will the jar in the maven repo be built for?

sjlee · 2016-08-16T20:43:10Z

Sorry this fell through the cracks. I was hoping to close PR #90 before cutting a release, but that's stalled. I could do a release before that as it's been some time.

My baseline thinking is to have x86_64 built into the jar for the maven central as that would probably be the largest user base. Thoughts?

zman0900 · 2016-08-16T21:07:01Z

We are running on x86_64 Linux here, so that would be perfect for me.

zman0900 · 2016-09-09T17:28:09Z

Any news on that release? If you do decide to release, any chance PR #117 could be included?

sjlee · 2016-09-13T03:29:10Z

My apologies things got delayed a while. Hopefully I can pick this up next week. Yes, I'll look at #117 before a release. Thanks for your patience.

harperjiang · 2018-02-20T02:14:43Z

I realize this is a pretty old post. But may I know the current situation?

sjlee · 2018-02-21T03:29:30Z

AFAIK, the 0.4.20 release has this: https://github.com/twitter/hadoop-lzo/releases/tag/release-0.4.20

sjlee closed this as completed Aug 29, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embed the native libraries in the hadoop-lzo jar #73

Embed the native libraries in the hadoop-lzo jar #73

julienledem commented Jul 3, 2013

sjlee commented Jul 8, 2013

julienledem commented Jul 8, 2013

sjlee commented Jul 8, 2013

sjlee commented Oct 4, 2013

jrottinghuis commented Oct 5, 2013

sjlee commented Oct 7, 2013

sjlee commented Oct 8, 2013

sjlee commented Aug 29, 2014

julienledem commented Aug 29, 2014

sjlee commented Aug 30, 2014

zman0900 commented Aug 16, 2016

sjlee commented Aug 16, 2016

zman0900 commented Aug 16, 2016

zman0900 commented Sep 9, 2016

sjlee commented Sep 13, 2016

harperjiang commented Feb 20, 2018

sjlee commented Feb 21, 2018

Embed the native libraries in the hadoop-lzo jar #73

Embed the native libraries in the hadoop-lzo jar #73

Comments

julienledem commented Jul 3, 2013

sjlee commented Jul 8, 2013

julienledem commented Jul 8, 2013

sjlee commented Jul 8, 2013

sjlee commented Oct 4, 2013

jrottinghuis commented Oct 5, 2013

sjlee commented Oct 7, 2013

sjlee commented Oct 8, 2013

sjlee commented Aug 29, 2014

julienledem commented Aug 29, 2014

sjlee commented Aug 30, 2014

zman0900 commented Aug 16, 2016

sjlee commented Aug 16, 2016

zman0900 commented Aug 16, 2016

zman0900 commented Sep 9, 2016

sjlee commented Sep 13, 2016

harperjiang commented Feb 20, 2018

sjlee commented Feb 21, 2018