Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add native Lz4, Snappy, and Zstd #201

Merged
merged 12 commits into from
Jul 2, 2024
Merged

Conversation

dain
Copy link
Member

@dain dain commented Jun 8, 2024

Add support for native Lz4, Snappy, and Zstd using the java.lang.foreign APIs

@dain dain requested review from electrum and martint June 8, 2024 04:53
@dain dain force-pushed the native-compression branch from 8724469 to df97db8 Compare June 8, 2024 05:07
bin/download.sh Outdated Show resolved Hide resolved
@wendigo
Copy link
Contributor

wendigo commented Jun 15, 2024

That's awesome!

@wendigo
Copy link
Contributor

wendigo commented Jun 15, 2024

I think that you could add Enable-Native-Access: ALL-UNNAMED to the MANIFEST.MF entries (https://docs.oracle.com/en/java/javase/22/core/restricted-methods.html)

@wendigo
Copy link
Contributor

wendigo commented Jun 17, 2024

@dain we should apply for ARM runners so we can add a coverage here as well

@wendigo
Copy link
Contributor

wendigo commented Jun 17, 2024

Right now some of the tests are not passing on ARM (due to the lack of the libgplcompression for aarch64)

@wendigo
Copy link
Contributor

wendigo commented Jun 18, 2024

Some benchmarks:

jmh-result.json
benchmarks.log

@dain
Copy link
Member Author

dain commented Jun 19, 2024

@wendigo

I think that you could add Enable-Native-Access: ALL-UNNAMED to the MANIFEST.MF entries (https://docs.oracle.com/en/java/javase/22/core/restricted-methods.html)

Ya, I think we should make this a proper module, not that most people use module capable systems.... follow up work

@dain we should apply for ARM runners so we can add a coverage here as well

Looks like the beta is open now https://github.blog/2024-06-03-arm64-on-github-actions-powering-faster-more-efficient-build-systems/

Some benchmarks:

For benchmarks, you'll want to look at these algorithms:

  • airlift_lz4
  • airlift_lz4_native
  • airlift_lzo
  • airlift_snappy
  • airlift_snappy_native
  • airlift_zstd
  • airlift_zstd_native

Then in the DataSet you'll want to narrow down to one of the collections, or it will take forever.

@dain dain force-pushed the native-compression branch from 5257c8a to 586d481 Compare June 26, 2024 07:14
Copy link
Contributor

@wendigo wendigo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM

src/main/java/io/airlift/compress/Compressor.java Outdated Show resolved Hide resolved
src/main/java/io/airlift/compress/snappy/SnappyNative.java Outdated Show resolved Hide resolved
src/main/java/io/airlift/compress/snappy/SnappyNative.java Outdated Show resolved Hide resolved
src/main/java/io/airlift/compress/snappy/SnappyNative.java Outdated Show resolved Hide resolved
src/main/java/io/airlift/compress/snappy/SnappyNative.java Outdated Show resolved Hide resolved
private static final MethodHandle IS_ERROR_METHOD;
private static final MethodHandle GET_ERROR_NAME_METHOD;

// TODO should we just hardcode this to 3?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we? or should this be configurable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we include our own libraries, loading this value from the library seems redundant.

dain added 3 commits June 26, 2024 14:47
The underlying format for zstd in Hadoop is the standard framed format.
The Hadoop JNI code for Zstd corrupts the process symbol table on Linux
by loading the Zstd library into the global process symbol table.

# Requirements

This library requires a Java 1.8+ virtual machine containing the `sun.misc.Unsafe` interface running on a little endian platform.
This library requires a Java 22+ virtual machine containing the `sun.misc.Unsafe` interface running on a little endian platform.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you consider building a multi-release JAR, so us miserable folks stuck on Java 1.8 can still use the pure Java bits of this library?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2.x branch (master) uses the new java.lang.foreign APIs that are only present in Java 22+. The release-0.x branch contains the code that works with Java 1.8.

@dain dain force-pushed the native-compression branch 2 times, most recently from d5637e8 to 8e35673 Compare June 27, 2024 04:59
@wendigo
Copy link
Contributor

wendigo commented Jun 27, 2024

  compress    airlift_lz4             calgary/book2                  333,498   375.5MB/s ±    29.4MB/s ( 7.82%) (N = 3, α = 99.9%)
  compress    airlift_lz4_native      calgary/book2                  333,498   461.9MB/s ±    47.4MB/s (10.27%) (N = 3, α = 99.9%)
  compress    airlift_snappy          calgary/book2                  334,111   357.4MB/s ±    34.5MB/s ( 9.64%) (N = 3, α = 99.9%)
  compress    airlift_snappy_native   calgary/book2                  334,941   529.0MB/s ±   139.7MB/s (26.41%) (N = 3, α = 99.9%)
  compress    airlift_zstd            calgary/book2                  205,814   149.4MB/s ±    49.8MB/s (33.30%) (N = 3, α = 99.9%)
  compress    airlift_zstd_native     calgary/book2                  203,941   236.8MB/s ±    63.9MB/s (26.98%) (N = 3, α = 99.9%)
  decompress  airlift_lz4             calgary/book2                  333,498  2713.4MB/s ±   616.6MB/s (22.73%) (N = 3, α = 99.9%)
  decompress  airlift_lz4_native      calgary/book2                  333,498  3553.0MB/s ±   959.0MB/s (26.99%) (N = 3, α = 99.9%)
  decompress  airlift_snappy          calgary/book2                  334,111   735.0MB/s ±    26.7MB/s ( 3.64%) (N = 3, α = 99.9%)
  decompress  airlift_snappy_native   calgary/book2                  334,941  2225.0MB/s ±   105.1MB/s ( 4.72%) (N = 3, α = 99.9%)
  decompress  airlift_zstd            calgary/book2                  205,814   817.0MB/s ±    16.6MB/s ( 2.04%) (N = 3, α = 99.9%)
  decompress  airlift_zstd_native     calgary/book2                  203,941  1115.3MB/s ±   169.5MB/s (15.19%) (N = 3, α = 99.9%)

Copy link
Contributor

@wendigo wendigo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dain dain force-pushed the native-compression branch 2 times, most recently from a51471a to 62ef599 Compare June 28, 2024 05:06
@wendigo
Copy link
Contributor

wendigo commented Jun 28, 2024

Can you move the code to io.airlift.compressor package before merging? So it doesn't clash with the old aircompressor. That way we will be able to add a bridge that glues two APIs together.

@dain dain force-pushed the native-compression branch from 3d47864 to 69849dc Compare June 29, 2024 02:13
@electrum
Copy link
Member

electrum commented Jul 1, 2024

you can specify the JAR-file manifest attribute Enable-Native-Access: ALL-UNNAMED in an executable JAR to enable warning-free use by all code on the class path

I think the key wording here is "an executable JAR". So this only works for the application's JAR being run with java -jar foo.jar. Otherwise, if any JAR could set this, it wouldn't protect the application (which is their goal with all of this).

@electrum
Copy link
Member

electrum commented Jul 1, 2024

Typo in commit message Add MessageSegment support ... should be MemorySegment


private static String getPlatform()
{
String name = System.getProperty("os.name");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually like this idea. We could convert the OS names to macos and linux, which would make the directories cleaner:

String name = System.getProperty("os.name");
name = switch (name) {
    case "Linux" -> "linux";
    case "Mac OS X" -> "macos";
    default -> throw new LinkageError("Unsupported OS platform: " + name);
}

This will require changing the previous commit that downloads the native libraries.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@dain dain force-pushed the native-compression branch from 69849dc to 1618e85 Compare July 2, 2024 04:18
@dain dain merged commit 3b4a4b1 into airlift:master Jul 2, 2024
1 check passed
@dain dain deleted the native-compression branch July 2, 2024 05:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants