Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Meta][Discussion] k-NN lib building process next approaches to continue support AL2 after CentOS7 deprecation #4687

Closed
peterzhuamazon opened this issue May 6, 2024 · 12 comments

Comments

@peterzhuamazon
Copy link
Member

peterzhuamazon commented May 6, 2024

[Discussion] k-NN lib building process next approaches

Tasks

Background

The k-NN plugin uses a couple of JNI libs that depend on external sources, namely nmslib and faiss. These libs are written in native code, and they have dependencies on the glibc version of the Linux operating system (OS) where the code is compiled.
https://github.com/opensearch-project/k-NN/tree/main/jni/external

Since the first public release of OpenSearch Project, the build team has used CentOS7 as the base OS for both the OpenSearch and Dashboards build image. CentOS7 includes glibc version 2.17, which serves as the baseline version. The JNI libs compiled on CentOS7 can be loaded correctly without crashing when making k-NN specific APIs calls, on any Linux OS with a glibc version >= 2.17, for example, Ubuntu 20.04 (glibc 2.31) and AmazonLinux2 (glibc 2.26).

In 2023, before the OpenSearch 2.10.0 release, the build team began the deprecation of CentOS7 as the OpenSearch build image, which will reach its EOL on 2024/06/30. Another reason being we had started migrating Dashboards from Node.js 14 to 18, which requires glibc version 2.28. After discussions with both the project and k-NN team, we decided to only switch the Dashboards build image from CentOS7 to RockyLinux8 (later AlmaLinux8) while keeping the OpenSearch build image unchanged for the time being.

This decision was made to ensure continued support for a wider range of operating systems running the k-NN plugin without facing compatibility issues. If we had switched from CentOS7 to RockyLinux8 for OpenSearch at that time, the baseline reference glibc version would increased from 2.17 to 2.28, means that any Linux OS with a glibc version < 2.28 (such as AmazonLinux2 with glibc 2.26) would no longer be able to make k-NN API calls and could potentially crash their cluster if such API calls were issued by the user.

Current Situation

A CentOS7 deprecation notice has been sent since 2.12.0: https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.12.0.md#deprecation-notice

As we approach the release of OpenSearch 2.14.0 on 2024/05 and with CentOS7 reaching EOL in just two months, we have revisited discussions with the project and k-NN team regarding the switch from CentOS7 to RockyLinux8 (later AlmaLinux8). The decision remains the same: keep CentOS7 for the 2.14.0 release but to setup a plan to transition from CentOS7 to AmazonLinux2 instead, either fully or partially, for the compilation of k-NN JNI libs. This decision is based on the fact that AmazonLinux2 has a lower version of glibc compared to RockyLinux8/AlmaLinux8, and AmazonLinux2 still has another year of support until 2025/06/30.

Approaches

Option 1️⃣: Fully switch from CentOS7 to AmazonLinux2 in 2.15.0, before switching to AlmaLinux8 on 2025/06/30.

  • Pros:
    • Maintains k-NN support on AmazonLinux2 until its EOL.
    • Provides a one-year transition period for users to migrate from AmazonLinux2 or other OSes with glibc < 2.28 to other supported OSes early on.
    • Requires minimal changes from the build team side. More confirmation is needed from the k-NN team regarding their efforts.
  • Cons:
    • The k-NN JNI lib building process is closely coupled with the OpenSearch distribution build process.
    • Previous attempts by the build team to use AmazonLinux2 instead of CentOS7 encountered issues with packages such as gfortran installation and others. Further research is needed to confirm if AmazonLinux2 is a suitable option.
    • Requires another switch from AmazonLinux2 to AlmaLinux8 due to AmazonLinux2's deprecation in one year.
    • AmazonLinux2 does not support architectures outside of x64 and arm64.

Option 2️⃣ (Recommended): Switch from CentOS7 to AlmaLinux8 in 2.15.0 on distribution build pipeline, but separate the k-NN JNI libs compilation into its own stage/pipeline and use AmazonLinux2 to compile for a year.

  • Pros:
    • Have all the pros of Option 1.
    • The JNI lib process is now standalone and independent of the distribution build process, with more flexibility.
    • AlmaLinux8 supports more architectures beyond x64 and arm64, such as ppc64le.
  • Cons:
    • Have all the cons related to AmazonLinux2 from Option 1.
    • More work from the build team and complicates the distribution build pipeline process.
    • Jenkins also has limits on the number of stages/file sizes in pipeline files, with the distribution build pipeline Jenkinsfile already at the limit.

Option 3️⃣: Copy compiled JNI lib files from the 2.14.0 release into an archive and store it on S3. Disable the compilation process and replace it with downloading the archive for future releases for a year, until Amazon Linux 2 deprecates.

  • Pros:
    • Simple change for both the build team and k-NN team.
    • Provides wider compatibility support until Amazon Linux 2 deprecates.
    • May speed up the k-NN plugin building process by removing the JNI lib compilation step.
  • Cons:
    • The k-NN team cannot update JNI lib external dependencies or source code tags for a year, meaning no new features or bug fixes from upstream changes.
    • Process lacks transparency, as users are unclear about where the lib is coming from or how it is compiled.
    • Requires manual setup to create the archive, which is not a sustainable long-term solution.

Option 4️⃣: Stick to Almalinux8 without switching back to CentOS7 in 2.15.0.

  • Pros:
    • AlmaLinux8 supports more architectures beyond x64 and arm64, such as ppc64le.
    • Minimum effort from the team on the change.
  • Cons:
    • No AL2 support on k-NN starting 2.15.0.

Please let us know your thoughts on this. As well as vote for the approaches you are agreeing on.

Thanks,
Peter

@peterzhuamazon peterzhuamazon self-assigned this May 6, 2024
@github-actions github-actions bot added the untriaged Issues that have not yet been triaged label May 6, 2024
@peterzhuamazon peterzhuamazon removed the untriaged Issues that have not yet been triaged label May 6, 2024
@peterzhuamazon
Copy link
Member Author

peterzhuamazon commented May 6, 2024

@jmazanec15
Copy link
Member

@peterzhuamazon option 3 is non-starter, as these libs will change for sure.

Option 2 sounds good, but would need to verify feasibility for compiling on AL2. I remember issues on arm.

@navneet1v
Copy link
Contributor

@peterzhuamazon can you also add another option where we directly move to Almalinux without doing any switch in between to AL2.

@peterzhuamazon
Copy link
Member Author

@peterzhuamazon can you also add another option where we directly move to Almalinux without doing any switch in between to AL2.

Added now, thanks.

@dblock
Copy link
Member

dblock commented May 20, 2024

The JNI bindings don't change often and rebuilding the code every time means that old platforms cannot be supported easily. There's possibly a cleaner option that would move the problem to the k-nn repo and could enable more platforms, including legacy ones.

  1. Add support to knn for loading multiple versions of nmslib/faiss/.so, e.g. linux/nnmslib.so, win32/nmbslib.so, glibc-2.31/nmslib.so, etc.
  2. Separate the build pipeline for those binaries. They can be built on GHA or Jenkins infrastructure for OpenSearch, or anywhere else.
  3. Commit the binaries into the knn repo. This should be ideally done by build automation so that no human is involved to avoid supply chain attacks.
  4. Package all the binaries inside the JAR, attempt to extract it at runtime if it doesn't exist.
  5. For the full distribution, include the binaries in the .tar.gz/.zip.
  6. Choose the path of the binary at runtime.

This is roughly what JNA does for the many cross-platform binaries.

@prudhvigodithi
Copy link
Collaborator

3. Commit the binaries into the knn repo.

Instead of committing into the knn repo we should consider uploading the libraries to central maven. Looks to me like maven supports .so files with <type>so</type>. Here is an example.

@dblock
Copy link
Member

dblock commented May 21, 2024

  1. Commit the binaries into the knn repo.

Instead of committing into the knn repo we should consider uploading the libraries to central maven. Looks to me like maven supports .so files with <type>so</type>. Here is an example.

I like that even better because it creates more traceability!

@peterzhuamazon
Copy link
Member Author

Newer options:

  1. Move to almalinux8 and drop centos7 / al2.
  2. Move to almalinux8 but keep knnlib build specific on centos7.
  3. Keep using centos7 for everything for another year until AL2 deprecate.
  4. Keep using AL2 for everything for another year until AL2 deprecate (still testing, not confirmed)

@peterzhuamazon
Copy link
Member Author

peterzhuamazon commented May 29, 2024

We are still trying to push for AL2 compilation fix now.

Current goals:

  • Get knn support for AL2 in 2025
  • Get knn build process standalone and fine-tuned.

@peterzhuamazon
Copy link
Member Author

Just sync up with @junqiu-lei again, updates:

  • Junqiu will send PR to update the dockerfile with 0.3.27 openblas changes
  • Junqiu will test in knn repo whether build script/cmake.txt needs to be updated accordingly
  • Peter will merge Junqiu PR and build another copy of AL2 images
  • Peter will trigger a 2.15.0 build with currently available plugins for testing (Need to confirm if knn already update 2.x to 2.15.0)
  • Peter will update get image workflow to push all related repos to use AL2 docker image for their github ci checks
  • Peter will update the compatibility chart of the website to reflect changes.

Thanks.

@peterzhuamazon peterzhuamazon changed the title [Discussion] k-NN lib building process next approaches [Discussion] k-NN lib building process next approaches to continue support AL2 after CentOS7 deprecation May 29, 2024
@peterzhuamazon
Copy link
Member Author

peterzhuamazon commented May 30, 2024

Next step:

  1. Junqiu confirms AL2 knnlib compilation and able to run on both x64 and arm64: [Deprecation] Properly deprecate CentOS7 as CI build image/Supported OS and switch to Almalinux8 #4379 (comment)
  2. Peter start looking at [Enhancement] Move knn lib building process to its standalone stage or pipeline #4737 and whether we can bring it up for 2.15.0.

@peterzhuamazon peterzhuamazon changed the title [Discussion] k-NN lib building process next approaches to continue support AL2 after CentOS7 deprecation [Meta][Discussion] k-NN lib building process next approaches to continue support AL2 after CentOS7 deprecation Jun 3, 2024
@peterzhuamazon
Copy link
Member Author

peterzhuamazon commented Jul 1, 2024

Will use #4737 as the main discussion one.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ✅ Done
Development

No branches or pull requests

5 participants