Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-10477. Make Rocksdb tools native lib compatible with all chipset with the same arch #6341

Merged
merged 2 commits into from
Mar 21, 2024

Conversation

swamirishi
Copy link
Contributor

@swamirishi swamirishi commented Mar 6, 2024

What changes were proposed in this pull request?

While building rocksdb tools lib, makefile of rocksdb adds native optmization gcc flags when compiling rocksdb code. This makes the built code incompatible with other chipsets, where it fails with illegal instruction when loading the native lib on ozone manager jvm.

Setting PORTABLE=1 while building rocksdb tools will fix this particular issue, and also adding -O0 flag while compiling jni on ozone will prevent the same on ozone jnilib side.
https://github.com/facebook/rocksdb/blob/v7.7.3/INSTALL.md#:~:text=By%20default%20the,make%20static_lib.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10477

How was this patch tested?

Successful build and checking the native lib manually on various instance

@swamirishi swamirishi requested review from smengcl, kerneltime, umamaheswararao, hemantk-12 and swagle and removed request for swagle March 6, 2024 18:41
Copy link
Contributor

@prashantpogde prashantpogde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. LGTM.

Copy link
Contributor

@swagle swagle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -22,8 +22,8 @@

cmake_minimum_required(VERSION 2.8)
add_definitions(-D_GLIBCXX_USE_CXX11_ABI=0)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fPIC")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why no optimization at all? This is different that specifying the instruction sets. Can you look into add_compile_options(-msse2) ? I am not sure no optimization == chipset instructions to be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not really aware of this, but let me check this out and get back on this.

@prashantpogde
Copy link
Contributor

should we force USE_SSE=0 or -DFORCE_SSE42=OFF ?

@swamirishi
Copy link
Contributor Author

should we force USE_SSE=0 or -DFORCE_SSE42=OFF ?
This is off by default
Makefile doesn’t set USE_SSE by default. In the code the it checks if it has been set https://github.com/facebook/rocksdb/blob/eb9a80fe1f18017b4d7f4084e8f2554f12234822/build_tools/build_detect_platform#L663

@kerneltime
Copy link
Contributor

SSE2 is pretty common based on the documentation in that makefile

  # Includes POPCNT for BitsSetToOne, BitParity
  TRY_SSE42="-msse4.2"

@kerneltime
Copy link
Contributor

@swamirishi the main issue is due to the use of AVX related instructions, we can skip those explicitly and we can target an instruction set that is more common.

@@ -235,6 +235,7 @@
<arg line="${project.build.directory}/rocksdb/rocksdb-${rocksdb.version}"/>
</exec>
<exec executable="make" dir="${project.build.directory}/rocksdb/rocksdb-${rocksdb.version}" failonerror="true">
<arg line="PORTABLE=1"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RocksDB does not use PORTABLE=0 so there is still some information missing as to what is the build setup.
For now pushing this in makes sense as we are seeing crashes for missing AVX4 but we still need to continue looking into what is the lowest common set of instruction set the build should support or at least quantify the performance impact. @swamirishi can you open a follow up Jira? Also, how was this change tested?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .so from the publicly distributed rocksdbjni-7.7.3.jar doesn't seem to be AVX or AVX2 optimized:

$ elfx86exts librocksdbjni-linux64.so 
File format and CPU architecture: Elf, X86_64
MODE64 (call)
CMOV (cmovbe)
SSE1 (movups)
SSE2 (movdqa)
BMI (tzcnt)
BMI2 (shlx)
Instruction set extensions used: BMI, BMI2, CMOV, MODE64, SSE1, SSE2
CPU Generation: Haswell

@kerneltime
Copy link
Contributor

@swamirishi we can merge this in once you specify the testing done for this.

@swamirishi
Copy link
Contributor Author

swamirishi commented Mar 20, 2024

@swamirishi we can merge this in once you specify the testing done for this.

I tested out this particular patch by building the code in an environment which supports avx2. I built the shared library and tried to load the library on another machine which doesn't support this instruction set without the flag PORTABLE=1 and with the flag PORTABLE=1.
The library load failed with the error in the first run:

Screen Shot 2024-03-20 at 15 15 19

The second time the load succeeded.

@smengcl
Copy link
Contributor

smengcl commented Mar 20, 2024

@swamirishi we can merge this in once you specify the testing done for this.

I tested out this particular patch by building the code in an environment which supports avx2. I built the shared library and tried to load the library on another machine which doesn't support this instruction set without the flag PORTABLE=1 and with the flag PORTABLE=1. The library load failed with the error in the first run:

Screen Shot 2024-03-20 at 15 15 19 The second time the load succeeded.

Yup fyi PORTABLE flag is used here in CMakeLists.txt to disable avx/avx2 optimization attempts here:

https://github.com/facebook/rocksdb/blob/v7.7.3/CMakeLists.txt#L252-L277

@swamirishi swamirishi merged commit e3a7224 into apache:master Mar 21, 2024
19 checks passed
smitajoshi12 pushed a commit to smitajoshi12/ozone that referenced this pull request Mar 27, 2024
xichen01 pushed a commit to xichen01/ozone that referenced this pull request Apr 17, 2024
xichen01 pushed a commit to xichen01/ozone that referenced this pull request Apr 18, 2024
xichen01 pushed a commit to xichen01/ozone that referenced this pull request Apr 23, 2024
swamirishi added a commit to swamirishi/ozone that referenced this pull request Jun 10, 2024
… all chipset with the same arch (apache#6341)

(cherry picked from commit e3a7224)
Change-Id: If5ff48cc5d38f7a0256015e5731675be8a4ed9a4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants