Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve dependency install and build #10920

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/macos.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ jobs:
CCACHE_DIR: '${{ github.workspace }}/.ccache'
# The arm runners have only 7GB RAM
BUILD_TYPE: "${{ matrix.os == 'macos-14' && 'Release' || 'Debug' }}"
INSTALL_PREFIX: "/tmp/deps-install"
steps:
- name: Checkout
uses: actions/checkout@v4
Expand All @@ -66,6 +67,7 @@ jobs:
source scripts/setup-macos.sh
install_build_prerequisites
install_velox_deps_from_brew
install_double_conversion

echo "NJOBS=`sysctl -n hw.ncpu`" >> $GITHUB_ENV
brew unlink protobuf || echo "protobuf not installed"
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@ projects/*
!projects/*.*
!projects/Makefile
.venv
deps-install
deps-download
majetideepak marked this conversation as resolved.
Show resolved Hide resolved

#==============================================================================#
# Autotools artifacts
Expand Down
8 changes: 8 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,14 @@ if(DEFINED ENV{CONDA_PREFIX})
endif()
endif()

if(DEFINED ENV{INSTALL_PREFIX})
message(STATUS "Dependency install directory set to: $ENV{INSTALL_PREFIX}")
list(APPEND CMAKE_PREFIX_PATH "$ENV{INSTALL_PREFIX}")
# Allow installed package headers to be picked up before brew/system package
# headers
include_directories(BEFORE "$ENV{INSTALL_PREFIX}/include")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love that, I think adding this on a target basis to all targets with BEFORE (via velox_add_library) should also do the job. I am pretty sure this will not take precedent before default system include paths, as they don't even appear in the compiler invocation but should work for any add brew paths.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will not take precedent before default system include paths, as they don't even appear in the compiler invocation

Ah actually not true. According to gcc docs system default paths are searched last, so this should also work in non-brew cases (or linked brew installs)

  1. For the quote form of the include directive, the directory of the current file is searched first.
  2. For the quote form of the include directive, the directories specified by -iquote options are searched in left-to-right order, as they appear on the command line.
  3. Directories specified with -I options are scanned in left-to-right order.
  4. Directories specified with -isystem options are scanned in left-to-right order.
  5. Standard system directories are scanned.

Copy link
Collaborator Author

@majetideepak majetideepak Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@assignUser I agree that per target is cleaner. But INSTALL_PREFIX also has the dependencies installed via the setup script. There seems no standard way to get the INCLUDE_DIR path for each dependency. Some packages like fmt add INTERFACE_INCLUDE_DIRS to the target (fmt::fmt) property, folly on the other hand defines the FOLLY_INCLUDE_DIRS variable. Protobuf and CUDA are currently adding include_directories after resolving the dependency. Should we do that for all the dependencies and add BEFORE?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you mean for the compilation of the dependencies themselves not when we are using them. Hm, yeah include_directories seems to be the best way to do that due to it's global nature... though if we have INSTALL_PREFIX are we even building any deps from source that are not self-contained/header only?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant when Velox is using the dependencies. I thought about this further and removed the global include_directories for INSTALL_PREFIX. This will only work if we always install dependencies. If some of the dependencies come from brew, this could cause issues if an older version of a package is present in INSTALL_PREFIX.

I added this to fix the fmt header issue. I now explicitly get the fmt include path and add it to include_directories with before tag. That is what you were suggesting too. Can you take another look?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will only work if we always install dependencies.

That was my assumption INSTALL_PREFIX exits == all deps installed there. And in that case the solution with include_directories (or per target) would work well.

If some of the dependencies come from brew, this could cause issues if an older version of a package is present in INSTALL_PREFIX.

Hm,maybe but that also seems like a mis-configuration then? Either use install prefix with all required deps or don't?

I think I liked the previous version more but 🤷 no strong preferences

Copy link
Collaborator Author

@majetideepak majetideepak Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm,maybe but that also seems like a mis-configuration then?

This is a good reason to use the previous approach. I reverted to that :)

Copy link
Contributor

@zuyu zuyu Sep 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This CMake statement for header file orders breaks duckdb build, as fmt (a custom copy) and duckdb re2 (using a different namespace) now are searching headers from ${INSTALL_PREFIX}, instead of its local directory, mentioned in #11058.

It seems also causing a build issue for Arrow mentioned in #11052.

We might need to rethink of a better way for the header include order, if it is really needed.

endif()

list(PREPEND CMAKE_MODULE_PATH "${PROJECT_SOURCE_DIR}/CMake"
"${PROJECT_SOURCE_DIR}/CMake/third-party")

Expand Down
24 changes: 16 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,19 +85,27 @@ dependencies for a given platform.
### Setting up dependencies

The following setup scripts use the `DEPENDENCY_DIR` environment variable to set the
location of the build packages. If you do not set this variable, it will default to
the current working directory.
location to download and build packages. This defaults to `deps-download` in the current
working directory.

```shell
$ export DEPENDENCY_DIR=/path/to/your/dependencies
```
Use `INSTALL_PREFIX` to set the install directory of the packages. This defaults to
`deps-install` in the current working directory on macOS and to the default install
location (eg. `/usr/local`) on linux.
Using the default install location `/usr/local` on macOS is discouraged since this
location is used by certain Homebrew versions.

Manually add the `INSTALL_PREFIX` value in the IDE or bash environment,
say `export INSTALL_PREFIX=/Users/$USERNAME/velox/deps-install` to `~/.zshrc` so that
subsequent Velox builds can use the installed packages.

*You can reuse `DEPENDENCY_INSTALL` and `INSTALL_PREFIX` for Velox clients such as Prestissimo
by specifying a common shared directory.`*

### Setting up on macOS

On a MacOS machine (either Intel or Apple silicon) you can setup and then build like so:
On a macOS machine (either Intel or Apple silicon) you can setup and then build like so:

```shell
$ export INSTALL_PREFIX=/Users/$USERNAME/velox/velox_dependency_install
$ ./scripts/setup-macos.sh
$ make
```
Expand Down Expand Up @@ -136,7 +144,7 @@ $ ./scripts/setup-adapters.sh
$ make
```

Note that `setup-adapters.sh` supports MacOS and Ubuntu 20.04 or later.
Note that `setup-adapters.sh` supports macOS and Ubuntu 20.04 or later.

### Using Clang on Linux

Expand Down
1 change: 1 addition & 0 deletions scripts/adapters.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
ARG image=ghcr.io/facebookincubator/velox-dev:centos9
FROM $image

COPY scripts/setup-helper-functions.sh /
COPY scripts/setup-adapters.sh /
RUN mkdir build && ( cd build && source /opt/rh/gcc-toolset-12/enable && \
bash /setup-adapters.sh ) && rm -rf build && dnf remove -y conda && dnf clean all
Expand Down
60 changes: 38 additions & 22 deletions scripts/setup-adapters.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,14 @@ set -eufx -o pipefail

SCRIPTDIR=$(dirname "${BASH_SOURCE[0]}")
source $SCRIPTDIR/setup-helper-functions.sh
DEPENDENCY_DIR=${DEPENDENCY_DIR:-$(pwd)}
DEPENDENCY_DIR=${DEPENDENCY_DIR:-$(pwd)/deps-download}
CMAKE_BUILD_TYPE="${BUILD_TYPE:-Release}"
MACHINE=$(uname -m)

if [[ "$OSTYPE" == darwin* ]]; then
export INSTALL_PREFIX=${INSTALL_PREFIX:-"$(pwd)/deps-install"}
fi

function install_aws_deps {
local AWS_REPO_NAME="aws/aws-sdk-cpp"
local AWS_SDK_VERSION="1.11.321"
Expand All @@ -40,14 +44,16 @@ function install_aws_deps {
MINIO_ARCH="amd64"
fi
local MINIO_BINARY="minio-2022-05-26"
local MINIO_OS="linux"
if [[ "$OSTYPE" == darwin* ]]; then
# minio will have to approved under the Privacy & Security on MacOS on first use.
MINIO_OS="darwin"
if [[! -f /usr/local/bin/${MINIO_BINARY} ]]; then
local MINIO_OS="linux"
if [[ "$OSTYPE" == darwin* ]]; then
# minio will have to approved under the Privacy & Security on MacOS on first use.
MINIO_OS="darwin"
fi
wget https://dl.min.io/server/minio/release/${MINIO_OS}-${MINIO_ARCH}/archive/minio.RELEASE.2022-05-26T05-48-41Z -O ${MINIO_BINARY}
chmod +x ./${MINIO_BINARY}
mv ./${MINIO_BINARY} /usr/local/bin/
fi
wget https://dl.min.io/server/minio/release/${MINIO_OS}-${MINIO_ARCH}/archive/minio.RELEASE.2022-05-26T05-48-41Z -O ${MINIO_BINARY}
chmod +x ./${MINIO_BINARY}
mv ./${MINIO_BINARY} /usr/local/bin/
}

function install_gcs-sdk-cpp {
Expand Down Expand Up @@ -117,41 +123,51 @@ function install_azure-storage-sdk-cpp {
sed -i "s/\"version-string\"/\"builtin-baseline\": \"$vcpkg_commit_id\",\"version-string\"/" $azure_core_dir/vcpkg.json
sed -i "s/\"version-string\"/\"overrides\": [{ \"name\": \"openssl\", \"version-string\": \"$openssl_version\" }],\"version-string\"/" $azure_core_dir/vcpkg.json
fi
cmake_install $azure_core_dir -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} -DBUILD_SHARED_LIBS=OFF

(
cd $azure_core_dir
cmake_install -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} -DBUILD_SHARED_LIBS=OFF
)
# install azure-storage-common
cmake_install sdk/storage/azure-storage-common -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} -DBUILD_SHARED_LIBS=OFF
(
cd sdk/storage/azure-storage-common
cmake_install -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} -DBUILD_SHARED_LIBS=OFF
)

# install azure-storage-blobs
cmake_install sdk/storage/azure-storage-blobs -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} -DBUILD_SHARED_LIBS=OFF

(
cd sdk/storage/azure-storage-blobs
cmake_install -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} -DBUILD_SHARED_LIBS=OFF
)
# install azure-storage-files-datalake
cmake_install sdk/storage/azure-storage-files-datalake -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} -DBUILD_SHARED_LIBS=OFF
(
cd sdk/storage/azure-storage-files-datalake
cmake_install -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} -DBUILD_SHARED_LIBS=OFF
)
}

function install_hdfs_deps {
github_checkout apache/hawq master
libhdfs3_dir=$DEPENDENCY_DIR/hawq/depends/libhdfs3
libhdfs3_dir=hawq/depends/libhdfs3
if [[ "$OSTYPE" == darwin* ]]; then
sed -i '' -e "/FIND_PACKAGE(GoogleTest REQUIRED)/d" $libhdfs3_dir/CMakeLists.txt
sed -i '' -e "s/dumpversion/dumpfullversion/" $libhdfs3_dir/CMakeLists.txt
sed -i '' -e "/FIND_PACKAGE(GoogleTest REQUIRED)/d" $DEPENDENCY_DIR/$libhdfs3_dir/CMakeLists.txt
sed -i '' -e "s/dumpversion/dumpfullversion/" $DEPENDENCY_DIR/$libhdfs3_dir/CMakeLists.txt
fi

if [[ "$OSTYPE" == linux-gnu* ]]; then
sed -i "/FIND_PACKAGE(GoogleTest REQUIRED)/d" $libhdfs3_dir/CMakeLists.txt
sed -i "s/dumpversion/dumpfullversion/" $libhdfs3_dir/CMake/Platform.cmake
sed -i "/FIND_PACKAGE(GoogleTest REQUIRED)/d" $DEPENDENCY_DIR/$libhdfs3_dir/CMakeLists.txt
sed -i "s/dumpversion/dumpfullversion/" $DEPENDENCY_DIR/$libhdfs3_dir/CMake/Platform.cmake
# Dependencies for Hadoop testing
wget_and_untar https://archive.apache.org/dist/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz hadoop
cp -a hadoop /usr/local/
cp -a ${DEPENDENCY_DIR}/hadoop /usr/local/
wget -P /usr/local/hadoop/share/hadoop/common/lib/ https://repo1.maven.org/maven2/junit/junit/4.11/junit-4.11.jar

yum install -y java-1.8.0-openjdk-devel

fi
cmake_install $libhdfs3_dir
cmake_install_dir $libhdfs3_dir
}

cd "${DEPENDENCY_DIR}" || exit
(mkdir -p "${DEPENDENCY_DIR}") || exit
# aws-sdk-cpp missing dependencies

if [[ "$OSTYPE" == "linux-gnu"* ]]; then
Expand Down
75 changes: 38 additions & 37 deletions scripts/setup-centos9.sh
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ export CFLAGS=${CXXFLAGS//"-std=c++17"/} # Used by LZO.
CMAKE_BUILD_TYPE="${BUILD_TYPE:-Release}"
BUILD_DUCKDB="${BUILD_DUCKDB:-true}"
USE_CLANG="${USE_CLANG:-false}"
export INSTALL_PREFIX=${INSTALL_PREFIX:-"/usr/local"}
DEPENDENCY_DIR=${DEPENDENCY_DIR:-$(pwd)/deps-download}

FB_OS_VERSION="v2024.05.20.00"
FMT_VERSION="10.1.1"
Expand Down Expand Up @@ -85,19 +87,19 @@ function install_gflags {
# Remove an older version if present.
dnf remove -y gflags
wget_and_untar https://github.com/gflags/gflags/archive/v2.2.2.tar.gz gflags
cmake_install gflags -DBUILD_SHARED_LIBS=ON -DBUILD_STATIC_LIBS=ON -DBUILD_gflags_LIB=ON -DLIB_SUFFIX=64
cmake_install_dir gflags -DBUILD_SHARED_LIBS=ON -DBUILD_STATIC_LIBS=ON -DBUILD_gflags_LIB=ON -DLIB_SUFFIX=64
}

function install_glog {
wget_and_untar https://github.com/google/glog/archive/v0.6.0.tar.gz glog
cmake_install glog -DBUILD_SHARED_LIBS=ON
cmake_install_dir glog -DBUILD_SHARED_LIBS=ON
}

function install_lzo {
wget_and_untar http://www.oberhumer.com/opensource/lzo/download/lzo-2.10.tar.gz lzo
(
cd lzo
./configure --prefix=/usr --enable-shared --disable-static --docdir=/usr/share/doc/lzo-2.10
cd ${DEPENDENCY_DIR}/lzo
./configure --prefix=${INSTALL_PREFIX} --enable-shared --disable-static --docdir=/usr/share/doc/lzo-2.10
make "-j$(nproc)"
make install
)
Expand All @@ -106,36 +108,36 @@ function install_lzo {
function install_boost {
wget_and_untar https://github.com/boostorg/boost/releases/download/${BOOST_VERSION}/${BOOST_VERSION}.tar.gz boost
(
cd boost
cd ${DEPENDENCY_DIR}/boost
if [[ ${USE_CLANG} != "false" ]]; then
./bootstrap.sh --prefix=/usr/local --with-toolset="clang-15"
./bootstrap.sh --prefix=${INSTALL_PREFIX} --with-toolset="clang-15"
# Switch the compiler from the clang-15 toolset which doesn't exist (clang-15.jam) to
# clang of version 15 when toolset clang-15 is used.
# This reconciles the project-config.jam generation with what the b2 build system allows for customization.
sed -i 's/using clang-15/using clang : 15/g' project-config.jam
${SUDO} ./b2 "-j$(nproc)" -d0 install threading=multi toolset=clang-15 --without-python
else
./bootstrap.sh --prefix=/usr/local
./bootstrap.sh --prefix=${INSTALL_PREFIX}
${SUDO} ./b2 "-j$(nproc)" -d0 install threading=multi --without-python
fi
)
}

function install_snappy {
wget_and_untar https://github.com/google/snappy/archive/1.1.8.tar.gz snappy
cmake_install snappy -DSNAPPY_BUILD_TESTS=OFF
cmake_install_dir snappy -DSNAPPY_BUILD_TESTS=OFF
}

function install_fmt {
wget_and_untar https://github.com/fmtlib/fmt/archive/${FMT_VERSION}.tar.gz fmt
cmake_install fmt -DFMT_TEST=OFF
cmake_install_dir fmt -DFMT_TEST=OFF
}

function install_protobuf {
wget_and_untar https://github.com/protocolbuffers/protobuf/releases/download/v21.8/protobuf-all-21.8.tar.gz protobuf
(
cd protobuf
./configure --prefix=/usr
cd ${DEPENDENCY_DIR}/protobuf
./configure --prefix=${INSTALL_PREFIX}
make "-j${NPROC}"
make install
ldconfig
Expand All @@ -144,61 +146,60 @@ function install_protobuf {

function install_fizz {
wget_and_untar https://github.com/facebookincubator/fizz/archive/refs/tags/${FB_OS_VERSION}.tar.gz fizz
cmake_install fizz/fizz -DBUILD_TESTS=OFF
cmake_install_dir fizz/fizz -DBUILD_TESTS=OFF
}

function install_folly {
wget_and_untar https://github.com/facebook/folly/archive/refs/tags/${FB_OS_VERSION}.tar.gz folly
cmake_install folly -DBUILD_TESTS=OFF -DFOLLY_HAVE_INT128_T=ON
cmake_install_dir folly -DBUILD_TESTS=OFF -DFOLLY_HAVE_INT128_T=ON
}

function install_wangle {
wget_and_untar https://github.com/facebook/wangle/archive/refs/tags/${FB_OS_VERSION}.tar.gz wangle
cmake_install wangle/wangle -DBUILD_TESTS=OFF
cmake_install_dir wangle/wangle -DBUILD_TESTS=OFF
}

function install_fbthrift {
wget_and_untar https://github.com/facebook/fbthrift/archive/refs/tags/${FB_OS_VERSION}.tar.gz fbthrift
cmake_install fbthrift -Denable_tests=OFF -DBUILD_TESTS=OFF -DBUILD_SHARED_LIBS=OFF
cmake_install_dir fbthrift -Denable_tests=OFF -DBUILD_TESTS=OFF -DBUILD_SHARED_LIBS=OFF
}

function install_mvfst {
wget_and_untar https://github.com/facebook/mvfst/archive/refs/tags/${FB_OS_VERSION}.tar.gz mvfst
cmake_install mvfst -DBUILD_TESTS=OFF
cmake_install_dir mvfst -DBUILD_TESTS=OFF
}

function install_duckdb {
if $BUILD_DUCKDB ; then
echo 'Building DuckDB'
wget_and_untar https://github.com/duckdb/duckdb/archive/refs/tags/v0.8.1.tar.gz duckdb
cmake_install duckdb -DBUILD_UNITTESTS=OFF -DENABLE_SANITIZER=OFF -DENABLE_UBSAN=OFF -DBUILD_SHELL=OFF -DEXPORT_DLL_SYMBOLS=OFF -DCMAKE_BUILD_TYPE=Release
cmake_install_dir duckdb -DBUILD_UNITTESTS=OFF -DENABLE_SANITIZER=OFF -DENABLE_UBSAN=OFF -DBUILD_SHELL=OFF -DEXPORT_DLL_SYMBOLS=OFF -DCMAKE_BUILD_TYPE=Release
fi
}

function install_arrow {
wget_and_untar https://archive.apache.org/dist/arrow/arrow-${ARROW_VERSION}/apache-arrow-${ARROW_VERSION}.tar.gz arrow
(
cd arrow/cpp
cmake_install \
-DARROW_PARQUET=OFF \
-DARROW_WITH_THRIFT=ON \
-DARROW_WITH_LZ4=ON \
-DARROW_WITH_SNAPPY=ON \
-DARROW_WITH_ZLIB=ON \
-DARROW_WITH_ZSTD=ON \
-DARROW_JEMALLOC=OFF \
-DARROW_SIMD_LEVEL=NONE \
-DARROW_RUNTIME_SIMD_LEVEL=NONE \
-DARROW_WITH_UTF8PROC=OFF \
-DARROW_TESTING=ON \
-DCMAKE_INSTALL_PREFIX=/usr/local \
-DCMAKE_BUILD_TYPE=Release \
-DARROW_BUILD_STATIC=ON \
-DThrift_SOURCE=BUNDLED
cmake_install_dir arrow/cpp \
-DARROW_PARQUET=OFF \
-DARROW_WITH_THRIFT=ON \
-DARROW_WITH_LZ4=ON \
-DARROW_WITH_SNAPPY=ON \
-DARROW_WITH_ZLIB=ON \
-DARROW_WITH_ZSTD=ON \
-DARROW_JEMALLOC=OFF \
-DARROW_SIMD_LEVEL=NONE \
-DARROW_RUNTIME_SIMD_LEVEL=NONE \
-DARROW_WITH_UTF8PROC=OFF \
-DARROW_TESTING=ON \
-DCMAKE_INSTALL_PREFIX=${INSTALL_PREFIX} \
-DCMAKE_BUILD_TYPE=Release \
-DARROW_BUILD_STATIC=ON \
-DThrift_SOURCE=BUNDLED

(
# Install thrift.
cd _build/thrift_ep-prefix/src/thrift_ep-build
cmake --install ./ --prefix /usr/local/
cd ${DEPENDENCY_DIR}/arrow/cpp/_build/thrift_ep-prefix/src/thrift_ep-build
cmake --install ./ --prefix ${INSTALL_PREFIX}
)
}

Expand Down
Loading
Loading