Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prolog of hermetic build with GCC 11 and Clang 13. #7712

Merged
merged 4 commits into from
Jan 21, 2022

Conversation

amosbird
Copy link
Contributor

@amosbird amosbird commented Jan 11, 2022

Proposed changes

Prepare to generate hermetic build using GCC 11 and Clang 13. The ideal toolchain would be ldb toolchain generated by ldb_toolchain_gen.sh

To kick off a clang build, set DORIS_TOOLCHAIN=clang before running any build scripts.

Types of changes

What types of changes does your code introduce to Doris?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)
  • Code refactor (Modify the code structure, format the code, etc...)
  • Optimization. Including functional usability improvements and performance improvements.
  • Dependency. Such as changes related to third-party components.
  • Other.

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • I have created an issue on (Fix #ISSUE) and described the bug/feature there in detail
  • Compiling and unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • If these changes need document changes, I have updated the document
  • Any dependent changes have been merged

Further comments

The experience of compiling Doris can be improved in many ways (I failed to compile it several times over the last year). The non-intrusive way of doing it will be providing a dedicated toolchain to build both thirdparty deps and the backend. This PR contains such proposal along with some improvements for building the thirdparty on unclean environment (some environment provides system shared libs which should not be found and used).

Benefits:

  1. Generate Byte-identical binaries. Good for quality control, testing, debugging, redistributing and forking.
  2. No docker dependency (your dev environment will appreciate that).
  3. Extremely friendly to new developers and potentially attracting more contributors
  4. Much easier to correctly maintain all build deps.

With the ldb toolchain, a minimal centos-7 dev machine (without web-ui things) will be the following:

    yum install -y \
        byacc \
        patch \
        automake \
        libtool \
        make \
        which \
        file \
        ncurses-devel \
        gettext-devel \
        bison \
        java-11-openjdk-devel \
        maven \
        unzip \
        bzip2 \
        zip

Why centos 7? Because it's the still maintained linux distribution with oldest GLIBC (2.17). https://gist.github.com/wagenet/35adca1a032cec2999d47b6c40aa45b1

It's also possible to enrich the LDB toolchain so that no external dependency is required.

It's worth mentioning if LDB toolchain is used, env ASAN_SYMBOLIZER_PATH should be set to the path of llvm-symbolizer explicitly.

With alternative toolchains, we can catch undefined behaviors like https://github.com/apache/incubator-doris/blob/a034c20d169d65b500bd984240eacd5c89c68571/be/src/util/simd/vstring_function.h#L93-L112

The undefined behavior of __builtin_ctz is described here https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html cc @HappenLee

New toolchain also indicates that sm3_test has potential mem leak which comes from openssl (maybe false positive).

Related PRs #7631 #7569

@morningman morningman added kind/chore Issues or PRs related to compile and install kind/improvement labels Jan 12, 2022
@yiguolei
Copy link
Contributor

Great feature

@yiguolei
Copy link
Contributor

LGTM

@morningman
Copy link
Contributor

Link to #7590

{
int r = __syscall(SYS_eventfd2, count, flags);
#ifdef SYS_eventfd
if (r==-ENOSYS && !flags) r = __syscall(SYS_eventfd, count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tab to space

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we should change code style from external projects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, the dir has been added to the .clang-format-ignore.
LGTM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So before merging this PR, we need to update the current docker dev image, so that developer can still compile from source, am I right?

Copy link
Contributor Author

@amosbird amosbird Jan 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to update the current docker dev image

Lemme check if it introduces any kinds of regressions. In general, this PR is for gcc11 and clang13 but should also play well with slightly older toolchains.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems apache/incubator-doris:build-env-latest has prebuilt thirparty blobs. This PR introduces some tweaks along with a new lib libidn. Perhaps it's a good time to rebuilt the dev image with new toolchains.

thirdparty/patches/googletest-release-1.11.0.patch |  71 ++++++++++
thirdparty/patches/libevent.patch                  |  11 ++
thirdparty/patches/rocksdb-5.14.2.patch            |  11 ++

@morningman
Copy link
Contributor

LGTM except for some small problem

./configure --prefix=$TP_INSTALL_DIR --disable-shared --enable-static \
--without-librtmp --with-ssl=${TP_INSTALL_DIR} --without-libidn2 --disable-ldap --enable-ipv6 \
--without-libssh2
make -j $PARALLEL && make install
--without-libssh2 --without-brotli
Copy link
Member

@yangzhg yangzhg Jan 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to add --without-libpsl to avoid undefined reference to psl_free in some env

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can refine the thirdparty builds gradually in subsequent PRs. After we encourage hermetic build, there will be a broad range of environment for testing. Corner cases are definitely expected.

set(ASAN_LIBS -static-libasan)
set(LSAN_LIBS -static-liblsan)
set(UBSAN_LIBS -static-libubsan tcmalloc)
set(TSAN_LIBS -static-libtsan)
Copy link
Member

@yangzhg yangzhg Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure which lib of asan, tsan,lasn,usan will be used, if the license of that lib is LGPL or other, GPL like license we may only allow to link the dynamic lib

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may only allow to link the dynamic lib

That's not the case. We already provide everything that allow the user to relink the application. Even our toolchain is open sourced :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more specific, there is nothing in LGPL that mandates dynamic linking/loading. The requirement is that the end user should be able to relink the application to a substitute set of libraries if he so chooses.

@amosbird
Copy link
Contributor Author

amosbird commented Jan 18, 2022

Now the PR supports CentOS 6 compilation.

NOTE: higher version of bison and autoconf is required.

NOTE2: plugin_loader_test ships a prebuilt shared object which contains GLIBC 2.14 memcpy. It cannot be loaded on CentOS 6 (GLIBC 2.12). This PR updates the blob (compiled with ldb-toolchain-v0.3 clang-13, linked with ld.bfd).

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 21, 2022
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman morningman merged commit 800a363 into apache:master Jan 21, 2022
zuochunwei pushed a commit to zuochunwei/incubator-doris that referenced this pull request Jan 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. kind/chore Issues or PRs related to compile and install kind/improvement reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants