Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-41922: [CI][C++] Update Minio version #44225

Merged
merged 10 commits into from
Oct 1, 2024

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Sep 25, 2024

We were using a rather old version of Minio for CI tests.

Update the Minio version and ensure the S3FileSystem implementation passes all the tests with it.

Copy link

⚠️ GitHub issue #41922 has been automatically assigned in GitHub to PR creator.

@pitrou
Copy link
Member Author

pitrou commented Sep 25, 2024

@github-actions crossbow submit -g cpp -g python -g wheel

This comment was marked as outdated.

@pitrou
Copy link
Member Author

pitrou commented Sep 25, 2024

So, I don't understand why all Crossbow conda-based jobs above have failed with a weird CMake error, while the GHA conda-based jobs succeed. The error message doesn't ring a bell to me:

CMake Error at cmake_modules/ThirdpartyToolchain.cmake:952 (find_program):
  Could not find EP_CMAKE_RANLIB using the following names: :
Call Stack (most recent call first):
  CMakeLists.txt:545 (include)

@kou Would you be able to advise?

@kou
Copy link
Member

kou commented Sep 25, 2024

It seems that conda stopped setting AR and RANLIB:

https://github.com/ursacomputing/crossbow/actions/runs/11037675823/job/30659122209#step:7:1151

+ export 'ARROW_CMAKE_ARGS= -DCMAKE_AR= -DCMAKE_RANLIB='
+ ARROW_CMAKE_ARGS=' -DCMAKE_AR= -DCMAKE_RANLIB='

How about the following?

diff --git a/ci/scripts/cpp_build.sh b/ci/scripts/cpp_build.sh
index bc2bba915f..ac0ba2bf72 100755
--- a/ci/scripts/cpp_build.sh
+++ b/ci/scripts/cpp_build.sh
@@ -34,7 +34,13 @@ if [ ! -z "${CONDA_PREFIX}" ] && [ "${ARROW_EMSCRIPTEN:-OFF}" = "OFF" ]; then
   echo -e "===\n=== Conda environment for build\n==="
   conda list
 
-  export ARROW_CMAKE_ARGS="${ARROW_CMAKE_ARGS} -DCMAKE_AR=${AR} -DCMAKE_RANLIB=${RANLIB}"
+  if [ -n "${AR}" ]; then
+    ARROW_CMAKE_ARGS+=" -DCMAKE_AR=${AR}"
+  fi
+  if [ -n "${RANLIB}" ]; then
+    ARROW_CMAKE_ARGS+=" -DCMAKE_RANLIB=${RANLIB}"
+  fi
+  export ARROW_CMAKE_ARGS
   export ARROW_GANDIVA_PC_CXX_FLAGS=$(echo | ${CXX} -E -Wp,-v -xc++ - 2>&1 | grep '^ ' | awk '{print "-isystem;" substr($1, 1)}' | tr '\n' ';')
 elif [ -x "$(command -v xcrun)" ]; then
   export ARROW_GANDIVA_PC_CXX_FLAGS="-isysroot;$(xcrun --show-sdk-path)"

The GHA conda-based jobs use cached conda environment. So the conda change isn't affected for now.

@pitrou
Copy link
Member Author

pitrou commented Sep 26, 2024

@github-actions crossbow submit -g cpp -g python

Copy link

Revision: 90c07b3

Submitted crossbow builds: ursacomputing/crossbow @ actions-299f61e3ea

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
example-python-minimal-build-fedora-conda GitHub Actions
example-python-minimal-build-ubuntu-venv GitHub Actions
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-conda-python-3.10 GitHub Actions
test-conda-python-3.10-cython2 GitHub Actions
test-conda-python-3.10-hdfs-2.9.2 GitHub Actions
test-conda-python-3.10-hdfs-3.2.1 GitHub Actions
test-conda-python-3.10-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.10-substrait GitHub Actions
test-conda-python-3.11 GitHub Actions
test-conda-python-3.11-dask-latest GitHub Actions
test-conda-python-3.11-dask-upstream_devel GitHub Actions
test-conda-python-3.11-hypothesis GitHub Actions
test-conda-python-3.11-pandas-latest-numpy-1.26 GitHub Actions
test-conda-python-3.11-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.11-pandas-nightly-numpy-nightly GitHub Actions
test-conda-python-3.11-pandas-upstream_devel-numpy-nightly GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.12 GitHub Actions
test-conda-python-3.12-cpython-debug GitHub Actions
test-conda-python-3.9 GitHub Actions
test-conda-python-3.9-pandas-1.1.3-numpy-1.19.5 GitHub Actions
test-conda-python-emscripten GitHub Actions
test-cuda-cpp-ubuntu-20.04-cuda-11.2.2 GitHub Actions
test-cuda-cpp-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-cuda-python-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-debian-12-python-3-amd64 GitHub Actions
test-debian-12-python-3-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-fedora-39-python-3 GitHub Actions
test-ubuntu-20.04-cpp GitHub Actions
test-ubuntu-20.04-cpp-bundled GitHub Actions
test-ubuntu-20.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-22.04-python-3 GitHub Actions
test-ubuntu-22.04-python-313-freethreading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions
test-ubuntu-24.04-python-3 GitHub Actions

@pitrou pitrou marked this pull request as ready for review September 26, 2024 09:14
@pitrou
Copy link
Member Author

pitrou commented Sep 26, 2024

@github-actions crossbow submit -g wheel

@pitrou
Copy link
Member Author

pitrou commented Sep 26, 2024

Your suggested fix looks good, @kou !

Copy link

Revision: 90c07b3

Submitted crossbow builds: ursacomputing/crossbow @ actions-66b8892afd

Task Status
python-sdist GitHub Actions
wheel-macos-monterey-cp310-cp310-arm64 GitHub Actions
wheel-macos-monterey-cp311-cp311-arm64 GitHub Actions
wheel-macos-monterey-cp312-cp312-arm64 GitHub Actions
wheel-macos-monterey-cp313-cp313-arm64 GitHub Actions
wheel-macos-monterey-cp313-cp313t-arm64 GitHub Actions
wheel-macos-monterey-cp39-cp39-arm64 GitHub Actions
wheel-macos-ventura-cp310-cp310-amd64 GitHub Actions
wheel-macos-ventura-cp311-cp311-amd64 GitHub Actions
wheel-macos-ventura-cp312-cp312-amd64 GitHub Actions
wheel-macos-ventura-cp313-cp313-amd64 GitHub Actions
wheel-macos-ventura-cp313-cp313t-amd64 GitHub Actions
wheel-macos-ventura-cp39-cp39-amd64 GitHub Actions
wheel-manylinux-2-28-cp310-cp310-amd64 GitHub Actions
wheel-manylinux-2-28-cp310-cp310-arm64 GitHub Actions
wheel-manylinux-2-28-cp311-cp311-amd64 GitHub Actions
wheel-manylinux-2-28-cp311-cp311-arm64 GitHub Actions
wheel-manylinux-2-28-cp312-cp312-amd64 GitHub Actions
wheel-manylinux-2-28-cp312-cp312-arm64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313-amd64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313-arm64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313t-amd64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313t-arm64 GitHub Actions
wheel-manylinux-2-28-cp39-cp39-amd64 GitHub Actions
wheel-manylinux-2-28-cp39-cp39-arm64 GitHub Actions
wheel-manylinux-2014-cp310-cp310-amd64 GitHub Actions
wheel-manylinux-2014-cp310-cp310-arm64 GitHub Actions
wheel-manylinux-2014-cp311-cp311-amd64 GitHub Actions
wheel-manylinux-2014-cp311-cp311-arm64 GitHub Actions
wheel-manylinux-2014-cp312-cp312-amd64 GitHub Actions
wheel-manylinux-2014-cp312-cp312-arm64 GitHub Actions
wheel-manylinux-2014-cp313-cp313-amd64 GitHub Actions
wheel-manylinux-2014-cp313-cp313-arm64 GitHub Actions
wheel-manylinux-2014-cp313-cp313t-amd64 GitHub Actions
wheel-manylinux-2014-cp313-cp313t-arm64 GitHub Actions
wheel-manylinux-2014-cp39-cp39-amd64 GitHub Actions
wheel-manylinux-2014-cp39-cp39-arm64 GitHub Actions
wheel-windows-cp310-amd64 GitHub Actions
wheel-windows-cp311-amd64 GitHub Actions
wheel-windows-cp312-amd64 GitHub Actions
wheel-windows-cp313-amd64 GitHub Actions
wheel-windows-cp39-amd64 GitHub Actions

Comment on lines 18 to 19
clang>=11
llvmdev>=11
clang>=11,<19
llvmdev>=11,<19
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: We can remove this by merging GH-44233.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kou Which one would you rather merge first?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've merged GH-44233.
I'll update this branch.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Sep 26, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Sep 27, 2024
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@github-actions github-actions bot removed the awaiting change review Awaiting change review label Sep 27, 2024
template <typename Error>
void SaveBackend(const Aws::Client::AWSError<Error>& error) {
S3Backend GetOrSetBackend(const Aws::Client::AWSError<Error>& error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetOrSet is a very weird name. What about SaveBackendIfAbsent or SaveBackendFromErrorIfAbsent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, I got the impression that it was a common name in caching patterns:
https://www.google.com/search?client=firefox-b-d&q=%22get_or_set%22

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But more importantly, I wanted to stress that this method is also used to retrieve the cached backend. The "Save" proposal doesn't do that (which is why I renamed the method in this PR).

@@ -1005,7 +1005,8 @@ TEST_F(TestS3FS, CreateDir) {
AssertFileInfo(fs_.get(), "bucket/newdir/newsub/newsubsub", FileType::Directory);

// Existing "file", should fail
ASSERT_RAISES(IOError, fs_->CreateDir("bucket/somefile"));
ASSERT_RAISES(IOError, fs_->CreateDir("bucket/somefile", /*recursive=*/false));
ASSERT_RAISES(IOError, fs_->CreateDir("bucket/somefile", /*recursive=*/true));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someday the filesystem tests could be unified. I was very careful with directory semantics in the azure fs implementations. Testing all cases to ensure they behave like Linux system calls.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we could open a new issue to improve the generic filesystem tests. Would you like to do that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you like to do that?

No. Because I know that the other filesystems diverge and these details have leaked into user code (i.e. they rely on the broken semantics).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Usually there's a tension into enforcing consistent semantics accross filesystems, and trying to preserve performance when consistency implies more IO roundtrips to a remote host. If there are inconsistencies that are not motivated by performance, they should be investigated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3 finally supports Conditional Writes. That means a lot of the check-before-create sequences you need to implement the filesystem semantics can be made much faster now.

https://aws.amazon.com/about-aws/whats-new/2024/08/amazon-s3-conditional-writes/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I opened #44281 for it

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting merge Awaiting merge labels Sep 30, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Oct 1, 2024
@pitrou
Copy link
Member Author

pitrou commented Oct 1, 2024

@github-actions crossbow submit -g cpp -g python

Copy link

github-actions bot commented Oct 1, 2024

Revision: ca25668

Submitted crossbow builds: ursacomputing/crossbow @ actions-1eca55fdde

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
example-python-minimal-build-fedora-conda GitHub Actions
example-python-minimal-build-ubuntu-venv GitHub Actions
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-conda-python-3.10 GitHub Actions
test-conda-python-3.10-cython2 GitHub Actions
test-conda-python-3.10-hdfs-2.9.2 GitHub Actions
test-conda-python-3.10-hdfs-3.2.1 GitHub Actions
test-conda-python-3.10-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.10-substrait GitHub Actions
test-conda-python-3.11 GitHub Actions
test-conda-python-3.11-dask-latest GitHub Actions
test-conda-python-3.11-dask-upstream_devel GitHub Actions
test-conda-python-3.11-hypothesis GitHub Actions
test-conda-python-3.11-pandas-latest-numpy-1.26 GitHub Actions
test-conda-python-3.11-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.11-pandas-nightly-numpy-nightly GitHub Actions
test-conda-python-3.11-pandas-upstream_devel-numpy-nightly GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.12 GitHub Actions
test-conda-python-3.12-cpython-debug GitHub Actions
test-conda-python-3.9 GitHub Actions
test-conda-python-3.9-pandas-1.1.3-numpy-1.19.5 GitHub Actions
test-conda-python-emscripten GitHub Actions
test-cuda-cpp-ubuntu-20.04-cuda-11.2.2 GitHub Actions
test-cuda-cpp-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-cuda-python-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-debian-12-python-3-amd64 GitHub Actions
test-debian-12-python-3-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-fedora-39-python-3 GitHub Actions
test-ubuntu-20.04-cpp GitHub Actions
test-ubuntu-20.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-22.04-python-3 GitHub Actions
test-ubuntu-22.04-python-313-freethreading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions
test-ubuntu-24.04-python-3 GitHub Actions

@pitrou
Copy link
Member Author

pitrou commented Oct 1, 2024

@github-actions crossbow submit example-python-minimal-build-fedora-conda

Copy link

github-actions bot commented Oct 1, 2024

Revision: cce7a92

Submitted crossbow builds: ursacomputing/crossbow @ actions-d81681459f

Task Status
example-python-minimal-build-fedora-conda GitHub Actions

@pitrou
Copy link
Member Author

pitrou commented Oct 1, 2024

The CI failures are unfortunate but they are also unrelated. I'll merge.

@pitrou pitrou merged commit 6b59098 into apache:main Oct 1, 2024
37 of 40 checks passed
@pitrou pitrou removed the awaiting change review Awaiting change review label Oct 1, 2024
@pitrou pitrou deleted the gh41922-update-minio branch October 1, 2024 14:50
@github-actions github-actions bot added the awaiting changes Awaiting changes label Oct 1, 2024
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 6b59098.

There was 1 benchmark result indicating a performance regression:

The full Conbench report has more details. It also includes information about 10 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants