Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-40079: [CI][Packaging] Enable Azure in more tests and builds #40080

Merged

Conversation

Tom-Newton
Copy link
Contributor

@Tom-Newton Tom-Newton commented Feb 14, 2024

Rationale for this change

We want python side tests of AzureFileSystem to run in CI.

What changes are included in this PR?

  • Add missing export to enable Azure pyarrow tests
  • Enable azure in sdist tests.
  • Enable Azure on macos python builds
  • Enable azure in conda builds and install dependencies (Azure C++ SDK and azurite)
  • Enable retries on C++ tests to mitigate [CI][FS][Azure] Azurite tests are flaking on main #40121

Probably all of this should have been included in #39971

Are these changes tested?

There is no new functionality to test

Are there any user-facing changes?

No

Copy link

⚠️ GitHub issue #40079 has been automatically assigned in GitHub to PR creator.

@Tom-Newton
Copy link
Contributor Author

Most of the CI failures are actually reproducible on main with PYTHON=3.9 docker-compose build conda-python.

I will need to investigate the AVX2 build failure.

Comment on lines 25 to 30
# Azurite requires npm
RUN export DEBIAN_FRONTEND=noninteractive && \
apt-get update -y -q && \
apt-get install -y -q npm \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we install npm (Node.js) by conda?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Feb 14, 2024
@kou
Copy link
Member

kou commented Feb 14, 2024

Test is AzureFileSsstem.InitializeWithDefaultCredential:

https://github.com/apache/arrow/actions/runs/7902101938/job/21566998998?pr=40080#step:6:6540

[ RUN      ] AzureFileSystem.InitializeWithDefaultCredential

TEST(AzureFileSystem, InitializeWithDefaultCredential) {
AzureOptions options;
options.account_name = "dummy-account-name";
ARROW_EXPECT_OK(options.ConfigureDefaultCredential());
EXPECT_OK_AND_ASSIGN(auto fs, AzureFileSystem::Make(options));
}

Important backtrace:

https://github.com/apache/arrow/actions/runs/7902101938/job/21566998998?pr=40080#step:6:6568

Thread 1 (Thread 0x7f5c8e6d2cc0 (LWP 26205)):
#0  0x00007f5c92ddae4c in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string (this=0x7f5c45596010, __str=...) at /home/conda/feedstock_root/build_artifacts/gcc_compilers_1706816862910/work/build/x86_64-conda-linux-gnu/libstdc++-v3/include/bits/basic_string.h:541
#1  0x000055e62c73586c in std::_Construct<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&> () at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/12.3.0/bits/stl_construct.h:119
#2  0x000055e62c7292c5 in std::__do_uninit_copy<__gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*> () at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/12.3.0/bits/stl_uninitialized.h:120
#3  0x00007f5c91da6f70 in Azure::Storage::_internal::StorageBearerTokenAuthenticationPolicy::Clone() const () from /opt/conda/envs/arrow/lib/./libazure-storage-common.so
#4  0x00007f5c9307cfc9 in Azure::Storage::Blobs::BlobServiceClient::BlobServiceClient(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::shared_ptr<Azure::Core::Credentials::TokenCredential>, Azure::Storage::Blobs::BlobClientOptions const&) () from /opt/conda/envs/arrow/lib/libazure-storage-blobs.so
#5  0x00007f5c95141a35 in std::make_unique<Azure::Storage::Blobs::BlobServiceClient, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::shared_ptr<Azure::Core::Credentials::TokenCredential>&> () at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/12.3.0/bits/unique_ptr.h:1065
#6  0x00007f5c95126c04 in arrow::fs::AzureOptions::MakeBlobServiceClient () at /arrow/cpp/src/arrow/filesystem/azurefs.cc:184

return std::make_unique<Blobs::BlobServiceClient>(AccountBlobUrl(account_name),
token_credential_);

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 14, 2024
@Tom-Newton
Copy link
Contributor Author

Pinning to same versions we get from vcpkg solves some of the CI failures. I think the only remaining failures are ones that also exist on main. Specifically installing GCS testbench on python 3.9 seems to fail.

@kou
Copy link
Member

kou commented Feb 17, 2024

Pinning to same versions we get from vcpkg solves some of the CI failures.

Does it mean that we may need to improve our Azure filesystem implementation to use more recent azure-sdk-for-cpp? (If it's needed, we should work on it as a separated task.)
Or does it mean that we have symbol conflict with azure-sdk-for-cpp installed by vcpkg and conda?

Anyway, could you add a comment why we need to pin azure-sdk-for-cpp?

I think the only remaining failures are ones that also exist on main. Specifically installing GCS testbench on python 3.9 seems to fail.

Could you open an issue for it so that we can work on it as a separated task?

@Tom-Newton
Copy link
Contributor Author

Tom-Newton commented Feb 18, 2024

Created an issue for the python 3.9 GCS testbench issue #40112

For the segfault it looks like going from azure-core-cpp 1.10.3 to 1.11.0 causes the problem. I cannot reproduce when using the same version built from source. I also found a couple of github issue which looks very relevant
conda-forge/azure-core-cpp-feedstock#10 (comment)
Azure/azure-sdk-for-cpp#5322

and it looks like 1.11.0 has been marked as broken on the conda website
image
https://anaconda.org/conda-forge/azure-core-cpp/labels

If I understand this correctly it seems to be a conda issue not an Azure SDK or an Arrow problem.

@Tom-Newton Tom-Newton marked this pull request as ready for review February 18, 2024 12:25
@Tom-Newton
Copy link
Contributor Author

I'm going to call this ready for review but I'm not sure if we can merge it while #40112 is un-resolved.

@Tom-Newton
Copy link
Contributor Author

Hmm.... it looks like the azure tests are slightly flaky. I just changed a comment and the new CI run has an azure failure.
Previous successful build: https://github.com/apache/arrow/actions/runs/7945396747/job/21692015614
Failure in most recent build: https://github.com/apache/arrow/actions/runs/7949050250/job/21699789831?pr=40080

C++ exception with description "Connection closed before getting full response or response is less than expected. Expected response length = 254. Read until now = 231" thrown in the test body.
2024-02-18T12:50:20.039Z ada6933e-9c33-47d2-86f6-29e9aa01f713 info: BlobStorageContextMiddleware: RequestMethod=DELETE RequestURL=http://127.0.0.1/devstoreaccount1/container?restype=container RequestHeaders:{"authorization":"SharedKey devstoreaccount1:hYh+JRj5cBYqdqOyM2wB3EZizQ/s2DiIoDI0CIF2EXM=","host":"127.0.0.1:10000","user-agent":"azsdk-cpp-storage-blobs/12.10.0-beta.1 (Linux 6.2.0-1019-azure x86_64 #19~22.04.1-Ubuntu SMP Wed Jan 10 22:57:03 UTC 2024)","x-ms-client-request-id":"be6819a2-72b8-4630-8eb0-4a88e7cb3061","x-ms-date":"Sun, 18 Feb 2024 12:50:20 GMT","x-ms-version":"2022-11-02"} ClientIP=127.0.0.1 Protocol=http HTTPVersion=1.1

It looks like azurite returned an invalid response.

I've run this multiple times locally but so far have not been able to reproduce.

@Tom-Newton Tom-Newton force-pushed the tomnewton/more_azure_builds_and_tests/GH-40079 branch from 69ed3a5 to 7ed0910 Compare February 18, 2024 20:42
@Tom-Newton
Copy link
Contributor Author

Tom-Newton commented Feb 18, 2024

Retrying got the same error in a different test case on a different build https://github.com/apache/arrow/actions/runs/7951594516/job/21705210845?pr=40080

@Tom-Newton
Copy link
Contributor Author

I think #40120 should resolve the Python 3.9 and GCS testbench problem.

I also found an example of the azure tests flaking on main https://github.com/apache/arrow/actions/runs/7915689559/job/21608061673. I'm unsure whether its fair to merge this while this flakiness exists. This PR enables Azure tests in more builds so the flakes will become more frequent.

I created a separate issue to fix this flakiness #40121

@Tom-Newton Tom-Newton force-pushed the tomnewton/more_azure_builds_and_tests/GH-40079 branch from 7ed0910 to 7c27978 Compare February 19, 2024 15:01
@Tom-Newton Tom-Newton force-pushed the tomnewton/more_azure_builds_and_tests/GH-40079 branch from 31e4920 to 5e3f846 Compare February 19, 2024 19:23
@@ -16,6 +16,12 @@
# under the License.

aws-sdk-cpp=1.11.68
# There is a problem with the 1.11.0 conda release of azure-core-cpp https://github.com/conda-forge/admin-requests/pull/911
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that is a relevant link. AFAIU there is not really a problem with the 1.11.0 conda package itself. It's that conda did not pin azure-core-cpp in other packages (eg building tiledb with azure-core-cpp against 1.10, and then running that in an env with 1.11 installed, then we got segfaults).

But for the purpose of our CI, we are both building and running against the same azure-core-cpp version, right? In that case, I would expect that building+running with azure-core-cpp would work just fine. Or if not, that means our azurefs implementation might not yet be compatible with the latest azure-core-cpp?

Copy link
Contributor Author

@Tom-Newton Tom-Newton Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I tried using 1.11.0 building from source with cmake and everything worked. With 1.10.0 1.11.0 from conda we got segfaults.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean 1.11 from conda? (not 1.10) Do you remember if there was more information in the logs about the crash?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#40080 (comment) summarises the info we have.

It looks like 1.11.1 is now available from conda though and that works, so I can remove this version restriction.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Feb 20, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 21, 2024
@assignUser
Copy link
Member

@github-actions crossbow submit -g python

Copy link

Revision: 53b9d2c

Submitted crossbow builds: ursacomputing/crossbow @ actions-47658c1ea8

Task Status
test-conda-python-3.10 GitHub Actions
test-conda-python-3.10-cython2 GitHub Actions
test-conda-python-3.10-hdfs-2.9.2 GitHub Actions
test-conda-python-3.10-hdfs-3.2.1 GitHub Actions
test-conda-python-3.10-pandas-latest GitHub Actions
test-conda-python-3.10-pandas-nightly GitHub Actions
test-conda-python-3.10-spark-v3.5.0 GitHub Actions
test-conda-python-3.10-substrait GitHub Actions
test-conda-python-3.11 GitHub Actions
test-conda-python-3.11-dask-latest GitHub Actions
test-conda-python-3.11-dask-upstream_devel GitHub Actions
test-conda-python-3.11-hypothesis GitHub Actions
test-conda-python-3.11-pandas-upstream_devel GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.12 GitHub Actions
test-conda-python-3.8 GitHub Actions
test-conda-python-3.8-pandas-1.0 GitHub Actions
test-conda-python-3.8-spark-v3.5.0 GitHub Actions
test-conda-python-3.9 GitHub Actions
test-conda-python-3.9-pandas-latest GitHub Actions
test-cuda-python GitHub Actions
test-debian-11-python-3 Azure
test-fedora-39-python-3 Azure
test-ubuntu-20.04-python-3 Azure
test-ubuntu-22.04-python-3 GitHub Actions

@Tom-Newton
Copy link
Contributor Author

Tom-Newton commented Feb 22, 2024

Copy link
Member

@assignUser assignUser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree CI failures seem unrelated but I'll let @raulcd have the last word on that. Thanks :)

ci/scripts/cpp_test.sh Show resolved Hide resolved
@kou
Copy link
Member

kou commented Feb 22, 2024

Those failures are unrelated because they are happen on main too.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@kou kou merged commit b089c6a into apache:main Feb 22, 2024
53 of 55 checks passed
@kou kou removed the awaiting change review Awaiting change review label Feb 22, 2024
@github-actions github-actions bot added the awaiting merge Awaiting merge label Feb 22, 2024
Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit b089c6a.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them.

zanmato1984 pushed a commit to zanmato1984/arrow that referenced this pull request Feb 28, 2024
…apache#40080)

### Rationale for this change
We want python side tests of `AzureFileSystem` to run in CI. 

### What changes are included in this PR?
- Add missing `export` to enable Azure pyarrow tests
- Enable azure in sdist tests.
- Enable Azure on macos python builds
- Enable azure in conda builds and install dependencies (Azure C++ SDK and azurite)
- Enable retries on C++ tests to mitigate apache#40121

Probably all of this should have been included in apache#39971

### Are these changes tested?
There is no new functionality to test

### Are there any user-facing changes?
No

* Closes: apache#40079
* GitHub Issue: apache#40079

Authored-by: Thomas Newton <thomas.w.newton@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
thisisnic pushed a commit to thisisnic/arrow that referenced this pull request Mar 8, 2024
…apache#40080)

### Rationale for this change
We want python side tests of `AzureFileSystem` to run in CI. 

### What changes are included in this PR?
- Add missing `export` to enable Azure pyarrow tests
- Enable azure in sdist tests.
- Enable Azure on macos python builds
- Enable azure in conda builds and install dependencies (Azure C++ SDK and azurite)
- Enable retries on C++ tests to mitigate apache#40121

Probably all of this should have been included in apache#39971

### Are these changes tested?
There is no new functionality to test

### Are there any user-facing changes?
No

* Closes: apache#40079
* GitHub Issue: apache#40079

Authored-by: Thomas Newton <thomas.w.newton@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting merge Awaiting merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI][Packaging] Enable Azure in more tests and builds
4 participants