Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI][Python] AMD64 Conda Java C Data Interface Integration Failure building PyArrow trying to use PYARROW_PARQUET #41725

Closed
vibhatha opened this issue May 20, 2024 · 9 comments

Comments

@vibhatha
Copy link
Collaborator

Describe the bug, including details regarding any error messages, version, and platform.

This CI has been failing in a number of PRs. Needs evaluation.

Trace

Error: docker run --rm -e CI=true -e GRADLE_ENTERPRISE_ACCESS_KEY= --shm-size 2147483648 -e ARROW_ACERO=OFF -e ARROW_DATASET=OFF -e ARROW_FLIGHT=OFF -e ARROW_FLIGHT_SQL=OFF -e ARROW_GANDIVA=OFF -e ARROW_JAVA_CDATA=ON -e ARROW_ORC=OFF -e ARROW_PARQUET=OFF -e CCACHE_COMPILERCHECK=content -e CCACHE_COMPRESS=1 -e CCACHE_COMPRESSLEVEL=6 -e CCACHE_DIR=/ccache -e CCACHE_MAXSIZE=1G -e GITHUB_ACTIONS=true -e JAVA_JNI_CMAKE_ARGS=-DARROW_JAVA_JNI_ENABLE_DEFAULT=OFF -DARROW_JAVA_JNI_ENABLE_C=ON -v /home/runner/work/arrow/arrow:/arrow -v /home/runner/work/arrow/arrow/.docker/maven-cache:/root/.m2 -v /home/runner/work/arrow/arrow/.docker/debian-ccache:/ccache apache/arrow-dev:amd64-conda-python-3.8-java-integration /arrow/ci/scripts/cpp_build.sh /arrow /build && /arrow/ci/scripts/python_build.sh /arrow /build && /arrow/ci/scripts/java_jni_build.sh /arrow ${ARROW_HOME} /build /tmp/dist/java/ && /arrow/ci/scripts/java_build.sh /arrow /build /tmp/dist/java && /arrow/ci/scripts/java_cdata_integration.sh /arrow /build exited with non-zero exit code 1
Error: Process completed with exit code 1.

Component(s)

Continuous Integration, Java

@raulcd
Copy link
Member

raulcd commented May 20, 2024

Can you add the link to a failing job?
Thanks!

@raulcd
Copy link
Member

raulcd commented May 20, 2024

From what I can see from the last success is that we were building pyarrow without parquet on this job (this is the last successful job 4 days ago):
https://github.com/apache/arrow/actions/runs/9111559866/job/25048936470#step:6:2149
and now we seem to be failing to build with Parquet because Arrow CPP wasn't build with Parquet:
https://github.com/apache/arrow/actions/runs/9156580743/job/25171211272?pr=41676#step:6:2549

   CMake Error at CMakeLists.txt:376 (message):
    You must build Arrow C++ with ARROW_PARQUET=ON

@jorisvandenbossche could this be a failure on autodetecting the flags used when building Arrow CPP? The failures match when this issue was merged: #41480

@raulcd raulcd changed the title [Java] [CI] AMD64 Conda Java C Data Interface Integration Failure [CI][Python] AMD64 Conda Java C Data Interface Integration Failure building PyArrow trying to use PYARROW_PARQUET May 20, 2024
@jorisvandenbossche
Copy link
Member

That indeed seems related, starting some debugging on CI in #41751

@jorisvandenbossche
Copy link
Member

Ah, but looking at the logs in more detail, this might actually be a "correct" message now. Because the build does use export PYARROW_WITH_PARQUET_ENCRYPTION=ON, asking for pyarrow to be build with parquet encryption, and at the moment the logic is that this essentially also forces to be build with parquet, even if PYARROW_WITH_PARQUET is set to OFF

@jorisvandenbossche
Copy link
Member

It's coming from here:

export PYARROW_WITH_PARQUET=${ARROW_PARQUET:-OFF}
export PYARROW_WITH_PARQUET_ENCRYPTION=${PARQUET_REQUIRE_ENCRYPTION:-ON}

I don't know if that was done to ensure that the parquet encryption was enabled whenever parquet would be enabled, but so that is currently in an inconsistent state

@jorisvandenbossche
Copy link
Member

We could of course also follow the logic from C++ where enabling PARQUET_REQUIRE_ENCRYPTION is not sufficient to build parquet, but this option only has effect if actually ARROW_PARQUET is enabled as well.

@jorisvandenbossche
Copy link
Member

Actually, more specifically we originally had this in our setup.py, which I removed:

        self.with_parquet_encryption = (self.with_parquet_encryption and
                                        self.with_parquet)

which ensured to ignore the PYARROW_WITH_PARQUET_ENCRYPTION=ON if parquet itself was not enabled.

kou pushed a commit that referenced this issue May 22, 2024
… itself is not enabled (fix Java integration build) (#41776)

### Rationale for this change

Because of refactoring in #41480, explicitly enabling `PYARROW_WITH_PARQUET_ENCRYPTION` without enabling `PYARROW_WITH_PARQUET` (and without Arrow C++ being built with Parquet support) now raises an error, while before we checked in `setup.py` that both were enabled for enabling encryption support. This patch mimics that logic in CMakeLists.txt with a warning added.

### What changes are included in this PR?

When PyArrow with Parquet Encryption is enabled but PyArrow with Parquet itself is not, ignore the encryption setting, but warn about it.

### Are these changes tested?

Yes

* GitHub Issue: #41725

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
@kou
Copy link
Member

kou commented May 22, 2024

Issue resolved by pull request 41776
#41776

@kou kou added this to the 17.0.0 milestone May 22, 2024
@kou kou closed this as completed May 22, 2024
vibhatha pushed a commit to vibhatha/arrow that referenced this issue May 25, 2024
…arquet itself is not enabled (fix Java integration build) (apache#41776)

### Rationale for this change

Because of refactoring in apache#41480, explicitly enabling `PYARROW_WITH_PARQUET_ENCRYPTION` without enabling `PYARROW_WITH_PARQUET` (and without Arrow C++ being built with Parquet support) now raises an error, while before we checked in `setup.py` that both were enabled for enabling encryption support. This patch mimics that logic in CMakeLists.txt with a warning added.

### What changes are included in this PR?

When PyArrow with Parquet Encryption is enabled but PyArrow with Parquet itself is not, ignore the encryption setting, but warn about it.

### Are these changes tested?

Yes

* GitHub Issue: apache#41725

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants