Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-35140: [R] Rewrite configure script and ensure we don't use mismatched libarrow #35147

Merged
merged 28 commits into from
May 3, 2023

Conversation

nealrichardson
Copy link
Member

@nealrichardson nealrichardson commented Apr 14, 2023

I've significantly rewritten r/configure to make it easier to reason about and harder for issues like #34229 and #35140 to happen. I've also added a version check to make sure that we don't obviously try to use a system C++ library that doesn't match the R package version. Making sure this was applied in all of the right places and handling what to do if the versions didn't match was the impetus for the whole refactor.

configure has been broken up into some functions, and the flow of the script is, as is documented at the top of the file:

# * Find libarrow on the system. If it is present, make sure
#   that its version is compatible with the R package.
# * If no suitable libarrow is found, download it (where allowed)
#   or build it from source.
# * Determine what features this libarrow has and what other
#   flags it requires, and set them in src/Makevars for use when
#   compiling the bindings.
# * Run a test program to confirm that arrow headers are found

All of the detection of CFLAGS and -L dirs etc. happen in one place now, and they all prefer using pkg-config to read from the libarrow build what libraries and flags it requires, rather than hard-coding. (autobrew is the only remaining exception, but I didn't feel like messing with that today.) This should make the builds more future proof, should make it so more build configurations work (e.g. I suspect that a static build in ARROW_HOME wouldn't have gotten picked up correctly because it didn't add -larrow_bundled_dependencies to the libs, but now it will), and it may eliminate the redundant -l and -D setting I've observed in some builds (not harmful but definitely sloppy).

Version checking has been added in an R script for ease of testing (and for easier handling of arithmetic), and there is an accompanying test-check-versions.R added. These are run on all the builds that use ci/scripts/r_test.sh.

Behavior changes

  • If libarrow is found on the system (via ARROW_HOME, pkg-config, or brew), but the version does not match, it will not be used, and we will try a bundled build. This should mean that users installing a released version will never have libarrow version problems.

  • If both the found C++ library and R package are on matching dev versions (i.e. not identical given the x.y.z.9000 vs x+1.y.z-SNAPSHOT difference), it will proceed with a warning that you may need to rebuild if there are issues. This means that regular developers will see an extra message in the build output.

  • autobrew is only used on a release version unless you set FORCE_AUTOBREW=true. This eliminates another source of version mismatches (C++ release version, R dev version).

  • The path where you could set LIB_DIR and INCLUDE_DIR env vars has been removed. Use ARROW_HOME instead.

  • Closes: [R] configure script for R fails to set PKG_CFLAGS #35140

  • Closes: [R] Improve package installation (on Fedora 36) with arrow system packages #31989

@github-actions
Copy link

@github-actions
Copy link

⚠️ GitHub issue #35140 has been automatically assigned in GitHub to PR creator.

@github-actions
Copy link

⚠️ GitHub issue #35140 has no components, please add labels for components.

@nealrichardson
Copy link
Member Author

If this is in fact the fix for the conda package issues (conda-forge/r-arrow-feedstock#56 (comment), conda-forge/r-arrow-feedstock#62), then we would have a way to test this in the nightly conda packaging jobs. We currently build the packages but don't install them and assert anything about their capabilities.

Perhaps we can add something to https://github.com/apache/arrow/blob/main/dev/tasks/conda-recipes/r-arrow/build.sh to load the installed package and test that it has parquet, dataset, etc.

@nealrichardson
Copy link
Member Author

@github-actions crossbow submit conda-linux--r4 conda-osx--r4

@github-actions
Copy link

Revision: 1bc1a01

Submitted crossbow builds: ursacomputing/crossbow @ actions-02dfd6461d

Task Status
conda-linux-aarch64-cpu-r42 Azure
conda-linux-x64-cpu-r42 Azure
conda-osx-arm64-cpu-r42 Azure
conda-osx-x64-cpu-r42 Azure

r/configure Outdated
if [ "$LIB_DIR" = "" ]; then
# In case LIB_DIR wasn't previously set, infer from PKG_DIRS
LIB_DIR=`echo $PKG_DIRS | sed -e 's/^-L//'`
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the following approach (we must set LIB_DIR when PKG_* are set) is better than this approach:

diff --git a/r/configure b/r/configure
index 65528bdc3..f452fffb1 100755
--- a/r/configure
+++ b/r/configure
@@ -126,10 +126,11 @@ else
     if [ "$UNAME" = "Darwin" ] && [ "$FORCE_BUNDLED_BUILD" != "true" ]; then
       if [ "$FORCE_AUTOBREW" != "true" ] && [ "`command -v brew`" ] && [ "`brew ls --versions ${PKG_BREW_NAME}`" != "" ]; then
         echo "*** Using Homebrew ${PKG_BREW_NAME}"
-        BREWDIR=`brew --prefix`
+        BREWDIR=`brew --prefix ${PKG_BREW_NAME}`
+        LIB_DIR="$BREWDIR/lib"
         PKG_LIBS="-larrow -larrow_bundled_dependencies"
-        PKG_DIRS="-L$BREWDIR/opt/$PKG_BREW_NAME/lib $PKG_DIRS"
-        PKG_CFLAGS="-I$BREWDIR/opt/$PKG_BREW_NAME/include $PKG_CFLAGS"
+        PKG_DIRS="-L$LIB_DIR $PKG_DIRS"
+        PKG_CFLAGS="-I$BREWDIR/include $PKG_CFLAGS"
       else
         echo "*** Downloading ${PKG_BREW_NAME}"
         if [ -f "autobrew" ]; then
@@ -144,7 +145,9 @@ else
         if [ $? -ne 0 ]; then
           echo "Failed to retrieve binary for ${PKG_BREW_NAME}"
         fi
-        # autobrew sets `PKG_LIBS`, `PKG_DIRS`, and `PKG_CFLAGS`
+        # autobrew sets `PKG_LIBS`, `PKG_DIRS`, `PKG_CFLAGS`, and `BREWDIR`
+        # but `LIB_DIR` isn't set.
+        LIB_DIR="$BREWDIR/lib"
       fi
     else
       if [ "${NOT_CRAN}" = "true" ]; then

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason I didn't take this approach is that r/configure has gotten so complex that I didn't think I could reason about all of the different branches of the code to know that I had found them all (see also #35049). Other, more concrete issues:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kou is this possibly relevant? PKG_LIBS and PKG_DIRS seem to be set correctly but for some reason LIB_DIR isn't.

diff --git a/r/configure b/r/configure
index 65528bdc38..8e130c53b4 100755
--- a/r/configure
+++ b/r/configure
@@ -110,7 +110,7 @@ else
     PKG_CFLAGS="$PKGCONFIG_CFLAGS $PKG_CFLAGS"
     PKG_LIBS="${PKGCONFIG_LIBS}"
     PKG_DIRS="${PKGCONFIG_DIRS}"
-    LIB_DIR=${FOUND_LIB_DIR}
+    LIB_DIR="${FOUND_LIB_DIR}"
 
     # Check for version mismatch
     PC_LIB_VERSION=`pkg-config --modversion arrow`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. Could you explain more the context for it? Can I see a build log for the case? This case #35140 ? Or other case?

If this is for #35140, I think that the isn't related because FOUND_LIB_DIR doesn't have any space:

*** Arrow C++ libraries found via pkg-config at /opt/homebrew/Cellar/apache-arrow/11.0.0_3/lib

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, #35140 uses Apache Arrow 11.0.0.
I was seeing the main branch.

Yes. The change #35147 (comment) will fix #35140. And it's already in the main branch: #34229

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the conda-side, we were able to fix it by specifying LIB_DIR=$PREFIX/lib in our build script, see conda-forge/r-arrow-feedstock#63

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 my mistake, sorry I missed #34229, that explains some of the confusion. I'll tidy this PR up with any remaining useful changes and get it merged.

@@ -14,3 +14,5 @@ fi

# ${R_ARGS} necessary to support cross-compilation
${R} CMD INSTALL --build r/. ${R_ARGS}
# Ensure that features are enabled in the R build (feel free to add others)
${R} -s -e 'library(arrow); stopifnot(arrow_with_dataset(), arrow_with_parquet(), arrow_with_s3())'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@h-vetinari Could you review conda part?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That part is fine and already working (as in: shows the build fails) in conda-forge/r-arrow-feedstock#63.

However, I don't see how the BREW-specific patch above would apply to conda-forge (we'd need $PREFIX), nor do I know what changed between 10 & 11. While taking a first look, my comment in that conda-forge PR was:

Do we need to do something like in the upstream R Makefile to discover the deps correctly?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Apr 24, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Apr 25, 2023
@nealrichardson nealrichardson marked this pull request as draft April 25, 2023 18:29
@nealrichardson
Copy link
Member Author

I've applied the patches and added a bunch of comments to r/configure to help with any upcoming refactoring. I want to do a bit more testing of the various branches locally before we move forward, so I've marked this as draft for now.

@nealrichardson nealrichardson changed the title GH-35140: [R] configure script needs LIB_DIR to find ArrowOptions.cmake GH-35140: [R] Rewrite configure script and ensure we don't use mismatched libarrow Apr 28, 2023
@nealrichardson nealrichardson marked this pull request as ready for review April 28, 2023 18:03
@nealrichardson nealrichardson requested a review from jonkeane April 28, 2023 18:09
@nealrichardson
Copy link
Member Author

@github-actions crossbow submit -g r

@github-actions
Copy link

Revision: 9dfa86d

Submitted crossbow builds: ursacomputing/crossbow @ actions-8ccc8bc92d

Task Status
conda-linux-aarch64-cpu-r42 Azure
conda-linux-x64-cpu-r42 Azure
conda-osx-arm64-cpu-r42 Azure
conda-osx-x64-cpu-r42 Azure
conda-win-x64-cpu-r41 Azure
homebrew-r-autobrew Github Actions
homebrew-r-brew Github Actions
r-binary-packages Github Actions
test-fedora-r-clang-sanitizer Azure
test-r-arrow-backwards-compatibility Github Actions
test-r-depsource-bundled Azure
test-r-depsource-system Github Actions
test-r-dev-duckdb Github Actions
test-r-devdocs Github Actions
test-r-gcc-11 Github Actions
test-r-gcc-12 Github Actions
test-r-install-local Github Actions
test-r-install-local-minsizerel Github Actions
test-r-library-r-base-latest Azure
test-r-linux-as-cran Github Actions
test-r-linux-rchk Github Actions
test-r-linux-valgrind Azure
test-r-minimal-build Azure
test-r-offline-maximal Github Actions
test-r-offline-minimal Azure
test-r-rhub-debian-gcc-devel-lto-latest Azure
test-r-rhub-debian-gcc-release-custom-ccache Azure
test-r-rhub-ubuntu-gcc-release-latest Azure
test-r-rstudio-r-base-4.1-opensuse153 Azure
test-r-rstudio-r-base-4.2-centos7-devtoolset-8 Azure
test-r-rstudio-r-base-4.2-focal Azure
test-r-ubuntu-22.04 Github Actions
test-r-versions Github Actions
test-ubuntu-r-sanitizer Azure

@nealrichardson
Copy link
Member Author

I will inspect the successful builds to make sure features are enabled as they should be. Of the failures:

@nealrichardson
Copy link
Member Author

@github-actions crossbow submit -g r

@kou
Copy link
Member

kou commented May 2, 2023

I think that this problem is related to -I not -L/-l: #34523 (comment)

@kou
Copy link
Member

kou commented May 2, 2023

Ah, wait. The job was passed at #35147 (comment) .
It may not be related to -I. I'll take a look at this.

@nealrichardson
Copy link
Member Author

Ah, wait. The job was passed at #35147 (comment) . It may not be related to -I. I'll take a look at this.

It passed there because I changed it not to use the bundled build. That said, it didn't pick up the wrong absl, so maybe there's still something to learn from it.

@kou
Copy link
Member

kou commented May 2, 2023

Hmm. It seems that autobrew not Homebrew was used in the passed job.

@kou
Copy link
Member

kou commented May 2, 2023

Ah, OK. Thanks.

@nealrichardson
Copy link
Member Author

In the failed build, PKG_CFLAGS=-DARROW_STATIC -DUTF8PROC_EXPORTS -I/Users/voltrondata/tmp/RtmpQAJbFi/R.INSTALLca181e5538f5/arrow/libarrow/arrow-11.0.0.100000506/include -I/opt/homebrew/include

@nealrichardson
Copy link
Member Author

autobrew does not use pkg-config to set PKG_CFLAGS, so -I/opt/homebrew/include is not present in that build

@kou
Copy link
Member

kou commented May 2, 2023

I think that this is the same problem as #34523 .
And this is a C++ part problem not an R part problem.
So I think that we can work on this as a separated pull request.

__ZN4absl12lts_2023012510FormatTimeENSt3__117basic_string_viewIcNS1_11char_traitsIcEEEENS0_4TimeENS0_8TimeZoneE is absl::lts_20230125::FormatTime(std::__1::basic_string_view<char, std::__1::char_traits<char> >, absl::lts_20230125::Time, absl::lts_20230125::TimeZone). And it's used at

return absl::FormatTime(absl::RFC3339_full, absl::FromChrono(tp),
.

Abseil was mixed when we build C++'s libarrow.a not R's arrow.so.

@nealrichardson
Copy link
Member Author

@github-actions crossbow submit -g r

@github-actions
Copy link

github-actions bot commented May 2, 2023

Revision: a769f6a

Submitted crossbow builds: ursacomputing/crossbow @ actions-674657ab73

Task Status
conda-linux-aarch64-cpu-r42 Azure
conda-linux-x64-cpu-r42 Azure
conda-osx-arm64-cpu-r42 Azure
conda-osx-x64-cpu-r42 Azure
conda-win-x64-cpu-r41 Azure
homebrew-r-autobrew Github Actions
homebrew-r-brew Github Actions
r-binary-packages Github Actions
test-fedora-r-clang-sanitizer Azure
test-r-arrow-backwards-compatibility Github Actions
test-r-depsource-bundled Azure
test-r-depsource-system Github Actions
test-r-dev-duckdb Github Actions
test-r-devdocs Github Actions
test-r-gcc-11 Github Actions
test-r-gcc-12 Github Actions
test-r-install-local Github Actions
test-r-install-local-minsizerel Github Actions
test-r-library-r-base-latest Azure
test-r-linux-as-cran Github Actions
test-r-linux-rchk Github Actions
test-r-linux-valgrind Azure
test-r-minimal-build Azure
test-r-offline-maximal Github Actions
test-r-offline-minimal Azure
test-r-rhub-debian-gcc-devel-lto-latest Azure
test-r-rhub-debian-gcc-release-custom-ccache Azure
test-r-rhub-ubuntu-gcc-release-latest Azure
test-r-rstudio-r-base-4.1-opensuse153 Azure
test-r-rstudio-r-base-4.2-centos7-devtoolset-8 Azure
test-r-rstudio-r-base-4.2-focal Azure
test-r-ubuntu-22.04 Github Actions
test-r-versions Github Actions
test-ubuntu-r-sanitizer Azure

@nealrichardson
Copy link
Member Author

I don't fully understand the devdocs failure, but it has to do with the fact that arrow 12.0.0 has been released (https://github.com/ursacomputing/crossbow/actions/runs/4862485406/jobs/8668915213#step:9:3878 vs. https://github.com/ursacomputing/crossbow/actions/runs/4833910722/jobs/8614497534#step:9:3880 where it passed before). The cflags/libs are identical between the two builds. Rebasing to pull in the new version number would probably solve it, though I'm not sure it's worth the electricity to confirm it (that's an expensive build for very little value, it seems).

@paleolimbot
Copy link
Member

Works great on Ubuntu via Docker (arm64 and amd64)!

# docker run --rm -it ghcr.io/apache/arrow-nanoarrow:ubuntu-arm64 or amd64
git clone https://github.com/nealrichardson/arrow.git
cd arrow/r
git switch lib-dir
R CMD INSTALL .

It doesn't seem to be pulling the hosted static libraries for development, which I expected on Ubuntu/x86 (but maybe it's not supposed to?).

@nealrichardson
Copy link
Member Author

It doesn't seem to be pulling the hosted static libraries for development, which I expected on Ubuntu/x86 (but maybe it's not supposed to?).

We don't host binaries for x.y.z.9000, the nightly versions are like x.y.z.1000123, so if you wanted it to pull a nightly for a .9000 dev build, we'd have to teach it how to find the most recent one. Something for you (or whoever) to consider on #35049 I suppose.

@nealrichardson
Copy link
Member Author

Made #35391 for the segfault on the ubuntu force tests job, which I see on main as well.

@nealrichardson nealrichardson merged commit ec89360 into apache:main May 3, 2023
@nealrichardson nealrichardson deleted the lib-dir branch May 3, 2023 14:37
@ursabot
Copy link

ursabot commented May 3, 2023

Benchmark runs are scheduled for baseline = f794b20 and contender = ec89360. ec89360 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️1.22% ⬆️0.0%] test-mac-arm
[Finished ⬇️0.26% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.66% ⬆️0.03%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] ec893602 ec2-t3-xlarge-us-east-2
[Finished] ec893602 test-mac-arm
[Finished] ec893602 ursa-i9-9960x
[Finished] ec893602 ursa-thinkcentre-m75q
[Finished] f794b208 ec2-t3-xlarge-us-east-2
[Finished] f794b208 test-mac-arm
[Finished] f794b208 ursa-i9-9960x
[Finished] f794b208 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@ursabot
Copy link

ursabot commented May 3, 2023

['Python', 'R'] benchmarks have high level of regressions.
test-mac-arm

liujiacheng777 pushed a commit to LoongArch-Python/arrow that referenced this pull request May 11, 2023
…mismatched libarrow (apache#35147)

I've significantly rewritten `r/configure` to make it easier to reason about and harder for issues like apache#34229 and apache#35140 to happen. I've also added a version check to make sure that we don't obviously try to use a system C++ library that doesn't match the R package version. Making sure this was applied in all of the right places and handling what to do if the versions didn't match was the impetus for the whole refactor. 

`configure` has been broken up into some functions, and the flow of the script is, as is documented at the top of the file:

```
# * Find libarrow on the system. If it is present, make sure
#   that its version is compatible with the R package.
# * If no suitable libarrow is found, download it (where allowed)
#   or build it from source.
# * Determine what features this libarrow has and what other
#   flags it requires, and set them in src/Makevars for use when
#   compiling the bindings.
# * Run a test program to confirm that arrow headers are found
```

All of the detection of CFLAGS and `-L` dirs etc. happen in one place now, and they all prefer using `pkg-config` to read from the libarrow build what libraries and flags it requires, rather than hard-coding. (autobrew is the only remaining exception, but I didn't feel like messing with that today.) This should make the builds more future proof, should make it so more build configurations work (e.g. I suspect that a static build in ARROW_HOME wouldn't have gotten picked up correctly because it didn't add `-larrow_bundled_dependencies` to the libs, but now it will), and it may eliminate the redundant `-l` and `-D` setting I've observed in some builds (not harmful but definitely sloppy).

Version checking has been added in an R script for ease of testing (and for easier handling of arithmetic), and there is an accompanying `test-check-versions.R` added. These are run on all the builds that use `ci/scripts/r_test.sh`. 

### Behavior changes

* If libarrow is found on the system (via ARROW_HOME, pkg-config, or brew), but the version does not match, it will not be used, and we will try a bundled build. This should mean that users installing a released version will never have libarrow version problems. 
* If both the found C++ library and R package are on matching dev versions (i.e. not identical given the x.y.z.9000 vs x+1.y.z-SNAPSHOT difference), it will proceed with a warning that you may need to rebuild if there are issues. This means that regular developers will see an extra message in the build output.
* autobrew is only used on a release version unless you set FORCE_AUTOBREW=true. This eliminates another source of version mismatches (C++ release version, R dev version).
* The path where you could set `LIB_DIR` and `INCLUDE_DIR` env vars has been removed. Use `ARROW_HOME` instead.

* Closes: apache#35140
* Closes: apache#31989

Lead-authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this pull request May 15, 2023
…mismatched libarrow (apache#35147)

I've significantly rewritten `r/configure` to make it easier to reason about and harder for issues like apache#34229 and apache#35140 to happen. I've also added a version check to make sure that we don't obviously try to use a system C++ library that doesn't match the R package version. Making sure this was applied in all of the right places and handling what to do if the versions didn't match was the impetus for the whole refactor. 

`configure` has been broken up into some functions, and the flow of the script is, as is documented at the top of the file:

```
# * Find libarrow on the system. If it is present, make sure
#   that its version is compatible with the R package.
# * If no suitable libarrow is found, download it (where allowed)
#   or build it from source.
# * Determine what features this libarrow has and what other
#   flags it requires, and set them in src/Makevars for use when
#   compiling the bindings.
# * Run a test program to confirm that arrow headers are found
```

All of the detection of CFLAGS and `-L` dirs etc. happen in one place now, and they all prefer using `pkg-config` to read from the libarrow build what libraries and flags it requires, rather than hard-coding. (autobrew is the only remaining exception, but I didn't feel like messing with that today.) This should make the builds more future proof, should make it so more build configurations work (e.g. I suspect that a static build in ARROW_HOME wouldn't have gotten picked up correctly because it didn't add `-larrow_bundled_dependencies` to the libs, but now it will), and it may eliminate the redundant `-l` and `-D` setting I've observed in some builds (not harmful but definitely sloppy).

Version checking has been added in an R script for ease of testing (and for easier handling of arithmetic), and there is an accompanying `test-check-versions.R` added. These are run on all the builds that use `ci/scripts/r_test.sh`. 

### Behavior changes

* If libarrow is found on the system (via ARROW_HOME, pkg-config, or brew), but the version does not match, it will not be used, and we will try a bundled build. This should mean that users installing a released version will never have libarrow version problems. 
* If both the found C++ library and R package are on matching dev versions (i.e. not identical given the x.y.z.9000 vs x+1.y.z-SNAPSHOT difference), it will proceed with a warning that you may need to rebuild if there are issues. This means that regular developers will see an extra message in the build output.
* autobrew is only used on a release version unless you set FORCE_AUTOBREW=true. This eliminates another source of version mismatches (C++ release version, R dev version).
* The path where you could set `LIB_DIR` and `INCLUDE_DIR` env vars has been removed. Use `ARROW_HOME` instead.

* Closes: apache#35140
* Closes: apache#31989

Lead-authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
rtpsw pushed a commit to rtpsw/arrow that referenced this pull request May 16, 2023
…mismatched libarrow (apache#35147)

I've significantly rewritten `r/configure` to make it easier to reason about and harder for issues like apache#34229 and apache#35140 to happen. I've also added a version check to make sure that we don't obviously try to use a system C++ library that doesn't match the R package version. Making sure this was applied in all of the right places and handling what to do if the versions didn't match was the impetus for the whole refactor. 

`configure` has been broken up into some functions, and the flow of the script is, as is documented at the top of the file:

```
# * Find libarrow on the system. If it is present, make sure
#   that its version is compatible with the R package.
# * If no suitable libarrow is found, download it (where allowed)
#   or build it from source.
# * Determine what features this libarrow has and what other
#   flags it requires, and set them in src/Makevars for use when
#   compiling the bindings.
# * Run a test program to confirm that arrow headers are found
```

All of the detection of CFLAGS and `-L` dirs etc. happen in one place now, and they all prefer using `pkg-config` to read from the libarrow build what libraries and flags it requires, rather than hard-coding. (autobrew is the only remaining exception, but I didn't feel like messing with that today.) This should make the builds more future proof, should make it so more build configurations work (e.g. I suspect that a static build in ARROW_HOME wouldn't have gotten picked up correctly because it didn't add `-larrow_bundled_dependencies` to the libs, but now it will), and it may eliminate the redundant `-l` and `-D` setting I've observed in some builds (not harmful but definitely sloppy).

Version checking has been added in an R script for ease of testing (and for easier handling of arithmetic), and there is an accompanying `test-check-versions.R` added. These are run on all the builds that use `ci/scripts/r_test.sh`. 

### Behavior changes

* If libarrow is found on the system (via ARROW_HOME, pkg-config, or brew), but the version does not match, it will not be used, and we will try a bundled build. This should mean that users installing a released version will never have libarrow version problems. 
* If both the found C++ library and R package are on matching dev versions (i.e. not identical given the x.y.z.9000 vs x+1.y.z-SNAPSHOT difference), it will proceed with a warning that you may need to rebuild if there are issues. This means that regular developers will see an extra message in the build output.
* autobrew is only used on a release version unless you set FORCE_AUTOBREW=true. This eliminates another source of version mismatches (C++ release version, R dev version).
* The path where you could set `LIB_DIR` and `INCLUDE_DIR` env vars has been removed. Use `ARROW_HOME` instead.

* Closes: apache#35140
* Closes: apache#31989

Lead-authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants