Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-203: Python: Basic filename based Parquet read/write #83

Closed
wants to merge 28 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
0463995
ARROW-203: Python: Basic filename based Parquet read/write
xhochy May 29, 2016
7192cfb
Add const to slicing parameters
xhochy May 30, 2016
081db5f
Limit and document chunk_size
xhochy May 30, 2016
0fbed3f
Remove obsolete parquet files
xhochy May 30, 2016
be6415c
Incorportate review comments
xhochy May 31, 2016
9b06e41
Make tests templated
xhochy Jun 3, 2016
5d4929a
Add test-util.h
xhochy Jun 3, 2016
b505feb
Install parquet-cpp via conda
xhochy Jun 3, 2016
81f501e
No need to install conda in travis_script_python anymore
xhochy Jun 3, 2016
6a41d23
Re-use conda installation from C++
xhochy Jun 4, 2016
cd3b9a9
Also search for Parquet in PyArrow
xhochy Jun 4, 2016
9520c39
Use PARQUET from miniconda path
xhochy Jun 4, 2016
2006e70
Rewrite test py.test style
xhochy Jun 4, 2016
2dffc14
Fix min mistake, use equals instead of ==
xhochy Jun 4, 2016
443de8b
Add miniconda to the LD_LIBRARY_PATH
xhochy Jun 4, 2016
5706db2
Use length and offset instead of slicing
xhochy Jun 4, 2016
066c08a
Add missing functions to smart pointers
xhochy Jun 4, 2016
4a80116
Handle Python3 strings correctly
xhochy Jun 4, 2016
00c1461
Also ensure correct OSX compiler flags in PyArrow
xhochy Jun 4, 2016
f583b61
Fix rpath for libarrow_parquet
xhochy Jun 5, 2016
77bd21a
Add pandas roundtrip to tests
xhochy Jun 5, 2016
0514d01
Handle exceptions on RowGroupWriter::Close better
xhochy Jun 7, 2016
8f6010a
Linter fixes
xhochy Jun 7, 2016
000e1e3
Use unique_ptr and shared_ptr from Cython
xhochy Jun 10, 2016
8d90d3f
Do not set LD_LIBRARY_PATH in python build
xhochy Jun 10, 2016
ec07768
Set LD_LIBRARY_PATH in python build
xhochy Jun 10, 2016
38d786c
Make code more readable by using using
xhochy Jun 10, 2016
405f85d
Remove FindParquet duplication
xhochy Jun 10, 2016
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion ci/travis_before_script_cpp.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

set -e

source $TRAVIS_BUILD_DIR/ci/travis_install_conda.sh
conda install -y --channel apache/channel/dev parquet-cpp
export PARQUET_HOME=$MINICONDA

: ${CPP_BUILD_DIR=$TRAVIS_BUILD_DIR/cpp-build}

mkdir $CPP_BUILD_DIR
Expand All @@ -19,7 +23,7 @@ echo $GTEST_HOME

: ${ARROW_CPP_INSTALL=$TRAVIS_BUILD_DIR/cpp-install}

CMAKE_COMMON_FLAGS="-DARROW_BUILD_BENCHMARKS=ON -DCMAKE_INSTALL_PREFIX=$ARROW_CPP_INSTALL"
CMAKE_COMMON_FLAGS="-DARROW_BUILD_BENCHMARKS=ON -DARROW_PARQUET=ON -DCMAKE_INSTALL_PREFIX=$ARROW_CPP_INSTALL"

if [ $TRAVIS_OS_NAME == "linux" ]; then
cmake -DARROW_TEST_MEMCHECK=on $CMAKE_COMMON_FLAGS -DCMAKE_CXX_FLAGS="-Werror" $CPP_DIR
Expand Down
22 changes: 1 addition & 21 deletions ci/travis_conda_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,27 +2,7 @@

set -e

if [ $TRAVIS_OS_NAME == "linux" ]; then
MINICONDA_URL="https://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh"
else
MINICONDA_URL="https://repo.continuum.io/miniconda/Miniconda-latest-MacOSX-x86_64.sh"
fi

wget -O miniconda.sh $MINICONDA_URL
MINICONDA=$TRAVIS_BUILD_DIR/miniconda
bash miniconda.sh -b -p $MINICONDA
export PATH="$MINICONDA/bin:$PATH"
conda update -y -q conda
conda info -a

conda config --set show_channel_urls yes
conda config --add channels conda-forge
conda config --add channels apache

conda install --yes conda-build jinja2 anaconda-client

# faster builds, please
conda install -y nomkl
source $TRAVIS_BUILD_DIR/ci/travis_install_conda.sh

# Build libarrow

Expand Down
26 changes: 26 additions & 0 deletions ci/travis_install_conda.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/usr/bin/env bash

set -e

if [ $TRAVIS_OS_NAME == "linux" ]; then
MINICONDA_URL="https://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh"
else
MINICONDA_URL="https://repo.continuum.io/miniconda/Miniconda-latest-MacOSX-x86_64.sh"
fi

wget -O miniconda.sh $MINICONDA_URL
export MINICONDA=$TRAVIS_BUILD_DIR/miniconda
bash miniconda.sh -b -p $MINICONDA
export PATH="$MINICONDA/bin:$PATH"
conda update -y -q conda
conda info -a

conda config --set show_channel_urls yes
conda config --add channels conda-forge
conda config --add channels apache

conda install --yes conda-build jinja2 anaconda-client

# faster builds, please
conda install -y nomkl

21 changes: 6 additions & 15 deletions ci/travis_script_python.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,28 +4,19 @@ set -e

PYTHON_DIR=$TRAVIS_BUILD_DIR/python

# Re-use conda installation from C++
export MINICONDA=$TRAVIS_BUILD_DIR/miniconda
export PATH="$MINICONDA/bin:$PATH"
export LD_LIBRARY_PATH="$MINICONDA/lib:$LD_LIBRARY_PATH"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? In theory this should not be necessary (handled by conda's shared library patching)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is this only for running the C++ unit tests (which don't have the rpath of the conda environment)? If so that's OK

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be required to load libparquet.so: https://travis-ci.org/apache/arrow/jobs/136655629

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue is that parquet-cpp is not being installed in the conda environment where pyarrow library is being built and tests run (https://github.com/apache/arrow/blob/master/ci/travis_script_python.sh#L32).

Two related issues, then:

  1. parquet-cpp needs to be added to the build and runtime requirements in python/conda.recipe/meta.yaml

  2. parquet-cpp must be installed after conda create is installed -- it's only being installed in the primary / top level environment

I think if you do this you can get rid of the LD_LIBRARY_PATH hack

export PARQUET_HOME=$MINICONDA

# Share environment with C++
pushd $CPP_BUILD_DIR
source setup_build_env.sh
popd

pushd $PYTHON_DIR

# Bootstrap a Conda Python environment

if [ $TRAVIS_OS_NAME == "linux" ]; then
MINICONDA_URL="https://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh"
else
MINICONDA_URL="https://repo.continuum.io/miniconda/Miniconda-latest-MacOSX-x86_64.sh"
fi

curl $MINICONDA_URL > miniconda.sh
MINICONDA=$TRAVIS_BUILD_DIR/miniconda
bash miniconda.sh -b -p $MINICONDA
export PATH="$MINICONDA/bin:$PATH"
conda update -y -q conda
conda info -a

python_version_tests() {
PYTHON_VERSION=$1
CONDA_ENV_NAME="pyarrow-test-${PYTHON_VERSION}"
Expand Down
2 changes: 2 additions & 0 deletions cpp/src/arrow/column.h
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,8 @@ class Column {

int64_t null_count() const { return data_->null_count(); }

const std::shared_ptr<Field>& field() const { return field_; }

// @returns: the column's name in the passed metadata
const std::string& name() const { return field_->name; }

Expand Down
7 changes: 7 additions & 0 deletions cpp/src/arrow/parquet/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,13 @@ add_library(arrow_parquet SHARED
target_link_libraries(arrow_parquet ${PARQUET_LIBS})
SET_TARGET_PROPERTIES(arrow_parquet PROPERTIES LINKER_LANGUAGE CXX)

if (APPLE)
set_target_properties(arrow_parquet
PROPERTIES
BUILD_WITH_INSTALL_RPATH ON
INSTALL_NAME_DIR "@rpath")
endif()

ADD_ARROW_TEST(parquet-schema-test)
ARROW_TEST_LINK_LIBRARIES(parquet-schema-test arrow_parquet)

Expand Down
Loading