Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-19955][PySpark] Jenkins Python Conda based test. #17355

Closed
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 43 additions & 24 deletions dev/run-pip-tests
Original file line number Diff line number Diff line change
Expand Up @@ -35,32 +35,38 @@ function delete_virtualenv() {
}
trap delete_virtualenv EXIT

PYTHON_EXECS=()
# Some systems don't have pip or virtualenv - in those cases our tests won't work.
if ! hash virtualenv 2>/dev/null; then
echo "Missing virtualenv skipping pip installability tests."
if hash virtualenv 2>/dev/null && [ ! -n "$USE_CONDA" ]; then
echo "virtualenv installed - using. Note if this is a conda virtual env you may wish to set USE_CONDA"
# Figure out which Python execs we should test pip installation with
if hash python2 2>/dev/null; then
# We do this since we are testing with virtualenv and the default virtual env python
# is in /usr/bin/python
PYTHON_EXECS+=('python2')
elif hash python 2>/dev/null; then
# If python2 isn't installed fallback to python if available
PYTHON_EXECS+=('python')
fi
if hash python3 2>/dev/null; then
PYTHON_EXECS+=('python3')
fi
elif hash conda 2>/dev/null; then
echo "Using conda virtual enviroments"
PYTHON_EXECS=('3.5')
USE_CONDA=1
else
echo "Missing virtualenv & conda, skipping pip installability tests"
exit 0
fi
if ! hash pip 2>/dev/null; then
echo "Missing pip, skipping pip installability tests."
exit 0
fi

# Figure out which Python execs we should test pip installation with
PYTHON_EXECS=()
if hash python2 2>/dev/null; then
# We do this since we are testing with virtualenv and the default virtual env python
# is in /usr/bin/python
PYTHON_EXECS+=('python2')
elif hash python 2>/dev/null; then
# If python2 isn't installed fallback to python if available
PYTHON_EXECS+=('python')
fi
if hash python3 2>/dev/null; then
PYTHON_EXECS+=('python3')
fi

set -x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this just there for debugging? if so, pls remove before merging. otherwise, consider sticking it at the beginning of the script.

# Determine which version of PySpark we are building for archive name
PYSPARK_VERSION=$(python -c "exec(open('python/pyspark/version.py').read());print __version__")
PYSPARK_VERSION=$(python3 -c "exec(open('python/pyspark/version.py').read());print(__version__)")
PYSPARK_DIST="$FWDIR/python/dist/pyspark-$PYSPARK_VERSION.tar.gz"
# The pip install options we use for all the pip commands
PIP_OPTIONS="--upgrade --no-cache-dir --force-reinstall "
Expand All @@ -75,18 +81,24 @@ for python in "${PYTHON_EXECS[@]}"; do
echo "Using $VIRTUALENV_BASE for virtualenv"
VIRTUALENV_PATH="$VIRTUALENV_BASE"/$python
rm -rf "$VIRTUALENV_PATH"
mkdir -p "$VIRTUALENV_PATH"
virtualenv --python=$python "$VIRTUALENV_PATH"
source "$VIRTUALENV_PATH"/bin/activate
# Upgrade pip & friends
pip install --upgrade pip pypandoc wheel
pip install numpy # Needed so we can verify mllib imports
if [ -n "$USE_CONDA" ]; then
conda create -y -p "$VIRTUALENV_PATH" python=$python numpy pandas pip setuptools
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting python=$python led to "python=3" which then tried to install python 3.6

+ conda create -y -p /tmp/tmp.OymEZOKFzo/3 python=3 numpy pandas pip setuptools
Fetching package metadata .........
Solving package specifications: .

Package plan for installation in environment /tmp/tmp.OymEZOKFzo/3:

The following NEW packages will be INSTALLED:

    mkl:             2017.0.1-0        
    numpy:           1.12.1-py36_0     
    openssl:         1.0.2k-1          
    pandas:          0.19.2-np112py36_1
    pip:             9.0.1-py36_1      
    python:          3.6.1-0
    ...

And that led to a conflict with pypandoc:

UnsatisfiableError: The following specifications were found to be in conflict:
  - pypandoc -> python 3.5* -> sqlite 3.9.*
  - pypandoc -> python 3.5* -> xz 5.0.*
  - python 3.6*

manually setting "python=3.5" seemed to clear things up so it could complete the test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds reasonable, for packaging I've made it explicitly request Python 3.5 (at some point if PyPandoc doesn't make it into 3.6 on conda forge we should ping them but no rush).

source activate "$VIRTUALENV_PATH"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to add this line after source activate .. to get pypandoc installed

conda install -y -c conda-forge pypandoc

Otherwise I got this error:

Could not import pypandoc - required to package PySpark

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So its not a hard error, and since the workers don't have pandoc installed (a separate binary) leaving it out for now seems like the easist path. Once we're all dockerized and happy we can add pandoc & pypandoc to the docker image.

else
mkdir -p "$VIRTUALENV_PATH"
virtualenv --python=$python "$VIRTUALENV_PATH"
source "$VIRTUALENV_PATH"/bin/activate
fi
# Upgrade pip & friends if using virutal env
if [ ! -n "USE_CONDA" ]; then
pip install --upgrade pip pypandoc wheel numpy
fi

echo "Creating pip installable source dist"
cd "$FWDIR"/python
# Delete the egg info file if it exists, this can cache the setup file.
rm -rf pyspark.egg-info || echo "No existing egg info file, skipping deletion"
$python setup.py sdist
python setup.py sdist


echo "Installing dist into virtual env"
Expand All @@ -112,6 +124,13 @@ for python in "${PYTHON_EXECS[@]}"; do

cd "$FWDIR"

# conda / virtualenv enviroments need to be deactivated differently
if [ -n "$USE_CONDA" ]; then
source deactivate
else
deactivate
fi

done
done

Expand Down
3 changes: 2 additions & 1 deletion dev/run-tests-jenkins
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@
# Environment variables are populated by the code here:
#+ https://github.com/jenkinsci/ghprb-plugin/blob/master/src/main/java/org/jenkinsci/plugins/ghprb/GhprbTrigger.java#L139

FWDIR="$(cd "`dirname $0`"/..; pwd)"
FWDIR="$( cd "$( dirname "$0" )/.." && pwd )"
cd "$FWDIR"

export PATH=/home/anaconda/bin:$PATH
exec python -u ./dev/run-tests-jenkins.py "$@"
6 changes: 3 additions & 3 deletions python/run-tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,9 +111,9 @@ def run_individual_python_test(test_name, pyspark_python):


def get_default_python_executables():
python_execs = [x for x in ["python2.6", "python3.4", "pypy"] if which(x)]
if "python2.6" not in python_execs:
LOGGER.warning("Not testing against `python2.6` because it could not be found; falling"
python_execs = [x for x in ["python2.7", "python3.4", "pypy"] if which(x)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we are not supporting 2.6 anymore!?!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed we've been talking about removing it but it's been blocked on Jenkins work.

if "python2.7" not in python_execs:
LOGGER.warning("Not testing against `python2.7` because it could not be found; falling"
" back to `python` instead")
python_execs.insert(0, "python")
return python_execs
Expand Down
1 change: 0 additions & 1 deletion python/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,6 @@ def _supports_symlinks():
'pyspark.ml',
'pyspark.ml.linalg',
'pyspark.ml.param',
'pyspark.ml.stat',
'pyspark.sql',
'pyspark.streaming',
'pyspark.bin',
Expand Down