-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-19955][PySpark] Jenkins Python Conda based test. #17355
Changes from 13 commits
d4b7e37
050c091
3f0e0ef
4808460
267837c
bc4f673
a722140
57a1f6e
6f33633
16d2773
8fe8ada
f99f222
5db1bc7
a7bf53f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -35,32 +35,38 @@ function delete_virtualenv() { | |
} | ||
trap delete_virtualenv EXIT | ||
|
||
PYTHON_EXECS=() | ||
# Some systems don't have pip or virtualenv - in those cases our tests won't work. | ||
if ! hash virtualenv 2>/dev/null; then | ||
echo "Missing virtualenv skipping pip installability tests." | ||
if hash virtualenv 2>/dev/null && [ ! -n "$USE_CONDA" ]; then | ||
echo "virtualenv installed - using. Note if this is a conda virtual env you may wish to set USE_CONDA" | ||
# Figure out which Python execs we should test pip installation with | ||
if hash python2 2>/dev/null; then | ||
# We do this since we are testing with virtualenv and the default virtual env python | ||
# is in /usr/bin/python | ||
PYTHON_EXECS+=('python2') | ||
elif hash python 2>/dev/null; then | ||
# If python2 isn't installed fallback to python if available | ||
PYTHON_EXECS+=('python') | ||
fi | ||
if hash python3 2>/dev/null; then | ||
PYTHON_EXECS+=('python3') | ||
fi | ||
elif hash conda 2>/dev/null; then | ||
echo "Using conda virtual enviroments" | ||
PYTHON_EXECS=('3.5') | ||
USE_CONDA=1 | ||
else | ||
echo "Missing virtualenv & conda, skipping pip installability tests" | ||
exit 0 | ||
fi | ||
if ! hash pip 2>/dev/null; then | ||
echo "Missing pip, skipping pip installability tests." | ||
exit 0 | ||
fi | ||
|
||
# Figure out which Python execs we should test pip installation with | ||
PYTHON_EXECS=() | ||
if hash python2 2>/dev/null; then | ||
# We do this since we are testing with virtualenv and the default virtual env python | ||
# is in /usr/bin/python | ||
PYTHON_EXECS+=('python2') | ||
elif hash python 2>/dev/null; then | ||
# If python2 isn't installed fallback to python if available | ||
PYTHON_EXECS+=('python') | ||
fi | ||
if hash python3 2>/dev/null; then | ||
PYTHON_EXECS+=('python3') | ||
fi | ||
|
||
set -x | ||
# Determine which version of PySpark we are building for archive name | ||
PYSPARK_VERSION=$(python -c "exec(open('python/pyspark/version.py').read());print __version__") | ||
PYSPARK_VERSION=$(python3 -c "exec(open('python/pyspark/version.py').read());print(__version__)") | ||
PYSPARK_DIST="$FWDIR/python/dist/pyspark-$PYSPARK_VERSION.tar.gz" | ||
# The pip install options we use for all the pip commands | ||
PIP_OPTIONS="--upgrade --no-cache-dir --force-reinstall " | ||
|
@@ -75,18 +81,24 @@ for python in "${PYTHON_EXECS[@]}"; do | |
echo "Using $VIRTUALENV_BASE for virtualenv" | ||
VIRTUALENV_PATH="$VIRTUALENV_BASE"/$python | ||
rm -rf "$VIRTUALENV_PATH" | ||
mkdir -p "$VIRTUALENV_PATH" | ||
virtualenv --python=$python "$VIRTUALENV_PATH" | ||
source "$VIRTUALENV_PATH"/bin/activate | ||
# Upgrade pip & friends | ||
pip install --upgrade pip pypandoc wheel | ||
pip install numpy # Needed so we can verify mllib imports | ||
if [ -n "$USE_CONDA" ]; then | ||
conda create -y -p "$VIRTUALENV_PATH" python=$python numpy pandas pip setuptools | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Setting
And that led to a conflict with pypandoc:
manually setting "python=3.5" seemed to clear things up so it could complete the test There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sounds reasonable, for packaging I've made it explicitly request Python 3.5 (at some point if PyPandoc doesn't make it into 3.6 on conda forge we should ping them but no rush). |
||
source activate "$VIRTUALENV_PATH" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I had to add this line after
Otherwise I got this error:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So its not a hard error, and since the workers don't have pandoc installed (a separate binary) leaving it out for now seems like the easist path. Once we're all dockerized and happy we can add pandoc & pypandoc to the docker image. |
||
else | ||
mkdir -p "$VIRTUALENV_PATH" | ||
virtualenv --python=$python "$VIRTUALENV_PATH" | ||
source "$VIRTUALENV_PATH"/bin/activate | ||
fi | ||
# Upgrade pip & friends if using virutal env | ||
if [ ! -n "USE_CONDA" ]; then | ||
pip install --upgrade pip pypandoc wheel numpy | ||
fi | ||
|
||
echo "Creating pip installable source dist" | ||
cd "$FWDIR"/python | ||
# Delete the egg info file if it exists, this can cache the setup file. | ||
rm -rf pyspark.egg-info || echo "No existing egg info file, skipping deletion" | ||
$python setup.py sdist | ||
python setup.py sdist | ||
|
||
|
||
echo "Installing dist into virtual env" | ||
|
@@ -112,6 +124,13 @@ for python in "${PYTHON_EXECS[@]}"; do | |
|
||
cd "$FWDIR" | ||
|
||
# conda / virtualenv enviroments need to be deactivated differently | ||
if [ -n "$USE_CONDA" ]; then | ||
source deactivate | ||
else | ||
deactivate | ||
fi | ||
|
||
done | ||
done | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -111,9 +111,9 @@ def run_individual_python_test(test_name, pyspark_python): | |
|
||
|
||
def get_default_python_executables(): | ||
python_execs = [x for x in ["python2.6", "python3.4", "pypy"] if which(x)] | ||
if "python2.6" not in python_execs: | ||
LOGGER.warning("Not testing against `python2.6` because it could not be found; falling" | ||
python_execs = [x for x in ["python2.7", "python3.4", "pypy"] if which(x)] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this mean we are not supporting 2.6 anymore!?! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indeed we've been talking about removing it but it's been blocked on Jenkins work. |
||
if "python2.7" not in python_execs: | ||
LOGGER.warning("Not testing against `python2.7` because it could not be found; falling" | ||
" back to `python` instead") | ||
python_execs.insert(0, "python") | ||
return python_execs | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this just there for debugging? if so, pls remove before merging. otherwise, consider sticking it at the beginning of the script.