Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-19955][PySpark] Jenkins Python Conda based test. #17355

Conversation

holdenk
Copy link
Contributor

@holdenk holdenk commented Mar 20, 2017

What changes were proposed in this pull request?

Allow Jenkins Python tests to use the installed conda to test Python 2.7 support & test pip installability.

How was this patch tested?

Updated shell scripts, ran tests locally with installed conda, ran tests in Jenkins.

@SparkQA
Copy link

SparkQA commented Mar 20, 2017

Test build #74853 has started for PR 17355 at commit 267837c.

@holdenk
Copy link
Contributor Author

holdenk commented Mar 20, 2017

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Mar 20, 2017

Test build #74871 has finished for PR 17355 at commit 267837c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@holdenk
Copy link
Contributor Author

holdenk commented Mar 21, 2017

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Mar 21, 2017

Test build #74998 has finished for PR 17355 at commit 267837c.

  • This patch fails PySpark pip packaging tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 22, 2017

Test build #75025 has finished for PR 17355 at commit bc4f673.

  • This patch fails PySpark pip packaging tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@holdenk
Copy link
Contributor Author

holdenk commented Mar 22, 2017

Jenkins retest this please.

@SparkQA
Copy link

SparkQA commented Mar 22, 2017

Test build #75033 has started for PR 17355 at commit bc4f673.

@SparkQA
Copy link

SparkQA commented Mar 22, 2017

Test build #75057 has finished for PR 17355 at commit a722140.

  • This patch fails PySpark pip packaging tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 23, 2017

Test build #75063 has finished for PR 17355 at commit 57a1f6e.

  • This patch fails PySpark pip packaging tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 23, 2017

Test build #75073 has finished for PR 17355 at commit 6f33633.

  • This patch fails PySpark pip packaging tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 23, 2017

Test build #75081 has started for PR 17355 at commit 16d2773.

@holdenk
Copy link
Contributor Author

holdenk commented Mar 23, 2017

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Mar 23, 2017

Test build #75090 has finished for PR 17355 at commit 16d2773.

  • This patch fails PySpark pip packaging tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@holdenk , I ran run-tests which worked fine and tried out run-pip-tests with USE_CONDA set. I ran into some of the above issues and then finally get this error: error: package directory 'pyspark/ml/stat' does not exist
Looks like from here https://github.com/apache/spark/blob/master/python/setup.py#L170 and I don't see that module exists, is that right?
After I removed ml.stats module from there, the tests ran.

pip install --upgrade pip pypandoc wheel
pip install numpy # Needed so we can verify mllib imports
if [ -n "$USE_CONDA" ]; then
conda create -y -p "$VIRTUALENV_PATH" python=$python numpy pandas pip setuptools
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting python=$python led to "python=3" which then tried to install python 3.6

+ conda create -y -p /tmp/tmp.OymEZOKFzo/3 python=3 numpy pandas pip setuptools
Fetching package metadata .........
Solving package specifications: .

Package plan for installation in environment /tmp/tmp.OymEZOKFzo/3:

The following NEW packages will be INSTALLED:

    mkl:             2017.0.1-0        
    numpy:           1.12.1-py36_0     
    openssl:         1.0.2k-1          
    pandas:          0.19.2-np112py36_1
    pip:             9.0.1-py36_1      
    python:          3.6.1-0
    ...

And that led to a conflict with pypandoc:

UnsatisfiableError: The following specifications were found to be in conflict:
  - pypandoc -> python 3.5* -> sqlite 3.9.*
  - pypandoc -> python 3.5* -> xz 5.0.*
  - python 3.6*

manually setting "python=3.5" seemed to clear things up so it could complete the test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds reasonable, for packaging I've made it explicitly request Python 3.5 (at some point if PyPandoc doesn't make it into 3.6 on conda forge we should ping them but no rush).

pip install numpy # Needed so we can verify mllib imports
if [ -n "$USE_CONDA" ]; then
conda create -y -p "$VIRTUALENV_PATH" python=$python numpy pandas pip setuptools
source activate "$VIRTUALENV_PATH"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to add this line after source activate .. to get pypandoc installed

conda install -y -c conda-forge pypandoc

Otherwise I got this error:

Could not import pypandoc - required to package PySpark

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So its not a hard error, and since the workers don't have pandoc installed (a separate binary) leaving it out for now seems like the easist path. Once we're all dockerized and happy we can add pandoc & pypandoc to the docker image.

python_execs = [x for x in ["python2.6", "python3.4", "pypy"] if which(x)]
if "python2.6" not in python_execs:
LOGGER.warning("Not testing against `python2.6` because it could not be found; falling"
python_execs = [x for x in ["python2.7", "python3.4", "pypy"] if which(x)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we are not supporting 2.6 anymore!?!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed we've been talking about removing it but it's been blocked on Jenkins work.

@SparkQA
Copy link

SparkQA commented Mar 24, 2017

Test build #75138 has finished for PR 17355 at commit 8fe8ada.

  • This patch fails PySpark pip packaging tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 26, 2017

Test build #75224 has finished for PR 17355 at commit f99f222.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@holdenk
Copy link
Contributor Author

holdenk commented Mar 26, 2017

.@bryanxutler so I left out pypandoc because there isn't pandoc on the machines and it's optional (prints a warning to stderr - but should work fine). I get back from vacation next week so let's chat then :)

@holdenk
Copy link
Contributor Author

holdenk commented Mar 26, 2017

Oops @BryanCutler damn phone keyboard.

@holdenk holdenk changed the title [SPARK-19955][WIP][PySpark] Jenkins Python Conda based test. [SPARK-19955][PySpark] Jenkins Python Conda based test. Mar 27, 2017
@holdenk
Copy link
Contributor Author

holdenk commented Mar 27, 2017

cc @JoshRosen & @shaneknapp : this PR allows us to keep our existing Jenkins worker setup while still moving away from 2.6 to 2.7 & enables pip packaging tests in Jenkins.

@SparkQA
Copy link

SparkQA commented Mar 27, 2017

Test build #75274 has finished for PR 17355 at commit 5db1bc7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shaneknapp
Copy link
Contributor

shaneknapp commented Mar 27, 2017 via email

@holdenk
Copy link
Contributor Author

holdenk commented Mar 28, 2017

Hope you feel better soon @shaneknapp :)

PYTHON_EXECS+=('python3')
fi

set -x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this just there for debugging? if so, pls remove before merging. otherwise, consider sticking it at the beginning of the script.

@SparkQA
Copy link

SparkQA commented Mar 29, 2017

Test build #75333 has finished for PR 17355 at commit a7bf53f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shaneknapp
Copy link
Contributor

lgtm++

@holdenk
Copy link
Contributor Author

holdenk commented Mar 29, 2017

Great, yay 2.6 deprecation adventures :)

@asfgit asfgit closed this in d6ddfdf Mar 29, 2017
@holdenk
Copy link
Contributor Author

holdenk commented Mar 29, 2017

Merged to master. Please do not backport.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants