Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK-1428: MLlib should convert non-float64 NumPy arrays to float64 instead of complaining #356

Closed
wants to merge 1 commit into from

Conversation

techaddict
Copy link
Contributor

No description provided.

@mengxr
Copy link
Contributor

mengxr commented Apr 8, 2014

@techaddict Is it easy to add a test to verify that it works?

@techaddict
Copy link
Contributor Author

@mengxr ok i'll try adding some test's

@mateiz
Copy link
Contributor

mateiz commented Apr 8, 2014

Jenkins, this is ok to test

@mateiz
Copy link
Contributor

mateiz commented Apr 8, 2014

You should also check that the vector does not contain complex numbers, since that is the one NumPy data type we can't convert to floats. You can do

if numpy.issubdtype(v.dtype, numpy.complex):
    raise TypeError(...)

@techaddict
Copy link
Contributor Author

@mateiz updated, should i add tests too (IMHO i don't think there is need, because its a trivial patch ) ?

@mateiz
Copy link
Contributor

mateiz commented Apr 9, 2014

Actually one more thing, can you set copy=true in astype (or just not set it)? I'm not sure that it preserves the right byte order and such and I know that this works with copy=true. Also we don't want to mess with the user's input data.

@mateiz
Copy link
Contributor

mateiz commented Apr 9, 2014

Sorry meant true, not false.

@techaddict
Copy link
Contributor Author

@mateiz You mean
v = v.astype(float64)
or v = v.astype(float64, copy=True)

@techaddict
Copy link
Contributor Author

@mateiz is there any other problem ?

@mateiz
Copy link
Contributor

mateiz commented Apr 10, 2014

This seems to be failing tests unfortunately -- click through to the Jenkins log. This is what it says:

=========================================================================
Running PySpark tests
=========================================================================
**********************************************************************
File "pyspark/mllib/_common.py", line 76, in __main__._deserialize_double_vector
Failed example:
    array_equal(x, _deserialize_double_vector(_serialize_double_vector(x)))
Exception raised:
    Traceback (most recent call last):
      File "/usr/local/lib/python2.7/doctest.py", line 1289, in __run
        compileflags, 1) in test.globs
      File "<doctest __main__._deserialize_double_vector[1]>", line 1, in <module>
        array_equal(x, _deserialize_double_vector(_serialize_double_vector(x)))
      File "pyspark/mllib/_common.py", line 55, in _serialize_double_vector
        if numpy.issubdtype(v.dtype, numpy.complex):
    NameError: global name 'numpy' is not defined
**********************************************************************
   1 of   2 in __main__._deserialize_double_vector
***Test Failed*** 1 failures.
Had test failures; see logs.

@mateiz
Copy link
Contributor

mateiz commented Apr 10, 2014

This is why it would be good to add a test actually. For instance you can add a test with _deserialize_double_vector(_serialize_double_vector(array([1,2,3])) and check that it returns array([1.0,2.0,3.0]).

@techaddict
Copy link
Contributor Author

@mateiz done working now 👍

@mateiz
Copy link
Contributor

mateiz commented Apr 10, 2014

Thanks Sandeep. Merged into master and 1.0.

@techaddict techaddict closed this Apr 10, 2014
@techaddict techaddict reopened this Apr 10, 2014
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@asfgit asfgit closed this in 3bd3129 Apr 10, 2014
asfgit pushed a commit that referenced this pull request Apr 10, 2014
…instead of complaining

Author: Sandeep <sandeep@techaddict.me>

Closes #356 from techaddict/1428 and squashes the following commits:

3bdf5f6 [Sandeep] SPARK-1428: MLlib should convert non-float64 NumPy arrays to float64 instead of complaining

(cherry picked from commit 3bd3129)
Signed-off-by: Matei Zaharia <matei@databricks.com>
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
…instead of complaining

Author: Sandeep <sandeep@techaddict.me>

Closes apache#356 from techaddict/1428 and squashes the following commits:

3bdf5f6 [Sandeep] SPARK-1428: MLlib should convert non-float64 NumPy arrays to float64 instead of complaining
@techaddict techaddict deleted the 1428 branch July 3, 2016 04:59
tangzhankun pushed a commit to tangzhankun/spark that referenced this pull request Jul 21, 2017
erikerlandson pushed a commit to erikerlandson/spark that referenced this pull request Jul 28, 2017
mccheah pushed a commit to mccheah/spark that referenced this pull request Oct 3, 2018
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
…ndpoint

Enable legacy endpoint format for docker-machine
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants