Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-7383][ML] Feature Parity in PySpark for ml.features #5991

Closed
wants to merge 4 commits into from

Conversation

brkyvz
Copy link
Contributor

@brkyvz brkyvz commented May 7, 2015

Implemented python wrappers for Scala functions that don't exist in ml.features

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 7, 2015

Test build #32156 has started for PR 5991 at commit bd39fd2.

@SparkQA
Copy link

SparkQA commented May 8, 2015

Test build #32156 has finished for PR 5991 at commit bd39fd2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Binarizer(JavaTransformer, HasInputCol, HasOutputCol):
    • class IDF(JavaEstimator, HasInputCol, HasOutputCol):
    • class IDFModel(JavaModel):
    • class Normalizer(JavaTransformer, HasInputCol, HasOutputCol):
    • class OneHotEncoder(JavaTransformer, HasInputCol, HasOutputCol):
    • class PolynomialExpansion(JavaTransformer, HasInputCol, HasOutputCol):
    • class StandardScaler(JavaEstimator, HasInputCol, HasOutputCol):
    • class StandardScalerModel(JavaModel):
    • class StringIndexer(JavaEstimator, HasInputCol, HasOutputCol):
    • class StringIndexerModel(JavaModel):
    • class Tokenizer(JavaTransformer, HasInputCol, HasOutputCol):
    • class VectorIndexer(JavaEstimator, HasInputCol, HasOutputCol):
    • class Word2Vec(JavaEstimator, HasStepSize, HasMaxIter, HasSeed, HasInputCol, HasOutputCol):
    • class Word2VecModel(JavaModel):
    • class HasSeed(Params):
    • class HasTol(Params):
    • class HasStepSize(Params):

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32156/
Test PASSed.

Traceback (most recent call last):
...
TypeError: Method setParams forces keyword arguments.
>>> df = sc.parallelize([Row(values=0.5)]).toDF()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I'm not sure which one is the recommended approach to create a DataFrame. @rxin

df = sc.parallelize([Row(values=0.5)]).toDF()

vs.

df = sqlContext.createDataFrame([(0.5,)], ["values"]) # don't need to import Row

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the 2nd approach

@mengxr
Copy link
Contributor

mengxr commented May 8, 2015

@brkyvz Thanks for working on this! It looks good except the variable naming in the doctests. It seems that RegexTokenizer is missing from the list. Could you add it as well?

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 8, 2015

Test build #32240 has started for PR 5991 at commit adcca55.

@mengxr
Copy link
Contributor

mengxr commented May 8, 2015

LGTM pending Jenkins.

@SparkQA
Copy link

SparkQA commented May 8, 2015

Test build #32240 has finished for PR 5991 at commit adcca55.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Binarizer(JavaTransformer, HasInputCol, HasOutputCol):
    • class IDF(JavaEstimator, HasInputCol, HasOutputCol):
    • class IDFModel(JavaModel):
    • class Normalizer(JavaTransformer, HasInputCol, HasOutputCol):
    • class OneHotEncoder(JavaTransformer, HasInputCol, HasOutputCol):
    • class PolynomialExpansion(JavaTransformer, HasInputCol, HasOutputCol):
    • class RegexTokenizer(JavaTransformer, HasInputCol, HasOutputCol):
    • class StandardScaler(JavaEstimator, HasInputCol, HasOutputCol):
    • class StandardScalerModel(JavaModel):
    • class StringIndexer(JavaEstimator, HasInputCol, HasOutputCol):
    • class StringIndexerModel(JavaModel):
    • class Tokenizer(JavaTransformer, HasInputCol, HasOutputCol):
    • class VectorIndexer(JavaEstimator, HasInputCol, HasOutputCol):
    • class Word2Vec(JavaEstimator, HasStepSize, HasMaxIter, HasSeed, HasInputCol, HasOutputCol):
    • class Word2VecModel(JavaModel):
    • class HasSeed(Params):
    • class HasTol(Params):
    • class HasStepSize(Params):

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32240/
Test PASSed.

@mengxr
Copy link
Contributor

mengxr commented May 8, 2015

Merged into master and branch-1.4. Thanks!

@asfgit asfgit closed this in f5ff4a8 May 8, 2015
asfgit pushed a commit that referenced this pull request May 8, 2015
Implemented python wrappers for Scala functions that don't exist in `ml.features`

Author: Burak Yavuz <brkyvz@gmail.com>

Closes #5991 from brkyvz/ml-feat-PR and squashes the following commits:

adcca55 [Burak Yavuz] add regex tokenizer to __all__
b91cb44 [Burak Yavuz] addressed comments
bd39fd2 [Burak Yavuz] remove addition
b82bd7c [Burak Yavuz] Parity in PySpark for ml.features

(cherry picked from commit f5ff4a8)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
Implemented python wrappers for Scala functions that don't exist in `ml.features`

Author: Burak Yavuz <brkyvz@gmail.com>

Closes apache#5991 from brkyvz/ml-feat-PR and squashes the following commits:

adcca55 [Burak Yavuz] add regex tokenizer to __all__
b91cb44 [Burak Yavuz] addressed comments
bd39fd2 [Burak Yavuz] remove addition
b82bd7c [Burak Yavuz] Parity in PySpark for ml.features
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
Implemented python wrappers for Scala functions that don't exist in `ml.features`

Author: Burak Yavuz <brkyvz@gmail.com>

Closes apache#5991 from brkyvz/ml-feat-PR and squashes the following commits:

adcca55 [Burak Yavuz] add regex tokenizer to __all__
b91cb44 [Burak Yavuz] addressed comments
bd39fd2 [Burak Yavuz] remove addition
b82bd7c [Burak Yavuz] Parity in PySpark for ml.features
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
Implemented python wrappers for Scala functions that don't exist in `ml.features`

Author: Burak Yavuz <brkyvz@gmail.com>

Closes apache#5991 from brkyvz/ml-feat-PR and squashes the following commits:

adcca55 [Burak Yavuz] add regex tokenizer to __all__
b91cb44 [Burak Yavuz] addressed comments
bd39fd2 [Burak Yavuz] remove addition
b82bd7c [Burak Yavuz] Parity in PySpark for ml.features
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants