Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-5253] [ML] LinearRegression with L1/L2 (ElasticNet) using OWLQN #4259

Closed
wants to merge 6 commits into from
Closed

Conversation

dbtsai
Copy link
Member

@dbtsai dbtsai commented Jan 29, 2015

No description provided.

@SparkQA
Copy link

SparkQA commented Jan 29, 2015

Test build #26284 has finished for PR 4259 at commit db8efff.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class LinearRegression extends Estimator[LinearRegressionModel] with LinearRegressionParams

@SparkQA
Copy link

SparkQA commented Jan 29, 2015

Test build #26283 has finished for PR 4259 at commit 179f4b3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 29, 2015

Test build #26291 has finished for PR 4259 at commit 5b75cda.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 29, 2015

Test build #26292 has finished for PR 4259 at commit aaaaa29.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 29, 2015

Test build #26296 has started for PR 4259 at commit dc7dcd7.

  • This patch merges cleanly.

@shaneknapp
Copy link
Contributor

i will kick this off again once i restart jenkins.

@shaneknapp
Copy link
Contributor

jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Jan 29, 2015

Test build #26298 has finished for PR 4259 at commit dc7dcd7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -23,6 +23,27 @@ private[ml] trait HasRegParam extends Params {
def getRegParam: Double = get(regParam)
}

private[ml] trait HasElasticNetParam extends HasRegParam {
/** param for elastic net regularization parameter */
val alphaParam: DoubleParam = new DoubleParam(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move it to LinearRegression and rename it to alpha. This is not a shared param.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOR will have it eventually as well. Do you mean we should with HasAlpha with HasRegParam instead of with HasElasticNetParam?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd vote for not using HasRegParam at all and using elastic net-specific terminology such as l1RegParam, l2RegParam or something else which makes it clear which param applies to which part of the regularization.

@mengxr
Copy link
Contributor

mengxr commented Feb 4, 2015

@dbtsai Thanks for contributing elastic-net! It may be hard to merge this into 1.3 now. There are two pending 1.3 features that are relevant: DataFrame API and Prediction APIs, both of which will conflict with this PR. We may want to wait for them merged first and update this PR to avoid resolve conflicts frequently.

@dbtsai
Copy link
Member Author

dbtsai commented Feb 5, 2015

I just rebased and addressed the api changes.

I was thinking about that when we do standardization on the feature, we're actually rescaling on the data which is basically rescaling on the covariant position of the equation. However, we can achieve the same scaling on rescaling on the graidentSum which is contravariant position. Thus, we don't need to apply the scaler on the data which will be much cheaper. This also works for the logistic regression as well. For intercept in LOR, we can deal with it in gradient function instead of using applyBias, so combining both of those two techniques, we don't have to create new dataset.

@SparkQA
Copy link

SparkQA commented Feb 5, 2015

Test build #26812 has finished for PR 4259 at commit 9066e4c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class LinearRegression extends Estimator[LinearRegressionModel] with LinearRegressionParams

@jkbradley
Copy link
Member

@dbtsai I'd like to make a pass over this, but I realized that it has conflicts because of the developer api PR committed last week: [https://github.com//pull/3637] Could you please rebase? I don't think there are any more big PRs coming up which will make you rebase again. Thank you!

@debasish83
Copy link

@dbtsai we are close to merge this PR which brings OWLQN and PQN under the umbrella of proximal algorithms to support most of the interesting ML related constraints scalanlp/breeze#364

I also have another Breeze PR that merges OWLQN and PQN through Proximal Quasi Newton method but that will take time to stabilize..

I would like to add more tests for OWLQN and specifically comparisons with liblinear...Most likely if I find some interesting testcases in your PR, I will move them to NonlinearMinimizerTest...

I will open up a mllib PR that uses NonlinearMinimizer and tests a family of consensus algorithms (Block Coordinate Descent and ADMM)..Our main focus is ElasticNet as well so it will be great if this PR is merged so that I can build upon it...

@debasish83
Copy link

this is linear regression...what happened to the logistic regression elastic net? We are more interested in that one...

@dbtsai
Copy link
Member Author

dbtsai commented Mar 3, 2015

@jkbradley I will rebase soon. @debasish83 I'll add MLOR with elastic-net when we stabilize the new ML api. Doing this in old codebase will be huge effort, and I will like to make them self-contain as much as possible instead of sharing the same generalized linear algorithm base class. We already have enough if else statement to deal with the differences in the base class.

@SparkQA
Copy link

SparkQA commented Mar 24, 2015

Test build #29042 has finished for PR 4259 at commit 66a4dc3.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dbtsai
Copy link
Member Author

dbtsai commented Mar 24, 2015

@jkbradley and @mengxr I just rebased it. Will do couple optimizations to avoid the scaling on the datasets which can be done in the optimization process instead. You guys can start to give me feedback so we have ample time to address issues before 1.4. Thanks.

@SparkQA
Copy link

SparkQA commented Mar 24, 2015

Test build #29043 has finished for PR 4259 at commit ea3e1dc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 24, 2015

Test build #29044 has finished for PR 4259 at commit ec3572f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -34,6 +34,43 @@ private[ml] trait HasRegParam extends Params {
def getRegParam: Double = get(regParam)
}

private[ml] trait HasElasticNetParam extends HasRegParam {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to call this "alphaParam?" That name assumes people have read elastic net papers. What about something like:

  • "elasticNetParam" (very explicit) (I assume this was your original name.)
  • "regMixing" (also explicit)
  • "regAlpha" (at least will be grouped next to "regParam")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I voted for regAlphaParam: Double for the variable name, and keep HasElasticNetParam as the trait name. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the names should definitely match, so we should pick one. I'm OK with "elasticNetParam" or "regAlpha." (I think "regAlpha" doesn't need "Param" attached to it since "alpha" is the name of the parameter.) Since "elastic net" is more easily recognized than "alpha," I vote for "elasticNetParam."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great! I'm convinced to do something like

private[ml] trait HasElasticNetParam extends HasRegParam {
  /**
   * param for elastic net regularization parameter
   * @group param
   */
  val elasticNetParam: DoubleParam =
    new DoubleParam(this, "elasticNetParam", "the ElasticNet mixing parameter")

  /** @group getParam */
  def getElasticNetParam: Double = get(elasticNetParam)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I like that it extends HasRegParam. When you update it, can you please make the doc specify more about the parameter (range, plus which end of the range corresponds to L1 vs. L2)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely. It seems that there is no easy way to specify the valid parameter range or requirement in this framework. Do you think it's a good idea that I add the check like the following?

  def getElasticNetParam: Double = {
    val elasticNetParam = get(elasticNetParam)
    require(elasticNetParam >= 0)
    require(elasticNetParam <= 1)
    elasticNetParam
  }

@jkbradley
Copy link
Member

@dbtsai Thanks for the update! I'll make a more detailed pass soon.

One bigger question: What is the plan for extensions? Based on this PR, it sounds like we expect "LinearRegression" to only support no regularization, L1, L2, or elastic net. I assume that, in the future, users who want other regularization can write their own extensions of GeneralizedLinearAlgorithm (once that is in spark.ml). But would we ever want to add more built-in regularizers? Or are we deciding now that we never will?

@dbtsai
Copy link
Member Author

dbtsai commented Mar 26, 2015

@jkbradley I think we should only support basic regularization in spark.ml first which is what python scikit-learn does. If users have the need of different type of regularization, they can implement it based on the code we have.

It will be hard to implement GeneralizedLinearAlgorithm with regularization without using a lot of if-else statement to handle the special case. I implemented logistic regression, linear regression, and cox proportional-hazards regression with elasticnet regularization at Alpine, and our customers are asking for precise accuracy compared with R's glmnet package. As a result, I spent some time to research the original R's glmnet code, and I found that there is no generic way to handle different linear models. There are special cases here and there.

For example, in logistic regression, the intercept is computed by adding extra one dimension in the data with constant one, but in linear regression, the intercept is computed by val intercept = yMean - dot(weights, scaler.mean).

As a result, I would like to implement them separately and make sure we have the same accuracy compared with R with proper tests first, and then we can abstract out the common part. I have another PR trying to do this, #1518 and I will continuous on that after this PR is merged.

@jkbradley
Copy link
Member

@dbtsai That sounds reasonable. I think we could do a general implementation based on abstractions for regularization and loss functions, but it might not be as efficient as specializations (such as closed-form computation of the intercept for linear regression). I'm OK with this for now, and I definitely support comparisons with R for correctness.

Let me know when it's all ready so I can make a detailed pass!

@SparkQA
Copy link

SparkQA commented Mar 27, 2015

Test build #29327 has finished for PR 4259 at commit 14fd603.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 10, 2015

Test build #29990 has finished for PR 4259 at commit ea8d1a4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 18, 2015

Test build #30508 has finished for PR 4259 at commit 825ca02.

  • This patch fails Scala style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.
  • This patch removes the following dependencies:
    • RoaringBitmap-0.4.5.jar
    • activation-1.1.jar
    • akka-actor_2.10-2.3.4-spark.jar
    • akka-remote_2.10-2.3.4-spark.jar
    • akka-slf4j_2.10-2.3.4-spark.jar
    • aopalliance-1.0.jar
    • arpack_combined_all-0.1.jar
    • avro-1.7.7.jar
    • breeze-macros_2.10-0.11.2.jar
    • breeze_2.10-0.11.2.jar
    • chill-java-0.5.0.jar
    • chill_2.10-0.5.0.jar
    • commons-beanutils-1.7.0.jar
    • commons-beanutils-core-1.8.0.jar
    • commons-cli-1.2.jar
    • commons-codec-1.10.jar
    • commons-collections-3.2.1.jar
    • commons-compress-1.4.1.jar
    • commons-configuration-1.6.jar
    • commons-digester-1.8.jar
    • commons-httpclient-3.1.jar
    • commons-io-2.1.jar
    • commons-lang-2.5.jar
    • commons-lang3-3.3.2.jar
    • commons-math-2.1.jar
    • commons-math3-3.4.1.jar
    • commons-net-2.2.jar
    • compress-lzf-1.0.0.jar
    • config-1.2.1.jar
    • core-1.1.2.jar
    • curator-client-2.4.0.jar
    • curator-framework-2.4.0.jar
    • curator-recipes-2.4.0.jar
    • gmbal-api-only-3.0.0-b023.jar
    • grizzly-framework-2.1.2.jar
    • grizzly-http-2.1.2.jar
    • grizzly-http-server-2.1.2.jar
    • grizzly-http-servlet-2.1.2.jar
    • grizzly-rcm-2.1.2.jar
    • groovy-all-2.3.7.jar
    • guava-14.0.1.jar
    • guice-3.0.jar
    • hadoop-annotations-2.2.0.jar
    • hadoop-auth-2.2.0.jar
    • hadoop-client-2.2.0.jar
    • hadoop-common-2.2.0.jar
    • hadoop-hdfs-2.2.0.jar
    • hadoop-mapreduce-client-app-2.2.0.jar
    • hadoop-mapreduce-client-common-2.2.0.jar
    • hadoop-mapreduce-client-core-2.2.0.jar
    • hadoop-mapreduce-client-jobclient-2.2.0.jar
    • hadoop-mapreduce-client-shuffle-2.2.0.jar
    • hadoop-yarn-api-2.2.0.jar
    • hadoop-yarn-client-2.2.0.jar
    • hadoop-yarn-common-2.2.0.jar
    • hadoop-yarn-server-common-2.2.0.jar
    • ivy-2.4.0.jar
    • jackson-annotations-2.4.0.jar
    • jackson-core-2.4.4.jar
    • jackson-core-asl-1.8.8.jar
    • jackson-databind-2.4.4.jar
    • jackson-jaxrs-1.8.8.jar
    • jackson-mapper-asl-1.8.8.jar
    • jackson-module-scala_2.10-2.4.4.jar
    • jackson-xc-1.8.8.jar
    • jansi-1.4.jar
    • javax.inject-1.jar
    • javax.servlet-3.0.0.v201112011016.jar
    • javax.servlet-3.1.jar
    • javax.servlet-api-3.0.1.jar
    • jaxb-api-2.2.2.jar
    • jaxb-impl-2.2.3-1.jar
    • jcl-over-slf4j-1.7.10.jar
    • jersey-client-1.9.jar
    • jersey-core-1.9.jar
    • jersey-grizzly2-1.9.jar
    • jersey-guice-1.9.jar
    • jersey-json-1.9.jar
    • jersey-server-1.9.jar
    • jersey-test-framework-core-1.9.jar
    • jersey-test-framework-grizzly2-1.9.jar
    • jets3t-0.7.1.jar
    • jettison-1.1.jar
    • jetty-util-6.1.26.jar
    • jline-0.9.94.jar
    • jline-2.10.4.jar
    • jodd-core-3.6.3.jar
    • json4s-ast_2.10-3.2.10.jar
    • json4s-core_2.10-3.2.10.jar
    • json4s-jackson_2.10-3.2.10.jar
    • jsr305-1.3.9.jar
    • jtransforms-2.4.0.jar
    • jul-to-slf4j-1.7.10.jar
    • kryo-2.21.jar
    • log4j-1.2.17.jar
    • lz4-1.2.0.jar
    • management-api-3.0.0-b012.jar
    • mesos-0.21.0-shaded-protobuf.jar
    • metrics-core-3.1.0.jar
    • metrics-graphite-3.1.0.jar
    • metrics-json-3.1.0.jar
    • metrics-jvm-3.1.0.jar
    • minlog-1.2.jar
    • netty-3.8.0.Final.jar
    • netty-all-4.0.23.Final.jar
    • objenesis-1.2.jar
    • opencsv-2.3.jar
    • oro-2.0.8.jar
    • paranamer-2.6.jar
    • parquet-column-1.6.0rc3.jar
    • parquet-common-1.6.0rc3.jar
    • parquet-encoding-1.6.0rc3.jar
    • parquet-format-2.2.0-rc1.jar
    • parquet-generator-1.6.0rc3.jar
    • parquet-hadoop-1.6.0rc3.jar
    • parquet-jackson-1.6.0rc3.jar
    • protobuf-java-2.4.1.jar
    • protobuf-java-2.5.0-spark.jar
    • py4j-0.8.2.1.jar
    • pyrolite-2.0.1.jar
    • quasiquotes_2.10-2.0.1.jar
    • reflectasm-1.07-shaded.jar
    • scala-compiler-2.10.4.jar
    • scala-library-2.10.4.jar
    • scala-reflect-2.10.4.jar
    • scalap-2.10.4.jar
    • scalatest_2.10-2.2.1.jar
    • slf4j-api-1.7.10.jar
    • slf4j-log4j12-1.7.10.jar
    • snappy-java-1.1.1.7.jar
    • spark-bagel_2.10-1.4.0-SNAPSHOT.jar
    • spark-catalyst_2.10-1.4.0-SNAPSHOT.jar
    • spark-core_2.10-1.4.0-SNAPSHOT.jar
    • spark-graphx_2.10-1.4.0-SNAPSHOT.jar
    • spark-launcher_2.10-1.4.0-SNAPSHOT.jar
    • spark-mllib_2.10-1.4.0-SNAPSHOT.jar
    • spark-network-common_2.10-1.4.0-SNAPSHOT.jar
    • spark-network-shuffle_2.10-1.4.0-SNAPSHOT.jar
    • spark-repl_2.10-1.4.0-SNAPSHOT.jar
    • spark-sql_2.10-1.4.0-SNAPSHOT.jar
    • spark-streaming_2.10-1.4.0-SNAPSHOT.jar
    • spire-macros_2.10-0.7.4.jar
    • spire_2.10-0.7.4.jar
    • stax-api-1.0.1.jar
    • stream-2.7.0.jar
    • tachyon-0.5.0.jar
    • tachyon-client-0.5.0.jar
    • uncommons-maths-1.2.2a.jar
    • unused-1.0.0.jar
    • xmlenc-0.52.jar
    • xz-1.0.jar
    • zookeeper-3.4.5.jar

@SparkQA
Copy link

SparkQA commented Apr 18, 2015

Test build #30510 has finished for PR 4259 at commit c867838.

  • This patch fails Scala style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.
  • This patch removes the following dependencies:
    • RoaringBitmap-0.4.5.jar
    • activation-1.1.jar
    • akka-actor_2.10-2.3.4-spark.jar
    • akka-remote_2.10-2.3.4-spark.jar
    • akka-slf4j_2.10-2.3.4-spark.jar
    • aopalliance-1.0.jar
    • arpack_combined_all-0.1.jar
    • avro-1.7.7.jar
    • breeze-macros_2.10-0.11.2.jar
    • breeze_2.10-0.11.2.jar
    • chill-java-0.5.0.jar
    • chill_2.10-0.5.0.jar
    • commons-beanutils-1.7.0.jar
    • commons-beanutils-core-1.8.0.jar
    • commons-cli-1.2.jar
    • commons-codec-1.10.jar
    • commons-collections-3.2.1.jar
    • commons-compress-1.4.1.jar
    • commons-configuration-1.6.jar
    • commons-digester-1.8.jar
    • commons-httpclient-3.1.jar
    • commons-io-2.1.jar
    • commons-lang-2.5.jar
    • commons-lang3-3.3.2.jar
    • commons-math-2.1.jar
    • commons-math3-3.4.1.jar
    • commons-net-2.2.jar
    • compress-lzf-1.0.0.jar
    • config-1.2.1.jar
    • core-1.1.2.jar
    • curator-client-2.4.0.jar
    • curator-framework-2.4.0.jar
    • curator-recipes-2.4.0.jar
    • gmbal-api-only-3.0.0-b023.jar
    • grizzly-framework-2.1.2.jar
    • grizzly-http-2.1.2.jar
    • grizzly-http-server-2.1.2.jar
    • grizzly-http-servlet-2.1.2.jar
    • grizzly-rcm-2.1.2.jar
    • groovy-all-2.3.7.jar
    • guava-14.0.1.jar
    • guice-3.0.jar
    • hadoop-annotations-2.2.0.jar
    • hadoop-auth-2.2.0.jar
    • hadoop-client-2.2.0.jar
    • hadoop-common-2.2.0.jar
    • hadoop-hdfs-2.2.0.jar
    • hadoop-mapreduce-client-app-2.2.0.jar
    • hadoop-mapreduce-client-common-2.2.0.jar
    • hadoop-mapreduce-client-core-2.2.0.jar
    • hadoop-mapreduce-client-jobclient-2.2.0.jar
    • hadoop-mapreduce-client-shuffle-2.2.0.jar
    • hadoop-yarn-api-2.2.0.jar
    • hadoop-yarn-client-2.2.0.jar
    • hadoop-yarn-common-2.2.0.jar
    • hadoop-yarn-server-common-2.2.0.jar
    • ivy-2.4.0.jar
    • jackson-annotations-2.4.0.jar
    • jackson-core-2.4.4.jar
    • jackson-core-asl-1.8.8.jar
    • jackson-databind-2.4.4.jar
    • jackson-jaxrs-1.8.8.jar
    • jackson-mapper-asl-1.8.8.jar
    • jackson-module-scala_2.10-2.4.4.jar
    • jackson-xc-1.8.8.jar
    • jansi-1.4.jar
    • javax.inject-1.jar
    • javax.servlet-3.0.0.v201112011016.jar
    • javax.servlet-3.1.jar
    • javax.servlet-api-3.0.1.jar
    • jaxb-api-2.2.2.jar
    • jaxb-impl-2.2.3-1.jar
    • jcl-over-slf4j-1.7.10.jar
    • jersey-client-1.9.jar
    • jersey-core-1.9.jar
    • jersey-grizzly2-1.9.jar
    • jersey-guice-1.9.jar
    • jersey-json-1.9.jar
    • jersey-server-1.9.jar
    • jersey-test-framework-core-1.9.jar
    • jersey-test-framework-grizzly2-1.9.jar
    • jets3t-0.7.1.jar
    • jettison-1.1.jar
    • jetty-util-6.1.26.jar
    • jline-0.9.94.jar
    • jline-2.10.4.jar
    • jodd-core-3.6.3.jar
    • json4s-ast_2.10-3.2.10.jar
    • json4s-core_2.10-3.2.10.jar
    • json4s-jackson_2.10-3.2.10.jar
    • jsr305-1.3.9.jar
    • jtransforms-2.4.0.jar
    • jul-to-slf4j-1.7.10.jar
    • kryo-2.21.jar
    • log4j-1.2.17.jar
    • lz4-1.2.0.jar
    • management-api-3.0.0-b012.jar
    • mesos-0.21.0-shaded-protobuf.jar
    • metrics-core-3.1.0.jar
    • metrics-graphite-3.1.0.jar
    • metrics-json-3.1.0.jar
    • metrics-jvm-3.1.0.jar
    • minlog-1.2.jar
    • netty-3.8.0.Final.jar
    • netty-all-4.0.23.Final.jar
    • objenesis-1.2.jar
    • opencsv-2.3.jar
    • oro-2.0.8.jar
    • paranamer-2.6.jar
    • parquet-column-1.6.0rc3.jar
    • parquet-common-1.6.0rc3.jar
    • parquet-encoding-1.6.0rc3.jar
    • parquet-format-2.2.0-rc1.jar
    • parquet-generator-1.6.0rc3.jar
    • parquet-hadoop-1.6.0rc3.jar
    • parquet-jackson-1.6.0rc3.jar
    • protobuf-java-2.4.1.jar
    • protobuf-java-2.5.0-spark.jar
    • py4j-0.8.2.1.jar
    • pyrolite-2.0.1.jar
    • quasiquotes_2.10-2.0.1.jar
    • reflectasm-1.07-shaded.jar
    • scala-compiler-2.10.4.jar
    • scala-library-2.10.4.jar
    • scala-reflect-2.10.4.jar
    • scalap-2.10.4.jar
    • scalatest_2.10-2.2.1.jar
    • slf4j-api-1.7.10.jar
    • slf4j-log4j12-1.7.10.jar
    • snappy-java-1.1.1.7.jar
    • spark-bagel_2.10-1.4.0-SNAPSHOT.jar
    • spark-catalyst_2.10-1.4.0-SNAPSHOT.jar
    • spark-core_2.10-1.4.0-SNAPSHOT.jar
    • spark-graphx_2.10-1.4.0-SNAPSHOT.jar
    • spark-launcher_2.10-1.4.0-SNAPSHOT.jar
    • spark-mllib_2.10-1.4.0-SNAPSHOT.jar
    • spark-network-common_2.10-1.4.0-SNAPSHOT.jar
    • spark-network-shuffle_2.10-1.4.0-SNAPSHOT.jar
    • spark-repl_2.10-1.4.0-SNAPSHOT.jar
    • spark-sql_2.10-1.4.0-SNAPSHOT.jar
    • spark-streaming_2.10-1.4.0-SNAPSHOT.jar
    • spire-macros_2.10-0.7.4.jar
    • spire_2.10-0.7.4.jar
    • stax-api-1.0.1.jar
    • stream-2.7.0.jar
    • tachyon-0.5.0.jar
    • tachyon-client-0.5.0.jar
    • uncommons-maths-1.2.2a.jar
    • unused-1.0.0.jar
    • xmlenc-0.52.jar
    • xz-1.0.jar
    • zookeeper-3.4.5.jar

* Trait for shared param elasticNetParam.
*/
@DeveloperApi
trait HasElasticNetParam extends Params {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be a shared param.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you suggest to move it to LinearRegression.scala? we will use it in LOR as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

@SparkQA
Copy link

SparkQA commented Apr 28, 2015

Test build #31090 has finished for PR 4259 at commit b46679c.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
    • trait HasElasticNetParam extends Params
    • trait HasTol extends Params
  • This patch removes the following dependencies:
    • RoaringBitmap-0.4.5.jar
    • activation-1.1.jar
    • akka-actor_2.10-2.3.4-spark.jar
    • akka-remote_2.10-2.3.4-spark.jar
    • akka-slf4j_2.10-2.3.4-spark.jar
    • aopalliance-1.0.jar
    • arpack_combined_all-0.1.jar
    • avro-1.7.7.jar
    • breeze-macros_2.10-0.11.2.jar
    • breeze_2.10-0.11.2.jar
    • chill-java-0.5.0.jar
    • chill_2.10-0.5.0.jar
    • commons-beanutils-1.7.0.jar
    • commons-beanutils-core-1.8.0.jar
    • commons-cli-1.2.jar
    • commons-codec-1.10.jar
    • commons-collections-3.2.1.jar
    • commons-compress-1.4.1.jar
    • commons-configuration-1.6.jar
    • commons-digester-1.8.jar
    • commons-httpclient-3.1.jar
    • commons-io-2.1.jar
    • commons-lang-2.5.jar
    • commons-lang3-3.3.2.jar
    • commons-math-2.1.jar
    • commons-math3-3.4.1.jar
    • commons-net-2.2.jar
    • compress-lzf-1.0.0.jar
    • config-1.2.1.jar
    • core-1.1.2.jar
    • curator-client-2.4.0.jar
    • curator-framework-2.4.0.jar
    • curator-recipes-2.4.0.jar
    • gmbal-api-only-3.0.0-b023.jar
    • grizzly-framework-2.1.2.jar
    • grizzly-http-2.1.2.jar
    • grizzly-http-server-2.1.2.jar
    • grizzly-http-servlet-2.1.2.jar
    • grizzly-rcm-2.1.2.jar
    • groovy-all-2.3.7.jar
    • guava-14.0.1.jar
    • guice-3.0.jar
    • hadoop-annotations-2.2.0.jar
    • hadoop-auth-2.2.0.jar
    • hadoop-client-2.2.0.jar
    • hadoop-common-2.2.0.jar
    • hadoop-hdfs-2.2.0.jar
    • hadoop-mapreduce-client-app-2.2.0.jar
    • hadoop-mapreduce-client-common-2.2.0.jar
    • hadoop-mapreduce-client-core-2.2.0.jar
    • hadoop-mapreduce-client-jobclient-2.2.0.jar
    • hadoop-mapreduce-client-shuffle-2.2.0.jar
    • hadoop-yarn-api-2.2.0.jar
    • hadoop-yarn-client-2.2.0.jar
    • hadoop-yarn-common-2.2.0.jar
    • hadoop-yarn-server-common-2.2.0.jar
    • ivy-2.4.0.jar
    • jackson-annotations-2.4.0.jar
    • jackson-core-2.4.4.jar
    • jackson-core-asl-1.8.8.jar
    • jackson-databind-2.4.4.jar
    • jackson-jaxrs-1.8.8.jar
    • jackson-mapper-asl-1.8.8.jar
    • jackson-module-scala_2.10-2.4.4.jar
    • jackson-xc-1.8.8.jar
    • jansi-1.4.jar
    • javax.inject-1.jar
    • javax.servlet-3.0.0.v201112011016.jar
    • javax.servlet-3.1.jar
    • javax.servlet-api-3.0.1.jar
    • jaxb-api-2.2.2.jar
    • jaxb-impl-2.2.3-1.jar
    • jcl-over-slf4j-1.7.10.jar
    • jersey-client-1.9.jar
    • jersey-core-1.9.jar
    • jersey-grizzly2-1.9.jar
    • jersey-guice-1.9.jar
    • jersey-json-1.9.jar
    • jersey-server-1.9.jar
    • jersey-test-framework-core-1.9.jar
    • jersey-test-framework-grizzly2-1.9.jar
    • jets3t-0.7.1.jar
    • jettison-1.1.jar
    • jetty-util-6.1.26.jar
    • jline-0.9.94.jar
    • jline-2.10.4.jar
    • jodd-core-3.6.3.jar
    • json4s-ast_2.10-3.2.10.jar
    • json4s-core_2.10-3.2.10.jar
    • json4s-jackson_2.10-3.2.10.jar
    • jsr305-1.3.9.jar
    • jtransforms-2.4.0.jar
    • jul-to-slf4j-1.7.10.jar
    • kryo-2.21.jar
    • log4j-1.2.17.jar
    • lz4-1.2.0.jar
    • management-api-3.0.0-b012.jar
    • mesos-0.21.0-shaded-protobuf.jar
    • metrics-core-3.1.0.jar
    • metrics-graphite-3.1.0.jar
    • metrics-json-3.1.0.jar
    • metrics-jvm-3.1.0.jar
    • minlog-1.2.jar
    • netty-3.8.0.Final.jar
    • netty-all-4.0.23.Final.jar
    • objenesis-1.2.jar
    • opencsv-2.3.jar
    • oro-2.0.8.jar
    • paranamer-2.6.jar
    • parquet-column-1.6.0rc3.jar
    • parquet-common-1.6.0rc3.jar
    • parquet-encoding-1.6.0rc3.jar
    • parquet-format-2.2.0-rc1.jar
    • parquet-generator-1.6.0rc3.jar
    • parquet-hadoop-1.6.0rc3.jar
    • parquet-jackson-1.6.0rc3.jar
    • protobuf-java-2.4.1.jar
    • protobuf-java-2.5.0-spark.jar
    • py4j-0.8.2.1.jar
    • pyrolite-2.0.1.jar
    • quasiquotes_2.10-2.0.1.jar
    • reflectasm-1.07-shaded.jar
    • scala-compiler-2.10.4.jar
    • scala-library-2.10.4.jar
    • scala-reflect-2.10.4.jar
    • scalap-2.10.4.jar
    • scalatest_2.10-2.2.1.jar
    • slf4j-api-1.7.10.jar
    • slf4j-log4j12-1.7.10.jar
    • snappy-java-1.1.1.7.jar
    • spark-bagel_2.10-1.4.0-SNAPSHOT.jar
    • spark-catalyst_2.10-1.4.0-SNAPSHOT.jar
    • spark-core_2.10-1.4.0-SNAPSHOT.jar
    • spark-graphx_2.10-1.4.0-SNAPSHOT.jar
    • spark-launcher_2.10-1.4.0-SNAPSHOT.jar
    • spark-mllib_2.10-1.4.0-SNAPSHOT.jar
    • spark-network-common_2.10-1.4.0-SNAPSHOT.jar
    • spark-network-shuffle_2.10-1.4.0-SNAPSHOT.jar
    • spark-repl_2.10-1.4.0-SNAPSHOT.jar
    • spark-sql_2.10-1.4.0-SNAPSHOT.jar
    • spark-streaming_2.10-1.4.0-SNAPSHOT.jar
    • spire-macros_2.10-0.7.4.jar
    • spire_2.10-0.7.4.jar
    • stax-api-1.0.1.jar
    • stream-2.7.0.jar
    • tachyon-0.6.4.jar
    • tachyon-client-0.6.4.jar
    • uncommons-maths-1.2.2a.jar
    • unused-1.0.0.jar
    • xmlenc-0.52.jar
    • xz-1.0.jar
    • zookeeper-3.4.5.jar

}

axpy(-diffSum, correction, result)
scal(1.0 / totalCnt, result)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I finally found why correction effect is zero. It's because diffSum is zero in our test dataset. diffSum is sum of diff, and for a synthetic dataset generated from linear equation with noise, the average of diff will be zero. As a result, for a real non-linear dataset, diffSum will not be zero, so we need some non-linear dataset for testing correctness. I'll add famous prostate cancer dataset used in the linear regression lasso paper into the unit-test.

DB Tsai and others added 4 commits April 28, 2015 00:11
@SparkQA
Copy link

SparkQA commented Apr 28, 2015

Test build #31130 has finished for PR 4259 at commit 9fc48ed.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait HasElasticNetParam extends Params
    • trait HasTol extends Params
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 28, 2015

Test build #31133 has finished for PR 4259 at commit a81c201.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait HasElasticNetParam extends Params
    • trait HasTol extends Params
  • This patch does not change any dependencies.

@mengxr
Copy link
Contributor

mengxr commented Apr 28, 2015

LGTM. Merged into master. Thanks!!

@asfgit asfgit closed this in 6a827d5 Apr 28, 2015
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 14, 2015
Author: DB Tsai <dbt@netflix.com>
Author: DB Tsai <dbtsai@alpinenow.com>

Closes apache#4259 from dbtsai/lir and squashes the following commits:

a81c201 [DB Tsai] add import org.apache.spark.util.Utils back
9fc48ed [DB Tsai] rebase
2178b63 [DB Tsai] add comments
9988ca8 [DB Tsai] addressed feedback and fixed a bug. TODO: documentation and build another synthetic dataset which can catch the bug fixed in this commit.
fcbaefe [DB Tsai] Refactoring
4eb078d [DB Tsai] first commit
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
Author: DB Tsai <dbt@netflix.com>
Author: DB Tsai <dbtsai@alpinenow.com>

Closes apache#4259 from dbtsai/lir and squashes the following commits:

a81c201 [DB Tsai] add import org.apache.spark.util.Utils back
9fc48ed [DB Tsai] rebase
2178b63 [DB Tsai] add comments
9988ca8 [DB Tsai] addressed feedback and fixed a bug. TODO: documentation and build another synthetic dataset which can catch the bug fixed in this commit.
fcbaefe [DB Tsai] Refactoring
4eb078d [DB Tsai] first commit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants