Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MLLib]SPARK-6348:Enable useFeatureScaling in SVMWithSGD #5055

Closed
wants to merge 7 commits into from

Conversation

tanyinyan
Copy link

set useFeatureScaling true in SVMWithSGD, the problem describled in jira (https://issues.apache.org/jira/browse/SPARK-6348)

@srowen
Copy link
Member

srowen commented Mar 17, 2015

ok to test

@srowen
Copy link
Member

srowen commented Mar 17, 2015

It looks reasonable, but this then forces feature scaling, which sort of changes behavior.
Hm, can this class just be made instantiable so that people can set this as they like? I don't know if there's a good reason that its constructor was made private. @jkbradley

@SparkQA
Copy link

SparkQA commented Mar 17, 2015

Test build #28716 has finished for PR 5055 at commit 158a766.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tanyinyan
Copy link
Author

test failed. Is there something wrong with the test case? set feature scaling true will change the predict result. @srowen @AmplabJenkins


File "pyspark/mllib/classification.py", line 232, in main.SVMModel
Failed example:
svm.predict(array([1.0]))
Expected:
1.25...
Got:
1.0107186024067978


@srowen
Copy link
Member

srowen commented Mar 17, 2015

No, it's almost certainly a result of your change. The prediction of the SVM changes if you force it to scale features. This is why I'm not sure this is the right thing to do; it swaps one hard-coded behavior for the other, and changes behavior. I'd prefer to simply make this selectable for all models.

@SparkQA
Copy link

SparkQA commented Mar 18, 2015

Test build #28757 has finished for PR 5055 at commit 249d36a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class SVMWithSGD (

provide a interface in object SVMWithSGD,to set useFeatureScaling
@tanyinyan
Copy link
Author

Agree with you. I commited a version to make SVMWithSGD class public ,but it fails Spark unit tests. I don't know why, Maybe provide a interface in object SVMWithSGD is also ok? @srowen

@SparkQA
Copy link

SparkQA commented Mar 18, 2015

Test build #28780 has finished for PR 5055 at commit ef437cb.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 18, 2015

Test build #28782 has finished for PR 5055 at commit 26558da.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Mar 18, 2015

Hm, maybe. I am sort of reluctant to add yet another utility method overload to expose this when we could just add it as a setter. @mengxr did you say you also favored not adding more of these methods in the objects? what do you think about making this constructor non-private to allow access directly to the class and its setter for feature scaling?

@SparkQA
Copy link

SparkQA commented Mar 19, 2015

Test build #28850 has finished for PR 5055 at commit 3c622f8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class SVMWithSGD (

make setFeatureScaling public
@SparkQA
Copy link

SparkQA commented Mar 19, 2015

Test build #28854 has finished for PR 5055 at commit 2dc9cb8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class SVMWithSGD (

@tanyinyan
Copy link
Author

Yes,I have made this constructor and setter public.@srowen

@srowen
Copy link
Member

srowen commented Mar 19, 2015

OK. I want to see if @mengxr is OK with this as it adds a new bit of API, technically. I think we'd want to document the params and perhaps mark this as ::Experimental:: if this is exposed.

Document the params of SVMWithSGD constructor and mark it as ::Experimental::
@SparkQA
Copy link

SparkQA commented Mar 20, 2015

Test build #28902 has finished for PR 5055 at commit 32c8507.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class SVMWithSGD (

@tanyinyan
Copy link
Author

I already document the params of SVMWithSGD constructor and mark it as ::Experimental::

@mengxr
Copy link
Contributor

mengxr commented Mar 20, 2015

Sorry for my late response! I tend to not expose this method to keep the API lightweight. If we feel feature scaling is good, we can enable it by default. For example, LIBLINEAR uses feature scaling but it doesn't expose it to users.

@jkbradley
Copy link
Member

Apologies for my late response too. I feel like what we really need to do is clarify the intended behavior of feature scaling. Currently, feature scaling changes the optimal solution vector for regularized models since it changes each feature's relative amount of regularization. I see 2 options:

  • Keep the current behavior.
    • If we go with this behavior, then we should expose it as an option since it changes the optimal solution vector.
  • Adjust the regularization parameter for each feature such that the optimal solution vector is identical (after rescaling) to the solution for the original problem (before scaling).
    • I believe this is what libsvm (or maybe liblinear) does.
    • If we do feature scaling under the hood (and do not expose it as an option), then we should use this behavior. Otherwise, users will be confused when the optimal solution is not what they would expect.

I strongly vote for the 2nd option: It has the intended benefit of improving optimization behavior, and it is better for users since it gives them the solution they expect.

@srowen
Copy link
Member

srowen commented Mar 27, 2015

@tanyinyan are you in a position to implement @jkbradley's suggestion? it's more involved for sure but it would indeed not change the solution, which sounds nice. I apologize for taking this down the wrong path of "option 1", exposing as an option.

@tanyinyan
Copy link
Author

Hi @jkbradley , @srowen , I'm considering "option 2" these days, and look up what libsvm and liblinear does . And find that, both in libsvm and liblinear , scaling changes the performance.

In libsvm guide:http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf ," We recommend linearly
scaling each attribute to the range [−1, +1] or [0, 1]", in the appendix A,there are some examples that scaling improve the accuracy.

The same as in liblinear:http://cran.r-project.org/web/packages/LiblineaR/LiblineaR.pdf , "Classification models usually perform better if each dimension of the data is first centered and scaled."

The goal to do scaling is to improve accuracy and the convergence rate, if the optimal solution vector is identical(before and after scaling), then it is meaningless to do scaling.

So, I suggest "setFeatureScaling(true)" (don't expose it as an option) as what is done in class LogisticRegressionWithLBFGS

@jkbradley
Copy link
Member

@tanyinyan I think what you're arguing for is actually option (1). I propose this combination of the solutions:

Expose setFeatureScaling() as an option. Default to true.

If featureScaling is true, then we scale features and do not adjust regularization. This will change the optimal solution, but as in your references, it is generally better to do anyways. (My experience is the same.)

If featureScaling is false, then we scale features internally but also adjust regularization. This will improve optimization behavior but will not change the optimal solution.

Defaulting to true will mean the algorithm will probably do the best thing by default, but will allow informed users to get what they really want if necessary.

This proposal will also avoid an API change since the meaning of featureScaling will stay the same.

@tanyinyan
Copy link
Author

" If featureScaling is false, then we scale features internally but also adjust regularization. This will improve optimization behavior but will not change the optimal solution."

@jkbradley , I'm not understanding the meaning of "optimization behavior" here, does it means convergence rate? If we scale features internally and also adjust regularization, then we will get the same gradient as not scale features for every labeled point, so I think if the optimal solution is not changed , so does the optimization behavior.

Have I understood correctly?

@jkbradley
Copy link
Member

@tanyinyan "Optimization behavior" means convergence rate, yes.

If we scale feature internally and also adjust regularization, then:

  • The optimal solution will not change. (I agree with you on this.)
  • The optimization behavior will change. This is because we use a single step size for all features.
    • E.g., suppose we have 2 features a and b, where norm(column b) = 1000 * norm(column a). The step size we use needs to be adjusted based on the norm of the feature columns; since column b has a really big norm, we will need to use a very small step size. This means we will progress really slowly, especially if (a) is the useful feature.

Does that make sense?

We could actually adjust the step size for each feature, rather than scaling the data. Come to think of it, that might be a more efficient solution since it's cheaper than creating a new copy of the data. I'll make a JIRA for that since it belongs in another PR.

@jkbradley
Copy link
Member

After speaking with @mengxr we might want to make some hard decisions about changing behaviors and hiding feature scaling. I noted them here: [https://issues.apache.org/jira/browse/SPARK-6683]

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@srowen
Copy link
Member

srowen commented Apr 27, 2015

I think this is WontFix in favor of SPARK-6683?

@mengxr
Copy link
Contributor

mengxr commented Apr 27, 2015

Yes. @tanyinyan Do you mind closing this for now? Thanks everyone for the discussion!

@tanyinyan
Copy link
Author

ok. @mengxr , @jkbradley how is SPARK-6683 going? I'm very glad if I could join and do some contribution to it.

@dbtsai
Copy link
Member

dbtsai commented Apr 28, 2015

SPARK-6683 will be more like doing the feature scaling within the objective function. LinearRegression with L1/L2 (ElasticNet) using OWLQN, #4259 does the scaling internally so there is no need for creating a new dataset.

@asfgit asfgit closed this in 555213e Apr 28, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants