transform_feature_names for scalers #229

kmike · 2017-07-31T15:08:06Z

I can see 3 main ways to show feature names for scalers:

display feature names as-is (like it is done in [MRG] ENH Add get_feature_names for various transformers scikit-learn/scikit-learn#6431);
show that feature names are scaled/normalized, but hide the details, e.g. scaled(x1);
show the complete formula, e.g. (x1*0.312 - 1.232) for StandardScaler

In this PR (1) is implemented; at least it is more useful than doing nothing.

It seems we may want an optional "verbose mode" for feature names, as there are use cases you want the whole formula, and there are use cases you only care about input feature names.

codecov-io · 2017-07-31T15:14:39Z

Codecov Report

Merging #229 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #229      +/-   ##
==========================================
+ Coverage   97.26%   97.27%   +<.01%     
==========================================
  Files          42       42              
  Lines        2673     2682       +9     
  Branches      515      517       +2     
==========================================
+ Hits         2600     2609       +9     
  Misses         38       38              
  Partials       35       35

Impacted Files	Coverage Δ
eli5/sklearn/transform.py	`100% <100%> (ø)`	⬆️

kmike · 2017-07-31T15:42:30Z

@lopuhin @jnothman what do you think?

lopuhin

Looks good to me, this is the behaviour I would expect. Left a minor question.

lopuhin · 2017-07-31T15:47:39Z

eli5/sklearn/transform.py

+    if in_names is None:
+        in_names = _get_feature_names(est, feature_names=in_names,
+                                      num_features=est.scale_.shape[0])
+    return [name for name in in_names]


what does this list comprehension do - is it the same as list(in_names), or you wanted to add something more?

Yes, I'm converting FeatureNames instance (which comes from _get_feature_names) to a list. It is also a left-over from my experiments with more elaborate feature names, when you don't pass names as-is.

jnothman

I think this is right, though as I've suggested elsewhere, for TFIDF I'd like the IDF to be noted. I've also wished we could just avoid this decision by having a structured representation of feature description. Something JSONable, for instance.

The tests are changed in #208, and while I dither over fixing up that PR, I wonder if we should pull the test changes into something separate.

jnothman · 2017-08-01T08:17:26Z

eli5/sklearn/transform.py

+@transform_feature_names.register(StandardScaler)
+@transform_feature_names.register(MaxAbsScaler)
+@transform_feature_names.register(RobustScaler)
+def _select_scaling(est, in_names=None):


I think you mean transform, not select

jnothman · 2017-08-01T08:26:30Z

Outputting feature descriptions as JsonLogic, perhaps??

kmike · 2017-08-01T18:21:19Z

I like the JsonLogic idea, to output expressions used for computing feature names.

+1 to pull in test changes to make updating #208 easier, but I'm also fine with merging #208 with a few minor changes :)

jnothman · 2017-08-01T22:52:51Z

I'm sure the jsonlogic idea is more attractive to computer scientists than date scientists though ;)

…

On 2 Aug 2017 4:21 am, "Mikhail Korobov" ***@***.***> wrote: I like the JsonLogic idea, to output expressions used for computing feature names. +1 to pull in test changes to make updating #208 <#208> easier, but I'm also fine with merging #208 <#208> with a few minor changes :) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#229 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz67j6OiWBDyvGsMpJRjzS8Iq8LHtHks5sT2yfgaJpZM4OohgJ> .

jnothman · 2017-08-01T22:54:01Z

One advantage of the jsonlogic idea is that sub-expressions are objects bring reused so it's relatively memory efficient.

…

On 2 Aug 2017 8:52 am, "Joel Nothman" ***@***.***> wrote: I'm sure the jsonlogic idea is more attractive to computer scientists than date scientists though ;) On 2 Aug 2017 4:21 am, "Mikhail Korobov" ***@***.***> wrote: > I like the JsonLogic idea, to output expressions used for computing > feature names. > > +1 to pull in test changes to make updating #208 > <#208> easier, but I'm also > fine with merging #208 <#208> > with a few minor changes :) > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#229 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AAEz67j6OiWBDyvGsMpJRjzS8Iq8LHtHks5sT2yfgaJpZM4OohgJ> > . >

kmike · 2017-08-02T03:06:17Z

We would still provide html/text/dataframe/(simplified json?) exports if we use jsonlogic internally, so data scientists should be fine :)

kmike added 2 commits July 31, 2017 11:26

transform_feature_names for scalers

7ea789f

fix scaler feature names when in_names is None

a0fa0d8

kmike force-pushed the transform_feature_names_scalers branch from 10bfa51 to a0fa0d8 Compare July 31, 2017 15:27

TST fix type annotations

e5408d9

lopuhin approved these changes Jul 31, 2017

View reviewed changes

DOC mention scalers support in docs

a94fc1c

jnothman reviewed Aug 1, 2017

View reviewed changes

better function name

4fcd761

kmike merged commit ae0249e into master Aug 1, 2017

kmike deleted the transform_feature_names_scalers branch August 1, 2017 18:22

kmike added this to the 0.8 milestone Aug 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

transform_feature_names for scalers #229

transform_feature_names for scalers #229

kmike commented Jul 31, 2017

codecov-io commented Jul 31, 2017 •

edited

Loading

kmike commented Jul 31, 2017

lopuhin left a comment

lopuhin Jul 31, 2017

kmike Jul 31, 2017

jnothman left a comment

jnothman Aug 1, 2017

jnothman commented Aug 1, 2017

kmike commented Aug 1, 2017

jnothman commented Aug 1, 2017 via email

jnothman commented Aug 1, 2017 via email

kmike commented Aug 2, 2017

transform_feature_names for scalers #229

transform_feature_names for scalers #229

Conversation

kmike commented Jul 31, 2017

codecov-io commented Jul 31, 2017 • edited Loading

Codecov Report

kmike commented Jul 31, 2017

lopuhin left a comment

Choose a reason for hiding this comment

lopuhin Jul 31, 2017

Choose a reason for hiding this comment

kmike Jul 31, 2017

Choose a reason for hiding this comment

jnothman left a comment

Choose a reason for hiding this comment

jnothman Aug 1, 2017

Choose a reason for hiding this comment

jnothman commented Aug 1, 2017

kmike commented Aug 1, 2017

jnothman commented Aug 1, 2017 via email

jnothman commented Aug 1, 2017 via email

kmike commented Aug 2, 2017

codecov-io commented Jul 31, 2017 •

edited

Loading