-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transform_feature_names for scalers #229
Conversation
Codecov Report
@@ Coverage Diff @@
## master #229 +/- ##
==========================================
+ Coverage 97.26% 97.27% +<.01%
==========================================
Files 42 42
Lines 2673 2682 +9
Branches 515 517 +2
==========================================
+ Hits 2600 2609 +9
Misses 38 38
Partials 35 35
|
10bfa51
to
a0fa0d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, this is the behaviour I would expect. Left a minor question.
eli5/sklearn/transform.py
Outdated
if in_names is None: | ||
in_names = _get_feature_names(est, feature_names=in_names, | ||
num_features=est.scale_.shape[0]) | ||
return [name for name in in_names] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this list comprehension do - is it the same as list(in_names)
, or you wanted to add something more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm converting FeatureNames instance (which comes from _get_feature_names
) to a list. It is also a left-over from my experiments with more elaborate feature names, when you don't pass names as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is right, though as I've suggested elsewhere, for TFIDF I'd like the IDF to be noted. I've also wished we could just avoid this decision by having a structured representation of feature description. Something JSONable, for instance.
The tests are changed in #208, and while I dither over fixing up that PR, I wonder if we should pull the test changes into something separate.
eli5/sklearn/transform.py
Outdated
@transform_feature_names.register(StandardScaler) | ||
@transform_feature_names.register(MaxAbsScaler) | ||
@transform_feature_names.register(RobustScaler) | ||
def _select_scaling(est, in_names=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean transform, not select
Outputting feature descriptions as JsonLogic, perhaps?? |
I'm sure the jsonlogic idea is more attractive to computer scientists than
date scientists though ;)
…On 2 Aug 2017 4:21 am, "Mikhail Korobov" ***@***.***> wrote:
I like the JsonLogic idea, to output expressions used for computing
feature names.
+1 to pull in test changes to make updating #208
<#208> easier, but I'm also fine
with merging #208 <#208> with a
few minor changes :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#229 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz67j6OiWBDyvGsMpJRjzS8Iq8LHtHks5sT2yfgaJpZM4OohgJ>
.
|
One advantage of the jsonlogic idea is that sub-expressions are objects
bring reused so it's relatively memory efficient.
…On 2 Aug 2017 8:52 am, "Joel Nothman" ***@***.***> wrote:
I'm sure the jsonlogic idea is more attractive to computer scientists than
date scientists though ;)
On 2 Aug 2017 4:21 am, "Mikhail Korobov" ***@***.***> wrote:
> I like the JsonLogic idea, to output expressions used for computing
> feature names.
>
> +1 to pull in test changes to make updating #208
> <#208> easier, but I'm also
> fine with merging #208 <#208>
> with a few minor changes :)
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#229 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AAEz67j6OiWBDyvGsMpJRjzS8Iq8LHtHks5sT2yfgaJpZM4OohgJ>
> .
>
|
We would still provide html/text/dataframe/(simplified json?) exports if we use jsonlogic internally, so data scientists should be fine :) |
I can see 3 main ways to show feature names for scalers:
scaled(x1)
;(x1*0.312 - 1.232)
for StandardScalerIn this PR (1) is implemented; at least it is more useful than doing nothing.
It seems we may want an optional "verbose mode" for feature names, as there are use cases you want the whole formula, and there are use cases you only care about input feature names.