-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue 137/PR 145 automatically search for categorical variables #145
issue 137/PR 145 automatically search for categorical variables #145
Conversation
Change tests for newer sklearn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review
@@ -223,28 +223,87 @@ def from_pipeline(cls, pipeline: dict): | |||
target_encoder, | |||
is_fitted=pipeline["_is_fitted"], | |||
) | |||
|
|||
def get_continous_and_discreate_columns( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_continuous_and_discrete_columns(
target_column_name :str | ||
) -> tuple: | ||
"""Filters out the continious and discreate varaibles out of a dataframe and returns a tuple containing lists of column names | ||
It assumes that numerical comumns with less than or equal to 10 different values are categorical |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
continuous instead of continious
discrete instead of discrete
variables instead of varaibles
columns instead of columns
Parameters | ||
---------- | ||
df : pd.DataFrame | ||
DataFrame that you want to divide in discreate and continous variables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typos
f"""Cobra automaticaly assumes that following variables are | ||
discrete: {discrete_vars} | ||
continuous: {continuous_vars} | ||
If you want to change this behaviour you can specify the discrete/continuous variables yourself with the continuous_vars and discrete_vars keywords. \nIt assumes that numerical comumns with less than or equal to 10 different values are categorical""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo comumns -> columns
): | ||
"""Fit the data to the preprocessing pipeline. | ||
If you put continious_vars and target_vars equal to `None` and give the id_col_name Cobra will guess which varaibles are continious and which are not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typos:
continuous, variables, continuous
""" | ||
if not (continuous_vars and discrete_vars): | ||
continuous_vars, discrete_vars = self.get_continous_and_discreate_columns( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typos in code can happen, but please always double-check your method names for typos, since this snowballs to the end user eventually.
get_continuous_and_discrete_columns(...)
"""Fit preprocessing pipeline and transform the data. | ||
If you put continious_vars and target_vars equal to `None` and give the id_col_name Cobra will guess which varaibles are continious and which are not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typos
Hi @patrickleonardy, it looks like Jano reviewed this after you and Pedro merged it. Still, his comments are valuable. Can you check your work against his comments? |
Hi Sander, |
Story Title
Automatically search for categorical variables #137
Changes made
get_continous_and_discreate_columns
function to thePreProcesser
class.fit
andfit_transform
to automatically callget_continous_and_discreate_columns
if bothcontinuous_vars, discrete_vars
parameters are equal toNone
.How does the solution address the problem
One can now set the values for
continuous_vars, discrete_vars
infit
andfit_transform
equal toNone
. and Cobra will guess which variables are continuous and which are discontinuous.Linked issues
Resolves #137