-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make name of new column with predictions appended to dataframe configurable #267
Conversation
Hey @janosh thanks for the PR! One quick thought - if we are going to accept changing column output names, it would probably be worth just replacing the entire output column name, not to make it a suffix. I could imagine some scenarios where people specify the output column name as |
And yes, we should definitely add a global matpipe test for it since it is both kind of a global option (i.e., its the output of the entire pipeline) and an Adaptor option. |
Turned the target col suffix into a full name and added a preset test for Where should the MatPipe test live? I don't quite understand the flow in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the MatPipe test, it should definitely go in test_pipeline. To explain, the TestMatPipe
class just defines common tests for any MatPipe configuration. The test classes that follow then run the same tests for different adaptors (Single, TPOT, etc.). One way to integrate the tests for this is change the TestMatPipe
tests to just use the DFMLAdaptor.target_output_name instead of target + " predicted"
in the extisting tests. You can add an extra test if needed
automatminer/automl/adaptors.py
Outdated
@@ -83,7 +86,7 @@ def __init__(self, **tpot_kwargs): | |||
|
|||
self.from_serialized = False | |||
self._best_models = None | |||
super(DFMLAdaptor, self).__init__() | |||
super(TPOTAdaptor, self).__init__(**tpot_kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to change TestAdaptorGood to not override DFMLAdaptor, since the DFMLAdaptor usage in test Adaptors should remain the same. Like we shouldn't be testing TPOTAdaptor where previously the base class was tested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure what you mean... Is your comment related to this line?
Also, I was wondering what's the purpose of using Python 2 super()
syntax in automatminer if all your CI is Python 3.7? I.e. why not simply use super()
without arguments everywhere?
- super(TPOTAdaptor, self).__init__(**tpot_kwargs)
+ super().__init__(**tpot_kwargs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using py3 super syntax is fine by me!
Edit: never mind that comment, I was confused which file this was in. Seems fine by me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using py3 super syntax is fine by me!
Cool. Would you accept a separate PR converting all super()
calls to Py3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can also make it part of this PR if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's make that a separate PR though.
automatminer/automl/base.py
Outdated
@@ -146,9 +153,11 @@ def predict(self, df: pd.DataFrame, target: str) -> pd.DataFrame: | |||
"".format(not_in_df, not_in_model) | |||
) | |||
else: | |||
if target_output_col is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just have the target output col argument be the "{target} predicted" string? Having it be None makes it a bit harder to see what is going on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh nvm, I see it is set in the base class constructor
automatminer_dev/tasks/bench.py
Outdated
@@ -145,7 +145,7 @@ def run_task(self, fw_spec): | |||
|
|||
# Evaluate model | |||
true = result_df[target] | |||
test = result_df[target + " predicted"] | |||
test = result_df[learner.target_output_col] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you are making changes to things which affect _dev
, this will take me some time to merge, since the entire infrastructure I use to run AMM benchmarks depends on dev
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem.
|
Added a test for |
Is there any way we could test this without re-fitting? Reason being running the full tests already takes a long time. If not its ok, just trying to brainstorm here a bit |
@janosh after thinking on this some more, you may be right that adding this as a powerup is not the right implementation. Just having it alongside "ignore_cols" in matpipe predict args (and additionally in the adaptors, etc.) might be the way to go |
I thought about that as well. We could combine |
As you like. I'm fine with either. Let me know what you think about the combined test and then I can revert the powerup commit alongside changing the pipeline test. |
I think this is perfect! The implementation is rather simple so I'd imagine that won't be a major problem, and is preferred to testing them individually. |
Yeah I think not having it in the powerups is preferred. Mainly because you are right, it doesn't really fit the paradigm. The powerups are for applying common operations which affect the pipeline configuration, not things which can be changed after the pipeline has been fit on. |
it will also be nice not having target_output_col be an attribute of the adaptors. If we don't make it a powerup it would be easy to just have it be an arg of DFMLAdaptor.predict (there's already too many damn attributes of each pipeline op). It would be quite a simple implementation in this case. |
Alrighty, I reset all changes to upstream master, only kept the predict kwarg (no added attribute), refactored |
I renamed Anything left to do here? |
@janosh sounds good. I am a bit behind so I haven't gotten the chance to see if this will change things in Edit: turns out i didn't do this that week |
@ardunn With the quarantine in place, perhaps there's time now to revisit this and get it merged? |
Haha @janosh yes I am starting work again on this. We are preparing a paper in parallel with this development so that has been taking considerable time as well. You can expect some automatminer progress over the next few weeks (including this PR) |
Sounds good, looking forward to it! :) |
Closes #266.
@ardunn We might want to add a test that specifically calls
mat_pipe.predict()
with a custom value forpred_col_suffix
. Right now, all I added wasThat's because the way
TestAdaptorGood
inherits fromDFMLAdaptor
and then overwritespredict
makes adding a test there awkward. Should we instead have one on bothTPOTAdaptor
andSinglePipelineAdaptor
?Also, let me know if you'd like a different name for
pred_col_suffix
.