Make name of new column with predictions appended to dataframe configurable #267

janosh · 2019-11-18T19:04:49Z

Closes #266.

@ardunn We might want to add a test that specifically calls mat_pipe.predict() with a custom value for pred_col_suffix. Right now, all I added was

# automl/tests/test_base.py
self.assertTrue(hasattr(tag, "pred_col_suffix"))

That's because the way TestAdaptorGood inherits from DFMLAdaptor and then overwrites predict makes adding a test there awkward. Should we instead have one on both TPOTAdaptor and SinglePipelineAdaptor?

Also, let me know if you'd like a different name for pred_col_suffix.

ardunn · 2019-11-18T19:48:30Z

Hey @janosh thanks for the PR!

One quick thought - if we are going to accept changing column output names, it would probably be worth just replacing the entire output column name, not to make it a suffix. I could imagine some scenarios where people specify the output column name as zT_predicted" and then when appended the name is actually "zTzT_predicted" or something... because the person was expecting the name to be replaced, not appended.

ardunn · 2019-11-18T19:49:43Z

And yes, we should definitely add a global matpipe test for it since it is both kind of a global option (i.e., its the output of the entire pipeline) and an Adaptor option.

janosh · 2019-11-18T20:29:21Z

Turned the target col suffix into a full name and added a preset test for target_col_name as a powerup.

Where should the MatPipe test live? I don't quite understand the flow in tests/test_pipeline.py.

ardunn

Regarding the MatPipe test, it should definitely go in test_pipeline. To explain, the TestMatPipe class just defines common tests for any MatPipe configuration. The test classes that follow then run the same tests for different adaptors (Single, TPOT, etc.). One way to integrate the tests for this is change the TestMatPipe tests to just use the DFMLAdaptor.target_output_name instead of target + " predicted" in the extisting tests. You can add an extra test if needed

ardunn · 2019-11-18T22:20:12Z

automatminer/automl/adaptors.py

@@ -83,7 +86,7 @@ def __init__(self, **tpot_kwargs):

        self.from_serialized = False
        self._best_models = None
-        super(DFMLAdaptor, self).__init__()
+        super(TPOTAdaptor, self).__init__(**tpot_kwargs)


Feel free to change TestAdaptorGood to not override DFMLAdaptor, since the DFMLAdaptor usage in test Adaptors should remain the same. Like we shouldn't be testing TPOTAdaptor where previously the base class was tested

Not quite sure what you mean... Is your comment related to this line?

Also, I was wondering what's the purpose of using Python 2 super() syntax in automatminer if all your CI is Python 3.7? I.e. why not simply use super() without arguments everywhere?

- super(TPOTAdaptor, self).__init__(**tpot_kwargs) + super().__init__(**tpot_kwargs)

using py3 super syntax is fine by me!

Edit: never mind that comment, I was confused which file this was in. Seems fine by me

using py3 super syntax is fine by me!

Cool. Would you accept a separate PR converting all super() calls to Py3?

Can also make it part of this PR if you prefer.

Yes, let's make that a separate PR though.

ardunn · 2019-11-18T22:21:44Z

automatminer/automl/base.py

@@ -146,9 +153,11 @@ def predict(self, df: pd.DataFrame, target: str) -> pd.DataFrame:
                "".format(not_in_df, not_in_model)
            )
        else:
+            if target_output_col is not None:


can we just have the target output col argument be the "{target} predicted" string? Having it be None makes it a bit harder to see what is going on

oh nvm, I see it is set in the base class constructor

ardunn · 2019-11-18T22:24:38Z

automatminer_dev/tasks/bench.py

@@ -145,7 +145,7 @@ def run_task(self, fw_spec):

        # Evaluate model
        true = result_df[target]
-        test = result_df[target + " predicted"]
+        test = result_df[learner.target_output_col]


Since you are making changes to things which affect _dev, this will take me some time to merge, since the entire infrastructure I use to run AMM benchmarks depends on dev.

No problem.

janosh · 2019-11-19T13:38:35Z

One way to integrate the tests for this is change the TestMatPipe tests to just use the DFMLAdaptor.target_output_name instead of target + " predicted" in the extisting tests.

That part is already implemented.

janosh · 2019-11-19T13:47:21Z

Added a test for target_output_col intests/test_pipeline.py.

ardunn · 2019-11-19T17:12:31Z

Added a test for target_output_col intests/test_pipeline.py.

Is there any way we could test this without re-fitting? Reason being running the full tests already takes a long time. If not its ok, just trying to brainstorm here a bit

ardunn · 2019-11-19T17:15:08Z

@janosh after thinking on this some more, you may be right that adding this as a powerup is not the right implementation. Just having it alongside "ignore_cols" in matpipe predict args (and additionally in the adaptors, etc.) might be the way to go

janosh · 2019-11-19T20:13:26Z

Is there any way we could test this without re-fitting? Reason being running the full tests already takes a long time.

I thought about that as well. We could combine test_target_output_col and test_ignore and call it test_predict_kwargs. That way we would only need to fit once at the expense of having to work a little harder if that test fails to find out which kwarg is the culprit.

janosh · 2019-11-19T20:14:32Z

@janosh after thinking on this some more, you may be right that adding this as a powerup is not the right implementation. Just having it alongside "ignore_cols" in matpipe predict args (and additionally in the adaptors, etc.) might be the way to go

As you like. I'm fine with either. Let me know what you think about the combined test and then I can revert the powerup commit alongside changing the pipeline test.

ardunn · 2019-11-19T20:47:43Z

Is there any way we could test this without re-fitting? Reason being running the full tests already takes a long time.

I thought about that as well. We could combine test_target_output_col and test_ignore and call it test_predict_kwargs. That way we would only need to fit once at the expense of having to work a little harder if that test fails to find out which kwarg is the culprit.

I think this is perfect! The implementation is rather simple so I'd imagine that won't be a major problem, and is preferred to testing them individually.

ardunn · 2019-11-19T20:49:03Z

@janosh after thinking on this some more, you may be right that adding this as a powerup is not the right implementation. Just having it alongside "ignore_cols" in matpipe predict args (and additionally in the adaptors, etc.) might be the way to go

As you like. I'm fine with either. Let me know what you think about the combined test and then I can revert the powerup commit alongside changing the pipeline test.

Yeah I think not having it in the powerups is preferred. Mainly because you are right, it doesn't really fit the paradigm. The powerups are for applying common operations which affect the pipeline configuration, not things which can be changed after the pipeline has been fit on.

ardunn · 2019-11-19T20:51:44Z

it will also be nice not having target_output_col be an attribute of the adaptors. If we don't make it a powerup it would be easy to just have it be an arg of DFMLAdaptor.predict (there's already too many damn attributes of each pipeline op). It would be quite a simple implementation in this case.

janosh · 2019-11-20T10:48:15Z

Alrighty, I reset all changes to upstream master, only kept the predict kwarg (no added attribute), refactored test_ignore to test_predict_kwargs and force pushed new changes. All that's missing now is documentation for target_output_col, I think.

janosh · 2019-11-25T12:50:18Z

I renamed target_output_col to output_col for brevity since I think that name is clear enough in the context of pipe.predict(). Hope you don't mind.

Anything left to do here?

ardunn · 2019-11-27T07:18:28Z

@janosh sounds good. I am a bit behind so I haven't gotten the chance to see if this will change things in -dev, but I'll be doing that this week. hang tight in the meantime!

Edit: turns out i didn't do this that week

janosh · 2020-03-27T10:05:28Z

@ardunn With the quarantine in place, perhaps there's time now to revisit this and get it merged?

ardunn · 2020-03-28T04:05:09Z

Haha @janosh yes I am starting work again on this. We are preparing a paper in parallel with this development so that has been taking considerable time as well. You can expect some automatminer progress over the next few weeks (including this PR)

janosh · 2020-03-28T11:30:56Z

Sounds good, looking forward to it! :)

ardunn reviewed Nov 18, 2019

View reviewed changes

janosh added 2 commits November 25, 2019 12:47

add kwarg output_col to matpipe.predict() and learner.predict()

683ae64

refactor test_ignore to test_predict_kwargs

2c7956d

ardunn merged commit c7b0251 into hackingmaterials:master May 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make name of new column with predictions appended to dataframe configurable #267

Make name of new column with predictions appended to dataframe configurable #267

janosh commented Nov 18, 2019 •

edited

Loading

ardunn commented Nov 18, 2019

ardunn commented Nov 18, 2019

janosh commented Nov 18, 2019

ardunn left a comment

ardunn Nov 18, 2019

janosh Nov 19, 2019

ardunn Nov 19, 2019 •

edited

Loading

janosh Nov 19, 2019

janosh Nov 19, 2019

ardunn Nov 19, 2019

ardunn Nov 18, 2019

ardunn Nov 18, 2019

ardunn Nov 18, 2019

janosh Nov 19, 2019

janosh commented Nov 19, 2019

janosh commented Nov 19, 2019

ardunn commented Nov 19, 2019

ardunn commented Nov 19, 2019

janosh commented Nov 19, 2019

janosh commented Nov 19, 2019 •

edited

Loading

ardunn commented Nov 19, 2019

ardunn commented Nov 19, 2019

ardunn commented Nov 19, 2019

janosh commented Nov 20, 2019

janosh commented Nov 25, 2019

ardunn commented Nov 27, 2019 •

edited

Loading

janosh commented Mar 27, 2020

ardunn commented Mar 28, 2020

janosh commented Mar 28, 2020

Make name of new column with predictions appended to dataframe configurable #267

Make name of new column with predictions appended to dataframe configurable #267

Conversation

janosh commented Nov 18, 2019 • edited Loading

ardunn commented Nov 18, 2019

ardunn commented Nov 18, 2019

janosh commented Nov 18, 2019

ardunn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ardunn Nov 19, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

janosh commented Nov 19, 2019

janosh commented Nov 19, 2019

ardunn commented Nov 19, 2019

ardunn commented Nov 19, 2019

janosh commented Nov 19, 2019

janosh commented Nov 19, 2019 • edited Loading

ardunn commented Nov 19, 2019

ardunn commented Nov 19, 2019

ardunn commented Nov 19, 2019

janosh commented Nov 20, 2019

janosh commented Nov 25, 2019

ardunn commented Nov 27, 2019 • edited Loading

janosh commented Mar 27, 2020

ardunn commented Mar 28, 2020

janosh commented Mar 28, 2020

janosh commented Nov 18, 2019 •

edited

Loading

ardunn Nov 19, 2019 •

edited

Loading

janosh commented Nov 19, 2019 •

edited

Loading

ardunn commented Nov 27, 2019 •

edited

Loading