Adding support for scikit-learn > 0.22 #936

Neeratyoy · 2020-07-27T18:49:12Z

What does this PR implement/fix? Explain your changes.

Adds support for scikit-learn 0.22.2 and above by ensuring all unit tests pass.

How should this PR be tested?

Checking travis and appveyor unit tests.

Any other comments?

The changes here reflect higher scikit-learn versions (>0.22) support for Python 3.6 and 3.7

codecov-commenter · 2020-07-29T09:20:35Z

Codecov Report

Merging #936 into develop will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           develop     #936      +/-   ##
===========================================
+ Coverage    87.76%   87.79%   +0.03%     
===========================================
  Files           37       37              
  Lines         4429     4433       +4     
===========================================
+ Hits          3887     3892       +5     
+ Misses         542      541       -1

Impacted Files	Coverage Δ
openml/extensions/sklearn/extension.py	`91.34% <100.00%> (+0.16%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9c93f5b...023ac40. Read the comment docs.

PGijsbers · 2020-07-29T10:16:53Z

openml/testing.py

@@ -223,6 +225,10 @@ def _check_fold_timing_evaluations(
            "wall_clock_time_millis_testing": (0, max_time_allowed),
            "wall_clock_time_millis": (0, max_time_allowed),
        }
+        if LooseVersion(sklearn.__version__) >= "0.23":


Why is it these can't/don't need to get tested?

This was a placeholder check since there was only one unit test which had a slightly different failure point.

Looked into it and had to make this change to account for this change in K-means in scikit-learn 0.23.1

Not sure if this is the most scalable, long-standing solution.

Neeratyoy · 2020-07-29T17:11:50Z

@mfeurer, @PGijsbers

For this unit test, I have currently edited this to get it to pass with scikit-learn 0.23.1.

However, this is of course not a neat solution. I think that this problem stems from how the following 2 changes interact:

Change in how keyword arguments and positional arguments will be handled in sklearn__init__
Updates to the inspect package where they seem to have deprecated getfullargspec (at least doesn't work with sklearn 0.23 but works with 0.22) --- inspect.signature works across the sklearn versions I tested but yields a different output s.t. this function needs updating

In summary, for sklearn 0.23.1, inspect.getfullargspec of sklearn.ensemble.RandomForestRegressor.__init__ (and others) returns FullArgSpec(args=[], varargs='args', varkw='kwargs', defaults=None, kwonlyargs=[], kwonlydefaults=None, annotations={}).

Thoughts, please?

PGijsbers · 2020-07-29T18:22:15Z

Given the linked inspect.getfullargspec documentation:

Note that signature() and Signature Object provide the recommended API for callable introspection

and

This function is retained primarily for use in code that needs to maintain compatibility with the Python 2 inspect module API.

I propose we update our code to use signature.

mfeurer

I agree with Pieter that we should use signature here and make sure that we detect the number of hyperparameters with a default correctly. Maybe we could actually check for both (number of hyperparameters with a default and number of hyperparameters without a default to make that test a bit more strict).

.travis.yml

openml/extensions/sklearn/extension.py

tests/test_extensions/test_sklearn_extension/test_sklearn_extension.py

mfeurer

I'm fine with this PR. @PGijsbers what about you?

PGijsbers · 2020-08-03T09:01:57Z

Looks good to me! Thanks! 🎉

@mfeurer

* Create first section: Creating Custom Flow * Add Section: Using the Flow It is incomplete as while trying to explain how to format the predictions, I realized a utility function is required. * Allow run description text to be custom Previously the description text that accompanies the prediction file was auto-generated with the assumption that the corresponding flow had an extension. To support custom flows (with no extension), this behavior had to be changed. The description can now be passed on initialization. The description describing it was auto generated from run_task is now correctly only added if the run was generated through run_flow_on_task. * Draft for Custom Flow tutorial * Add minimal docstring to OpenMLRun I am not for each field what the specifications are. * Process code review feedback In particular: - text changes - fetch true labels from the dataset instead * Use the format utility function in automatic runs To format the predictions. * Process @mfeurer feedback * Rename arguments of list_evaluations (#933) * list evals name change * list evals - update * adding config file to user guide (#931) * adding config file to user guide * finished requested changes * Edit api (#935) * version1 * minor fixes * tests * reformat code * check new version * remove get data * code format * review comments * fix duplicate * type annotate * example * tests for exceptions * fix pep8 * black format * Adding support for scikit-learn > 0.22 (#936) * Preliminary changes * Updating unit tests for sklearn 0.22 and above * Triggering sklearn tests + fixes * Refactoring to inspect.signature in extensions * Add flake8-print in pre-commit (#939) * Add flake8-print in pre-commit config * Replace print statements with logging * Fix edit api (#940) * fix edit api * Update subflow paragraph * Check the ClassificationTask has class label set * Test task is of supported type * Add tests for format_prediction * Adding Python 3.8 support (#916) * Adding Python 3.8 support * Fixing indentation * Execute test cases for 3.8 * Testing * Making install script fail * Process feedback Neeratyoy * Test Exception with Regex Also throw NotImplementedError instead of TypeError for unsupported task types. Added links in the example. * change edit_api to reflect server (#941) * change edit_api to reflect server * change test and example to reflect rest API changes * tutorial comments * Update datasets_tutorial.py * Create first section: Creating Custom Flow * Add Section: Using the Flow It is incomplete as while trying to explain how to format the predictions, I realized a utility function is required. * Allow run description text to be custom Previously the description text that accompanies the prediction file was auto-generated with the assumption that the corresponding flow had an extension. To support custom flows (with no extension), this behavior had to be changed. The description can now be passed on initialization. The description describing it was auto generated from run_task is now correctly only added if the run was generated through run_flow_on_task. * Draft for Custom Flow tutorial * Add minimal docstring to OpenMLRun I am not for each field what the specifications are. * Process code review feedback In particular: - text changes - fetch true labels from the dataset instead * Use the format utility function in automatic runs To format the predictions. * Process @mfeurer feedback * Update subflow paragraph * Check the ClassificationTask has class label set * Test task is of supported type * Add tests for format_prediction * Process feedback Neeratyoy * Test Exception with Regex Also throw NotImplementedError instead of TypeError for unsupported task types. Added links in the example. Co-authored-by: Bilgecelik <38037323+Bilgecelik@users.noreply.github.com> Co-authored-by: marcoslbueno <38478211+marcoslbueno@users.noreply.github.com> Co-authored-by: Sahithya Ravi <44670788+sahithyaravi1493@users.noreply.github.com> Co-authored-by: Neeratyoy Mallik <neeratyoy@gmail.com> Co-authored-by: zikun <33176974+zikun@users.noreply.github.com>

Neeratyoy added 3 commits July 27, 2020 20:45

Preliminary changes

46c25fd

Merge branch 'develop' into sklearn-0.22+support

9ed6a09

Updating unit tests for sklearn 0.22 and above

3f736b7

PGijsbers reviewed Jul 29, 2020

View reviewed changes

Triggering sklearn tests + fixes

dcf2d22

Neeratyoy marked this pull request as ready for review July 29, 2020 17:36

Neeratyoy requested review from PGijsbers and mfeurer July 29, 2020 17:36

mfeurer reviewed Jul 30, 2020

View reviewed changes

Refactoring to inspect.signature in extensions

023ac40

Neeratyoy requested a review from mfeurer July 31, 2020 16:46

mfeurer approved these changes Jul 31, 2020

View reviewed changes

PGijsbers merged commit 666ca68 into develop Aug 3, 2020

PGijsbers deleted the sklearn-0.22+support branch August 3, 2020 09:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support for scikit-learn > 0.22 #936

Adding support for scikit-learn > 0.22 #936

Neeratyoy commented Jul 27, 2020

codecov-commenter commented Jul 29, 2020 •

edited

Loading

PGijsbers Jul 29, 2020

Neeratyoy Jul 29, 2020

Neeratyoy commented Jul 29, 2020

PGijsbers commented Jul 29, 2020

mfeurer left a comment

mfeurer left a comment

PGijsbers commented Aug 3, 2020

Adding support for scikit-learn > 0.22 #936

Adding support for scikit-learn > 0.22 #936

Conversation

Neeratyoy commented Jul 27, 2020

What does this PR implement/fix? Explain your changes.

How should this PR be tested?

Any other comments?

codecov-commenter commented Jul 29, 2020 • edited Loading

Codecov Report

PGijsbers Jul 29, 2020

Choose a reason for hiding this comment

Neeratyoy Jul 29, 2020

Choose a reason for hiding this comment

Neeratyoy commented Jul 29, 2020

PGijsbers commented Jul 29, 2020

mfeurer left a comment

Choose a reason for hiding this comment

mfeurer left a comment

Choose a reason for hiding this comment

PGijsbers commented Aug 3, 2020

codecov-commenter commented Jul 29, 2020 •

edited

Loading