Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare 11.0 release #969

Merged
merged 50 commits into from
Oct 25, 2020
Merged

Prepare 11.0 release #969

merged 50 commits into from
Oct 25, 2020

Conversation

PGijsbers
Copy link
Collaborator

No description provided.

Neeratyoy and others added 30 commits November 6, 2019 09:40
* Preliminary addition of license to source files

* Adding license to almost every source file
* add task_type to list_runs

* length of run change

* changelog

* changes in progress rst
* do not populate server base URL on startup

* update changelog

* fix pep8
* Ask users to cite us

* improve reference

* Remove linebreak from bibtex block.
* Adding option to print logs during an api call

* Adding timing to log and changing string interpolation

* Improving logging and timing of api calls

* PEP8

* PEP8
* improve sdsit handling

* fix changelog

* fix pytest installation

* install test dependencies extra

* fix sdist
* add better error message for too-long URI

* improve error handling

* improve data download function, fix bugs

* stricter API, more private methods

* incorporate Pieter's feedback
* Initial changes to handle reproducible example from the issue

* Making tentative changes; Need to test deserialization

* Fixing deserialization when empty steps in sklearn model

* Fixing flake issues, failing test cases

* Fixing test cases

* Dropping support for 'None' as sklearn estimator

* Adding test case for None estimator
* Add support for using run_model_on_task simply

* Add unit test

* fix mypy error
* Changes proposed in #885. Don't register handlers by default.

* Delay file creation until log emit. Correctly read from config.

* Remove loading/storing log level references.

* _create_log_handlers now returns early if called a second time

* Fix type errors.

* Update changelog.

* Test remove register file log handler to see if CI works.

* Undo last change. test server ssl works agian.

* Bump scikit-learn version to 0.22

* Scikit-learn 0.22 does not install properly.

* Install scikit-learn through pip instead.
* init feather implementation

* sparse matrix

* test notebook

* feather pickle compare

* test arrow vs feather

* add columns condition

* Testing

* get_dataset add cache format

* add pyarrow

* sparse matrix check

* pep8 and remove files

* return type

* fix type annotation

* value check

* change feather condition

* fixes and test

* fix errors

* testing file

* feather new file for attributes

* change feather attribute file path

* delete testing file

* testing changes

* delete pkls

* fixes

* fixes

* add comments

* change default caching

* pip version

* review comment fixes

* newline

* fix if condition

* Update install.sh

* pandas verison due to sparse data

* review #2

* Update appveyor.yml

* Update appveyor.yml

* rename cache dir
* remove __version__ from __all__ in init

* Add comment for flake8 test
* Removing support for pandas SparseDataFrame

* Fixing rebase loss

* Reiterating with Matthias' changes

* Rolling back setup

* Fixing PEP8

* Changing check to detect sparse dataframes

* Fixing edge case to handle server side arff issue

* Removing stray comment

* Failing test case fix

* Removing stray comment
* Fixing typos

* Rewording
* Sphinx issue fix

* Removing comment
I ran into issues when the openml server config is not exactly 'https://www.openml.org/api/v1/xml', e.g. I had 'https://www.openml.org/api/v1'.
I only noticed when getting a bad dataset url.

This edit makes the API more robust against how exactly the server URL is set in the config.
* Add Flake8 configuration

Uses the configuration from ci_scripts

* Add mypy configuration file

Based on the ci_scripts parameters.

* Pre-commit mypy flake8, add flake8 excludes

Any venv folder does not need flake8.
The example directory got flake8 warnings so I assumed it should be
excluded.

* Add Black to pre-commit

Add ignore E203 as Black will observe PEPs specification for white space
around a colon it is next to an expression.

* Set max line length to 100

* Blacken code

There are a few places where big indentation is introduced that may
warrant refactoring so it looks better.
I did not refactor anything yet, but did exlude three (?) lists (of ids)
to not be formatted.

* Add unit tests to flake8 and mypy pre-commit

* Use pre-commit for flake8, mypy and black checks

This ensures it runs with the same versions and settings as developers.

* Update docs, add 'test' dependencies

Add two other developer dependencies not strictly required for unit
tests, but required for development.
I think the overlap between people who want to execute unit tests and
perform commits is (close to) 100% anyway.

* Uninstall pytest-cov on appveyor ci

It seems to cause an error on import due to a missing sqlite3 dll.
As we don't check coverage anyway, hopefully just uninstalling is
sufficient.

* Add -y to uninstall

* Sphinx issue fix (#923)

* Sphinx issue fix

* Removing comment

* More robust handling of openml_url (#921)

I ran into issues when the openml server config is not exactly 'https://www.openml.org/api/v1/xml', e.g. I had 'https://www.openml.org/api/v1'.
I only noticed when getting a bad dataset url.

This edit makes the API more robust against how exactly the server URL is set in the config.

* format for black artifacts

* Add Flake8 configuration

Uses the configuration from ci_scripts

* Add mypy configuration file

Based on the ci_scripts parameters.

* Pre-commit mypy flake8, add flake8 excludes

Any venv folder does not need flake8.
The example directory got flake8 warnings so I assumed it should be
excluded.

* Add Black to pre-commit

Add ignore E203 as Black will observe PEPs specification for white space
around a colon it is next to an expression.

* Set max line length to 100

* Blacken code

There are a few places where big indentation is introduced that may
warrant refactoring so it looks better.
I did not refactor anything yet, but did exlude three (?) lists (of ids)
to not be formatted.

* Add unit tests to flake8 and mypy pre-commit

* Use pre-commit for flake8, mypy and black checks

This ensures it runs with the same versions and settings as developers.

* Update docs, add 'test' dependencies

Add two other developer dependencies not strictly required for unit
tests, but required for development.
I think the overlap between people who want to execute unit tests and
perform commits is (close to) 100% anyway.

* Uninstall pytest-cov on appveyor ci

It seems to cause an error on import due to a missing sqlite3 dll.
As we don't check coverage anyway, hopefully just uninstalling is
sufficient.

* Add -y to uninstall

* format for black artifacts

Co-authored-by: Neeratyoy Mallik <neeratyoy@gmail.com>
Co-authored-by: Joaquin Vanschoren <joaquin.vanschoren@gmail.com>
* MAINT 918: improve error handling and error message

* incorporate feedback from Pieter
* Increase unit test stability

by waiting longer for the server to process run traces, and by
querying the server less frequently for new run traces.

* Make test stricter

actually, we only wait for evaluations to ensure that the trace
is processed by the server. Therefore, we can also simply wait
for the trace being available instead of relying on the proxy
indicator of evaluations being available.

* fix stricter test
* Mention the initialization of pre-commit

* Restructure the two contribution guidelines

The rst file will now have general contribution information, for
contributions that are related to openml-python, but not actually to the
openml-python repository.
Information for making a contribution to the openml-python repository is
in the contributing markdown file.
* improve error message for dataset upload

* fix unit test
* list evals name change

* list evals - update
* adding config file to user guide

* finished requested changes
* version1

* minor fixes

* tests

* reformat code

* check new version

* remove get data

* code format

* review comments

* fix duplicate

* type annotate

* example

* tests for exceptions

* fix pep8

* black format
Neeratyoy and others added 17 commits August 3, 2020 11:01
* Preliminary changes

* Updating unit tests for sklearn 0.22 and above

* Triggering sklearn tests + fixes

* Refactoring to inspect.signature in extensions
* Add flake8-print in pre-commit config

* Replace print statements with logging
* fix edit api
* Adding Python 3.8 support

* Fixing indentation

* Execute test cases for 3.8

* Testing

* Making install script fail
* change edit_api to reflect server

* change test and example to reflect rest API changes

* tutorial comments

* Update datasets_tutorial.py
* Create first section: Creating Custom Flow

* Add Section: Using the Flow

It is incomplete as while trying to explain how to format the
predictions, I realized a utility function is required.

* Allow run description text to be custom

Previously the description text that accompanies the prediction file was
auto-generated with the assumption that the corresponding flow had an
extension. To support custom flows (with no extension), this behavior
had to be changed. The description can now be passed on initialization.
The description describing it was auto generated from run_task is now
correctly only added if the run was generated through run_flow_on_task.

* Draft for Custom Flow tutorial

* Add minimal docstring to OpenMLRun

I am not for each field what the specifications are.

* Process code review feedback

In particular:
 - text changes
 - fetch true labels from the dataset instead

* Use the format utility function in automatic runs

To format the predictions.

* Process @mfeurer feedback

* Rename arguments of list_evaluations (#933)

* list evals name change

* list evals - update

* adding config file to user guide (#931)

* adding config file to user guide

* finished requested changes

* Edit api (#935)

* version1

* minor fixes

* tests

* reformat code

* check new version

* remove get data

* code format

* review comments

* fix duplicate

* type annotate

* example

* tests for exceptions

* fix pep8

* black format

* Adding support for scikit-learn > 0.22 (#936)

* Preliminary changes

* Updating unit tests for sklearn 0.22 and above

* Triggering sklearn tests + fixes

* Refactoring to inspect.signature in extensions

* Add flake8-print in pre-commit (#939)

* Add flake8-print in pre-commit config

* Replace print statements with logging

* Fix edit api (#940)

* fix edit api

* Update subflow paragraph

* Check the ClassificationTask has class label set

* Test task is of supported type

* Add tests for format_prediction

* Adding Python 3.8 support (#916)

* Adding Python 3.8 support

* Fixing indentation

* Execute test cases for 3.8

* Testing

* Making install script fail

* Process feedback Neeratyoy

* Test Exception with Regex

Also throw NotImplementedError instead of TypeError for unsupported task
types. Added links in the example.

* change edit_api to reflect server (#941)

* change edit_api to reflect server

* change test and example to reflect rest API changes

* tutorial comments

* Update datasets_tutorial.py

* Create first section: Creating Custom Flow

* Add Section: Using the Flow

It is incomplete as while trying to explain how to format the
predictions, I realized a utility function is required.

* Allow run description text to be custom

Previously the description text that accompanies the prediction file was
auto-generated with the assumption that the corresponding flow had an
extension. To support custom flows (with no extension), this behavior
had to be changed. The description can now be passed on initialization.
The description describing it was auto generated from run_task is now
correctly only added if the run was generated through run_flow_on_task.

* Draft for Custom Flow tutorial

* Add minimal docstring to OpenMLRun

I am not for each field what the specifications are.

* Process code review feedback

In particular:
 - text changes
 - fetch true labels from the dataset instead

* Use the format utility function in automatic runs

To format the predictions.

* Process @mfeurer feedback

* Update subflow paragraph

* Check the ClassificationTask has class label set

* Test task is of supported type

* Add tests for format_prediction

* Process feedback Neeratyoy

* Test Exception with Regex

Also throw NotImplementedError instead of TypeError for unsupported task
types. Added links in the example.

Co-authored-by: Bilgecelik <38037323+Bilgecelik@users.noreply.github.com>
Co-authored-by: marcoslbueno <38478211+marcoslbueno@users.noreply.github.com>
Co-authored-by: Sahithya Ravi <44670788+sahithyaravi1493@users.noreply.github.com>
Co-authored-by: Neeratyoy Mallik <neeratyoy@gmail.com>
Co-authored-by: zikun <33176974+zikun@users.noreply.github.com>
* support passthrough and drop in sklearn extension when serialized to xml dict

* make test work with sklearn==0.21

* improve PR

* Add additional unit tests

* fix test

* incorporate feedback and generalize unit tests
* Added PEP 561 compliance (#945)

* FIX: mypy test dependancy

* FIX: mypy test dependancy (#945)

* FIX: Added mypy to CI list of test packages
* convert TaskTypeEnum class to TaskType enum

* update docstrings for TaskType

* fix bug in examples, import TaskType directly

* use task_type instead of task_type_id
* Updating contribution to aid debugging

* More explicit instructions
Remove a faulty entry in the argument list of datasets.
* Improved documentation of example

* Update examples/30_extended/create_upload_tutorial.py

Co-authored-by: PGijsbers <p.gijsbers@tue.nl>

Co-authored-by: Matthias Feurer <feurerm@informatik.uni-freiburg.de>
Co-authored-by: PGijsbers <p.gijsbers@tue.nl>
* run on tasks allows dataframes

* don't force third subcomponent part to be list

* Making DataFrame default behaviour for runs; Fixing test cases for the same

* Fixing PEP8 + Adding docstring to CustomImputer()

* run on tasks allows dataframes

* Attempting rebase

* Fixing test cases

* Trying test case fixes

* run on tasks allows dataframes

* don't force third subcomponent part to be list

* Making DataFrame default behaviour for runs; Fixing test cases for the same

* Fixing PEP8 + Adding docstring to CustomImputer()

* Attempting rebase

* Fixing test cases

* Trying test case fixes

* Allowing functions in subcomponents

* Fixing test cases

* Adding dataset output param to run

* Fixing test cases

* Changes suggested by mfeurer

* Editing predict_proba function

* Test case fix

* Test case fix

* Edit unit test to bypass server issue

* Fixing unit test

* Reiterating with @PGijsbers comments

* Minor fixes to test cases

* Adding unit test and suggestions from @mfeurer

* Fixing test case for all sklearn versions

* Testing changes

* Fixing import in example

* Triggering unit tests

* Degugging failed example script

* Adding unit tests

* Push for debugging

* Push for @mfeurer to debug

* Resetting to debug

* Updating branch

* pre-commit fixes

* Handling failing examples

* Reiteration with clean ups and minor fixes

* Closing comments

* Black fixes

* feedback from @mfeurer

* Minor fix

* suggestions from @PGijsbers

Co-authored-by: neeratyoy <neeratyoy@gmail.com>
Co-authored-by: neeratyoy <de4nas@gmail.com>
* fork api

* improve docs (+1 squashed commits)

Squashed commits:

[ec5c0d10] import changes

* minor change (+1 squashed commits)

Squashed commits:

[1822c99] improve docs (+1 squashed commits)

Squashed commits:

[ec5c0d10] import changes

* docs update

* clarify example

* Update doc/progress.rst

* Fix whitespaces for docstring

* fix error

* Use id 999999 for unknown dataset

Co-authored-by: PGijsbers <p.gijsbers@tue.nl>
* Change default size for list_evaluations to 10000

* Suggestions from code review
Co-authored-by: PGijsbers <p.gijsbers@tue.nl>
@PGijsbers
Copy link
Collaborator Author

I might be able to look at the merge conflicts in the morning. I am a bit confused about why there are merge conflicts in the first place though, the last master commit was the 10.2 release.

@mfeurer mfeurer merged commit bc87333 into master Oct 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.