This document describes the workflow on how to contribute to the openml-python package. If you are interested in connecting a machine learning package with OpenML (i.e. write an openml-python extension) or want to find other ways to contribute, see this page.
The scope of the OpenML Python package is to provide a Python interface to the OpenML platform which integrates well with Python's scientific stack, most notably numpy, scipy and pandas. To reduce opportunity costs and demonstrate the usage of the package, it also implements an interface to the most popular machine learning package written in Python, scikit-learn. Thereby it will automatically be compatible with many machine learning libraries written in Python.
We aim to keep the package as light-weight as possible and we will try to keep the number of potential installation dependencies as low as possible. Therefore, the connection to other machine learning libraries such as pytorch, keras or tensorflow should not be done directly inside this package, but in a separate package using the OpenML Python connector. More information on OpenML Python connectors can be found here.
We use GitHub issues to track all bugs and feature requests; feel free to open an issue if you have found a bug or wish to see a feature implemented.
It is recommended to check that your issue complies with the following rules before submitting:
-
Verify that your issue is not being currently addressed by other issues or pull requests.
-
Please ensure all code snippets and error messages are formatted in appropriate code blocks. See Creating and highlighting code blocks.
-
Please include your operating system type and version number, as well as your Python, openml, scikit-learn, numpy, and scipy versions. This information can be found by running the following code snippet:
import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import sklearn; print("Scikit-Learn", sklearn.__version__)
import openml; print("OpenML", openml.__version__)
Great! You've decided you want to help out. Now what? All contributions should be linked to issues on the Github issue tracker. In particular for new contributors, the good first issue label should help you find issues which are suitable for beginners. Resolving these issues allow you to start contributing to the project without much prior knowledge. Your assistance in this area will be greatly appreciated by the more experienced developers as it helps free up their time to concentrate on other issues.
If you encountered a particular part of the documentation or code that you want to improve, but there is no related open issue yet, open one first. This is important since you can first get feedback or pointers from experienced contributors.
To let everyone know you are working on an issue, please leave a comment that states you will work on the issue (or, if you have the permission, assign yourself to the issue). This avoids double work!
The preferred workflow for contributing to openml-python is to
fork the main repository on
GitHub, clone, check out the branch develop
, and develop on a new branch
branch. Steps:
-
Fork the project repository by clicking on the 'Fork' button near the top right of the page. This creates a copy of the code under your GitHub user account. For more details on how to fork a repository see this guide.
-
Clone your fork of the openml-python repo from your GitHub account to your local disk:
$ git clone git@github.com:YourLogin/openml-python.git $ cd openml-python
-
Switch to the
develop
branch:$ git checkout develop
-
Create a
feature
branch to hold your development changes:$ git checkout -b feature/my-feature
Always use a
feature
branch. It's good practice to never work on themaster
ordevelop
branch! To make the nature of your pull request easily visible, please prepend the name of the branch with the type of changes you want to merge, such asfeature
if it contains a new feature,fix
for a bugfix,doc
for documentation andmaint
for other maintenance on the package. -
Develop the feature on your feature branch. Add changed files using
git add
and thengit commit
files:$ git add modified_files $ git commit
to record your changes in Git, then push the changes to your GitHub account with:
$ git push -u origin my-feature
-
Follow these instructions to create a pull request from your fork. This will send an email to the committers.
(If any of the above seems like magic to you, please look up the Git documentation on the web, or ask a friend or another contributor for help.)
We recommended that your contribution complies with the following rules before you submit a pull request:
-
Follow the pep8 style guide. With the following exceptions or additions:
- The max line length is 100 characters instead of 80.
- When creating a multi-line expression with binary operators, break before the operator.
- Add type hints to all function signatures. (note: not all functions have type hints yet, this is work in progress.)
- Use the
str.format
overprintf
style formatting. E.g. use"{} {}".format('hello', 'world')
not"%s %s" % ('hello', 'world')
. (note: old code may still useprintf
-formatting, this is work in progress.)
-
If your pull request addresses an issue, please use the pull request title to describe the issue and mention the issue number in the pull request description. This will make sure a link back to the original issue is created.
-
An incomplete contribution -- where you expect to do more work before receiving a full review -- should be submitted as a
draft
. These may be useful to: indicate you are working on something to avoid duplicated work, request broad review of functionality or API, or seek collaborators. Drafts often benefit from the inclusion of a task list in the PR description. -
Add unit tests and examples for any new functionality being introduced.
- If an unit test contains an upload to the test server, please ensure that it is followed by a file collection for deletion, to prevent the test server from bulking up. For example,
TestBase._mark_entity_for_removal('data', dataset.dataset_id)
,TestBase._mark_entity_for_removal('flow', (flow.flow_id, flow.name))
. - Please ensure that the example is run on the test server by beginning with the call to
openml.config.start_using_configuration_for_example()
.
- If an unit test contains an upload to the test server, please ensure that it is followed by a file collection for deletion, to prevent the test server from bulking up. For example,
-
All tests pass when running
pytest
. On Unix-like systems, check with (from the toplevel source folder):$ pytest
For Windows systems, execute the command from an Anaconda Prompt or add
pytest
to PATH before executing the command. -
Documentation and high-coverage tests are necessary for enhancements to be accepted. Bug-fixes or new features should be provided with non-regression tests. These tests verify the correct behavior of the fix or feature. In this manner, further modifications on the code base are granted to be consistent with the desired behavior. For the Bug-fixes case, at the time of the PR, this tests should fail for the code base in develop and pass for the PR code.
-
Add your changes to the changelog in the file doc/progress.rst.
-
If any source file is being added to the repository, please add the BSD 3-Clause license to it.
First install openml with its test dependencies by running
$ pip install -e .[test]
from the repository folder. Then configure pre-commit through
$ pre-commit install
This will install dependencies to run unit tests, as well as pre-commit. To run the unit tests, and check their code coverage, run:
$ pytest --cov=. path/to/tests_for_package
Make sure your code has good unittest coverage (at least 80%).
Pre-commit is used for various style checking and code formatting. Before each commit, it will automatically run:
- black a code formatter. This will automatically format your code. Make sure to take a second look after any formatting takes place, if the resulting code is very bloated, consider a (small) refactor. note: If Black reformats your code, the commit will automatically be aborted. Make sure to add the formatted files (back) to your commit after checking them.
- mypy a static type checker. In particular, make sure each function you work on has type hints.
- flake8 style guide enforcement. Almost all of the black-formatted code should automatically pass this check, but make sure to make adjustments if it does fail.
If you want to run the pre-commit tests without doing a commit, run:
$ pre-commit run --all-files
Make sure to do this at least once before your first commit to check your setup works.
Executing a specific unit test can be done by specifying the module, test case, and test. To obtain a hierarchical list of all tests, run
$ pytest --collect-only
<Module 'tests/test_datasets/test_dataset.py'>
<UnitTestCase 'OpenMLDatasetTest'>
<TestCaseFunction 'test_dataset_format_constructor'>
<TestCaseFunction 'test_get_data'>
<TestCaseFunction 'test_get_data_rowid_and_ignore_and_target'>
<TestCaseFunction 'test_get_data_with_ignore_attributes'>
<TestCaseFunction 'test_get_data_with_rowid'>
<TestCaseFunction 'test_get_data_with_target'>
<UnitTestCase 'OpenMLDatasetTestOnTestServer'>
<TestCaseFunction 'test_tagging'>
You may then run a specific module, test case, or unit test respectively:
$ pytest tests/test_datasets/test_dataset.py
$ pytest tests/test_datasets/test_dataset.py::OpenMLDatasetTest
$ pytest tests/test_datasets/test_dataset.py::OpenMLDatasetTest::test_get_data
NOTE: In the case the examples build fails during the Continuous Integration test online, please fix the first failing example. If the first failing example switched the server from live to test or vice-versa, and the subsequent examples expect the other server, the ensuing examples will fail to be built as well.
Happy testing!
We are glad to accept any sort of documentation: function docstrings, reStructuredText documents, tutorials, etc. reStructuredText documents live in the source code repository under the doc/ directory.
You can edit the documentation using any text editor and then generate
the HTML output by typing make html
from the doc/ directory.
The resulting HTML files will be placed in build/html/
and are viewable in
a web browser. See the README
file in the doc/
directory for more
information.
For building the documentation, you will need to install a few additional dependencies:
$ pip install -e .[docs]
When dependencies are installed, run
$ sphinx-build -b html doc YOUR_PREFERRED_OUTPUT_DIRECTORY