Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Adding development docs to make sure it is clear how to do dev #150

Merged
merged 3 commits into from
Oct 19, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
108 changes: 104 additions & 4 deletions DEVELOPING.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,97 @@
<!-- TOC -->

- [Requirements](#requirements)
- [Setting up your development environment](#setting-up-your-development-environment)
- [Building the project from source](#building-the-project-from-source)
- [Development Tasks](#development-tasks)
- [Basic Verification](#basic-verification)
- [Docsite](#docsite)
- [Details](#details)
- [Coding Style](#coding-style)
- [Lint](#lint)
- [Type checking](#type-checking)
- [Unit tests](#unit-tests)
- [Advanced Updating submodules](#advanced-updating-submodules)
- [Cython and C++](#cython-and-c)
- [Making a Release](#making-a-release)

<!-- /TOC -->

# Requirements
* Python 3.8+
* Poetry (`curl -sSL https://install.python-poetry.org | python - --version=1.2.2`)

For the other requirements, inspect the ``pyproject.toml`` file. If you are updated the dependencies, please run `poetry update` to update the
* Python 3.9+
* numpy
* scipy
* scikit-learn>=1.3.1
adam2392 marked this conversation as resolved.
Show resolved Hide resolved

For the other requirements, inspect the ``pyproject.toml`` file.

# Setting up your development environment

We recommend using miniconda, as python virtual environments may not setup properly compilers necessary for our compiled code.

<!-- Setup a conda env -->

conda create -n sktree python=3.9
adam2392 marked this conversation as resolved.
Show resolved Hide resolved
conda activate sktree

**Any commands should ALWAYS be after you have activated your conda environment.**
Next, install necessary build dependencies. For more information, see https://scikit-learn.org/stable/developers/advanced_installation.html.

conda install -c conda-forge joblib threadpoolctl pytest compilers llvm-openmp

Assuming these steps have worked properly and you have read and followed any necessary scikit-learn advanced installation instructions, you can then install dependencies for scikit-tree.

If you are developing locally, you will need the build dependencies to compile the Cython / C++ code:

pip install -r build_requirements.txt

Other requirements can be installed as such:

pip install -r requirements.txt
pip install -r style_requirements.txt
pip install -r test_requirements.txt
pip install -r doc_requirements.txt

# Building the project from source

We leverage meson to build scikit-tree from source. We utilize a CLI tool, called [spin](https://github.com/scientific-python/spin), which wraps certain meson commands to make building easier.

For example, the following command will build the project

spin build
adam2392 marked this conversation as resolved.
Show resolved Hide resolved

The following command will test the project

spin test

For other commands, see

spin --help

Note at this stage, you will be unable to run Python commands directly. For example, ``pytest ./sktree`` will not work.

However, after installing and building the project from source using meson, you can leverage editable installs to make testing code changes much faster.

pip install --no-build-isolation --editable .
sampan501 marked this conversation as resolved.
Show resolved Hide resolved

This will now link the meson build to your Python runtime. Now if you run

pytest ./sktree

the unit-tests should run.

# Development Tasks
There are a series of top-level tasks available through Poetry. These can each be run via
There are a series of top-level tasks available through Poetry. If you are updated the dependencies, please run `poetry update` to update the lock file. These can each be run via

`poetry run poe <taskname>`

To do so, first install poetry and poethepoet.

pip install poetry poethepoet

Now, you are ready to run quick commands to format the codebase, lint the codebase and type-check the codebase.

### Basic Verification
* **format** - runs the suite of formatting tools applying tools to make code compliant
* **format_check** - runs the suite of formatting tools checking for compliance
Expand Down Expand Up @@ -53,6 +136,23 @@ In order for any code to be added to the repository, we require unit tests to pa

poetry run poe unit_test

# (Advanced) Updating submodules

Scikit-tree relies on a submodule of a forked-version of scikit-learn for certain Python and Cython code that extends the ``DecisionTree*`` models. Usually, if a developer is making changes, they should go over to the ``submodulev3`` branch on ``https://github.com/neurodata/scikit-learn`` and
submit a PR to make changes to the submodule.

This should **ALWAYS** be supported by some use-case in scikit-tree. We want the minimal amount of code-change in our forked version of scikit-learn to make it very easy to merge in upstream changes, bug fixes and features for tree-based code.

Once a PR is submitted and merged, the developer can update the submodule here in scikit-tree, so that we leverage the new commit. You **must** update the submodule commit ID and also commit this change, so that way the build leverages the new submodule commit ID.

git submodule update --init --recursive --remote
git add -A
git commit -m "Update submodule" -s

Now, you can re-build the project using the latest submodule changes.

spin build --clean

# Cython and C++
The general design of scikit-tree follows that of the tree-models inside scikit-learn, where tree-based models are inherently Cythonized, or written with C++. Then the actual forest (e.g. RandomForest, or ExtraForest) is just a Python API wrapper that creates an ensemble of the trees.

Expand Down
Loading