Skip to content

snad-space/coniferest

Repository files navigation

coniferest

PyPI version Documentation Status Test Workflow Build and publish wheels pre-commit.ci status

Package for active anomaly detection with isolation forests, made by SNAD collaboration.

It includes:

  • IsolationForest - reimplementation of scikit-learn's isolation forest with much better scoring performance due to the use of Cython and multi-threading (the latter is not currently available on macOS).
  • AADForest - reimplementation of Active Anomaly detection algorithm with isolation forests from Shubhomoy Das' ad_examples package with better performance, much less code and more flexible dependencies.
  • PineForest - our own active learning model based on the idea of tree filtering.

Install the package with pip install coniferest.

See the documentation for the Tutorial.

asciicast

Installation

The project is using Cython for performance and requires compilation. However, binary wheels are available for Linux, macOS and Windows, so you can install the package with pip install coniferest on these platforms with no build-time dependencies. Currently multithreading is not available in macOS ARM wheels, but you can install the package from the source to enable it, see instructions below.

If your specific platform is not supported, or you need a development version, you can install the package from the source. To do so, clone the repository and run pip install . in the root directory.

Note, that we are using OpenMP for multi-threading, which is not available on macOS with the Apple LLVM Clang compiler. You still can install the package with Apple LLVM, but it will be single-threaded. Alternatively, you can install the package with Clang from Homebrew (brew install llvm libomp) or GCC (brew install gcc), which will enable multi-threading. In this case you will need to set environment variables CC=gcc-12 (or whatever version you have installed) or CC=$(brew --preifx llvm)/bin/clang and CONIFEREST_FORCE_OPENMP_ON_MACOS=1.

Development

You can install the package in editable mode with pip install -e .[dev] to install the development dependencies.

Linters and formatters

This project makes use of pre-commit hooks, you can install them with pre-commit install. Pre-commit CI is used for continuous integration of the hooks, they are applied to every pull request, and CI is responsible for auto-updating the hooks.

Testing and benchmarking

We use tox to build and test the package in isolated environments with different Python versions. To run tests locally, install tox with pip install tox and run tox in the root directory. We configure tox to skip long tests.

The project uses pytest as a testing framework. Tests are located in the tests directory, and can be run with pytest tests in the root directory. By default, all tests are run, but you can select specific tests with -k option, e.g. pytest tests -k test_onnx.test_onnx_aadforest. You can also deselect a specific group of tests with -m option, e.g. pytest tests -m'not long', see pyproject.toml for the list of markers.

We use pytest-benchmark for benchmarking. You can run benchmarks with pytest tests --benchmark-enable -m benchmark in the root directory. Most of the benchmarks have n_jobs fixture set to 1 by default, you can change it with --n_jobs option. You can adjust the minimum number of iterations with --benchmark-min-rounds and maximum execution time per benchmark with --benchmark-max-time (note that the latter can be exceeded if the minimum number of rounds is not reached). See pyproject.toml for the default benchmarking options. You can make a snapshot the current benchmark result with --benchmark-save=NAME or with --benchmark-autosave, and compare benchmarks with pytest-benchmark compare command.