This code contains a python version of plastic using CapyMOA, and the scripts for running the experiments from the paper and plotting the results. The java code of plastic can be found under github.com/heymarco/PLASTIC
Access the paper via link.springer.com/chapter/10.1007/978-3-031-70362-1_3
Commonly used incremental decision trees for mining data streams include Hoeffding Trees (HT) and Extremely Fast Decision Trees (EFDT). EFDT exhibits faster learning than HT. However, due to its split revision procedure, EFDT suffers from sudden and unpredictable accuracy decreases caused by subtree pruning. To overcome this, we propose PLASTIC, an incremental decision tree that restructures the otherwise pruned subtree. This is possible due to decision tree plasticity: one can alter a tree's structure without affecting its predictions. We conduct extensive evaluations comparing PLASTIC with state-of-the-art methods on synthetic and real-world data streams. Our results show that PLASTIC improves EFDT's worst-case accuracy by up to 50 % and outperforms the current state of the art on real-world data. We provide an open-source implementation of PLASTIC within the MOA framework for mining high-speed data streams.
To run our experiments,
-
build the moa jar in the PLASTIC-repository mentioned above
-
include the moa jar in this project under
src/capymoa/jar/
-
run the scripts under
plastic/
If you want to cite this paper, use
@inproceedings{heyden2024leveraging,
title={Leveraging Plasticity in Incremental Decision Trees},
author={Heyden, Marco and Gomes, Heitor Murilo and Fouch{\'e}, Edouard and Pfahringer, Bernhard and B{\"o}hm, Klemens},
booktitle={Joint European Conference on Machine Learning and Knowledge Discovery in Databases},
pages={38--54},
year={2024},
organization={Springer}
}
Python wrapper for MOA to allow efficient use of existing algorithms with a more modern API
-
(Optional) It is recommended to use a conda environment since using a conda environment isolates project dependencies and avoids conflicts.
Follow the instructions at this link to install miniconda.
Setup the conda environment by running one of the following commands:
conda env create -f environment.yml # For linux conda env create -f environment_wds.yml # For windows
Ensure the environment is activated:
conda activate CapyMOA
-
Use pip to install the project in editable mode with the development dependencies:
pip install --editable '.[dev]'
-
Make sure the environment variable
JAVA_HOME
is set. -
Run the tests to make sure everything is working:
python -m pytest
-
(Optional) Try the example notebooks
- Download extra datasets with
make download
. - Run the DEMO notebook
python -m jupyter notebook notebooks/DEMO.ipynb
. The notebook must be started with the correctJAVA_HOME
variable set. To double check runecho $JAVA_HOME
in the terminal before starting the notebook.
- Download extra datasets with
- Full support for classification, regression and semi-supervised classification.
- Read CSV or ARFF files, or use synthetic generators from MOA.
These notebooks show how to do things. Data is available in the /data/
directory (some of which will need to be downloaded, see instrucitons there).
- DEMO.ipynb: Contains simple examples on how to execute classification and regression, using MOA objets to configure synthetic generators or classifiers/regressors.
- Evaluation_and_Data_Reading.ipynb: Many examples showing how to perform different evaluations for classification and regression using different methods (i.e. a loop or buildin functions).
- Learners_API_Examples.ipynb: Similar to the DEMO, but shows more capabilities of the evaluator and learner objects.
- Using_sklearn_pytorch.ipynb: Shows how one can use the API to run sklearn algorithms (those that implement
partial_fit
) and PyTorch models.
These show how some parts of the library were developed and provide comparisons of different options on how to do things.
- Efficient_Evaluation.ipynb: Some simple benchmarks comparing different versions of test_then_train_evaluation and prequential_evaluation. Interesting to developers looking to improve that aspect of the platform.
- Using_jpype_MOA_example.ipynb: Example using MOA directly from jpype without the library in-between. Interesting to developers looking for a full example of how it is done without the library.
- Data_Reading.ipynb: Data reading examples. More interesting to developers looking to improve the data capabilities.
Updated all the notebooks on 16/01/2024, removed some that were outdated