📋 About • 📦 Installation • 🚀 Usage • 🧬 Genetic Operators • 🫂 Community Guidelines • 📜 License
GATree is a Python library designed for implementing evolutionary decision trees using a standard genetic algorithm approach. The library provides functionalities for selection, mutation, and crossover operations within the decision tree structure, allowing users to evolve and optimise decision trees for various classification and clustering tasks. 🌲🧬
The library's core objective is to empower users in creating and fine-tuning decision trees through an evolutionary process, opening avenues for innovative approaches to classification and clustering problems. GATree enables the dynamic growth and adaptation of decision trees, offering a flexible and powerful tool for machine learning enthusiasts and practitioners. 🚀🌿
GATree is currently limited to classification and clustering tasks, with support for regression tasks planned for future releases. 💡
- Free software: MIT license
- Documentation: http://gatree.readthedocs.io
- Python: 3.9, 3.10, 3.11, 3.12
- Dependencies: listed in CONTRIBUTING.md
- Operating systems: Windows, Ubuntu, macOS
To install GATree
using pip, run the following command:
pip install gatree
The following example demonstrates how to perform classification of the iris dataset using GATree
. More examples can be found in the examples directory.
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from gatree.methods.gatreeclassifier import GATreeClassifier
# Load the iris dataset
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = pd.Series(iris.target, name='target')
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=¸10)
# Create and fit the GATree classifier
gatree = GATreeClassifier(n_jobs=16, random_state=32)
gatree.fit(X=X_train, y=y_train, population_size=100, max_iter=100)
# Make predictions on the testing set
y_pred = gatree.predict(X_test)
# Evaluate the accuracy of the classifier
print(accuracy_score(y_test, y_pred))
The genetic algorithm for decision trees in GATree
involves several key operators: selection, elitism, crossover, and mutation. Each of these operators plays a crucial role in the evolution and optimisation of the decision trees. Below is a detailed description of each operator within the context of the GATree
class.
Selection is the process of choosing parent trees from the current population to produce offspring for the next generation. By default, GATree
class uses tournament selection, a method where a subset of the population is randomly chosen, and the best individual from this subset is selected.
Elitism ensures that the best-performing individuals (trees) from the current generation are carried over to the next generation without any modification. This guarantees that the quality of the population does not decrease from one generation to the next.
Crossover is a genetic operator used to combine the genetic information of two parent trees to generate new offspring. This enables exploration, which helps in creating diversity in the population and combining good traits from both parents.
Mutation introduces random changes to a tree to maintain genetic diversity and explore new solutions. This helps in avoiding local optima by introducing new genetic structures.
To contribure to the software, please read the contributing guidelines.
If you encounter any issues with the library, please report them using the issue tracker. Include a detailed description of the problem, including the steps to reproduce the problem, the stack trace, and details about your operating system and software version.
If you need support, please first refer to the documentation. If you still require assistance, please open an issue on the issue tracker with the question
tag. For private inquiries, you can contact us via e-mail at tadej.lahovnik1@um.si or saso.karakatic@um.si.
This package is distributed under the MIT License. This license can be found online at http://www.opensource.org/licenses/MIT.
This framework is provided as-is, and there are no guarantees that it fits your purposes or that it is bug-free. Use it at your own risk!