Isolation Forests is an unsupervised machine learning algorithm that builds an ensemble of Isolation Trees (iTrees). These trees are binary structures used to isolate data points, particularly detecting anomalies within a dataset. Anomalies are likely to be closer to the root on an iTree, meaning the fewer decisions required to isolate a point, the more probable it is to be an anomaly.
- Randomly select a feature and a random splitting value to create partitions along that feature.
- Recursively partition the data into subspaces until each point is isolated or a maximum depth is reached.
- Measure the average path length for each data point across all trees in the forest.
- Anomalies are expected to have shorter average path length (easy to isolate).
- Inversely proportional to its average path length.
- Set a threshold to identify anomalies.
- Efficient for high-dimensional data and can handle large datasets.
- Scalable.
- Sensitive to parameters.
- May struggle with certain data patterns.