Releases: intel/scikit-learn-intelex
Intel(R) Extension for Scikit-learn 2021.2.3
🚨 New Features
- Added support of patching scikit-learn version 1.0. scikit-learn version 0.21. * is no longer supported
Intel(R) Extension for Scikit-learn 2021.2
⚡️ New package - Intel(R) Extension for Scikit-learn*
- Intel(R) Extension for Scikit-learn* contains scikit-learn patching functionality originally available in daal4py package. All future updates for the patching will be available in Intel(R) Extension for Scikit-learn only. Please use the package instead of daal4py.
⚠️ Deprecations
- Scikit-learn patching functionality in daal4py was deprecated and moved to a separate package - Intel(R) Extension for Scikit-learn*. All future updates for the patching will be available in Intel(R) Extension for Scikit-learn only. Please use the package instead of daal4py for the Scikit-learn acceleration.
📚 Support Materials
- Medium blogs:
- Kaggle kernels:
🛠️ Library Engineering
- Enabled new PyPI distribution channel for Intel(R) Extension for Scikit-learn and daal4py:
- Four latest Python versions (3.6, 3.7, 3.8) are supported on Linux, Windows and MacOS.
- Support of both CPU and GPU is included in the package.
- You can download daal4py using the following command:
pip install daal4py
- You can download Intel(R) Extension for Scikit-learn using the following command:
pip install scikit-learn-intelex
🚨 New Features
- Patches for four latest scikit-learn releases: 0.21.X, 0.22.X, 0.23.X and 0.24.X
- [CPU] Acceleration of
roc_auc_score
function - [CPU] Bit-to-bit results reproducibility for: LinearRegression, Ridge, SVC, KMeans, PCA, Lasso, ElasticNet, tSNE, KNeighborsClassifier, KNeighborsRegressor, NearestNeighbors, RandomForestClassifier, RandomForestRegressor
🚀 Improved performance
- [CPU] RandomForestClassifier and RandomForestRegressor scikit-learn estimators: training and prediction
- [CPU] Principal Component Analysis (PCA) scikit-learn estimator: training
- [CPU] Support Vector Classification (SVC) scikit-learn estimators: training and prediction
- [CPU] Support Vector Classification (SVC) scikit-learn estimator with the
probability==True
parameter: training and prediction
🐛 Bug Fixes
- [CPU] Improved accuracy of
RandomForestClassifier
andRandomForestRegressor
scikit-learn estimators - [CPU] Fixed patching issues with
pairwise_distances
- [CPU] Fixed the behavior of the
patch_sklearn
andunpatch_sklearn
functions - [CPU] Fixed unexpected behavior that made accelerated functionality unavailable through scikit-learn patching if the input was not of
float32
orfloat64
data types. Scikit-learn patching now works with all numpy data types. - [CPU] Fixed a memory leak that appeared when
DataFrame
from pandas was used as an input type - [CPU] Fixed performance issue for interoperability with
Modin
Intel® daal4py 2020 Update 3 Patch 1
What's New
- Added support of patching scikit-learn version 0.24.
Intel® daal4py 2021.1
What's New
Introduced new daal4py functionality:
- GPU:
- Batch algorithms:
K-means
,Covariance, PCA
,Logistic Regression
,Linear Regression
,Random Forest Classification
andRegression
,Gradient Boosting Classification
andRegression
,kNN
,SVM
,DBSCAN
andLow-order moments
- Online algorithms:
Covariance
,PCA
,Linear Regression
andLow-order moments
- Batch algorithms:
Improved daal4py performance for the following algorithms:
- CPU:
Logistic Regression
training and predictionk-Nearest Neighbors
prediction withBrute Force
methodLogistic Loss
andCross Entropy objective functions
Introduced new functionality for scikit-learn patching through daal4py:
- CPU:
- Acceleration of
NearestNeighbors
andKNeighborsRegressor
scikit-learn estimators withBrute Force
andK-D tree
methods - Acceleration of
TSNE
scikit-learn estimator
- Acceleration of
- GPU:
- Intel GPU support in scikit-learn for
DBSCAN
,K-means
,Linear
andLogistic Regression
- Intel GPU support in scikit-learn for
Improved performance of the following scikit-learn estimators via scikit-learn patching:
- CPU:
LogisticRegression
fit, predict and predict_proba methodsKNeighborsClassifier
predict, predict_proba and kneighbors methods with“brute”
method
Known Issues
train_test_split
indaal4py
patches forScikit-learn
can produce incorrect shuffling on Windows*
Installation
To install this package with conda run the following:
conda install -c intel daal4py
Intel® daal4py 2020 Update 3
What's New in Intel® daal4py 2020 Update 3:
Introduced new daal4py functionality:
- Conversion of trained
XGBoost
* andLightGBM
* models into a daal4py Gradient Boosted Trees model for fast prediction - Support of
Modin
* DataFrame as an input - Brute Force method for
k-Nearest Neighbors
classification algorithm, which for datasets with more than 13 features demonstrates a better performance than the existing K-D tree method k-Nearest Neighbors
search for K-D tree and Brute Force methods with computation of distances to nearest neighbors and their indices
Extended existing daal4py functionality:
- Voting methods for prediction in
k-Nearest Neighbors
classification and search: based on inverse-distance and uniform weighting - New parameters in
Decision Forest
classification and regression: minObservationsInSplitNode, minWeightFractionInLeafNode, minImpurityDecreaseInSplitNode, maxLeafNodes with best-first strategy and sample weights - Support of Support Vector Machine (
SVM
) decision function for Multi-class Classifier
Improved daal4py performance for the following algorithms:
SVM
training and predictionDecision Forest
classification trainingRBF
andLinear
kernel functions
Introduced new functionality for scikit-learn patching through daal4py:
- Acceleration of
KNeighborsClassifier
scikit-learn estimator with Brute Force and K-D tree methods - Acceleration of
RandomForestClassifier
andRandomForestRegressor
scikit-learn estimators - Sparse input support for
KMeans
and Support Vector Classification (SVC
) scikit-learn estimators - Prediction of probabilities for
SVC
scikit-learn estimator - Support of ‘normalize’ parameter for
Lasso
andElasticNet
scikit-learn estimators
Improved performance of the following functionality for scikit-learn patching through daal4py:
train_test_split()
- Support Vector Classification (
SVC
) fit and prediction
To install this package with conda run the following:
conda install -c intel daal4py
daal4py 2020.2
Introduced new functionality:
- Thunder method for Support Vector Machine (SVM) training algorithm, which demonstrates better training time than the existing sequential minimal optimization method
Extended existing functionality:
- Training with the number of features greater than the number of observations for Linear Regression, Ridge Regression, and Principal Component Analysis
- New sample_weights parameter for SVM algorithm
- New parameter in K-Means algorithm, resultsToEvaluate, which controls computation of centroids, assignments, and exact objective function
Improved performance for the following:
- Support Vector Machine training and prediction, Elastic Net and LASSO training, Principal Component Analysis training and transform, K-D tree based k-Nearest Neighbors prediction
- K-Means algorithm in batch computation mode
- RBF kernel function
Deprecated 32-bit support:
- 2020 product line will be the last one to support 32-bit
Introduced improvements to daal4py library:
- Performance optimizations for pandas input format
- Scikit-learn compatible API for AdaBoost classifier, Decision Tree classifier, and Gradient Boosted Trees classifier and regressor
Improved performance of the following Intel Scikit-learn algorithms and functions:
- fit and prediction in K-Means and Support Vector Classification (SVC), fit in Elastic Net and LASSO, fit and transform in PCA
- Support Vector Classification (SVC) with non-default weights of samples and classes
- train_test_split() and assert_all_finite()
To install this package with conda run the following:
conda install -c intel daal4py
daal4py 2020.1
Introduced new functionality:
- Elastic Net algorithm with L1 and L2 regularization in batch computation mode. The algorithm supports various optimization solvers that handle non-smooth functions.
- Probabilistic classification for Decision Forest Classification algorithm with a choice voting method to calculate probabilities.
Extended existing functionality:
- Performance optimizations for distributed Spark samples, K-means algorithm for some input dimensions, Gradient Boosted Trees training stage for large datasets on multi-core platforms and Decision Forest prediction stage for datasets with a small number of observations on processors that support Intel® Advanced Vector Extensions 2 (Intel® AVX2) and Intel® Advanced Vector Extensions 512 (Intel® AVX-512)
- Performance optimizations across algorithms that use SOA (Structure Of Arrays) NumericTable as an input on processors that support Intel® Advanced Vector Extensions 512 (Intel® AVX-512)
daal4py 2020.0
Added support for Brownboost, Logistboost as well as Stump regression and Stump classification algorithms to daal4py.
Added support for Adaboost classification algorithm, including support for method="SAMME" or "SAMMER" for multi-class data.
"Variable Importance" feature has been added in Gradient Boosting Trees.
Ability to compute class prediction probabilities has been added to appropriate classifiers, including logistic regression, tree-based classifiers, etc.
2019.5
Single node support for DBSCAN, LASSO, Coordinate Descent (CD) solver algorithms
Distributed model support for SVD, QR, K-means init++ and parallel++ algorithms
daal4py 2019.3
Product release with Intel(R) Parallel Studio 2019 Update 3