Skip to content

Latest commit

 

History

History
124 lines (108 loc) · 4.67 KB

CHANGELOG.md

File metadata and controls

124 lines (108 loc) · 4.67 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

0.5.0 - 2019-11-25

Added

  • Grid search and randomized search with cross-validation
  • K-fold splitter
  • Support for jupyter-notebooks from dislib docker image
  • Automatic installation of dislib executable when running pip install dislib
  • Support for sparse data in PCA
  • A new notebook with more usage examples
  • jupyter command to dislib executable
  • Pointer to sklearn license in LICENSE file
  • NOTICE file

Changed

  • Estimators now extend sklearn BaseEstimator
  • Extended tutorial notebook with other examples
  • Added acknowledgements to README

Removed

  • Pandas dependency in test_als
  • CODEOWNERS file

Fixed

  • Small fixes to tutorial notebook
  • Small fixes to documentation
  • dislib executable now works even if PyCOMPSs is not installed
  • Bug fix in ALS performance test
  • Several bugs in fancy indexing of ds-arrays
  • Fixed dislib executable on MacOS

0.4.0 - 2019-09-16

Added

  • Distributed array data structure
  • A basic tutorial notebook

Changed

  • Updated docker image to PyCOMPSs 2.5
  • Modified the whole library to use distributed arrays instead of Datasets (including estimators, examples, etc.)
  • Added 'init' parameter to K-means
  • Updated the developer guide

Removed

  • Dataset and Subset data structures
  • FFT estimator
  • Methods to load from multiple files

Fixed

  • Fixed the usage of random state in K-means
  • Some issues in the performance tests
  • Other minor bug fixes

0.3.0 - 2019-06-28

Added

  • The VERSION file
  • Test for duplicate support vectors in CSVM
  • Test for GaussianMixture with random initialization
  • New types of covariances for GaussianMixture and more tests
  • Scripts for automated performance tests on MareNostrum 4
  • A small Performance section to the docs
  • Two new algorithms: PCA and LinearRegression
  • Added some tests for DBSCAN

Changed

  • Dataset now does not check for duplicate samples (and does not build an array of unique IDs). This improves performance signifcantly.
  • CSVM now checks and removes duplicate samples generated during the fit process.
  • GaussianMixture now works with sparse data
  • GaussianMixture now removes partial results using compss_delete
  • Improved the performance of K-means' _partial_sum task
  • Improved docs of GaussianMixture and simplified the code
  • Added a check_convergence argument to GaussianMixture
  • Significant performance improvement of DBSCAN
  • Improved the performance of the shuffle method by using PyCOMPSs COLLECTIONS

Fixed

  • A bug in DBSCAN that was generating incorrect results in certain cases

0.2.0 - 2019-03-01

Added

  • This CHANGELOG file
  • Added badges to README file
  • Added tests for C-SVM and K-means
  • Created a utils module with shuffle and as_grid methods
  • Added an API reference to the documentation
  • Dataset.samples and Dataset.labels properties
  • New tests for DBSCAN
  • A first version of nearest neighbors algorithm
  • Added tests for C-SVM, K-means and DBSCAN with sparse data
  • Created a setup.py file and a pip package
  • First implementation of Gaussian mixtures and ALS
  • Implemented a StandardScaler class as part of a new preprocessing module
  • Created a resample method in the utils module
  • Dataset transpose
  • Dataset apply function

Changed

  • Refactored DBSCAN completely to make code more legible and fix several bugs
  • Fixed DBSCAN because it was producing wrong results in some scenarios. Changed the use of disjoint sets to connected components.
  • Extended the installation instructions in the README file
  • The script classifier_comparison.py now includes Random Forest classifier
  • Tests are split into modules
  • The COMPSs docker image has been reworked
  • Changed the way random_state is used in the different algorithms to ensure proper randomization and reproducibility of different executions.
  • Unified the signatures of the different algorithms to fit, predict, and fit_predict. These methods now have the same arguments in all the algorithms.
  • Changed license to Apache v2
  • Fixed some typos in README
  • load methods in the data module can take a delimiter argument now
  • Moved the quickstart guide to a separate file and included it in the documentation
  • Fixed several bugs