Skip to content

v0.2.0

Compare
Choose a tag to compare
@Innixma Innixma released this 28 Apr 22:24
· 1872 commits to master since this release
1939728

v0.2.0 introduces numerous optimizations that reduce Tabular average inference time by 4x and average disk usage by 10x compared to v0.1.0, as well as a refactored ImagePredictor API to better align with the other tasks and a 20x inference speedup in Vision tasks. This release contains 42 commits from 9 contributors.

This release is non-breaking when upgrading from v0.1.0, with four exceptions:

  1. ImagePredictor.predict and ImagePredictor.predict_proba have different output formats.
  2. TabularPredictor.evaluate and TabularPredictor.evaluate_predictions have different output formats.
  3. Custom dictionary inputs to TabularPredictor.fit's hyperparameter_tune_kwargs argument now have a different format.
  4. Models trained in v0.1.0 should only be loaded with v0.1.0. Loading models trained in different versions of AutoGluon is not supported.

See the full commit change-log here: v0.1.0...v0.2.0

Thanks to the 9 contributors that contributed to the v0.2.0 release!

Special thanks to the 3 first-time contributors! @taesup-aws, @ValerioPerrone, @lukemorrill

Full Contributor List (ordered by # of commits):

@Innixma, @zhreshold, @gradientsky, @jwmueller, @mseeger, @sxjscience, @taesup-aws, @ValerioPerrone, @lukemorrill

Major Changes

Tabular

  • Reduced overall inference time on best_quality preset by 4x (and 2x on others). @Innixma, @gradientsky
  • Reduced overall disk usage on best_quality preset by 10x. @Innixma
  • Reduced training time and inference time of K-Nearest-Neighbor models by 250x, and reduced disk usage by 10x via:
    • Efficient out-of-fold implementation (10x training & inference speedup, 10x reduced disk usage) on best_quality preset. @Innixma (#1022)
    • [Experimental] Integration of the scikit-learn-intelex package (25x training & inference speedup). @Innixma (#1049)
      • This is currently not installed by default. Try it via pip install autogluon.tabular[all,skex] or pip install "scikit-learn-intelex<2021.3". Once installed, AutoGluon will automatically use it.
  • Reduced training time, inference time, and disk usage of RandomForest and ExtraTrees models by 10x via efficient out-of-fold implementation. @Innixma (#1066, #1082)
  • Reduced training time by 30% and inference time by 75% on the FastAI neural network model. @gradientsky (#977)
  • Added quantile as a new problem_type to support quantile regression problems. @taesup-aws, @jwmueller (#1005, #1040)
  • [Experimental] Added GPU accelerated RandomForest, K-Nearest-Neighbors and Linear models via integration with NVIDIA RAPIDS. @Innixma (#995, #997, #1000)
    • This is not enabled by default. Try it out by first installing RAPIDS and then installing AutoGluon.
      • Currently, the models need to be specially passed to the .fit hyperparameters argument. Refer to the below kaggle kernel for an example or check out RAPIDS official AutoGluon example.
    • See how to use AutoGluon + RAPIDS to get top 1% on the Otto kaggle competition with an interactive kaggle kernel!
  • [Experimental] Added option to specify early stopping rounds for models LightGBM, CatBoost, and XGBoost via a new model parameter ag.early_stop. @Innixma (#1037)
    • Try it out via hyperparameters={'XGB': {'ag.early_stop': 500}}.
    • The API for this may change in future releases as we try to optimize usage of early stopping in AutoGluon.
  • [Experimental] Added adaptive early stopping to LightGBM. This will attempt to choose when to stop training the model more smartly than using an early stopping rounds value. @Innixma (#1042)
  • Re-ordered model training priority to perform better when time_limit is small. For time_limit=3600 on datasets with over 100,000 rows, v0.2.0 has a 65% win-rate over v0.1.0. @Innixma (#1059, #1084)
  • Adjusted time allocation to stack layers when performing multi-layer stacking to allow for longer training on earlier layers. @Innixma (#1075)
  • Updated CatBoost to v0.25. @Innixma (#1064)
  • Added extra_metrics argument to .leaderboard. @Innixma (#1058)
  • Added feature group importance support to .feature_importance. @Innixma (#989)
    • Now, users can get the combined importance of a group of features.
    • predictor.feature_importance(test_data, features=['A', 'B', 'C', ('AB', ['A', 'B'])])
  • [BREAKING] Refactored .evalute and .evaluate_predictions to be easier to use and share the same code logic. @Innixma (#1080)
    • The output type has changed and the sign of the metric score has been flipped in some circumstances.

Vision

  • Reduced inference time by 20x via various optimizations in inference batching. @zhreshold
  • Fixed a problem when loading saved models on cpu-only machines when models are trained on GPU. @zhreshold
  • Improved model fitting performance by up to 10% for ObjectDetector when presets is empty. @zhreshold
  • [BREAKING] Refactored predict and predict_proba methods in ImagePredictor to have the same output formats as TabularPredictor and TextPredictor. @zhreshold (#1044)
    • This change is BREAKING. Previous users of v0.1.0 should ensure they update to use the new formats if they made use of the old predict and predict_proba when switching to v0.2.0.
  • Added improved support for CSV and pandas DataFrame input to ImagePredictor. @zhreshold (#1010)
  • Added early stopping strategies that significantly improve training efficiency. @zhreshold (#1039)

General