Skip to content

v0.3.0

Compare
Choose a tag to compare
@Innixma Innixma released this 15 Aug 02:42
· 1802 commits to master since this release
57e08cc

v0.3.0 introduces multi-modal image, text, tabular support to AutoGluon. In just a few lines of code, you can train a multi-layer stack ensemble using text, image, and tabular data! To our knowledge this is the first publicly available implementation of a model that handles all 3 modalities at once. Check it out in our brand new multimodal tutorial! v0.3.0 also features a major model quality improvement for Tabular, with a 57.6% winrate vs v0.2.0 on the AutoMLBenchmark, along with an up to 10x online inference speedup due to low level numpy and pandas optimizations throughout the codebase! This inference optimization enables AutoGluon to have sub 30 millisecond end-to-end latency for real-time deployment scenarios when paired with model distillation. Finally, AutoGluon can now train PyTorch image models via integration with TIMM. Specify any TIMM model to ImagePredictor or TabularPredictor to train them with AutoGluon!

This release is non-breaking when upgrading from v0.2.0. As always, only load previously trained models using the same version of AutoGluon that they were originally trained on. Loading models trained in different versions of AutoGluon is not supported.

This release contains 70 commits from 10 contributors.

See the full commit change-log here: v0.2.0...v0.3.0

Thanks to the 10 contributors that contributed to the v0.3.0 release!

Special thanks to the 3 first-time contributors! @rxjx, @sallypannn, @sarahyurick

Special thanks to @talhaanwarch who opened 21 GitHub issues (!) and participated in numerous discussions during v0.3.0 development. His feedback was incredibly valuable when diagnosing issues and improving the user experience throughout AutoGluon!

Full Contributor List (ordered by # of commits):

@Innixma, @zhreshold, @jwmueller, @gradientsky, @sxjscience, @ValerioPerrone, @taesup-aws, @sallypannn, @rxjx, @sarahyurick

Major Changes

Multimodal

Tutorials

Tabular

  • Overall, AutoGluon-Tabular v0.3 wins 57.6% of the time against AutoGluon-Tabular v0.2 in AutoMLBenchmark!
  • Improved online inference speed by 1.5x-10x via various low level pandas and numpy optimizations. @Innixma (#1136)
  • Accelerated feature preprocessing speed by 100x+ for datetime and text features. @Innixma (#1203)
  • Fixed FastAI model not properly scaling regression label values, improving model quality significantly. @Innixma (#1162)
  • Fixed r2 metric having the wrong sign in FastAI model, dramatically improving performance when r2 metric is specified. @Innixma (#1159)
  • Updated XGBoost to 1.4, defaulted hyperparameter tree_method='hist' for improved performance. @Innixma (#1239)
  • Added groups parameter. Now users can specify the exact split indices in a groups column when performing model bagging. This solution leverages sklearn's LeaveOneGroupOut cross-validator. @Innixma (#1224)
  • Added option to use holdout data for final ensembling weights in multi-layer stacking via a new use_bag_holdout argument. @Innixma (#1105)
  • Added neural network based quantile regression models. @taesup-aws (#1047)
  • Bug fix for random forest models' out-of-fold prediction computation in quantile regression. @jwmueller, @Innixma (#1100, #1102)
  • Added predictor.features() to get the original feature names used during training. @Innixma (#1257)
  • Refactored AbstractModel code to be easier to use. @Innixma (#1151, #1216, #1245, #1266)
  • Refactored BaggedEnsembleModel code in preparation for distributed bagging. @gradientsky (#1078)
  • Updated RAPIDS version to 21.06. @sarahyurick (#1241)
  • Force dtype conversion in feature preprocessing to align with FeatureMetadata. Now users can specify the dtypes of features via FeatureMetadata rather than updating the DataFrame. @Innixma (#1212)
  • Fixed various edge cases with out-of-bounds date time values. Now out-of-bounds date time values are treated as missing. @Innixma (#1182)

Vision

  • Added Torch / TIMM backend support! Now AutoGluon can train any TIMM model natively, and MXNet is no longer required to train vision models. @zhreshold (#1249)
  • Added regression problem_type support to ImagePredictor. @sallypannn (#1165)
  • Added GPU memory check to avoid going OOM during training. @Innixma (#1199)
  • Fixed error when vision models are hyperparameter tuned with forked multiprocessing. @gradientsky (#1107)
  • Fixed crash when an image is missing (both train and inference). Use TabularPredictor's Image API to get this functionality. @Innixma (#1210)
  • Fixed error when the same image is in multiple rows when calling predict_proba. @Innixma (#1206)
  • Fixed invalid preset configurations. @Innixma (#1199)
  • Fixed major defect causing tuning data to not be properly created if tuning data was not provided by user. @Innixma (#1168)
  • Upgraded Pillow version to '>=8.3.0,<8.4.0'. @gradientsky (#1262)

Text

  • Removed pyarrow as a required dependency. @Innixma (#1200)
  • Fixed crash when eval_metric='average_precision'. @rxjx (#1092)

General