v0.3.0
v0.3.0 introduces multi-modal image, text, tabular support to AutoGluon. In just a few lines of code, you can train a multi-layer stack ensemble using text, image, and tabular data! To our knowledge this is the first publicly available implementation of a model that handles all 3 modalities at once. Check it out in our brand new multimodal tutorial! v0.3.0 also features a major model quality improvement for Tabular, with a 57.6% winrate vs v0.2.0 on the AutoMLBenchmark, along with an up to 10x online inference speedup due to low level numpy and pandas optimizations throughout the codebase! This inference optimization enables AutoGluon to have sub 30 millisecond end-to-end latency for real-time deployment scenarios when paired with model distillation. Finally, AutoGluon can now train PyTorch image models via integration with TIMM. Specify any TIMM model to ImagePredictor
or TabularPredictor
to train them with AutoGluon!
This release is non-breaking when upgrading from v0.2.0. As always, only load previously trained models using the same version of AutoGluon that they were originally trained on. Loading models trained in different versions of AutoGluon is not supported.
This release contains 70 commits from 10 contributors.
See the full commit change-log here: v0.2.0...v0.3.0
Thanks to the 10 contributors that contributed to the v0.3.0 release!
Special thanks to the 3 first-time contributors! @rxjx, @sallypannn, @sarahyurick
Special thanks to @talhaanwarch who opened 21 GitHub issues (!) and participated in numerous discussions during v0.3.0 development. His feedback was incredibly valuable when diagnosing issues and improving the user experience throughout AutoGluon!
Full Contributor List (ordered by # of commits):
@Innixma, @zhreshold, @jwmueller, @gradientsky, @sxjscience, @ValerioPerrone, @taesup-aws, @sallypannn, @rxjx, @sarahyurick
Major Changes
Multimodal
- Added multimodal tabular, text, image functionality! See the tutorial to get started. @Innixma, @zhreshold (#1041, #1211, #1277)
Tutorials
- Added a new custom model tutorial to showcase how to easily add any model to AutoGluon! @Innixma (#1238)
- Added a new custom metric tutorial to showcase how to add custom metrics to AutoGluon! @Innixma (#1271)
- Added FairHPO tutorial. @ValerioPerrone (#1090, #1236)
Tabular
- Overall, AutoGluon-Tabular v0.3 wins 57.6% of the time against AutoGluon-Tabular v0.2 in AutoMLBenchmark!
- Improved online inference speed by 1.5x-10x via various low level pandas and numpy optimizations. @Innixma (#1136)
- Accelerated feature preprocessing speed by 100x+ for datetime and text features. @Innixma (#1203)
- Fixed FastAI model not properly scaling regression label values, improving model quality significantly. @Innixma (#1162)
- Fixed r2 metric having the wrong sign in FastAI model, dramatically improving performance when r2 metric is specified. @Innixma (#1159)
- Updated XGBoost to 1.4, defaulted hyperparameter
tree_method='hist'
for improved performance. @Innixma (#1239) - Added
groups
parameter. Now users can specify the exact split indices in agroups
column when performing model bagging. This solution leverages sklearn's LeaveOneGroupOut cross-validator. @Innixma (#1224) - Added option to use holdout data for final ensembling weights in multi-layer stacking via a new
use_bag_holdout
argument. @Innixma (#1105) - Added neural network based quantile regression models. @taesup-aws (#1047)
- Bug fix for random forest models' out-of-fold prediction computation in quantile regression. @jwmueller, @Innixma (#1100, #1102)
- Added
predictor.features()
to get the original feature names used during training. @Innixma (#1257) - Refactored AbstractModel code to be easier to use. @Innixma (#1151, #1216, #1245, #1266)
- Refactored BaggedEnsembleModel code in preparation for distributed bagging. @gradientsky (#1078)
- Updated RAPIDS version to 21.06. @sarahyurick (#1241)
- Force dtype conversion in feature preprocessing to align with FeatureMetadata. Now users can specify the dtypes of features via FeatureMetadata rather than updating the DataFrame. @Innixma (#1212)
- Fixed various edge cases with out-of-bounds date time values. Now out-of-bounds date time values are treated as missing. @Innixma (#1182)
Vision
- Added Torch / TIMM backend support! Now AutoGluon can train any TIMM model natively, and MXNet is no longer required to train vision models. @zhreshold (#1249)
- Added regression
problem_type
support to ImagePredictor. @sallypannn (#1165) - Added GPU memory check to avoid going OOM during training. @Innixma (#1199)
- Fixed error when vision models are hyperparameter tuned with forked multiprocessing. @gradientsky (#1107)
- Fixed crash when an image is missing (both train and inference). Use TabularPredictor's Image API to get this functionality. @Innixma (#1210)
- Fixed error when the same image is in multiple rows when calling
predict_proba
. @Innixma (#1206) - Fixed invalid preset configurations. @Innixma (#1199)
- Fixed major defect causing tuning data to not be properly created if tuning data was not provided by user. @Innixma (#1168)
- Upgraded Pillow version to '>=8.3.0,<8.4.0'. @gradientsky (#1262)
Text
- Removed pyarrow as a required dependency. @Innixma (#1200)
- Fixed crash when
eval_metric='average_precision'
. @rxjx (#1092)
General
- Improved support for GPU on Windows. @Innixma (#1255)
- Added quadratic kappa evaluation metric. @sxjscience (#1104)
- Improved access method for
__version__
. @Innixma (#1122) - Upgraded pandas to 1.3. @Innixma (#1258)
- Upgraded ConfigSpace to 0.4.19. @Innixma (#1265)
- Upgraded numpy, graphviz, and dill versions. @Innixma (#1275)
- Various minor doc improvements. @jwmueller, @Innixma (#1089, #1091, #1093, #1095, #1219, #1253)
- Various minor updates and fixes. @Innixma, @zhreshold, @gradientsky (#1098, #1099, #1101, #1113, #1117, #1118, #1166, #1177, #1188, #1197, #1227, #1229, #1235, #1245, #1251)