v0.1.0
v0.1.0 is our largest release yet, containing 173 commits from 20 contributors over the course of 5 months.
This release is API breaking from past releases, as AutoGluon is now a namespace package. Please refer to our documentation for using v0.1.0. New GitHub issues based on versions earlier then v0.1.0 will not be addressed, and we recommend all users to upgrade to v0.1.0 as soon as possible.
See the full commit change-log here: v0.0.15...v0.1.0
Try it out yourself in 5 minutes with our Colab Tutorial.
Special thanks to the 20 contributors that contributed to the v0.1.0 release! Contributor List:
@Innixma, @gradientsky, @sxjscience, @jwmueller, @zhreshold, @mseeger, @daikikatsuragawa, @Chudbrochil, @adrienatallah, @jonashaag, @songqiang, @larroy, @sackoh, @muhyun, @rschmucker, @aaronkl, @kaixinbaba, @sflender, @jojo19893, @mak-454
Major Changes
General
- MacOS is now fully supported.
- Windows is now experimentally supported. Installation instructions for Windows are still in progress.
- Python 3.8 is now supported.
- Overhauled API. APIs between TabularPredictor, TextPredictor, and ImagePredictor are now much more consistent. @Innixma, @sxjscience, @zhreshold, @jwmueller, @gradientsky
- Updated AutoGluon to a namespace package, now individual modules can be separately installed to improve flexibility. As an example, to only install HPO related functionality, you can get a minimal install via
pip install autogluon.core
. For a full list of available submodules, see this link. @gradientsky (#694) - Significantly improved robustness of HPO scheduling to avoid errors for user. @mseeger, @gradientsky, @rschmucker, @Innixma (#713, #735, #750, #754, #824, #920, #924)
- mxnet is no longer a required dependency in AutoGluon. @mseeger (#726)
- Various dependency version upgrades.
Tabular
- Major API refactor. @Innixma (#768, #855, #869)
- Multimodal Tabular + Text support (Tutorial). Now Tabular can train a multi-modal Tabular + Text transformer model alongside its standard models, and achieve state-of-the-art results on multi-modal tabular + text datasets with 3 lines of code. @sxjscience, @Innixma (#740, #752, #756, #770, #776, #794, #802, #848, #852, #867, #869, #871, #877)
- GPU support for LightGBM, CatBoost, XGBoost, MXNet neural network, and FastAI neural network models. Specify
ag_args_fit={'num_gpus': 1}
inTabularPredictor.fit()
to enable. @Innixma (#896) sample_weight
support. Tabular can now handle user-defined sample weights for imbalanced datasets. @jwmueller (#942, #962)- Multi-label prediction support (Tutorial). Tabular can now predict across multiple label columns. @jwmueller (#953)
- Added student model ensembling in model distillation. @Innixma (#937)
- Generally improved accuracy and robustness due to a variety of internal improvements and the addition of new models. (v0.1.0 gets a better score on over 70% of datasets in benchmarking compared to v0.0.15!)
- New model: XGBoost. @sackoh (#691)
- New model: FastAI Tabular Neural Network. @gradientsky (#742, #748, #826, #839, #842)
- New model: TextPredictorModel (Multi-modal transformer) (Requires GPU). @sxjscience (#770)
- New experimental model: TabTransformer (Tabular transformer model (paper)). @Chudbrochil (#723)
- New experimental model: FastText. @songqiang (#580)
- View all available models in our documentation: https://auto.gluon.ai/stable/api/autogluon.tabular.models.html
- New advanced functionality: Extract out-of-fold predictions from a fit TabularPredictor (docs). @Innixma (#779)
- Greatly optimized and expanded upon feature importance calculation functionality. Now
predictor.feature_importance()
returns confidence bounds on importance values. @Innixma (#803) - New experimental functionality:
predictor.fit_extra()
enables the fitting of additional models on top of an already fitTabularPredictor
object (docs). @Innixma (#768) - Per-model HPO support. Now you can specify
hyperparameter_tune_kwargs
in a model's hyperparameters via'ag_args': {'hyperparameter_tune_kwargs': hpo_args}
. @Innixma (#883) - Sped up preprocessing runtimes by 100x+ on large (10M+ row) datasets by subsampling data during feature duplicate resolution. @Innixma (#950)
- Added SHAP notebook tutorials. @jwmueller (#720)
- Heavily optimized CatBoost inference speed during online-inference. @Innixma (#724)
- KNN models now respect time_limit. @Innixma (#845)
- Added stack ensemble visualization method. @muhyun (#786)
- Added NLP token prefiltering logic for ngram generation. @sflender (#907)
- Added initial support for compression of model files to reduce disk usage. @adrienatallah (#940, #944)
- Numerous bug fixes. @Innixma, @jwmueller, @gradientsky (many...)
Text
- Major API refactor. @sxjscience (#876, #936, #972, #975)
- Support multi-GPU inference. @sxjscience (#873)
- Greatly improved user time_limit adherence. @Innixma (#877)
- Fixed bug in model deserialization. @jojo19893 (#708)
- Numerous bug fixes. @sxjscience (#836, #847, #850, #861, #865, #963, #980)
Vision
- Major API refactor. @zhreshold (#733, #828, #882, #930, #946)
- Greatly improved user time_limit adherence. @zhreshold