Neural network based quantile regression models

Tabular: Re-order NN model priority (autogluon#1059) Tabular: Added Adaptive Early Stopping (autogluon#1042) * Tabular: Added AdaptiveES, default adaptive to LightGBM * ag.es -> ag.early_stop * addressed comments Tabular: Upgraded CatBoost to v0.25 (autogluon#1064) Tabular: Added extra_metrics argument to leaderboard (autogluon#1058) * Tabular: Added extra_metrics argument to leaderboard * addressed comments Upgrade psutil and scipy (autogluon#1072) Tabular: Added efficient OOF functionality to RF/XT models (autogluon#1066) * Tabular: Added efficient OOF functionality to RF/XT models * addressed comments, disabled RF/XT use_child_oof by default Tabular: Adjusted per-level stack time (autogluon#1075) * Tabular: Added efficient OOF functionality to RF/XT models * addressed comments, disabled RF/XT use_child_oof by default * Tabular: Adjusted stack time limit allocation Constrained Bayesian optimization (autogluon#1034) * Constrained Bayesian optimization * Comments from Matthias * Fix random_seed keyword * constraint_attribute + other comments * Fix import Co-authored-by: Valerio Perrone <vperrone@amazon.com> Refactoring of FIFOScheduler, HyperbandScheduler: self.time_out bette… (autogluon#1050) * Refactoring of FIFOScheduler, HyperbandScheduler: self.time_out better respected by stopping jobs when they run over * Added an option and warning concerning the changed meaning of 'time_out' * Removed code to add time_this_iter to result in reporter (buggy, and not used) update predict_proba return (autogluon#1044) * update predict_proba return * non-api breaking * bump * update format * update label format and predict_proba * add test * fix d8 * remove squeeze * fix * fix incorrect class mapping, force it align with label column * fix * fix label * fix sorted list * fix * reset labels * fix test * address comments * fix test * fix * label * test for custom label Vision: Limited gluoncv version (autogluon#1081) Tabular: RF/XT Efficient OOB (autogluon#1082) * Tabular: Enabled efficient OOB for RF/XT * Tabular: Removed min_samples_leaf * 300 estimators Tabular: Refactored evaluate/evaluate_predictions (autogluon#1080) * Tabular: Refactored evaluate/evaluate_predictions * minor fix Tabular: Reorder model priority (autogluon#1084) * Tabular: Enabled efficient OOB for RF/XT * Tabular: Removed min_samples_leaf * 300 estimators * Tabular: Reordered model training priority * added memory check before training XGBoost * minor update * fix xgboost Updated to v0.2.0 (autogluon#1086) Restricted sklearn to >=0.23.2 (autogluon#1088) Update to 0.2.1 (autogluon#1087) TextPredictor fails if eval_metric = 'average_precision' (autogluon#1092) * TextPredictor fails if eval_metric = 'average_precision' Fixes autogluon#1085 * TextPredictor fails if eval_metric = 'average_precision' Fixes autogluon#1085 Co-authored-by: Rohit Jain <rohit@thetalake.com> upgrade SHAP notebooks (autogluon#1089) tell users to search closed issues (autogluon#1095) Added tutorial / API reference table to README.md (autogluon#1093) Tabular: Added ImagePredictorModel (autogluon#1041) * Tabular: Added ImagePredictorModel * Added ImagePredictorModel unittest * revert accidental minimum_cat_count change * addressed comments * addressed comments * Updated after ImagePredictor refactor * minor fix * Addressed comments add `tabular_nn_torch.py`
taesup-aws · May 6, 2021 · e8998ed · e8998ed
1 parent b257068
commit e8998ed
Show file tree

Hide file tree

Showing 79 changed files with 6,269 additions and 2,892 deletions.
diff --git a/Jenkinsfile b/Jenkinsfile
@@ -143,6 +143,8 @@ stage("Unit Test") {
           ${install_tabular_all}
           ${install_mxnet}
           ${install_text}
+          ${install_extra}
+          ${install_vision}
 
           cd tabular/
           python3 -m pytest --junitxml=results.xml --runslow tests

diff --git a/README.md b/README.md
@@ -8,6 +8,8 @@
 
 [![Build Status](https://ci.gluon.ai/view/all/job/autogluon/job/master/badge/icon)](https://ci.gluon.ai/view/all/job/autogluon/job/master/)
 [![Pypi Version](https://img.shields.io/pypi/v/autogluon.svg)](https://pypi.org/project/autogluon/#history)
+[![GitHub license](docs/static/apache2.svg)](./LICENSE)
+[![Downloads](https://pepy.tech/badge/autogluon)](https://pepy.tech/project/autogluon)
 ![Upload Python Package](https://github.com/awslabs/autogluon/workflows/Upload%20Python%20Package/badge.svg)
 
 AutoGluon automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.  With just a few lines of code, you can train and deploy high-accuracy machine learning and deep learning models on text, image, and tabular data.
@@ -19,14 +21,22 @@ AutoGluon automates machine learning tasks enabling you to easily achieve strong
 # python3 -m pip install -U pip
 # python3 -m pip install -U setuptools wheel
 # python3 -m pip install -U "mxnet<2.0.0"
-# python3 -m pip install autogluon  # autogluon==0.1.0
+# python3 -m pip install autogluon  # autogluon==0.2.0
 
 from autogluon.tabular import TabularDataset, TabularPredictor
 train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')
 test_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')
 predictor = TabularPredictor(label='class').fit(train_data, time_limit=120)  # Fit models for 120s
 leaderboard = predictor.leaderboard(test_data)
 ```
+
+| AutoGluon Task | Quickstart | API |
+| :--- | :---: | :---: |
+| TabularPredictor | [![Quick Start](https://img.shields.io/static/v1?label=&message=tutorial&color=grey)](https://auto.gluon.ai/stable/tutorials/tabular_prediction/tabular-quickstart.html) | [![API](https://img.shields.io/badge/api-reference-blue.svg)](https://auto.gluon.ai/stable/api/autogluon.predictor.html#module-0) |
+| TextPredictor | [![Quick Start](https://img.shields.io/static/v1?label=&message=tutorial&color=grey)](https://auto.gluon.ai/stable/tutorials/text_prediction/beginner.html) | [![API](https://img.shields.io/badge/api-reference-blue.svg)](https://auto.gluon.ai/stable/api/autogluon.predictor.html#module-3) |
+| ImagePredictor | [![Quick Start](https://img.shields.io/static/v1?label=&message=tutorial&color=grey)](https://auto.gluon.ai/stable/tutorials/image_prediction/beginner.html) | [![API](https://img.shields.io/badge/api-reference-blue.svg)](https://auto.gluon.ai/stable/api/autogluon.predictor.html#module-1) |
+| ObjectDetector | [![Quick Start](https://img.shields.io/static/v1?label=&message=tutorial&color=grey)](https://auto.gluon.ai/stable/tutorials/object_detection/beginner.html) | [![API](https://img.shields.io/badge/api-reference-blue.svg)](https://auto.gluon.ai/stable/api/autogluon.predictor.html#module-2) |
+
 ## News
 
 **Announcement for previous users:** The AutoGluon codebase has been modularized into [namespace packages](https://packaging.python.org/guides/packaging-namespace-packages/), which means you now only need those dependencies relevant to your prediction task of interest! For example, you can now work with tabular data without having to [install](https://auto.gluon.ai/dev/install.html) dependencies required for AutoGluon's computer vision tasks (and vice versa). Unfortunately this improvement required a minor API change (eg. instead of `from autogluon import TabularPrediction`, you should now do: `from autogluon.tabular import TabularPredictor`), for all versions newer than v0.0.15. Documentation/tutorials under the old API may still be viewed [for version 0.0.15](https://auto.gluon.ai/0.0.15/index.html) which is the last released version under the old API.

diff --git a/VERSION b/VERSION
@@ -1 +1 @@
-0.1.1
+0.2.1
diff --git a/core/src/autogluon/core/_setup_utils.py b/core/src/autogluon/core/_setup_utils.py
@@ -14,11 +14,11 @@
 
 # Only put packages here that would otherwise appear multiple times across different module's setup.py files.
 DEPENDENT_PACKAGES = {
-    'numpy': '==1.19.5',
+    'numpy': '==1.19.5',  # TODO: v0.3 consider upgrading
     'pandas': '>=1.0.0,<2.0',
-    'scikit-learn': '>=0.22.0,<0.25',
-    'scipy': '==1.5.4',
-    'gluoncv': '>=0.10.1,<0.12.0',
+    'scikit-learn': '>=0.23.2,<0.25',  # 0.22 crashes during efficient OOB in Tabular
+    'scipy': '>=1.5.4,<1.7',
+    'gluoncv': '>=0.10.1.post0,<0.11',
     'tqdm': '>=4.38.0',
     'Pillow': '<=8.1',
     'graphviz': '<0.9.0,>=0.8.1',

diff --git a/core/src/autogluon/core/features/feature_metadata.py b/core/src/autogluon/core/features/feature_metadata.py
@@ -24,7 +24,8 @@ class FeatureMetadata:
     type_group_map_special : Dict[str, List[str]], optional
         Dictionary of special types to lists of feature names.
         The keys can be anything, but it is generally recommended they be one of:
-            ['binned', 'datetime_as_int', 'datetime_as_object', 'text', 'text_as_category', 'text_special', 'text_ngram', 'stack']
+            ['binned', 'datetime_as_int', 'datetime_as_object', 'text', 'text_as_category', 'text_special', 'text_ngram', 'image_path', 'stack']
+        For descriptions of each special feature-type, see: `autogluon.core.features.types`
         Feature names that appear in the value lists must also be keys in type_map_raw.
         Feature names are not required to have special types.
     """
@@ -56,7 +57,7 @@ def _validate(self):
     # Note: This is not optimized for speed. Do not rely on this function during inference.
     # TODO: Add valid_names, invalid_names arguments which override all other arguments for the features listed?
     def get_features(self, valid_raw_types: list = None, valid_special_types: list = None, invalid_raw_types: list = None, invalid_special_types: list = None,
-                     required_special_types: list = None, required_raw_special_pairs: List[Tuple[str, List[str]]] = None, required_exact=False, required_at_least_one_special=False):
+                     required_special_types: list = None, required_raw_special_pairs: List[Tuple[str, List[str]]] = None, required_exact=False, required_at_least_one_special=False) -> List[str]:
         """
         Returns a list of features held within the feature metadata object after being pruned through the available parameters.
 
@@ -176,6 +177,41 @@ def keep_features(self, features: list, inplace=False):
         features_to_remove = [feature for feature in self.get_features() if feature not in features]
         return self.remove_features(features=features_to_remove, inplace=inplace)
 
+    def add_special_types(self, type_map_special: Dict[str, List[str]], inplace=False):
+        """
+        Adds special types to features.
+
+        Parameters
+        ----------
+        type_map_special : Dict[str, List[str]]
+            Dictionary of feature -> list of special types to add.
+            Features in dictionary must already exist in the FeatureMetadata object.
+        inplace : bool, default False
+            If True, updates self inplace and returns self.
+            If False, updates a copy of self and returns copy.
+        Returns
+        -------
+        :class:`FeatureMetadata` object.
+
+        Examples
+        --------
+        >>> from autogluon.core.features.feature_metadata import FeatureMetadata
+        >>> feature_metadata = FeatureMetadata({'FeatureA': 'int', 'FeatureB': 'object'})
+        >>> feature_metadata = feature_metadata.add_special_types({'FeatureA': ['MySpecialType'], 'FeatureB': ['MySpecialType', 'text']})
+        """
+        if inplace:
+            metadata = self
+        else:
+            metadata = copy.deepcopy(self)
+        valid_features = set(self.get_features())
+
+        for feature, special_types in type_map_special.items():
+            if feature not in valid_features:
+                raise ValueError(f'"{feature}" does not exist in this FeatureMetadata object. Only existing features can be assigned special types.')
+            for special_type in special_types:
+                metadata.type_group_map_special[special_type].append(feature)
+        return metadata
+
     @staticmethod
     def _remove_features_from_type_group_map(d, features):
         for key, features_orig in d.items():

diff --git a/core/src/autogluon/core/features/types.py b/core/src/autogluon/core/features/types.py
@@ -31,6 +31,9 @@
 # feature is a generated feature based off of a text feature that is an ngram.
 S_TEXT_NGRAM = 'text_ngram'
 
+# feature is an object type that contains a string path to an image that can be utilized in computer vision
+S_IMAGE_PATH = 'image_path'
+
 # feature is a generated feature based off of a ML model's prediction probabilities of the label column for the row.
 # Any model which takes a stack feature as input is a stack ensemble.
 S_STACK = 'stack'
diff --git a/core/src/autogluon/core/models/_utils.py b/core/src/autogluon/core/models/_utils.py
@@ -1,14 +1,30 @@
 
+from autogluon.core.utils.early_stopping import AdaptiveES, ES_CLASS_MAP
+
+
 # TODO: Add more strategies
-#  - Adaptive early stopping: adjust rounds during model training
-def get_early_stopping_rounds(num_rows_train, strategy='auto', min_rounds=10, max_rounds=150, min_rows=10000):
+def get_early_stopping_rounds(num_rows_train, strategy='auto', min_patience=10, max_patience=150, min_rows=10000):
+    if isinstance(strategy, (tuple, list)):
+        strategy = list(strategy)
+        if isinstance(strategy[0], str):
+            if strategy[0] in ES_CLASS_MAP:
+                strategy[0] = ES_CLASS_MAP[strategy[0]]
+            else:
+                raise AssertionError(f'unknown early stopping strategy: {strategy}')
+        return strategy
+
     """Gets early stopping rounds"""
     if strategy == 'auto':
-        modifier = 1 if num_rows_train <= min_rows else min_rows / num_rows_train
-        early_stopping_rounds = max(
-            round(modifier * max_rounds),
-            min_rounds,
-        )
+        strategy = 'simple'
+
+    modifier = 1 if num_rows_train <= min_rows else min_rows / num_rows_train
+    simple_early_stopping_rounds = max(
+        round(modifier * max_patience),
+        min_patience,
+    )
+    if strategy == 'simple':
+        return simple_early_stopping_rounds
+    elif strategy == 'adaptive':
+        return AdaptiveES, dict(adaptive_offset=min_patience, min_patience=simple_early_stopping_rounds)
     else:
         raise AssertionError(f'unknown early stopping strategy: {strategy}')
-    return early_stopping_rounds
diff --git a/core/src/autogluon/core/models/abstract/abstract_model.py b/core/src/autogluon/core/models/abstract/abstract_model.py
@@ -483,7 +483,12 @@ def _convert_proba_to_unified_form(self, y_pred_proba):
         For multiclass and softclass classification, keeps y_pred_proba as a 2 dimensional array of prediction probabilities for each class.
         For regression, converts y_pred_proba to a 1 dimensional array of predictions.
         """
-        if self.problem_type == BINARY:
+        if self.problem_type == REGRESSION:
+            if len(y_pred_proba.shape) == 1:
+                return y_pred_proba
+            else:
+                return y_pred_proba[:, 1]
+        elif self.problem_type == BINARY:
             if len(y_pred_proba.shape) == 1:
                 return y_pred_proba
             elif y_pred_proba.shape[1] > 1:
@@ -492,8 +497,8 @@ def _convert_proba_to_unified_form(self, y_pred_proba):
                 return y_pred_proba
         elif y_pred_proba.shape[1] > 2:  # Multiclass, Softclass
             return y_pred_proba
-        else:  # Regression
-            return y_pred_proba[:, 1]
+        else:  # Unknown problem type
+            raise AssertionError(f'Unknown y_pred_proba format for `problem_type="{self.problem_type}"`.')
 
     def score(self, X, y, metric=None, sample_weight=None, **kwargs):
         if metric is None:
@@ -951,7 +956,7 @@ def _get_model_params(self) -> dict:
         else:
             return self._get_params()
 
-    # TODO: Add documentation for valid args for each model. Currently only `ag.es`
+    # TODO: Add documentation for valid args for each model. Currently only `ag.early_stop`
     def _ag_params(self) -> set:
         """
         Set of params that are not passed to self.model, but are used by the wrapper.
@@ -963,7 +968,7 @@ def _ag_params(self) -> set:
 
         Possible params:
 
-        ag.es : int, str, or tuple
+        ag.early_stop : int, str, or tuple
             generic name for early stopping logic. Typically can be an int or a str preset/strategy.
             Also possible to pass tuple of (class, kwargs) to construct a custom early stopping object.
                 Refer to `autogluon.core.utils.early_stopping` for examples.

diff --git a/core/src/autogluon/core/models/ensemble/bagged_ensemble_model.py b/core/src/autogluon/core/models/ensemble/bagged_ensemble_model.py
@@ -319,7 +319,7 @@ def _fit_single(self, X, y, model_base, use_child_oof, time_limit, **kwargs):
             logger.log(15, '\t`use_child_oof` was specified for this model. It will function similarly to a bagged model, but will only fit one child model.')
             time_start_predict = time.time()
             if model_base._get_tags().get('valid_oof', False):
-                self._oof_pred_proba = model_base.get_oof_pred_proba(X=X)
+                self._oof_pred_proba = model_base.get_oof_pred_proba(X=X, y=y)
             else:
                 logger.warning('\tWARNING: `use_child_oof` was specified but child model does not have a dedicated `get_oof_pred_proba` method. This model may have heavily overfit validation scores.')
                 self._oof_pred_proba = model_base.predict_proba(X=X)