Skip to content

Commit

Permalink
Neural network based quantile regression models
Browse files Browse the repository at this point in the history
Tabular: Re-order NN model priority (autogluon#1059)

Tabular: Added Adaptive Early Stopping (autogluon#1042)

* Tabular: Added AdaptiveES, default adaptive to LightGBM

* ag.es -> ag.early_stop

* addressed comments

Tabular: Upgraded CatBoost to v0.25 (autogluon#1064)

Tabular: Added extra_metrics argument to leaderboard (autogluon#1058)

* Tabular: Added extra_metrics argument to leaderboard

* addressed comments

Upgrade psutil and scipy (autogluon#1072)

Tabular: Added efficient OOF functionality to RF/XT models (autogluon#1066)

* Tabular: Added efficient OOF functionality to RF/XT models

* addressed comments, disabled RF/XT use_child_oof by default

Tabular: Adjusted per-level stack time (autogluon#1075)

* Tabular: Added efficient OOF functionality to RF/XT models

* addressed comments, disabled RF/XT use_child_oof by default

* Tabular: Adjusted stack time limit allocation

Constrained Bayesian optimization (autogluon#1034)

* Constrained Bayesian optimization

* Comments from Matthias

* Fix random_seed keyword

* constraint_attribute + other comments

* Fix import

Co-authored-by: Valerio Perrone <vperrone@amazon.com>

Refactoring of FIFOScheduler, HyperbandScheduler: self.time_out bette… (autogluon#1050)

* Refactoring of FIFOScheduler, HyperbandScheduler: self.time_out better respected by stopping jobs when they run over

* Added an option and warning concerning the changed meaning of 'time_out'

* Removed code to add time_this_iter to result in reporter (buggy, and not used)

update predict_proba return (autogluon#1044)

* update predict_proba return

* non-api breaking

* bump

* update format

* update label format and predict_proba

* add test

* fix d8

* remove squeeze

* fix

* fix incorrect class mapping, force it align with label column

* fix

* fix label

* fix sorted list

* fix

* reset labels

* fix test

* address comments

* fix test

* fix

* label

* test for custom label

Vision: Limited gluoncv version (autogluon#1081)

Tabular: RF/XT Efficient OOB (autogluon#1082)

* Tabular: Enabled efficient OOB for RF/XT

* Tabular: Removed min_samples_leaf

* 300 estimators

Tabular: Refactored evaluate/evaluate_predictions (autogluon#1080)

* Tabular: Refactored evaluate/evaluate_predictions

* minor fix

Tabular: Reorder model priority (autogluon#1084)

* Tabular: Enabled efficient OOB for RF/XT

* Tabular: Removed min_samples_leaf

* 300 estimators

* Tabular: Reordered model training priority

* added memory check before training XGBoost

* minor update

* fix xgboost

Updated to v0.2.0 (autogluon#1086)

Restricted sklearn to >=0.23.2 (autogluon#1088)

Update to 0.2.1 (autogluon#1087)

TextPredictor fails if eval_metric = 'average_precision' (autogluon#1092)

* TextPredictor fails if eval_metric = 'average_precision'
Fixes autogluon#1085

* TextPredictor fails if eval_metric = 'average_precision'
Fixes autogluon#1085

Co-authored-by: Rohit Jain <rohit@thetalake.com>

upgrade SHAP notebooks (autogluon#1089)

tell users to search closed issues (autogluon#1095)

Added tutorial / API reference table to README.md (autogluon#1093)

Tabular: Added ImagePredictorModel (autogluon#1041)

* Tabular: Added ImagePredictorModel

* Added ImagePredictorModel unittest

* revert accidental minimum_cat_count change

* addressed comments

* addressed comments

* Updated after ImagePredictor refactor

* minor fix

* Addressed comments

add `tabular_nn_torch.py`
  • Loading branch information
Innixma authored and taesup-aws committed May 6, 2021
1 parent b257068 commit e8998ed
Show file tree
Hide file tree
Showing 79 changed files with 6,269 additions and 2,892 deletions.
2 changes: 2 additions & 0 deletions Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,8 @@ stage("Unit Test") {
${install_tabular_all}
${install_mxnet}
${install_text}
${install_extra}
${install_vision}
cd tabular/
python3 -m pytest --junitxml=results.xml --runslow tests
Expand Down
12 changes: 11 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@

[![Build Status](https://ci.gluon.ai/view/all/job/autogluon/job/master/badge/icon)](https://ci.gluon.ai/view/all/job/autogluon/job/master/)
[![Pypi Version](https://img.shields.io/pypi/v/autogluon.svg)](https://pypi.org/project/autogluon/#history)
[![GitHub license](docs/static/apache2.svg)](./LICENSE)
[![Downloads](https://pepy.tech/badge/autogluon)](https://pepy.tech/project/autogluon)
![Upload Python Package](https://github.com/awslabs/autogluon/workflows/Upload%20Python%20Package/badge.svg)

AutoGluon automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just a few lines of code, you can train and deploy high-accuracy machine learning and deep learning models on text, image, and tabular data.
Expand All @@ -19,14 +21,22 @@ AutoGluon automates machine learning tasks enabling you to easily achieve strong
# python3 -m pip install -U pip
# python3 -m pip install -U setuptools wheel
# python3 -m pip install -U "mxnet<2.0.0"
# python3 -m pip install autogluon # autogluon==0.1.0
# python3 -m pip install autogluon # autogluon==0.2.0

from autogluon.tabular import TabularDataset, TabularPredictor
train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')
test_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')
predictor = TabularPredictor(label='class').fit(train_data, time_limit=120) # Fit models for 120s
leaderboard = predictor.leaderboard(test_data)
```

| AutoGluon Task | Quickstart | API |
| :--- | :---: | :---: |
| TabularPredictor | [![Quick Start](https://img.shields.io/static/v1?label=&message=tutorial&color=grey)](https://auto.gluon.ai/stable/tutorials/tabular_prediction/tabular-quickstart.html) | [![API](https://img.shields.io/badge/api-reference-blue.svg)](https://auto.gluon.ai/stable/api/autogluon.predictor.html#module-0) |
| TextPredictor | [![Quick Start](https://img.shields.io/static/v1?label=&message=tutorial&color=grey)](https://auto.gluon.ai/stable/tutorials/text_prediction/beginner.html) | [![API](https://img.shields.io/badge/api-reference-blue.svg)](https://auto.gluon.ai/stable/api/autogluon.predictor.html#module-3) |
| ImagePredictor | [![Quick Start](https://img.shields.io/static/v1?label=&message=tutorial&color=grey)](https://auto.gluon.ai/stable/tutorials/image_prediction/beginner.html) | [![API](https://img.shields.io/badge/api-reference-blue.svg)](https://auto.gluon.ai/stable/api/autogluon.predictor.html#module-1) |
| ObjectDetector | [![Quick Start](https://img.shields.io/static/v1?label=&message=tutorial&color=grey)](https://auto.gluon.ai/stable/tutorials/object_detection/beginner.html) | [![API](https://img.shields.io/badge/api-reference-blue.svg)](https://auto.gluon.ai/stable/api/autogluon.predictor.html#module-2) |

## News

**Announcement for previous users:** The AutoGluon codebase has been modularized into [namespace packages](https://packaging.python.org/guides/packaging-namespace-packages/), which means you now only need those dependencies relevant to your prediction task of interest! For example, you can now work with tabular data without having to [install](https://auto.gluon.ai/dev/install.html) dependencies required for AutoGluon's computer vision tasks (and vice versa). Unfortunately this improvement required a minor API change (eg. instead of `from autogluon import TabularPrediction`, you should now do: `from autogluon.tabular import TabularPredictor`), for all versions newer than v0.0.15. Documentation/tutorials under the old API may still be viewed [for version 0.0.15](https://auto.gluon.ai/0.0.15/index.html) which is the last released version under the old API.
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.1.1
0.2.1
8 changes: 4 additions & 4 deletions core/src/autogluon/core/_setup_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,11 @@

# Only put packages here that would otherwise appear multiple times across different module's setup.py files.
DEPENDENT_PACKAGES = {
'numpy': '==1.19.5',
'numpy': '==1.19.5', # TODO: v0.3 consider upgrading
'pandas': '>=1.0.0,<2.0',
'scikit-learn': '>=0.22.0,<0.25',
'scipy': '==1.5.4',
'gluoncv': '>=0.10.1,<0.12.0',
'scikit-learn': '>=0.23.2,<0.25', # 0.22 crashes during efficient OOB in Tabular
'scipy': '>=1.5.4,<1.7',
'gluoncv': '>=0.10.1.post0,<0.11',
'tqdm': '>=4.38.0',
'Pillow': '<=8.1',
'graphviz': '<0.9.0,>=0.8.1',
Expand Down
40 changes: 38 additions & 2 deletions core/src/autogluon/core/features/feature_metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,8 @@ class FeatureMetadata:
type_group_map_special : Dict[str, List[str]], optional
Dictionary of special types to lists of feature names.
The keys can be anything, but it is generally recommended they be one of:
['binned', 'datetime_as_int', 'datetime_as_object', 'text', 'text_as_category', 'text_special', 'text_ngram', 'stack']
['binned', 'datetime_as_int', 'datetime_as_object', 'text', 'text_as_category', 'text_special', 'text_ngram', 'image_path', 'stack']
For descriptions of each special feature-type, see: `autogluon.core.features.types`
Feature names that appear in the value lists must also be keys in type_map_raw.
Feature names are not required to have special types.
"""
Expand Down Expand Up @@ -56,7 +57,7 @@ def _validate(self):
# Note: This is not optimized for speed. Do not rely on this function during inference.
# TODO: Add valid_names, invalid_names arguments which override all other arguments for the features listed?
def get_features(self, valid_raw_types: list = None, valid_special_types: list = None, invalid_raw_types: list = None, invalid_special_types: list = None,
required_special_types: list = None, required_raw_special_pairs: List[Tuple[str, List[str]]] = None, required_exact=False, required_at_least_one_special=False):
required_special_types: list = None, required_raw_special_pairs: List[Tuple[str, List[str]]] = None, required_exact=False, required_at_least_one_special=False) -> List[str]:
"""
Returns a list of features held within the feature metadata object after being pruned through the available parameters.
Expand Down Expand Up @@ -176,6 +177,41 @@ def keep_features(self, features: list, inplace=False):
features_to_remove = [feature for feature in self.get_features() if feature not in features]
return self.remove_features(features=features_to_remove, inplace=inplace)

def add_special_types(self, type_map_special: Dict[str, List[str]], inplace=False):
"""
Adds special types to features.
Parameters
----------
type_map_special : Dict[str, List[str]]
Dictionary of feature -> list of special types to add.
Features in dictionary must already exist in the FeatureMetadata object.
inplace : bool, default False
If True, updates self inplace and returns self.
If False, updates a copy of self and returns copy.
Returns
-------
:class:`FeatureMetadata` object.
Examples
--------
>>> from autogluon.core.features.feature_metadata import FeatureMetadata
>>> feature_metadata = FeatureMetadata({'FeatureA': 'int', 'FeatureB': 'object'})
>>> feature_metadata = feature_metadata.add_special_types({'FeatureA': ['MySpecialType'], 'FeatureB': ['MySpecialType', 'text']})
"""
if inplace:
metadata = self
else:
metadata = copy.deepcopy(self)
valid_features = set(self.get_features())

for feature, special_types in type_map_special.items():
if feature not in valid_features:
raise ValueError(f'"{feature}" does not exist in this FeatureMetadata object. Only existing features can be assigned special types.')
for special_type in special_types:
metadata.type_group_map_special[special_type].append(feature)
return metadata

@staticmethod
def _remove_features_from_type_group_map(d, features):
for key, features_orig in d.items():
Expand Down
3 changes: 3 additions & 0 deletions core/src/autogluon/core/features/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@
# feature is a generated feature based off of a text feature that is an ngram.
S_TEXT_NGRAM = 'text_ngram'

# feature is an object type that contains a string path to an image that can be utilized in computer vision
S_IMAGE_PATH = 'image_path'

# feature is a generated feature based off of a ML model's prediction probabilities of the label column for the row.
# Any model which takes a stack feature as input is a stack ensemble.
S_STACK = 'stack'
32 changes: 24 additions & 8 deletions core/src/autogluon/core/models/_utils.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,30 @@

from autogluon.core.utils.early_stopping import AdaptiveES, ES_CLASS_MAP


# TODO: Add more strategies
# - Adaptive early stopping: adjust rounds during model training
def get_early_stopping_rounds(num_rows_train, strategy='auto', min_rounds=10, max_rounds=150, min_rows=10000):
def get_early_stopping_rounds(num_rows_train, strategy='auto', min_patience=10, max_patience=150, min_rows=10000):
if isinstance(strategy, (tuple, list)):
strategy = list(strategy)
if isinstance(strategy[0], str):
if strategy[0] in ES_CLASS_MAP:
strategy[0] = ES_CLASS_MAP[strategy[0]]
else:
raise AssertionError(f'unknown early stopping strategy: {strategy}')
return strategy

"""Gets early stopping rounds"""
if strategy == 'auto':
modifier = 1 if num_rows_train <= min_rows else min_rows / num_rows_train
early_stopping_rounds = max(
round(modifier * max_rounds),
min_rounds,
)
strategy = 'simple'

modifier = 1 if num_rows_train <= min_rows else min_rows / num_rows_train
simple_early_stopping_rounds = max(
round(modifier * max_patience),
min_patience,
)
if strategy == 'simple':
return simple_early_stopping_rounds
elif strategy == 'adaptive':
return AdaptiveES, dict(adaptive_offset=min_patience, min_patience=simple_early_stopping_rounds)
else:
raise AssertionError(f'unknown early stopping strategy: {strategy}')
return early_stopping_rounds
15 changes: 10 additions & 5 deletions core/src/autogluon/core/models/abstract/abstract_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -483,7 +483,12 @@ def _convert_proba_to_unified_form(self, y_pred_proba):
For multiclass and softclass classification, keeps y_pred_proba as a 2 dimensional array of prediction probabilities for each class.
For regression, converts y_pred_proba to a 1 dimensional array of predictions.
"""
if self.problem_type == BINARY:
if self.problem_type == REGRESSION:
if len(y_pred_proba.shape) == 1:
return y_pred_proba
else:
return y_pred_proba[:, 1]
elif self.problem_type == BINARY:
if len(y_pred_proba.shape) == 1:
return y_pred_proba
elif y_pred_proba.shape[1] > 1:
Expand All @@ -492,8 +497,8 @@ def _convert_proba_to_unified_form(self, y_pred_proba):
return y_pred_proba
elif y_pred_proba.shape[1] > 2: # Multiclass, Softclass
return y_pred_proba
else: # Regression
return y_pred_proba[:, 1]
else: # Unknown problem type
raise AssertionError(f'Unknown y_pred_proba format for `problem_type="{self.problem_type}"`.')

def score(self, X, y, metric=None, sample_weight=None, **kwargs):
if metric is None:
Expand Down Expand Up @@ -951,7 +956,7 @@ def _get_model_params(self) -> dict:
else:
return self._get_params()

# TODO: Add documentation for valid args for each model. Currently only `ag.es`
# TODO: Add documentation for valid args for each model. Currently only `ag.early_stop`
def _ag_params(self) -> set:
"""
Set of params that are not passed to self.model, but are used by the wrapper.
Expand All @@ -963,7 +968,7 @@ def _ag_params(self) -> set:
Possible params:
ag.es : int, str, or tuple
ag.early_stop : int, str, or tuple
generic name for early stopping logic. Typically can be an int or a str preset/strategy.
Also possible to pass tuple of (class, kwargs) to construct a custom early stopping object.
Refer to `autogluon.core.utils.early_stopping` for examples.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -319,7 +319,7 @@ def _fit_single(self, X, y, model_base, use_child_oof, time_limit, **kwargs):
logger.log(15, '\t`use_child_oof` was specified for this model. It will function similarly to a bagged model, but will only fit one child model.')
time_start_predict = time.time()
if model_base._get_tags().get('valid_oof', False):
self._oof_pred_proba = model_base.get_oof_pred_proba(X=X)
self._oof_pred_proba = model_base.get_oof_pred_proba(X=X, y=y)
else:
logger.warning('\tWARNING: `use_child_oof` was specified but child model does not have a dedicated `get_oof_pred_proba` method. This model may have heavily overfit validation scores.')
self._oof_pred_proba = model_base.predict_proba(X=X)
Expand Down
Loading

0 comments on commit e8998ed

Please sign in to comment.