GitHub - usaa/vogel: A ML project flow tool, with the primary objective of simplifying actuarial ML processes.

Vogel is a ML project flow tool, with the primary objective of simplifying actuarial ML processes. It tracks and manages model development from data preparation to results analysis and visualization.

Install

Clone the Vogel repo
In the Vogel repo, pip install
- pip install -e .

Features

Visualization
- One-way plots (observed vs actual values)
- Multi-variate plots (individual feature analysis)
- Pareto charts
- Model stats comparison chart
Custom Variable Transformations
- Maintains metadata
- Multiple binning mechanisms
- Model Comparison Statistics
- Available statistics vary by model type
Interfaces with multiple modeling platforms

Example

Pandas in Pandas out pipelines. All metadata is carried through to the transformed data.

import numpy as np
import pandas as pd
import statsmodels.api as sm
from IPython.display import display, HTML

import vogel.preprocessing as v_prep
import vogel.utils as v_utils
import vogel.utils.stats as v_stats
import vogel.train as v_train

# Test Data
df = pd.DataFrame({
      'a': [200., 40., 60., 100., 10., 10., 10.]
    , 'b': [100., 20., 30., np.nan, 5., 5., 5.]
    , 'c': ['texas', 'texas', 'michigan', 'colorado', 'michigan', 'michigan', 'michigan']
    , 'd': ['texas', 'texas', 'michigan', np.nan, 'michigan', 'michigan', 'michigan']
    , 'e': [1., 1., 1., 1., 1., 1., 1.]
    , 'f': [0., 10., 20, 1., 2., 20., 1000.]
})

display(df)

data_dict = {
    'grp_numeric': ['a', 'b']
  , 'grp_cat': ['c', 'd']
  , 'grp_other': ['a', 'c']
}

pipeline = v_utils.make_pipeline(
    v_prep.FeatureUnion([
        ('numeric', v_utils.make_pipeline(
            v_prep.ColumnExtractor(['grp_numeric', 'd'], data_dict, want_numeric = True),
            v_prep.NullEncoder(),
            v_prep.Imputer(),
            v_prep.Binning(bin_type='qcut', bins=3, bin_id='mean', drop='replace', feature_filter=['a'])
        )),
        ('cats', v_utils.make_pipeline(
            v_prep.ColumnExtractor(['grp_numeric', 'd'], data_dict, want_numeric = False),
            v_prep.LabelEncoder()
        ))
    ])
)

train_X = pipeline.fit_transform(df)

display(train_X)

We can now run a few models on this transformed data. We will ignore the validation and hyperparameter tuning options for now.

train_y = df['f'] 

run_list = [
    {
        'model_type': v_train.V_SM_GLM,
        'model_name': 'simple' + '_SM_glm_tweedie',
        'model_params': {
            'family': sm.families.Gaussian()
        },
        'fit_params': {
        }
    }, 
    {
        'model_type': v_train.V_xgb,
        'model_name': 'simple_1' + '_xgb',
        'model_params': {
            'objective': 'reg:linear',
            'n_estimators': 1,
            'n_jobs': -1
        },
        'fit_params': {
            'eval_set': [(train_X, train_y)],
            'verbose': False
        }
    }
    ,
    {
        'model_type': v_train.V_xgb,
        'model_name': 'simple_80' + '_xgb',
        'model_params': {
            'objective': 'reg:linear',
            'n_estimators': 80,
            'n_jobs': -1
        },
        'fit_params': {
            'eval_set': [(train_X, train_y)],
            'verbose': False
        }
    }
]

train_data_dict = {
    'X': train_X, 
    'y': train_y
}

model_runner = v_train.ModelRunner('reg', run_list, train_data_dict,
                                   None, pipeline)

eval_set = model_runner.evaluate_models()
display(eval_set)

With the stats package we can visualize how our models fit. We will choose the GLM, as it is the simplest best fitting model.

v_stats.plot_compare_stats(eval_set, valid_only=False)

We can see how individual features fit in out model.

mdl_glm = model_runner.models[0]
print('b')
v_stats.plot_one_way_fit(train_X['b'], mdl_glm.predict(train_X), target=train_y, target_error=True, pad_bar_chart=True)
mdl_glm.plot_glm_one_way_fit(plot_error=False)

More examples

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
docs		docs
examples		examples
vogel		vogel
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Install

Features

Example

About

Releases

Packages

Contributors 5

Languages

License

usaa/vogel

Folders and files

Latest commit

History

Repository files navigation

Install

Features

Example

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages