setup.py

from setuptools import setup, find_packages

long_description = '''
Give an input CSV file and a target field you want to predict to automl-gs, and get a trained high-performing machine learning or deep learning model plus native code pipelines allowing you to integrate that model into any prediction workflow. No black box: you can see *exactly* how the data is processed, how the model is constructed, and you can make tweaks as necessary.

automl-gs is an AutoML tool which, unlike Microsoft's [NNI](https://github.com/Microsoft/nni), Uber's [Ludwig](https://github.com/uber/ludwig), and [TPOT](https://github.com/EpistasisLab/tpot), offers a *zero code/model definition interface* to getting an optimized model and data transformation pipeline in multiple popular ML/DL frameworks, with minimal Python dependencies (pandas + scikit-learn + your framework of choice). automl-gs is designed for citizen data scientists and engineers without a deep statistical background under the philosophy that you don't need to know any modern data preprocessing and machine learning engineering techniques to create a powerful prediction workflow.

Nowadays, the cost of computing many different models and hyperparameters is much lower than the oppertunity cost of an data scientist's time. automl-gs is a Python 3 module designed to abstract away the common approaches to transforming tabular data, architecting machine learning/deep learning models, and performing random hyperparameter searches to identify the best-performing model. This allows data scientists and researchers to better utilize their time on model performance optimization.

* Generates native Python code; no platform lock-in, and no need to use automl-gs after the model script is created.
* Train model configurations super-fast *for free* using a **TPU** in Google Colaboratory.
* Handles messy datasets that normally require manual intervention, such as datetime/categorical encoding and spaced/parathesized column names.
* Each part of the generated model pipeline is its own function w/ docstrings, making it much easier to integrate into production workflows.
* Extremely detailed metrics reporting for every trial stored in a tidy CSV, allowing you to identify and visualize model strengths and weaknesses.
* Correct serialization of data pipeline encoders on disk (i.e. no pickled Python objects!)
* Retrain the generated model on new data without making any code/pipeline changes.
* Quit the hyperparameter search at any time, as the results are saved after each trial.

The models generated by automl-gs are intended to give a very strong *baseline* for solving a given problem; they're not the end-all-be-all that often accompanies the AutoML hype, but the resulting code is easily tweakable to improve from the baseline.
'''


setup(
    name='automl_gs',
    packages=['automl_gs'],  # this must be the same as the name above
    version='0.2.1',
    description='Provide an input CSV and a target field to predict, ' \
    'generate a model + code to run it.',
    long_description=long_description,
    long_description_content_type='text/markdown',
    author='Max Woolf',
    author_email='max@minimaxir.com',
    url='https://github.com/minimaxir/automl-gs',
    keywords=['deep learning', 'tensorflow', 'keras', 'automl', 'xgboost'],
    classifiers=[],
    license='MIT',
    entry_points = {
        'console_scripts': ['automl_gs=automl_gs.automl_gs:cmd'],
    },
    python_requires='>=3.5',
    include_package_data=True,
    install_requires=['pandas', 'scikit-learn', 'autopep8', 'tqdm', 'jinja2>=2.8', 'pyyaml']
)