This is a template cookiecutter project for bootstrapping your work on Kaggle competitions. It contains :
- a directory structure for sorting your notebooks, data, models, figures, tasks and source code to reuse in notebooks
- a conda environment file with the basic python libraries and some extras :
- numpy / pandas / scikit-learn / seaborn / statsmodels / plotly / jupyterlab classic Data Science stack
- keras and lightgbm for prediction
- pyspark and h2o for distributed processing
- pandas-profiling for generating HTML reports on pandas dataframes
- missingno for missing data analysis
- invoke as a replacement to
Makefile
for managing project tasks - nbdime for diffing and merging notebooks
- kaggle-api a CLI for interacting with Kaggle API
- path.py for browsing files in Python
- Anaconda >=5.x
- Cookiecutter >= 1.4.0: This can be installed with pip by or conda depending on how you manage your Python packages:
$ pip install cookiecutter
or
$ conda config --add channels conda-forge
$ conda install cookiecutter
In a folder where you want your project generated :
cookiecutter https://github.com/andfanilo/cookiecutter-kaggle.git
You can also clone the project in <path/to/template>
,
and from the folder where you want to generate your project, launch cookiecutter <path/to/template>
It will ask for the following values :
full_name
email
project_name
project_short_description
version
Complete the values for your project and voilà ! Then follow the README
inside your new project for further installation.
All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome.
This project is heavily influenced by drivendata's Data Science cookiecutter.
Other links that helped shape this cookiecutter :