Academy-Course-DAT31048

Exploratory Data Analysis using Pandas, Seaborn and Statsmodels

This course is a CrashProgram (short course) introducing exploratory data analysis using credit risk data as the use case

Course objectives

We learn the concept and techniques of Exploratory Data Analysis
Touch upon the issue of bias and how to mitigate it
Learn about more advanced formats such as HDF
Learn basic exploratory data analysis using pandas
Create standard graphs using seaborn
Calculate Contingency tables, WoE and Information Value using pandas, scipy and statsmodels

The course is live at the Open Risk Academy, this repository hosts the python scripts used in the course. The scripts can be used standalone but documentation is minimal

Brief Description

Step 1: Importing data using pandas
Step 2: Blindfoldind data and saving in HDF format to preserve metadata
Step 3: Univariate statistics for numerical and categorical variables
Step 4: Histograms and Barplots using Seaborn
Step 5: Identifying outliers visually and numerically
Step 6: Scatterplots, correlations and correlations heatmaps
Step 7: Contingency tables and mosaic plots
Step 8: Assessing association using Chi-Square tests and Information Value

Where To Get Help:

If you get stuck on any issue with the course or the Academy:

If the issue is related to the course topics / material, check in the first instance the Course Forum (Chat)
Join the course discussion in the Open Risk Commons
If the issue is related the operation of the Open Risk Academy check first the Academy FAQ
If the issue persists contact us at info at openrisk dot eu

Academy Course Catalog

Course List and Description

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.idea		.idea
LICENSE		LICENSE
README.md		README.md
Step_1_import_data.py		Step_1_import_data.py
Step_2_blindfolding_data.py		Step_2_blindfolding_data.py
Step_3_univariate_stats.py		Step_3_univariate_stats.py
Step_4_univariate_graphs.py		Step_4_univariate_graphs.py
Step_5a_outlier_graphs.py		Step_5a_outlier_graphs.py
Step_5b_outlier_report.py		Step_5b_outlier_report.py
Step_6a_scatterplots.py		Step_6a_scatterplots.py
Step_6b_correlation_heatmap.py		Step_6b_correlation_heatmap.py
Step_7_association_stats.py		Step_7_association_stats.py
Step_8_association_survey.py		Step_8_association_survey.py
categorical_variable_barplots.png		categorical_variable_barplots.png
eda_plot.png		eda_plot.png
failed_scatter_plot.png		failed_scatter_plot.png
fosdem_nouns.xlsx		fosdem_nouns.xlsx
german_credit.csv		german_credit.csv
mosaic_association_plot.png		mosaic_association_plot.png
numerical_variable_histograms.png		numerical_variable_histograms.png
outlier_plot_0.png		outlier_plot_0.png
outlier_plot_1.png		outlier_plot_1.png
outlier_plot_2.png		outlier_plot_2.png
requirements.txt		requirements.txt
standard_scatter_plot.png		standard_scatter_plot.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Academy-Course-DAT31048

Course objectives

Brief Description

Where To Get Help:

Academy Course Catalog

About

Releases

Packages

Contributors 2

Languages

License

Open-Risk-Academy/Academy-Course-DAT31048

Folders and files

Latest commit

History

Repository files navigation

Academy-Course-DAT31048

Course objectives

Brief Description

Where To Get Help:

Academy Course Catalog

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages