Exploratory Data Analysis using Pandas, Seaborn and Statsmodels
This course is a CrashProgram (short course) introducing exploratory data analysis using credit risk data as the use case
- We learn the concept and techniques of Exploratory Data Analysis
- Touch upon the issue of bias and how to mitigate it
- Learn about more advanced formats such as HDF
- Learn basic exploratory data analysis using pandas
- Create standard graphs using seaborn
- Calculate Contingency tables, WoE and Information Value using pandas, scipy and statsmodels
The course is live at the Open Risk Academy, this repository hosts the python scripts used in the course. The scripts can be used standalone but documentation is minimal
- Step 1: Importing data using pandas
- Step 2: Blindfoldind data and saving in HDF format to preserve metadata
- Step 3: Univariate statistics for numerical and categorical variables
- Step 4: Histograms and Barplots using Seaborn
- Step 5: Identifying outliers visually and numerically
- Step 6: Scatterplots, correlations and correlations heatmaps
- Step 7: Contingency tables and mosaic plots
- Step 8: Assessing association using Chi-Square tests and Information Value
If you get stuck on any issue with the course or the Academy:
- If the issue is related to the course topics / material, check in the first instance the Course Forum (Chat)
- Join the course discussion in the Open Risk Commons
- If the issue is related the operation of the Open Risk Academy check first the Academy FAQ
- If the issue persists contact us at info at openrisk dot eu