Skip to content

A modern, enterprise-ready business intelligence web application

License

Notifications You must be signed in to change notification settings

katonic-dev/explainit

Repository files navigation

Explainit


License GitHub release (latest SemVer) PyPI version PyPI test Docs Latest

What is Explainit?

Explainit is a modern, enterprise-ready business intelligence web application that re-uses existing frameworks to manage and serve dashboard features to machine learning project lifecycle.

Features

Explainit allows ML platform teams to:

  • Analyze Drift in the existing data stack (Features & Targets).
  • Prepare very short summary of productionized data.
  • Perform Quality Checks on the data to provide the feature overview.
  • Analyze in-depth relationship between features & target.

Who is Explainit for?

Explainit helps ML platform teams with DevOps experience monitor productionized batch data. Explainit can also help these teams build towards a explainability/monitoring platform that improves collaboration between engineers and data scientists.

Explainit is likely not the right tool if you:

  • Are in an organization that’s just getting started with ML and is not yet sure what the business impact of ML is.
  • Rely primarily on unstructured data.

Quick Concepts on Drift

What is Model Drift?

Model Drift (also known as model decay) refers to the degradation of a model’s prediction power due to changes in the environment or changes in feature distribution, and thus the relationships between variables.

Types of Model Drift

There are three main types of model drift:

  • Concept drift
  • Data drift
  • Upstream data changes

Concept drift is a type of model drift where the relationship between the input and target changes over time. It usually occurs when real-world environments change in contrast to the training data the model learned from. For example, the behaviour of customers can change over time, lowering the accuracy of a model trained on historic customer datasets.

Data drift is a type of model drift where the properties of the independent variable(s) change(s). Examples of data drift include changes in the data due to seasonality, changes in consumer preferences, the addition of new products, etc…

Upstream data changes refer to operational data changes in the data pipeline. An example of this is when a feature is no longer being generated, resulting in missing values. Another example is a change in measurement (eg. miles to kilometers).

Installation guide

Install the Explainit Package:

pip install explainit

Run the App

In order to generate the dashboards inside the application, you need to run the following commands.

from explainit.app import build

After importing the methods, we need some data that should be passed to the application in order to generate the dashboards. We'll use the Default Loan dataset.

import pandas as pd

ref_data = pd.read_csv("https://raw.githubusercontent.com/katonic-dev/explainit/master/examples/data/reference_data.csv", index_col=None)
prod_data = pd.read_csv("https://raw.githubusercontent.com/katonic-dev/explainit/master/examples/data/production_data.csv", index_col=None)

Once you have the both reference and production datasets, all you need to do is pass those datasets into the method that we imported along with the target column name and target column type (type should be cat for categorical column and num for numerical columns).

build(
  reference_data=ref_data,
  production_data=prod_data,
  target_col_name="bad_loan",
  target_col_type="cat",
  host="127.0.0.1",
  port=8050
)

If you want to run your application in a separate server rather than localhost, you need to mention the host and port addresses.

App Snapshots

Below is a snapshot of the landing page of Explainit Dashboard.


Contributor Guide

Interested in contributing? Check out our CONTRIBUTING.md to find resources around contributing along with a detailed guide on how to set up a development environment.

QnA

Q. What exactly the scope of the app is?

A. By this app users can calculate Dataset Drift, Target Drift and Data Quality metrics to understand the Production / Real-World Data along with Training / Reference Data better to come to a decision.

Q. What does the input data look like?

A. Input Data is nothing but your reference/training and production/inference data. The reference data will be used for the distribution comparision for the production data. These input data should be passed as pandas dataframes.

Q. What outputs does the app produce?

A. App shows / produces the Statistical Information about the complete data (features + target) for drift analysis, Distribution Plots for each of the features to understand the data better, Contribution of each features on the target along with Correlations metrics.

Q. What decisions can the user make by using the app?

A. With Drift Information from the app user can make some decisions:

  • Look for the quality data for the usecase.
  • Make changes or train new models for production.
  • Update the domain specific concepts to understand the real-world better for new models.