Introduction

This is a sample project to demonstrate the use of Great Expectations to validate and document data quality. This example uses a sample transaction data set converting to Pandas DataFrame and then validate. It will automatically generate data documentation in HTML format and store the scanned result to postgres database.

The official documentation for Great Expectations can be found at Official website and the glossary of terms can be found at Glossary.

Pre-requisites

A Postgres database to save the scanned result.

Installation

To install the project, follow the steps below:

Clone the repository
Create a virtual environment using python -m venv venv
Activate the virtual environment using source venv/bin/activate or venv\Scripts\activate on Windows
Install the required packages using pip install -r requirements.txt
Copy .env-example to .env and update the values as per your environment.

Running the project

To run the project, follow the steps below:

Initialize Great Expectations using python init.py
Run the validation using python main.py
To recreate once the init.py file is modified, run: python init.py --mode recreate

Understanding the project

The project consists of two files:

init.py: This file initializes Great Expectations and creates the data context along with various configurations and rules.
main.py: This file scans the rules.
data: This folder contains the sample data to be validated.

Type Checking and Linting

This repo uses pre-commit hooks to check type and linting before committing the code.

Install pre-commit by running pip install pre-commit and then run pre-commit install to install the hooks.

Perform below commands to:

Type Checking mypy . --pdb
Linting ruff check .

Testing

To run the tests, run pytest in the terminal. The test contains the following:

Integration test on the context.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Introduction

Pre-requisites

Installation

Running the project

Understanding the project

Type Checking and Linting

Testing

Files

README.md

Latest commit

History

README.md

File metadata and controls

Introduction

Pre-requisites

Installation

Running the project

Understanding the project

Type Checking and Linting

Testing