This is a sample project to demonstrate the use of Great Expectations
to validate and document data quality.
This example uses a sample transaction data set converting to Pandas
DataFrame and then validate. It will automatically generate data documentation in HTML format and store the scanned result to postgres database.
The official documentation for Great Expectations can be found at Official website and the glossary of terms can be found at Glossary.
- A Postgres database to save the scanned result.
To install the project, follow the steps below:
- Clone the repository
- Create a virtual environment using
python -m venv venv
- Activate the virtual environment using
source venv/bin/activate
orvenv\Scripts\activate
on Windows - Install the required packages using
pip install -r requirements.txt
- Copy
.env-example
to.env
and update the values as per your environment.
To run the project, follow the steps below:
- Initialize Great Expectations using
python init.py
- Run the validation using
python main.py
- To recreate once the init.py file is modified, run:
python init.py --mode recreate
The project consists of two files:
init.py
: This file initializes Great Expectations and creates the data context along with various configurations and rules.main.py
: This file scans the rules.data
: This folder contains the sample data to be validated.
This repo uses pre-commit
hooks to check type and linting before committing the code.
Install pre-commit
by running pip install pre-commit
and then run pre-commit install
to install the hooks.
Perform below commands to:
- Type Checking
mypy . --pdb
- Linting
ruff check .
To run the tests, run pytest
in the terminal.
The test contains the following:
- Integration test on the context.