A Scalable Data Cleaning Library for PySpark.
-
Updated
Apr 4, 2019 - Python
A Scalable Data Cleaning Library for PySpark.
Table Enforcer is my attempt to apply a sort of "test driven development" workflow to data cleaning and validation. A python package to facilitate the iterative process of developing and using schema-like representations of DataFrames in pandas for recoding and validating instances of these data.
final project of the Digital Academy
Programs I write for my Data Mining course
Add a description, image, and links to the datacleansing topic page so that developers can more easily learn about it.
To associate your repository with the datacleansing topic, visit your repo's landing page and select "manage topics."