Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
TLDR
It looks like Pandera is just a better version of the thing that I'd slapped together. It would take another few two or three hours of cleanup to swap it in completely.
Examples
MIH fails standardized values checks:
Facilities fails to have all required columns:
How it Works
In this implementation, we read in our metadata file, and translate all the columns for a given dataset into a Pandera Check, using our custom validators (e.g. for
wkb
types). Then we feed it a dataframe to validate. Simple!Also, in this implementation, we read in all source data as a
str
type into the dataframe, then use Checks to validate when needed. Pandera Data Types seem like potentially the ideal approach for this.