Take Pandera for a Spin #859

alexrichey · 2024-05-21T19:04:38Z

TLDR

It looks like Pandera is just a better version of the thing that I'd slapped together. It would take another few two or three hours of cleanup to swap it in completely.

Examples

MIH fails standardized values checks:

Facilities fails to have all required columns:

How it Works

In this implementation, we read in our metadata file, and translate all the columns for a given dataset into a Pandera Check, using our custom validators (e.g. for wkb types). Then we feed it a dataframe to validate. Simple!

Also, in this implementation, we read in all source data as a str type into the dataframe, then use Checks to validate when needed. Pandera Data Types seem like potentially the ideal approach for this.

Implement Pandera for package validation

8482717

alexrichey requested review from damonmcc, fvankrieken and sf-dcp May 21, 2024 19:12

alexrichey mentioned this pull request Jun 24, 2024

Opendata: Use Pandera in validations #943

Open

damonmcc removed request for damonmcc, fvankrieken and sf-dcp October 24, 2024 13:58

sf-dcp mentioned this pull request Nov 8, 2024

Data Validation Framework: Source + Product data #1241

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Take Pandera for a Spin #859

Take Pandera for a Spin #859

alexrichey commented May 21, 2024 •

edited

Loading

Take Pandera for a Spin #859

Are you sure you want to change the base?

Take Pandera for a Spin #859

Conversation

alexrichey commented May 21, 2024 • edited Loading

TLDR

Examples

How it Works

alexrichey commented May 21, 2024 •

edited

Loading