You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jul 3, 2023. It is now read-only.
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
When we extract columns, it would be very handy to be able to run checks against those columns. pandera is a great, lightweight tool for validating dtypes, nullability, uniqueness, and any arbitrary Check callable.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Ideally this would be a decorator that would work similar to extra_columns, would ingest a DataFrame and return the same dataframe, and expand the nodes to have a dataframe validation node. This could be specific to pandera, or could be made more general, so something like
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
certainly you can have a splitting node where you validate data yourself, but I think this is a common enough pattern (or it really should be common enough and made a first class citizen of any dataframe manipulation) that it would benefit from being easy to plug in directly to a node
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered:
We should see how we can connect this with #115 (#41).
I agree that at the beginning of any Hamilton DAG, data needs to be loaded, and it'll usually be a dataframe, that will then be split into columns; I see merit in enabling a data quality check before using extract_columns.
@chrisaddy I'm closing this since we integrated pandera support which allows dataframe validation -- or alternatively, people don't use extract_columns and instead manually write out functions that pull that column from a dataframe and then use the base validators, or pandera validators to qualify things.
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
When we extract columns, it would be very handy to be able to run checks against those columns. pandera is a great, lightweight tool for validating dtypes, nullability, uniqueness, and any arbitrary
Check
callable.Describe the solution you'd like
A clear and concise description of what you want to happen.
Ideally this would be a decorator that would work similar to
extra_columns
, would ingest aDataFrame
and return the same dataframe, and expand the nodes to have a dataframe validation node. This could be specific to pandera, or could be made more general, so something likeor more generically
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
certainly you can have a splitting node where you validate data yourself, but I think this is a common enough pattern (or it really should be common enough and made a first class citizen of any dataframe manipulation) that it would benefit from being easy to plug in directly to a node
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: