Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEV/CLN: Drop pandera dependency? #269

Open
NickleDave opened this issue Jul 3, 2024 · 1 comment
Open

DEV/CLN: Drop pandera dependency? #269

NickleDave opened this issue Jul 3, 2024 · 1 comment

Comments

@NickleDave
Copy link
Collaborator

Our using pandera to validata dataframes really adds to the number of things that get installed when you install crowsetta, largely because pandera depends on pydantic

This makes it it more likely that some change upstream will impact people that just want to use crowsetta so their own library can parse annotations, see for example kitzeslab/opensoundscape#1017 and vocalpy/vocalpy#173

I recall looking at "pure Python" libraries for validating dataframes before, I wonder if there's one we could vendor to avoid. Like typedframe maybe

@NickleDave
Copy link
Collaborator Author

NickleDave commented Jul 3, 2024

Thinking out loud: I guess most tools are pretty consistent about how they save so it would have to be someone working with the exported annotations that corrupts the files.

It would be good to know if anyone else had cases where pandera caught some problem with annotation files, and that was helpful. I think most of the validation errors I've gotten have been because of a mistake I made (e.g. building a simple-seq annotation file "manually" with pandas from some one-off annotation format, then trying to load it with crowsetta)

Maybe we just need clearer error messages instead of strict validation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant