Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support field.missingValues #174

Open
3 tasks
peterdesmet opened this issue Feb 19, 2024 · 1 comment
Open
3 tasks

Support field.missingValues #174

peterdesmet opened this issue Feb 19, 2024 · 1 comment
Labels
complexity:high Likely complex to implement datapackage:v2 enhancement New feature or request function:read_resource Function read_resource()

Comments

@peterdesmet
Copy link
Member

peterdesmet commented Feb 19, 2024

CHANGELOG: https://datapackage.org/overview/changelog/#fieldmissingvalues-new

Values defined in field.missingValues overwrite anything that is defined in schema.missingValues (frictionlessdata/datapackage#24). This is not straightforward to support in frictionless-r, since readr doesn't support it. But @khusmann has helpfully provided an implementation using wrappers for readr, see this comment and this implementation.

Overall, I think this feature is best implemented together with categorical types.

@peterdesmet peterdesmet added enhancement New feature or request function:read_resource Function read_resource() datapackage:v2 labels Feb 19, 2024
@khusmann
Copy link
Contributor

A couple notes on this --

I now have a more mature readr wrapper hosted here: https://kylehusmann.com/interlacer/

My implementation uses type_convert(), which is beginning to stray from vroom in its behavior, which is now the default importer for readr: tidyverse/readr#1526 . So that's unfortunate...

The other downside is that it greedily loads all the data, instead of taking advantage of vroom's lazy load capabilities. There might be ways around this, but at the end of the day I think field-level missingness is something we want to see implemented in vroom proper. I've put a feature request in vroom to this effect: tidyverse/vroom#532

It pains me to wait on vroom for this feature because I doubt it'll be high on their priority list, but I think that might actually be our limiting factor for implementing this in the most stable / predictable way... :'(

@peterdesmet peterdesmet changed the title Support field-level missingValues Support field.missingValues Jul 3, 2024
@peterdesmet peterdesmet added the complexity:high Likely complex to implement label Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
complexity:high Likely complex to implement datapackage:v2 enhancement New feature or request function:read_resource Function read_resource()
Projects
None yet
Development

No branches or pull requests

2 participants