-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Origin/fix missing features error #5318
Origin/fix missing features error #5318
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome thanks !
Could you also add a test in tests/packaged_modules/test_json.py
?
You can copy test_json_generate_tables
and rename it something like test_json_generate_tables_with_features
, and pass explicit features to Json(...)
Thanks :) I just updated the test to make sure it works even when there's a column missing, and did a minor change to json.py to add the missing columns for the other kinds of JSON files as well (I moved the code to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this ! Also cc @mariosasko does it sound good to you as well ?
Thanks Unso! If @lhoestq is happy then I'm also happy :D |
When I noticed the ping, this PR had already been merged... Luckily, PyArrow's |
This fixes the problem of when the dataset_load function reads a function with "features" provided but some read batches don't have columns that later show up. For instance, the provided "features" requires columns A,B,C but only columns B,C show. This fixes this by adding the column A with nulls.