-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Empty strings not interpreted as null when reading CSV files #7797
Comments
I agree this should likely be fixed. Thank you for filing this @66OJ66 |
I find arrow-csv also have the above problem and seem like arrow-csv never set string to null, see below link DataType::Utf8 => Ok(Arc::new(
rows.iter()
.map(|row| Some(row.get(i)))
.collect::<StringArray>(), a example show this problem
fn test_init_nulls_with_inference() {
let format = Format::default().with_header(true).with_delimiter(b',');
let mut file = File::open("test/data/init_null_test.csv").unwrap();
let (schema, _) = format.infer_schema(&mut file, None).unwrap();
file.rewind().unwrap();
let mut csv = ReaderBuilder::new(Arc::new(schema))
.with_format(format)
.build(file)
.unwrap();
let batch = csv.next().unwrap().unwrap();
println!("{:?}",batch);
} the print result is
and I also find the infer scheme of datafusion is different from arrow-csv
|
I tested the above code again using v33.0, and it's returning the expected output now Thanks @haohuaijin for fixing the underlying issue! |
Describe the bug
Initial discussion here: #7761
In short, it seems like empty strings in CSV files aren't being interpreted as null
To Reproduce
Create a simple
.csv
file this this:Run the following code:
Expected behavior
I was expecting the output to look like this:
But the full dataset is returned instead:
Additional context
I've tested this on main and v31.0.0, and the result is the same
The text was updated successfully, but these errors were encountered: