-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Column hasNulls
property is actually isNullable
#428
Comments
Hi! It's indeed seems confusing, but at the same time can be justified. |
In real life, I want to save nullable (but without null values) column as not nullable in Arrow and get unexpected NullValueError |
I agree about your change for |
I still see it as a i.e. it's expected that whenever someone creates data column, they should supply exact type (as if |
yes, but when you're generating type schemas from, say, OpenApi, this is not always the case. The column is nullable, since it can contain |
DataColumn is created from known data and is immutable. There is no reason for it (DataColumn instance itself!) to have nullable type when created if there are no actual null values. You can still cast dataframe that contains it to schema with nullable property. From type system perspective it's fine. |
@koperagen thank you for the opinion. Currently I have made workaround in my project by rewrapping all columns with |
Initially i thought that you call |
Currently it is created by FYI, current project is BPLEX (Russian only: https://bia-tech.ru/solutions/platforma-dlya-optimizacii-2/) that actually is middleware for transferring data between business GUI or DWH and mathematics models as well as from one model output to another input. |
And I have common entrypoint for internal data validation and saving (actually, 90% of it's code is Dataframe ArrowWriter itself) |
So, do you think it can (or should) be fixed on |
It should not, in my mind. Kotlin Dataframe is pretty good library, it's default behavior looks good for data science. But in can also be used (and is actually used, at least in BPLEX :) ) for data engineering. And in this case static schemas looks better. Arrow is also data engineering tool and it should keep it's approach together with Kotlin Dataframe, IMHO. Once again, this issue is not about BPLEX (this is my problem) and not about Arrow (this is just one case) but about method name and it's actual behavior mismatch, without regard for data origin and sink. Please don't kill Dataframe data engineering opportunity. |
It's very cool and inspiring that dataframe is used for such tasks successfully :) |
Hello
If I get some dataframe with some column and I want to check if it contains null values, I use
hasNulls
property.expected: false
actual: true
From another side, column can be marked as not nullable but contain null values (this behavior might be a critical issue itself)
expected: true or IllegalArgumentException
actual: false
The text was updated successfully, but these errors were encountered: