Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is Distinct From Incorrectly Handles Masked Nulls #7332

Closed
tustvold opened this issue Aug 18, 2023 · 6 comments · Fixed by #7282
Closed

Is Distinct From Incorrectly Handles Masked Nulls #7332

tustvold opened this issue Aug 18, 2023 · 6 comments · Fixed by #7282
Assignees
Labels
bug Something isn't working

Comments

@tustvold
Copy link
Contributor

tustvold commented Aug 18, 2023

Describe the bug

is_distinct_from is computed as

left_isnull != right_isnull || left_value != right_value

This is incorrect as regardless of if left_value != right_value if both left_isnull & right_isnull the result should be false.

IsNotDistinct has the same issue

This appears to have been introduced in #4560

To Reproduce

 #[test]
    fn is_distinct_from_nulls() -> Result<()> {
        // [0, 0, NULL, 0, 0, 0]
        let left_int_array = Int32Array::new(
            vec![0, 0, 1, 3, 0, 0].into(),
            Some(NullBuffer::from(vec![true, true, false, true, true, true])),
        );
        // [0, NULL, NULL, NULL, 0, NULL]
        let right_int_array = Int32Array::new(
            vec![0; 6].into(),
            Some(NullBuffer::from(vec![
                true, false, false, false, true, false,
            ])),
        );

        assert_eq!(
            BooleanArray::from(vec![
                Some(false),
                Some(true),
                Some(false),
                Some(true),
                Some(false),
                Some(true),
            ]),
            is_distinct_from(&left_int_array, &right_int_array)?
        );
    }

Expected behavior

No response

Additional context

apache/arrow-rs#960 tracks implementing these upstream

@tustvold tustvold added the bug Something isn't working label Aug 18, 2023
@tustvold tustvold self-assigned this Aug 18, 2023
@comphead
Copy link
Contributor

@tustvold I'll take it as it was my PR which brought up the bug

@tustvold
Copy link
Contributor Author

There is already a fix upstream - apache/arrow-rs#4716

@comphead
Copy link
Contributor

I see, so that means current is_distinct_* are supposed to be removed from DF?

@tustvold
Copy link
Contributor Author

tustvold commented Aug 21, 2023

Correct, the upstream implementation should also be significantly faster

@comphead
Copy link
Contributor

I would like to try to migrate DF to new implementation once arrow-rs released

@tustvold
Copy link
Contributor Author

#7282 already has this completed - I like to verify things before cutting the release just in case 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants