Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add time dictionary coercions #6208

Merged
merged 3 commits into from
Aug 8, 2024
Merged

Conversation

adriangb
Copy link
Contributor

@adriangb adriangb commented Aug 7, 2024

No description provided.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Aug 7, 2024
@tustvold
Copy link
Contributor

tustvold commented Aug 7, 2024

I wonder if we want to reduce the codegen by coercing to the corresponding primitive array type and then coercing this to a dictionary

@adriangb
Copy link
Contributor Author

adriangb commented Aug 8, 2024

Sorry I'm not sure I follow. Are you worried about the verbosity of handling all of these cases?

@tustvold
Copy link
Contributor

tustvold commented Aug 8, 2024

Rust generics are monomorphised so if you call a generic function with lots of different parameters it leads to lots of additional codegen hurting compile times and bloating your binary. I'm suggesting rather than instantiating the cast code for every value type, we instead use the existing code to coerce to integers and then to dictionaries

@adriangb
Copy link
Contributor Author

adriangb commented Aug 8, 2024

Oh right that sort of code gen. Since this is all internal, could we go with the pattern in place now and punt on a larger refactor?

@tustvold
Copy link
Contributor

tustvold commented Aug 8, 2024

punt on a larger refactor

I don't think you need a major refactor, just something along these lines (maybe extracted into a function to reduce the duplication)

let int = cast_with_options(array, &Int64, cast_options)?;
let dict = cast_with_options(
    int.as_ref(),
    &Dictionary(Box::new(K::DATA_TYPE), Box::new(Int64)),
    cast_options,
)?;
cast_with_options(
    dict.as_ref(),
    &Dictionary(
        Box::new(K::DATA_TYPE),
        Box::new(Timestamp(TimeUnit::Nanosecond, t)),
    ),
    cast_options,
)

The performance should be completely dominated by the dictionary computation for non-trivial arrays, and therefore the additional dispatch logic irrelevant.

could we go with the pattern in place now

As written this PR will fairly drastically increase the amount of codegen, you have to understand this code is instantiated for all 8 dictionary key types, so I think it is important we try to keep this under control.

@adriangb
Copy link
Contributor Author

adriangb commented Aug 8, 2024

Got it now! I was able to make a standalone function using your example code. It also helped reduce code duplication a lot.

@adriangb
Copy link
Contributor Author

adriangb commented Aug 8, 2024

@tustvold looks like CI passed and this is approved, mind merging?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @adriangb and @tustvold

@alamb alamb merged commit 3e02689 into apache:master Aug 8, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants