Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid from_num_days_from_ce_opt calls in timestamp_s_to_datetime if we don't need #6746

Closed
jayzhan211 opened this issue Nov 18, 2024 · 1 comment · Fixed by #6755
Closed
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog

Comments

@jayzhan211
Copy link
Contributor

jayzhan211 commented Nov 18, 2024

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Describe the solution you'd like

In Arrow-rs, extracting the minute from a timestamp involves converting the timestamp, such as 1599563412, into a NaiveDateTime using the timestamp_s_to_datetime function. This results in a value like 2020-09-08T12:10:12.123456780. The minute() method is then called on the NaiveDateTime to retrieve the minute component, such as 10.

However, in chrono's implementation, even when only the minute is required using NaiveTime::from_num_seconds_from_midnight_opt, the function NaiveTime::from_num_days_from_ce_opt is still called. Ideally, this additional step should be avoided to improve efficiency.

    #[inline]
    #[must_use]
    pub const fn from_timestamp(secs: i64, nsecs: u32) -> Option<Self> {
        let days = secs.div_euclid(86_400) + UNIX_EPOCH_DAY;
        let secs = secs.rem_euclid(86_400);
        if days < i32::MIN as i64 || days > i32::MAX as i64 {
            return None;
        }
        let date = try_opt!(NaiveDate::from_num_days_from_ce_opt(days as i32));
        let time = try_opt!(NaiveTime::from_num_seconds_from_midnight_opt(secs as u32, nsecs));
        Some(date.and_time(time).and_utc())
    }

I propose we upstream chrono crate and find a way to get date and time separately given secs: i64, nsecs: u32. Then we call corresponding smaller function based on what we need to minimum the cost.

Describe alternatives you've considered

Additional context

This is the flamegraph for clickbench Q18 in datafusion, we can see the from_num_days_from_ce_opt spent much time for date_part function. But the query care only about minute so the compute of from_num_days_from_ce_opt is useless.
Screenshot 2024-11-18 at 5 40 45 PM

@jayzhan211 jayzhan211 added the enhancement Any new improvement worthy of a entry in the changelog label Nov 18, 2024
@alamb alamb added the arrow Changes to the arrow crate label Dec 17, 2024
@alamb
Copy link
Contributor

alamb commented Dec 17, 2024

label_issue.py automatically added labels {'arrow'} from #6755

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants