Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster timestamp parsing (~70-90% faster) #3801

Merged
merged 7 commits into from
Mar 9, 2023

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Mar 3, 2023

Which issue does this PR close?

Closes #.

Rationale for this change

2020-09-08              time:   [41.412 ns 41.433 ns 41.453 ns]
                        change: [-94.894% -94.888% -94.881%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

2020-09-08T13:42:29     time:   [45.773 ns 45.807 ns 45.840 ns]
                        change: [-91.959% -91.950% -91.941%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild

2020-09-08T13:42:29.190 time:   [46.387 ns 46.458 ns 46.534 ns]
                        change: [-91.947% -91.936% -91.925%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

2020-09-08T13:42:29.190855
                        time:   [47.343 ns 47.367 ns 47.393 ns]
                        change: [-91.397% -91.389% -91.382%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

2020-09-08T13:42:29.190855999
                        time:   [47.425 ns 47.468 ns 47.514 ns]
                        change: [-91.861% -91.851% -91.842%] (p = 0.00 < 0.05)
                        Performance has improved.

2020-09-08T13:42:29+00:00
                        time:   [57.554 ns 57.587 ns 57.623 ns]
                        change: [-32.570% -32.405% -32.206%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  1 (1.00%) high mild
  16 (16.00%) high severe

2020-09-08T13:42:29.190+00:00
                        time:   [59.645 ns 59.678 ns 59.711 ns]
                        change: [-33.385% -33.207% -33.058%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

2020-09-08T13:42:29.190855+00:00
                        time:   [57.440 ns 57.476 ns 57.515 ns]
                        change: [-36.681% -36.590% -36.504%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

2020-09-08T13:42:29.190855999-05:00
                        time:   [57.124 ns 57.151 ns 57.179 ns]
                        change: [-37.896% -37.674% -37.515%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

2020-09-08T13:42:29.190855Z
                        time:   [28.680 ns 28.712 ns 28.749 ns]
                        change: [-67.723% -67.672% -67.616%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  5 (5.00%) high severe

What changes are included in this PR?

Are there any user-facing changes?

@tustvold tustvold changed the title Faster timezone parsing Faster timezone parsing (~70-90% faster) Mar 3, 2023
@github-actions github-actions bot added the arrow Changes to the arrow crate label Mar 3, 2023
@tustvold tustvold changed the title Faster timezone parsing (~70-90% faster) Faster timestamp parsing (~70-90% faster) Mar 3, 2023
@tustvold tustvold marked this pull request as ready for review March 5, 2023 17:28
@tustvold tustvold marked this pull request as draft March 5, 2023 17:41
@tustvold tustvold marked this pull request as ready for review March 6, 2023 13:44
@jhorstmann
Copy link
Contributor

Impressive results!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the code carefully, it looks good (and very clever!) to me, nicely done @tustvold

Prior to merging this, I think it needs it needs significantly more test coverage (especially negative cases) -- previously we were relying on the chrono parser to be well tested and so don't have exhaustive tests. However, with our own parser I think we also need to add our own coverage. Maybe we can crib (borrow?) from the chrono test cases?

I agree with @jhorstmann that the reported results are very nice 🚀

3 => [bytes[1], bytes[2], b'0', b'0'],
_ => return None,
};
values.iter_mut().for_each(|x| *x = x.wrapping_sub(b'0'));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this converts the ascii to the numeric representation, right? I am thinking that the use of wrapping sub ensures that any values like (20) that is lower than '0' will not pass this check, is that correct?

Copy link
Contributor Author

@tustvold tustvold Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapping sub will just underflow harmlessly for such values, which will then fail on the check for < 9

arrow-cast/src/parse.rs Outdated Show resolved Hide resolved
arrow-cast/src/parse.rs Show resolved Hide resolved
arrow-cast/src/parse.rs Outdated Show resolved Hide resolved
arrow-cast/src/parse.rs Outdated Show resolved Hide resolved
@tustvold
Copy link
Contributor Author

tustvold commented Mar 9, 2023

I've added some more tests, PTAL, including the chrono test cases I could find... They aren't actually all that extensive... - https://github.com/chronotope/chrono/blob/main/src/format/parse.rs#L932

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- thank you @tustvold

@tustvold tustvold merged commit cdb042e into apache:master Mar 9, 2023
@ursabot
Copy link

ursabot commented Mar 9, 2023

Benchmark runs are scheduled for baseline = de9f826 and contender = cdb042e. cdb042e is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants