Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data loading for chinese precomputed caches #4468

Merged
merged 20 commits into from
Dec 19, 2023

Conversation

Manishearth
Copy link
Member

@Manishearth Manishearth commented Dec 19, 2023

Part of #3933

This ties everything together, datagenning cached data for a period of 200 years after the year 1900 (eventually can be made configurable).

With this PR, cached data can be considered done for Chinese and Dangi. Islamic and Hebrew still need to have this work done, though it will hopefully be simpler.

I may want to do followup work investigating bechmark performance (see comment below)

This PR can be reviewed commit by commit.

@Manishearth
Copy link
Member Author

Benchmark results:

     Running benches/convert.rs (/home/manishearth/dev/icu4x/target/release/deps/convert-9276c61bbc500b27)
convert/calendar/iso    time:   [4.6265 ns 4.6398 ns 4.6548 ns]
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low severe
  4 (4.00%) high mild
  1 (1.00%) high severe
convert/calendar/buddhist
                        time:   [4.7727 ns 4.8141 ns 4.8632 ns]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low severe
  2 (2.00%) high mild
convert/calendar/coptic time:   [19.947 ns 20.714 ns 21.507 ns]
Found 28 outliers among 100 measurements (28.00%)
  9 (9.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  16 (16.00%) high severe
convert/calendar/ethiopic
                        time:   [22.356 ns 22.850 ns 23.439 ns]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe
convert/calendar/indian time:   [6.9233 ns 6.9758 ns 7.0292 ns]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
convert/calendar/julian time:   [23.639 ns 23.717 ns 23.802 ns]
Found 17 outliers among 100 measurements (17.00%)
  7 (7.00%) low severe
  3 (3.00%) low mild
  2 (2.00%) high mild
  5 (5.00%) high severe
convert/calendar/chinese_calculating
                        time:   [128.99 µs 129.49 µs 129.98 µs]
convert/calendar/chinese_cached
                        time:   [83.002 ns 83.769 ns 84.552 ns]
convert/calendar/gregorian
                        time:   [4.7451 ns 4.7624 ns 4.7804 ns]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low severe
  2 (2.00%) high mild
convert/calendar/islamic/observational
                        time:   [67.040 µs 68.330 µs 70.058 µs]
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe
convert/calendar/islamic/civil
                        time:   [23.409 ns 23.457 ns 23.504 ns]
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low severe
  4 (4.00%) high mild
  2 (2.00%) high severe
convert/calendar/islamic/ummalqura
                        time:   [193.53 µs 194.91 µs 196.50 µs]
Found 24 outliers among 100 measurements (24.00%)
  4 (4.00%) low severe
  2 (2.00%) high mild
  18 (18.00%) high severe
convert/calendar/islamic/tabular
                        time:   [25.884 ns 26.250 ns 26.693 ns]
Found 12 outliers among 100 measurements (12.00%)
  6 (6.00%) high mild
  6 (6.00%) high severe

     Running benches/date.rs (/home/manishearth/dev/icu4x/target/release/deps/date-b327089746c19b13)
date/calendar/overview  time:   [161.15 ns 165.37 ns 169.93 ns]
Found 12 outliers among 100 measurements (12.00%)
  6 (6.00%) high mild
  6 (6.00%) high severe
date/calendar/buddhist  time:   [154.81 ns 156.10 ns 157.67 ns]
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild
date/calendar/coptic    time:   [1.2697 µs 1.2781 µs 1.2890 µs]
Found 12 outliers among 100 measurements (12.00%)
  2 (2.00%) high mild
  10 (10.00%) high severe
date/calendar/ethiopic  time:   [1.2692 µs 1.2878 µs 1.3099 µs]
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high severe
date/calendar/indian    time:   [338.48 ns 339.69 ns 340.88 ns]
date/calendar/persian   time:   [1.6215 µs 1.6559 µs 1.6934 µs]
Found 17 outliers among 100 measurements (17.00%)
  4 (4.00%) high mild
  13 (13.00%) high severe
date/calendar/roc       time:   [169.04 ns 170.19 ns 171.39 ns]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
date/calendar/julian    time:   [1.2987 µs 1.3128 µs 1.3301 µs]
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe
date/calendar/chinese_calculating
                        time:   [7.8037 ms 7.8461 ms 7.8969 ms]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
date/calendar/chinese_cached
                        time:   [5.7837 ms 5.7914 ms 5.8007 ms]
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe
date/calendar/dangi_calculating
                        time:   [7.7225 ms 7.7458 ms 7.7783 ms]
Found 15 outliers among 100 measurements (15.00%)
  3 (3.00%) high mild
  12 (12.00%) high severe
date/calendar/dangi_cached
                        time:   [5.8629 ms 5.9199 ms 5.9876 ms]
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe
date/calendar/hebrew    time:   [16.617 µs 16.873 µs 17.191 µs]
date/calendar/gregorian time:   [208.49 ns 210.77 ns 213.02 ns]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
date/calendar/islamic/civil
                        time:   [1.5194 µs 1.5252 µs 1.5319 µs]
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe
date/calendar/islamic/tabular
                        time:   [1.5393 µs 1.5528 µs 1.5668 µs]
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe
date/calendar/islamic/ummalqura
                        time:   [28.234 ms 29.228 ms 30.234 ms]
date/calendar/islamic/observational
                        time:   [13.859 ms 13.914 ms 13.973 ms]
Found 33 outliers among 100 measurements (33.00%)
  19 (19.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  11 (11.00%) high severe

     Running benches/datetime.rs (/home/manishearth/dev/icu4x/target/release/deps/datetime-2ad42a45c21b21ec)
datetime/calendar/overview
                        time:   [297.45 ns 308.84 ns 320.66 ns]
Found 13 outliers among 100 measurements (13.00%)
  3 (3.00%) high mild
  10 (10.00%) high severe
datetime/calendar/buddhist
                        time:   [287.48 ns 289.93 ns 293.28 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
datetime/calendar/coptic
                        time:   [1.2809 µs 1.2856 µs 1.2908 µs]
datetime/calendar/ethiopic
                        time:   [1.3332 µs 1.3474 µs 1.3641 µs]
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe
datetime/calendar/chinese_calculating
                        time:   [7.8281 ms 7.8662 ms 7.9073 ms]
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild
datetime/calendar/chinese_cached
                        time:   [5.8008 ms 5.8235 ms 5.8554 ms]
Found 18 outliers among 100 measurements (18.00%)
  18 (18.00%) high severe
datetime/calendar/gregorian
                        time:   [305.50 ns 327.00 ns 352.39 ns]
Found 14 outliers among 100 measurements (14.00%)
  2 (2.00%) high mild
  12 (12.00%) high severe
datetime/calendar/indian
                        time:   [413.97 ns 420.76 ns 430.14 ns]
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe
datetime/calendar/julian
                        time:   [1.3759 µs 1.3921 µs 1.4102 µs]
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe
datetime/calendar/islamic/civil
                        time:   [1.7730 µs 1.8500 µs 1.9175 µs]
Found 24 outliers among 100 measurements (24.00%)
  2 (2.00%) high mild
  22 (22.00%) high severe
datetime/calendar/islamic/tabular
                        time:   [1.8157 µs 1.8485 µs 1.8841 µs]
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild
datetime/calendar/islamic/ummalqura
                        time:   [22.756 ms 23.022 ms 23.366 ms]
Found 10 outliers among 100 measurements (10.00%)
  7 (7.00%) high mild
  3 (3.00%) high severe
datetime/calendar/islamic/observational
                        time:   [13.994 ms 14.057 ms 14.121 ms]
Found 41 outliers among 100 measurements (41.00%)
  18 (18.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high mild
  20 (20.00%) high severe

     Running benches/iso.rs (/home/manishearth/dev/icu4x/target/release/deps/iso-52fac05f9fd4ad22)
iso/from_minutes_since_local_unix_epoch
                        time:   [18.969 µs 19.315 µs 19.776 µs]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

Relevant snippets:

convert/calendar/chinese_calculating
                        time:   [128.99 µs 129.49 µs 129.98 µs]
convert/calendar/chinese_cached
                        time:   [83.002 ns 83.769 ns 84.552 ns]



date/calendar/chinese_calculating
                        time:   [7.8037 ms 7.8461 ms 7.8969 ms]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
date/calendar/chinese_cached
                        time:   [5.7837 ms 5.7914 ms 5.8007 ms]
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe
date/calendar/dangi_calculating
                        time:   [7.7225 ms 7.7458 ms 7.7783 ms]
Found 15 outliers among 100 measurements (15.00%)
  3 (3.00%) high mild
  12 (12.00%) high severe
date/calendar/dangi_cached
                        time:   [5.8629 ms 5.9199 ms 5.9876 ms]


datetime/calendar/chinese_calculating
                        time:   [7.8281 ms 7.8662 ms 7.9073 ms]
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild
datetime/calendar/chinese_cached
                        time:   [5.8008 ms 5.8235 ms 5.8554 ms]

It's always faster, but not as much faster as I might have hoped. The date/datetime tests are somewhat expected, since half of those fixtures are for uncached years. Convert is not. I'd like to investigate this as a followup.

@sffc
Copy link
Member

sffc commented Dec 19, 2023

Convert is not. I'd like to investigate this as a followup.

1000x faster?

convert/calendar/chinese_calculating
                        time:   [128.99 µs 129.49 µs 129.98 µs]
convert/calendar/chinese_cached
                        time:   [83.002 ns 83.769 ns 84.552 ns]

@Manishearth
Copy link
Member Author

oh. I just didn't look at the unit. Oops

@Manishearth
Copy link
Member Author

Manishearth commented Dec 19, 2023

For date/datetime I may tweak the bench to also test "only in cached range" dates as a separate bench

use icu_provider::prelude::*;

const YEARS: i32 = 200;
const ISO_START: i32 = 1900;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just made these numbers up, open to other suggestions. Data size is currently 600B each.

Previously when Andrew and I were designing it we were including the related ISO year which would have limited the range to around 200 but now this range can be arbitrarily extended to whatever we want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you make it 250 years then we're right in the middle of the range.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pre-unix epoch is less important than the future, I'd even go as far as saying 1950-2150.

Is there no cycle in this data? Or is the period just very large?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there's no cycle.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bumped it to 250. Also open to making it 200 and 1950, but I think prior dates are actually somewhat important especially when it comes to peoples' birthdays. Not that important though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well there has to be a cycle at some point

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why there has to be? If you're suggesting that all sequences of elements from a bounded set are cyclic, that's certainly not the case, check out the digits of pi.

The functions involved are periodic but their periods are not rational multiples of one another. The system overall is quasiperiodic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Living people's birthdays is also my rule of thumb.

@Manishearth
Copy link
Member Author

Manishearth commented Dec 19, 2023

Still not happy with it being in the high 10s of ns, expected it to be a bit lower. But less of a big deal, still in range.

sffc
sffc previously approved these changes Dec 19, 2023
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great that this works!

Did two passes, read some of the functions in detail, don't see anything wrong or that I would change.

components/calendar/src/chinese_based.rs Show resolved Hide resolved
use icu_provider::prelude::*;

const YEARS: i32 = 200;
const ISO_START: i32 = 1900;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pre-unix epoch is less important than the future, I'd even go as far as saying 1950-2150.

Is there no cycle in this data? Or is the period just very large?

components/calendar/src/chinese.rs Outdated Show resolved Hide resolved
components/calendar/src/dangi.rs Outdated Show resolved Hide resolved
@@ -445,6 +445,15 @@ mod test {
use super::*;
use crate::types::MonthCode;
use calendrical_calculations::rata_die::RataDie;
/// Run a test twice, with two calendars
fn do_twice(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmh, for-loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How? Why? I don't think a for loop is cleaner than this, and this lets us tweak how we do it.

@Manishearth Manishearth merged commit ae4c162 into unicode-org:main Dec 19, 2023
29 checks passed
@Manishearth Manishearth deleted the chinese-precomputed branch December 19, 2023 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants