-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loading cache data for lunar calendars #3933
Comments
There's a PR open for Chinese (https://github.com/unicode-org/icu4x/pulls), but islamic/hebrew also need similar work done. |
Name of the constructor that does not load data:
Discussion:
Conclusion: Next steps:
It would be nice to have Chinese ready with the pre-calculated caches in 1.3 but does not need to block. LGTM: @Manishearth @sffc @robertbastian |
#4051 solves this for the milestone, kicking the can to 1.4 |
The plan is:
One thing I'm not sure about is if YearInfo should ever differ from what is stored in precomputed data. Currently the YearInfo equivalent in tree for chinese is: pub(crate) struct ChineseBasedCache {
pub(crate) new_year: RataDie,
pub(crate) next_new_year: RataDie,
pub(crate) leap_month: Option<NonZeroU8>,
} whereas the precomputed cache is: pub(crate) struct ChineseBasedCompiledData {
pub(crate) new_year: RataDie,
/// last_day_of_month[12] = last_day_of_month[11] in non-leap years
/// These days are 1-indexed: so the last day of month for a 30-day 一月 is 30
/// The array itself is zero-indexed, be careful passing it self.0.month!
last_day_of_month: [u16; 13],
///
pub(crate) leap_month: Option<NonZeroU8>,
} (This packs down to three bytes in If we are going to store such data live, we might as well store However, if we make that call now, we can avoid having to pass in the compiled data for everything other than the |
Discussed a bit with @sffc: In general we think that YearInfo and PrecomputedData will not need to differ on the data stored per-year (they may differ on the data storage format). This means that ArithmeticDate only needs to worry about PrecomputedData for the offset/until functions, which is a far lower footprint. The design becomes:
The main downside here is that when working without loaded data, it may be slower to calculate the YearInfo for an entire year since we will have to calculate things for multiple months at a time. This is not a cost we ever have to pay right now. On the other hand, it will probably amortize nicely. Another downside is that this will increase the size of Date whenever our largest calendar decides to cache more things. This is a price we think is ok to pay since Date is mostly ephemeral. |
ugh, found some gnarly stuff that will not be easy to optimize icu4x/components/calendar/src/chinese.rs Line 311 in c755786
This needs the number of days in the previous year. We might just also cache that but it's suboptimal, it's only needed for the rather rare case of week-of-year formatting, but it needs to be computed mandatorily when formatting |
…dar) (#4407) This is mostly a scaffolding PR implementing the first few steps of #3933 (comment). This: - Adds a YearInfo type that is stored in CalendarArithmetic and needs to be computed whenever the type is created or has its offset updated - Requires `.offset()` to be called with an `&impl PrecomputedDataSource<YearInfo>`. An intermediate state of the PR had a separate PrecomputedDataSource associated type as per the original design, but I realized it wasn't necessary and it had a tendency to lead to worse trait resolution issues with the ChineseBased blanket impl - For now, all construction methods for CalendarArithmetic require YearInfo to be `()`. A PR actually using this can add the APIs it needs - As a cleanup I removed `new_with_lunar_ordinals()`. We weren't maintaining that distinction correctly; and I considered fixing it, but that method only exists in the case of future needs and when those needs come we can easily re-add it and do the proper refactoring. Can be reviewed commit-by-commit if desired, though I'd recommend reviewing the whole PR at once. It's probably sufficient to review calendar_arithmetic.rs and chinese_based.rs.
Part of #3933 This does not yet add data loading, but it gets very close to it. This PR makes the size of Date not bloat too much, without requiring much computation for any of the getters -- they're all basic bit operations.
Part of #3933 This implements a module that contains efficient Hebrew calendrical calculations using the [Four Gates](https://en.wikipedia.org/wiki/Hebrew_calendar#The_four_gates) method. It is more efficient than the book code (which is a good demonstration of some algorithms but has not been mathematically simplified, or cached in any way) and works off of the intermediate notion of Keviyah, which is a cacheable quantity that characterizes the type of year. This doesn't yet plug this code into `icu_calendar`, but it does test that it has exactly the same behavior as the book code. I do link the sources, but I _hope_ that all the math here is explained adequately in the comments. The sources should only need to be consulted for the actual values of the constants. **Please let me know if this is not the case and I'll document further**. A thing I'm not yet sure on (which doesn't block this PR) is what should *actually* be cached in the `icu_calendar::Hebrew` YearInfo. We basically have two options: - Cache just the keviyah, or perhaps YearType/StartOfYear/`is_leap`. This is quite compact, but conversion to ISO will require computation of the molad. `molad_details` is not a *particularly* expensive method to call; but it's not completely cheap. - Cache the keviyah *and* the `weeks_since_beharad`. This has danger of blowing the bounds of an i64, though we can probably optimize it by instead storing the RataDie of the beginning of the week, stuffing the index of the Keviyah in the `%7` of the value, and calculating `is_leap` from the year. The implementation of this module makes it rather straightforward to hop between either strategy, so it's not a big deal yet. We can benchmark once we implement this.
@sffc We need to make a decision based on the benchmarking results:
For background, currently Proposal: We commit to never precomputing, add Approval:
|
Does keviyah work over all dates or are there cases where we would need to fall back to the book calculations? If keviyah is a true drop-in replacement, then yeah, we can just use it all the time and make Hebrew independent of data. |
All dates (in range, which is a range of ~i32 years I think: check the code) Prior versions of the Hebrew calendar used slightly different four gates tables, Adjler documents them all, but the book does not handle that either, it implements the modern set of calculations. The only question here is if it is fast enough for us to never want to add data loading. The benchmarks convince me but they are definitely at a level where I would not be surprised if others were not convinced. |
I've started working on the islamic ones. My current data model is to store month length booleans and then use the remaining 4 bits to store a new years offset from the mean synodic new year ( |
@Manishearth - This is 1.3 blocking because the API of a calendar that has no data versus one that does is different. We could theoretically remove constructors from these types for 1.3 and only do things through AnyCalendar. But then these APIs are functionally useless outside of AnyCalendar.
Parts:
new()
and deprecatingnew_calculating()
(and thetry_new...with_calendar()
, if we are definitely not precomputing (otherwise add precomputing) Deprecate Hebrew::new_always_calculating() #4532The text was updated successfully, but these errors were encountered: