Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading cache data for lunar calendars #3933

Closed
14 tasks done
sffc opened this issue Aug 24, 2023 · 10 comments · Fixed by #4785
Closed
14 tasks done

Loading cache data for lunar calendars #3933

sffc opened this issue Aug 24, 2023 · 10 comments · Fixed by #4785
Assignees
Labels
C-calendar Component: Calendars S-large Size: A few weeks (larger feature, major refactoring)

Comments

@sffc
Copy link
Member

sffc commented Aug 24, 2023

@Manishearth - This is 1.3 blocking because the API of a calendar that has no data versus one that does is different. We could theoretically remove constructors from these types for 1.3 and only do things through AnyCalendar. But then these APIs are functionally useless outside of AnyCalendar.

Parts:

@sffc sffc added discuss Discuss at a future ICU4X-SC meeting C-calendar Component: Calendars labels Aug 24, 2023
@sffc sffc added this to the 1.3 Blocking ⟨P1⟩ milestone Aug 24, 2023
@Manishearth
Copy link
Member

There's a PR open for Chinese (https://github.com/unicode-org/icu4x/pulls), but islamic/hebrew also need similar work done.

@sffc sffc removed the discuss Discuss at a future ICU4X-SC meeting label Sep 5, 2023
@sffc
Copy link
Member Author

sffc commented Sep 5, 2023

Name of the constructor that does not load data:

  • Chinese::new_without_data()
  • Chinese::new_calculating()
  • Chinese::new_without_calculated_data()
  • Chinese::new_runtime_calculating()
  • Chinese::new_always_calculating()

Discussion:

Conclusion: new_always_calculating() for the no-data constructor. new() is the constructor with compiled data.

Next steps:

  • The following calendars need work: Chinese, Dangi, all Islamic, maybe Hebrew
  • Make the following changes:
    • Change these calendars to have new_always_calculating() constructor
    • Make sure they are not Copy

It would be nice to have Chinese ready with the pre-calculated caches in 1.3 but does not need to block.

LGTM: @Manishearth @sffc @robertbastian

@Manishearth
Copy link
Member

#4051 solves this for the milestone, kicking the can to 1.4

@Manishearth
Copy link
Member

Manishearth commented Nov 28, 2023

The plan is:

  • CalendarArithmetic takes in a &PrecomputedData type parameter on all of its methods. Most calendars use () Prepare CalendarArithmetic with new PrecompiledData type parameter #4378
  • ArithmeticDate contains a YearInfo field which also defaults to (). Date mutations must update it.
  • The PrecomputedData is stored on the Calendar object, whereas the YearInfo, as a field of the ArithmeticDate, is stored on the Date. The calendar passes in PrecomputedData where necessary
    • Note: this is different from the current code; which stores both caches on the Date! This design is incorrect and duplicates the data.
  • Methods which can rely on YearInfo will do so. Methods which cannot must hit the precomputed data store to obtain year info.

One thing I'm not sure about is if YearInfo should ever differ from what is stored in precomputed data. Currently the YearInfo equivalent in tree for chinese is:

pub(crate) struct ChineseBasedCache {
    pub(crate) new_year: RataDie,
    pub(crate) next_new_year: RataDie,
    pub(crate) leap_month: Option<NonZeroU8>,
}

whereas the precomputed cache is:

pub(crate) struct ChineseBasedCompiledData {
    pub(crate) new_year: RataDie,
    /// last_day_of_month[12] = last_day_of_month[11] in non-leap years
    /// These days are 1-indexed: so the last day of month for a 30-day 一月 is 30
    /// The array itself is zero-indexed, be careful passing it self.0.month!
    last_day_of_month: [u16; 13],
    ///
    pub(crate) leap_month: Option<NonZeroU8>,
}

(This packs down to three bytes in PackedChineseBasedCompiledData)

If we are going to store such data live, we might as well store PackedChineseBasedCompiledData everywhere, which simplifies the architecture down to "load PackedChineseBasedCompiledData into YearInfo, if you can't load from precomputed data then compute on your own". In the long run this may mean we don't need to pass in PrecomputedData everywhere, though in the short run I'd like to grant us that freedom.

However, if we make that call now, we can avoid having to pass in the compiled data for everything other than the until() and offset_days() which is nice.

@Manishearth
Copy link
Member

Manishearth commented Nov 28, 2023

Discussed a bit with @sffc:

In general we think that YearInfo and PrecomputedData will not need to differ on the data stored per-year (they may differ on the data storage format).

This means that ArithmeticDate only needs to worry about PrecomputedData for the offset/until functions, which is a far lower footprint.

The design becomes:

  • ArithmeticDate gets YearInfo, PrecomputedData, load_yearinfo(data: &PrecomputedData)
  • The offset/until functions take in data: &PrecomputedData. All other ArithmeticDate methods do not
  • All CalendarArithmetic methods are expected to hit YearInfo for their stuff. Callers are expected to preserve YearInfo when mutating years, ideally loaded via load_yearinfo().

The main downside here is that when working without loaded data, it may be slower to calculate the YearInfo for an entire year since we will have to calculate things for multiple months at a time. This is not a cost we ever have to pay right now. On the other hand, it will probably amortize nicely.

Another downside is that this will increase the size of Date whenever our largest calendar decides to cache more things. This is a price we think is ok to pay since Date is mostly ephemeral.

@Manishearth
Copy link
Member

ugh, found some gnarly stuff that will not be easy to optimize

days_in_prev_year: Self::days_in_provided_year(prev_year),

This needs the number of days in the previous year. We might just also cache that but it's suboptimal, it's only needed for the rather rare case of week-of-year formatting, but it needs to be computed mandatorily when formatting

Manishearth added a commit that referenced this issue Dec 7, 2023
…dar) (#4407)

This is mostly a scaffolding PR implementing the first few steps of
#3933 (comment).

This:

- Adds a YearInfo type that is stored in CalendarArithmetic and needs to
be computed whenever the type is created or has its offset updated
- Requires `.offset()` to be called with an `&impl
PrecomputedDataSource<YearInfo>`. An intermediate state of the PR had a
separate PrecomputedDataSource associated type as per the original
design, but I realized it wasn't necessary and it had a tendency to lead
to worse trait resolution issues with the ChineseBased blanket impl
- For now, all construction methods for CalendarArithmetic require
YearInfo to be `()`. A PR actually using this can add the APIs it needs
- As a cleanup I removed `new_with_lunar_ordinals()`. We weren't
maintaining that distinction correctly; and I considered fixing it, but
that method only exists in the case of future needs and when those needs
come we can easily re-add it and do the proper refactoring.

Can be reviewed commit-by-commit if desired, though I'd recommend
reviewing the whole PR at once. It's probably sufficient to review
calendar_arithmetic.rs and chinese_based.rs.
Manishearth added a commit that referenced this issue Dec 18, 2023
Part of #3933

This does not yet add data loading, but it gets very close to it.

This PR makes the size of Date not bloat too much, without requiring
much computation for any of the getters -- they're all basic bit
operations.
Manishearth added a commit that referenced this issue Dec 28, 2023
Part of #3933

This implements a module that contains efficient Hebrew calendrical
calculations using the [Four
Gates](https://en.wikipedia.org/wiki/Hebrew_calendar#The_four_gates)
method. It is more efficient than the book code (which is a good
demonstration of some algorithms but has not been mathematically
simplified, or cached in any way) and works off of the intermediate
notion of Keviyah, which is a cacheable quantity that characterizes the
type of year.


This doesn't yet plug this code into `icu_calendar`, but it does test
that it has exactly the same behavior as the book code.

I do link the sources, but I _hope_ that all the math here is explained
adequately in the comments. The sources should only need to be consulted
for the actual values of the constants. **Please let me know if this is
not the case and I'll document further**.



A thing I'm not yet sure on (which doesn't block this PR) is what should
*actually* be cached in the `icu_calendar::Hebrew` YearInfo. We
basically have two options:

- Cache just the keviyah, or perhaps YearType/StartOfYear/`is_leap`.
This is quite compact, but conversion to ISO will require computation of
the molad. `molad_details` is not a *particularly* expensive method to
call; but it's not completely cheap.
- Cache the keviyah *and* the `weeks_since_beharad`. This has danger of
blowing the bounds of an i64, though we can probably optimize it by
instead storing the RataDie of the beginning of the week, stuffing the
index of the Keviyah in the `%7` of the value, and calculating `is_leap`
from the year.

The implementation of this module makes it rather straightforward to hop
between either strategy, so it's not a big deal yet. We can benchmark
once we implement this.
@Manishearth Manishearth added the needs-approval One or more stakeholders need to approve proposal label Jan 10, 2024
@Manishearth
Copy link
Member

Manishearth commented Jan 16, 2024

@sffc We need to make a decision based on the benchmarking results:

Consider adding a new() and deprecating new_calculating() (and the try_new...with_calendar(), if we are definitely not precomputing (otherwise add precomputing)

For background, currently Hebrew must be constructed with new_always_calculating()

Proposal: We commit to never precomputing, add ::new(), make Hebrew constructable as a singleton struct, and deprecate the old APIs. For Hebrew only.

Approval:

@sffc
Copy link
Member Author

sffc commented Jan 16, 2024

Does keviyah work over all dates or are there cases where we would need to fall back to the book calculations? If keviyah is a true drop-in replacement, then yeah, we can just use it all the time and make Hebrew independent of data.

@Manishearth
Copy link
Member

Manishearth commented Jan 16, 2024

All dates (in range, which is a range of ~i32 years I think: check the code)

Prior versions of the Hebrew calendar used slightly different four gates tables, Adjler documents them all, but the book does not handle that either, it implements the modern set of calculations.

The only question here is if it is fast enough for us to never want to add data loading. The benchmarks convince me but they are definitely at a level where I would not be surprised if others were not convinced.

@sffc sffc removed the needs-approval One or more stakeholders need to approve proposal label Mar 14, 2024
@Manishearth
Copy link
Member

I've started working on the islamic ones. My current data model is to store month length booleans and then use the remaining 4 bits to store a new years offset from the mean synodic new year (year * 12 * MEAN_SYNODIC_MONTH). It's unlikely that that number will be super large.

@Manishearth Manishearth added the S-large Size: A few weeks (larger feature, major refactoring) label Apr 4, 2024
Manishearth added a commit that referenced this issue Apr 19, 2024
)

Fixes #3933

Uses work from #4770 on the
remaining 3 calendars


عید مبارک!!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-calendar Component: Calendars S-large Size: A few weeks (larger feature, major refactoring)
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants