Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an Ideal Components Bag / Skeleton for DateTimeFormat #1317

Open
7 tasks
Tracked by #272
gregtatum opened this issue Nov 18, 2021 · 29 comments
Open
7 tasks
Tracked by #272

Create an Ideal Components Bag / Skeleton for DateTimeFormat #1317

gregtatum opened this issue Nov 18, 2021 · 29 comments
Assignees
Labels
C-datetime Component: datetime, calendars, time zones help wanted Issue needs an assignee S-epic Size: Major project (create smaller child issues) T-core Type: Required functionality

Comments

@gregtatum
Copy link
Member

gregtatum commented Nov 18, 2021

This is a meta issue to track implementing the "ideal components bag" as laid out in the DateTimeFormat Deep Dive 2021-10-01 design document. Originally there was some discussion to have this replace the current components bag, but it is to be implemented alongside the existing components bag. A better name can be bikeshed if needed.

The following need to be completed.

@sffc
Copy link
Member

sffc commented Jan 27, 2022

@gregtatum will provide mentorship.

@zbraniecki
Copy link
Member

I'm spreading the word about this issue looking for candidates.

More details:

Description: Currently, DateTimeFormat has two ways to select the right format, both of them are imperfect. We believe we have a balanced novel solution that, once implemented, will become the foundational use of the DateTimeFormat.
Scope: We believe that the initial implementation should take one person several (2-3) months to implement. Hopefully in time for ICU4X 1.0.
Mentorship: This project is well staffed on the mentorship side with @gregtatum from Mozilla, @sffc from Google and @zbraniecki from Amazon willing to invest time to mentor the engineer who'll pick it.
How to start: If you are interested in the project, comment in this issue or join unicode-org.slack.com #icu4x and we'll get you on-ramped.

@ozghimire
Copy link
Contributor

ozghimire commented Feb 10, 2022

I'm interested to work on this issue.

@zbraniecki
Copy link
Member

@gregtatum are you still open to mentor?

@pdogr
Copy link
Contributor

pdogr commented Feb 11, 2022

If this issue is still open, I'm definitely interested to work on this.

@gregtatum
Copy link
Member Author

@ozghimire Great! How would you prefer to get started? There is a document linked above outlining the strategy which should discuss how to get things going. I would suggest starting with #1318. I will fill in more details on that issue.

@pdogr I think ozghimire is taking the first step on this to move it forward, and it's hard to parallelize this initial step, but there will probably be work to help out on around the issues. You could take another DateTimeFormat issue to get onboarded. I'm sure there will be opportunities to help in the short term. #1581 would be a good bug to onboard with if you wanted to take it.

@randomicon00
Copy link

Hello @gregtatum, are still looking for contributors?

@sffc sffc added discuss Discuss at a future ICU4X-SC meeting and removed v1 labels Apr 1, 2022
@sffc
Copy link
Member

sffc commented Oct 14, 2024

How should I organize the code in the repo? I know this is banal, but I need to do something. What we currently have is a bit of a mess, and we need to move things anyway as part of the big rename, so I may as well move them into good places.

Currently, inside components/datetime/src:

Path Description Visibility
fields/*.rs Field and its related types Public
format/datetime.rs Core formatting logic Private
format/neo.rs DateTimeNames, DateTimePatternFormatter Private with public re-exported types
options/*.rs 1.x formatter options Datagen-only except for the HourCycle type, which we could replace with the one from icu_preferences
pattern/*.rs Internal pattern types and logic doc(hidden)
provider/calendar/*.rs 1.x data structs Mix of public and datagen
provider/neo/*.rs 2.x data structs Public
provider/packed_pattern.rs Packed pattern data struct, recently landed Public
provider/time_zones.rs Time zone data structs Public
raw/neo.rs DateTimeZonePatternSelectionData and its related types Private
skeleton/*.rs Classical skeleton code Datagen
calendar.rs CldrCalendar and related traits and trait impls Private with public re-exported types
error.rs MismatchedCalendarError Public
external_loaders.rs Fixed decimal formatter and calendar data loader helpers Private
helpers.rs size_test macro Private
input.rs ExtractedInput Private
lib.rs Nothing defined, only re-exports Public
neo_marker.rs Declarations and definitions of 2.x traits; field set markers Public
neo_pattern.rs DateTimePattern Public
neo_serde.rs Serde impls for 2.x things Private
neo_skeleton.rs Enums and structs for semantic skeleta Public
neo.rs Main 2.x formatter types Public
time_zone.rs Time zone formatting Private
tz_registry.rs Time zone format registry: mapping between semantic time zones, resolved time zones, and field-based time zones Private

All re-exports are from the root unless otherwise specified.

What I think I want to move:

Destination File Name Stuff to move inside Visibility
names.rs DateTimeNames Private with public re-exported types
dt_pattern.rs DateTimePattern, DateTimePatternFormatter Private with public re-exported types
raw.rs What is currently raw/neo.rs Private
error.rs All error enums throughout the crate Public/private as needed
fieldset.rs Field set markers Public
scaffolding/*.rs 2.x formatting traits. Also move CldrCalendar and friends in here Public, but nothing in this module should show up in normal usage of the API
skeleton_impl/*.rs Rename of skeleton/*.rs Datagen
skeleton.rs Enums and structs for semantic skeleta Public
formatter.rs Main 2.x formatter types Private with public re-exported types

Note: I want everything to have exactly 1 place where it is exported.

Thoughts/approval? @Manishearth @robertbastian

Manishearth pushed a commit that referenced this issue Oct 14, 2024
These were useful when code was shared between 1.x and 2.x formatting
code paths, but now removing them reduces failure paths.

#1317
@Manishearth
Copy link
Member

This seems fine! I think we should be organized but it's flexible and we don't have to get it perfect right now. Something vaguely sensible is enough for me!

This was referenced Oct 14, 2024
sffc added a commit that referenced this issue Oct 16, 2024
@sffc
Copy link
Member

sffc commented Oct 18, 2024

Type naming discussion:

Points brought up:

  • Formatter feels too generic and we have man yformatters
  • FieldSetFormatter and SkeletonFormatter are inside basebally
  • DateTimeFormatter is fine as long as we never add a set of aliases like DateTimeFormatter, TimeFormatter, DateFormatter, etc for runtime fieldsets

Discussion:

  • @sffc Runtime fieldsets are a power user API anyway. Most people should be using compile-time fieldsets like YearMonthFormatter. I'm okay committing to no convenient aliases for the runtime ones.
  • @Manishearth and @robertbastian agreed

Conclusion:

  • DateTimeFormatter and FixedCalendarDateTimeFormatter, with an optional TBD after 2.0 GregorianDateTimeFormatter alias
  • Have aliases for YearMonthDayFormatter etc, potentially post 2.0, names need bikeshed.

Agreed: @sffc @Manishearth @robertbastian

@sffc
Copy link
Member

sffc commented Oct 29, 2024

Review with Zibi:

  • @zbraniecki: Make the field set fields private? I think we should end up with a macro like:

fieldset!([year, month, day])::medium() => YMD::medium()

DateTimePattern verbiage:

Original: Most clients should use DateTimeFormatter instead of directly formatting with patterns.

[DateTimePattern] forgoes most internationalization functionality of the datetime crate. It assumes that the pattern is already localized for the customer's locale. Most clients should use [DateTimeFormatter] instead of directly formatting with patterns.

Type exports:

icu::datetime::pattern::DateTimeNames
icu::datetime::pattern::DateTimePattern
icu::datetime::pattern::DateTimePatternFormatter

On the filesystem:

  • names.rs?
  • pattern/mod.rs?
  • Maybe icu::datetime::private::pattern::Pattern?
  • Maybe icu::datetime::private::pattern::ReferencePattern?

Errors:

  • @zbraniecki: Slight preference for exporting errors adjacent to the types they are used in

@Manishearth
Copy link
Member

fieldset!([year, month, day])::medium() => YMD::medium()

Let's not use macros in type position, they don't work in every possible type position and this gets annoying quickly. I think it's fine to provide such a macro but having a fallback is good.

@sffc
Copy link
Member

sffc commented Oct 31, 2024

Notes from brief discussion with @Manishearth:

The user-facing field set related types can be put into 4 buckets

  1. Compile-time field sets such as struct YMD { options }
  2. Runtime field sets such as enum DateFieldSet { YMD, ... }
  3. Runtime skeletons such as struct DateSkeleton { field_set: DateFieldSet, options }
  4. The options themselves, such as Alignment or FractionalSecondDigits

Where should these all go?

My original idea (Option 1):

  1. icu::datetime::fieldset::YMD
  2. icu::datetime::skeleton::DateFieldSet
  3. icu::datetime::skeleton::DateSkeleton
  4. icu::datetime::skeleton::Alignment

One that @Manishearth suggested (Option 2):

  1. icu::datetime::fieldset::YMD
  2. icu::datetime::fieldset::runtime::DateFieldSet
  3. icu::datetime::skeleton::DateSkeleton
  4. icu::datetime::options::Alignment

Here's another one that might be good (Option 3):

  1. icu::datetime::fieldset::YMD
  2. icu::datetime::fieldset::DateFieldSet
  3. icu::datetime::options::DateFieldSetWithOptions
  4. icu::datetime::options::Alignment

I'm not sure about options::DateFieldSetWithOptions. I could still put it at skeleton::DateSkeleton. But skeleton sounds more important, but in the ICU4X world, it's this thing that most people shouldn't generally always be using.

It also occurs to me that skeleton and scaffold are kind-of similar words, but they mean different things.

@Manishearth
Copy link
Member

Manishearth commented Oct 31, 2024

To add some points, personally I think the ideal situation is that the fieldsets, runtime fieldsets, skeleta are all in their own modules, containing nothing but those types, combiner types (Combo), and potentially other modules. Exactly how that is achieved can be done in multiple ways, with fieldset and fieldset::runtime or fieldset and fieldset_runtime, and with skeletons being a submodule of fieldsets or options or something. No strong opinion there.

My vision is that each of these (except for options) can have strong documentation about the usage of these things that the rest of the crate can link to.

@Manishearth
Copy link
Member

  • @Manishearth I think the design is clean. We basically have compile time fieldsets, runtime fieldsets, and skeletons that combine fieldsets and options.
  • @hsivonen I don't like relying on struct/enum to distinguish types. It feels too much like insider information such that most people will find it difficult to know and understand.
  • @sffc I don't like having types that users need to use being more than one module away from the root. Why don't these fieldset types live in the same module? They both represent field sets.
  • @Manishearth In general I think the UX of opening a module with multiple "types of things" is confusing: you open a module with five of one thing and five of another, it's unclear which things you need to look at to fully understand the module. It's fine if a module has five types called FooFieldset, it's clear that "I just should look at TimeFieldset and once I understand that then I will understand that DateFieldset". However if a module has DateFieldset, TimeFieldset, ..., and DateSkeleton, TimeSkeleton, .... then it becomes unclear how to slice things: should you look at one of the Date* types and then one of the Time* types (and so on) or should you look at one of the *Skeleton types and one of the *Fieldset types.
  • @robertbastian You can put them in the same module if they were called RuntimeDateFieldset, etc.
  • @sffc I see two logical ways to organize: keep field sets together, or keep compiletime/runtime together. The neo_skeleton module should come down to 2 types: skeleton structs, and the enums to represent the fieldsets.
  • @sffc The Fieldset enums can be seen as being an implementation detail of the skeleton structs.
  • @Manishearth We can reexport the symbols from the crate.
  • @sffc I don't like reexporting. It creates multiple ways of doing the same thing, and it's not necessary in our case. It also requires suppression workarounds in the FFI code.

@sffc's current thinking:

  • datetime::fieldset::YMD
  • datetime::fieldset_dynamic::DateFieldSet
  • datetime::fieldset_dynamic::DateSkeleton
  • datetime::options::Alignment

@Manishearth But then we have two dimensions in the same module:

(Date, Time, DateTime, ZonedDateTime)(Skeleton, Fieldset)

@sffc We could reduce it to one type, like this:

// mod fieldset_dynamic
enum DateFieldSet {
    YMD(fieldset::YMD),
    MD(fieldset::MD),
}

@Manishearth But currently it's nice to write:

let skeleton = NeoDateSkeleton {YearMonthDay, YearStyle::whatever};

and this gets a bit more complicated on construction? depends on what the user patterns would be like.

  • @sffc in many cases you'll end up ... currently datetime fixture code needs this, which parses out each individual thing
  • @sffc ... why don't we consider adding a Builder type
  • @Manishearth Let's file an issue and perhaps link to it in the docs
  • @sffc I'm not happy with the deeply nested modules, but I don't have an alternative

Proposal:

  • datetime::fieldset::YMD
  • datetime::fieldset::dynamic::DateFieldSet
    • enum containing YMD(fieldset::YMD), etc
  • datetime::options::Alignment
  • Future: datetime::fieldset::dynamic::builder::DateFieldSetBuilder

LGTM: @sffc @Manishearth

@sffc
Copy link
Member

sffc commented Nov 5, 2024

I made a proposal to switch around how time fields are handled:

https://docs.google.com/document/d/1SkxoitlCFiQ_KGW3dmRk7lbumGd_N1rfYJLMlkiXvh4/edit?tab=t.0

I started implementing this in ICU4X. My idea:

  1. Change the API to reflect the new enum
  2. Change the time data payloads to contain 3 patterns, variants of Hour, Hour+Minute, and Hour+Minute+Second using the same mechanism that we have for distinguishing the three year styles
  3. Reduce down to 3 time data payloads: default hour cycle, h12, and h24
  4. Reduce down to a single time field, but otherwise keep the current traits working as they are

Then, I will update #5761 to switch around the dynamic field sets as discussed previously.

One small caveat: I realized that overlap patterns already use the variants for year style. However, none of our overlap patterns currently contain the year field, so I'm currently adding a debug assertion and changing the variants there to be for time precision instead.

Does this sound okay @Manishearth?

@Manishearth
Copy link
Member

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-datetime Component: datetime, calendars, time zones help wanted Issue needs an assignee S-epic Size: Major project (create smaller child issues) T-core Type: Required functionality
Projects
None yet
Development

No branches or pull requests

10 participants