-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replacing datagen options by DatagenDriver
#3861
Conversation
abfc396
to
860986c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick architectural review; mostly looks OK but leaving Requested Changes for some smaller things
DataExportDriver
DatagenDriver
/// are excluded. This method can be used to reennable them. | ||
/// | ||
/// The special string `"search*"` causes all search collation tables to be included. | ||
pub fn with_collations(self, collations: impl IntoIterator<Item = String>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be called with_addtional_collations
or something? Because most collations are preselected and cannot be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question and I like your suggestion; let's not block this pr on the resolution but we should decide before 1.3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments
I'd like to see this merged soon; I don't like big PRs like this open for a long time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Praise: this is a great refactoring and I really like the new API. I have some nitty comments but the overall architecture looks great!
/// [`icu_datagen::all_keys`]: crate::all_keys | ||
/// [`icu_datagen::key`]: crate::key | ||
/// [`icu_datagen::keys_from_bin`]: crate::keys_from_bin | ||
pub fn with_keys(self, keys: impl IntoIterator<Item = DataKey>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about impl IntoIterator
-- if we need the vec or hashset then why not just take the vec or hashset? It's nice that it keeps the internal collection hidden, but if you already have a hashset then this is slower. (ok to discuss in a follow up)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this to be much more ergonomic. In tests I can just pass arrays, and also keys
, keys_for_bin
and keys_from_file
return Vec<DataKey>
, so you always end up writing .into_iter().collect()
at the call site.
/// are excluded. This method can be used to reennable them. | ||
/// | ||
/// The special string `"search*"` causes all search collation tables to be included. | ||
pub fn with_collations(self, collations: impl IntoIterator<Item = String>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question and I like your suggestion; let's not block this pr on the resolution but we should decide before 1.3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really should either change the bounds on the export
function to be stable (by loading the likely subtags data some roundabout way via ExportMarker) or make the function be named export_unstable
but I'm okay doing that after landing the mammoth PR so long as it is tracked to block 1.3
We should load it through |
@@ -147,12 +138,7 @@ impl DatagenDriver { | |||
// Avoids multiple monomorphizations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: only sink
gets a trait object; is it worth it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idk, but provider is not object safe so it can't
The new architecture has two main structs:
DatagenProvider
andDatagenDriver
.SourceData
is deprecated and put behind thelegacy_api
flag, data is now added directly to theDatagenProvider
.DatagenDriver
is used to configure export options, and contains the export logic such as locale selection. These two structs could in the future live in different crates.Misc fixes:
data
intotests/data
. This is required as the new API does not include fallback data, so the segmentation dictionaries have to be in the correct location insidetests/data/icuexport
. The legacy API thus pulls in data fromtests/data
, but it does so usinginclude_str!
which I hope is fine.Fixes #3795, #3800
Part of #3564