Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connect properties provider to the icu4x_datagen exporter tool #1204

Merged
merged 20 commits into from
Oct 29, 2021

Conversation

echeran
Copy link
Contributor

@echeran echeran commented Oct 21, 2021

Closes #936

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • components/properties/src/props.rs is now changed in the branch
  • provider/uprops/src/enum_codepointtrie.rs is different
  • tools/datagen/src/bin/datagen.rs is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@echeran echeran requested a review from sffc October 26, 2021 01:51
@echeran echeran marked this pull request as ready for review October 28, 2021 21:10
@echeran echeran requested review from Manishearth and a team as code owners October 28, 2021 21:10
@aethanyc
Copy link
Contributor

I find it a bit strange that the tool generates data for CodePointMaps keys GENERAL_CATEGORY_V1 and SCRIPT_V1 under uniset/ folder such as uniset/sc@1.json and uniset/gc@1.json.

@echeran
Copy link
Contributor Author

echeran commented Oct 28, 2021

I find it a bit strange that the tool generates data for CodePointMaps keys GENERAL_CATEGORY_V1 and SCRIPT_V1 under uniset/ folder such as uniset/sc@1.json and uniset/gc@1.json.

Agreed. Weigh in on the discussion in #1196

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just some small things

@@ -27,6 +29,7 @@ pub enum EnumeratedProperty {
/// GeneralSubcategory only supports specific subcategories (eg `UppercaseLetter`).
/// It does not support grouped categories (eg `Letter`). For grouped categories, use [`GeneralCategory`].
#[derive(Copy, Clone, PartialEq, Eq, Debug, TryFromPrimitive, UnsafeFromPrimitive)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: why do we need these serde impls now when we didn't need them before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deserialize is needed for unit tests that need to load data as a UnicodePropertyMap data struct (which in turns carries a CodePointMap, which takes these enums as parameters). In the reverse direction, serialization happens when generating testdata.

This is the error from cargo test when I comment out these serde impls:

error[E0277]: the trait bound `GeneralSubcategory: serde::de::Deserialize<'_>` is not satisfied
  --> src/maps.rs:56:5
   |
9  |     maps::get_general_category(&provider)
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `serde::de::Deserialize<'_>` is not implemented for `GeneralSubcategory`
   |
   = note: required because of the requirements on the impl of `for<'de> serde::de::Deserialize<'_>` for `UnicodePropertyMapV1<'de, GeneralSubcategory>`
   = note: 1 redundant requirements hidden
   = note: required because of the requirements on the impl of `for<'de> serde::de::Deserialize<'de>` for `yoke::trait_hack::YokeTraitHack<UnicodePropertyMapV1<'de, GeneralSubcategory>>`
   = note: required because of the requirements on the impl of `icu_provider::data_provider::DataProvider<'_, UnicodePropertyMapV1Marker<GeneralSubcategory>>` for `icu_provider_fs::fs_data_provider::FsDataProvider`
note: required by a bound in `get_general_category`
  --> /usr/local/google/home/elango/oss/icu4x/components/properties/src/maps.rs:67:8
   |
67 |     D: DataProvider<'data, UnicodePropertyMapV1Marker<GeneralSubcategory>> + ?Sized,
   |        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ required by this bound in `get_general_category`

Similarly, this is the error from cargo make testdata:

error[E0277]: the trait bound `GeneralSubcategory: Serialize` is not satisfied
   --> provider/uprops/src/enum_codepointtrie.rs:116:1
    |
116 | / icu_provider::impl_dyn_provider!(EnumeratedPropertyCodePointTrieProvider, {
117 | |     key::GENERAL_CATEGORY_V1 => UnicodePropertyMapV1Marker<GeneralSubcategory>,
118 | |     key::SCRIPT_V1 => UnicodePropertyMapV1Marker<Script>,
119 | | }, SERDE_SE, 'data);
    | |____________________^ the trait `Serialize` is not implemented for `GeneralSubcategory`
    |
    = note: required because of the requirements on the impl of `for<'a> Serialize` for `UnicodePropertyMapV1<'a, GeneralSubcategory>`
    = note: 1 redundant requirements hidden
    = note: required because of the requirements on the impl of `for<'a> Serialize` for `&'a UnicodePropertyMapV1<'a, GeneralSubcategory>`
    = note: required because of the requirements on the impl of `UpcastDataPayload<'_, UnicodePropertyMapV1Marker<GeneralSubcategory>>` for `SerdeSeDataStructMarker`
note: required by `upcast`
   --> /usr/local/google/home/elango/oss/icu4x/provider/core/src/dynutil.rs:40:5
    |
40  | /     fn upcast(
41  | |         other: crate::prelude::DataPayload<'data, M>,
42  | |     ) -> crate::prelude::DataPayload<'data, Self>;
    | |__________________________________________________^
    = note: this error originates in the macro `$crate::impl_dyn_provider` (in Nightly builds, run with -Z macro-backtrace for more info)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so I think we do actually need the serde impls here, because human_readable serialization of the ZeroVec needs those impls on T.

tools/datagen/src/bin/datagen.rs Outdated Show resolved Hide resolved
tools/scripts/data.toml Show resolved Hide resolved
@echeran echeran requested a review from sffc October 29, 2021 02:05
@Manishearth Manishearth removed their request for review October 29, 2021 06:38
@sffc sffc added this to the ICU4X 0.4 milestone Oct 29, 2021
@echeran echeran merged commit 2313561 into unicode-org:main Oct 29, 2021
@echeran echeran deleted the uprops-datagen branch October 29, 2021 17:47
@sffc sffc linked an issue Nov 1, 2021 that may be closed by this pull request
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants