-
-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add joining-group sub-command #32
base: master
Are you sure you want to change the base?
Conversation
@@ -24,7 +24,114 @@ pub struct ArabicShaping { | |||
/// The "joining type" of this codepoint. | |||
pub joining_type: JoiningType, | |||
/// The "joining group" of this codepoint. | |||
pub joining_group: String, | |||
pub joining_group: JoiningGroup, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change. Breaking changes should be called out in the PR description.
Could we please back this out? I'd really prefer not to release a new semver incompatible version of ucd-parse
at this point. I could, but I would prefer such things be deliberate instead of a reactive "oh this PR came in with a breaking change so let's just put out an incompatible release." That's how churn happens. :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative here is to add a new field that is an enum, and leave the existing string field. Maybe joining_group2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Breaking changes should be called out in the PR description.
Sorry I didn't think about the impact of the change when doing it so didn't think to call it out in the PR.
|
||
/// The Joining_Group field read from ArabicShaping.txt | ||
#[derive(Clone, Copy, Debug, Eq, PartialEq)] | ||
pub enum JoiningGroup { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just looking at the values, it kind of feels like this should be marked as #[non_exhaustive]
. I don't really have any confidence that this list won't expand in future versions of Unicode.
Honestly might be a good reason to just drop the enum altogether. Why did you make this an enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly might be a good reason to just drop the enum altogether. Why did you make this an enum?
The addition of the joining-group
sub-command was done to as part of our work to add support for Syriac shaping to Allsorts. We need to match on the joining group to do this:
It was changed to an enum to be consistent with joining_type
, which has been used just prior to implement Arabic shaping https://github.com/yeslogic/allsorts/blob/ade4ef90dddecf63370d78b5ada58439e910faaf/src/scripts/arabic.rs#L33-L67
I guess in the context of doing one then the other, it seems like an oversight that one was an enum and the other a string. In hindsight I should have realised this was a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we add joining_group2
that is the enum and leave joining_group
as a String
? Then add a note to the (public) docs of joining_group
that it will be dropped in the next semver bump of the crate.
What about the non_exhaustive
bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we add
joining_group2
that is the enum and leavejoining_group
as aString
? Then add a note to the (public) docs ofjoining_group
that it will be dropped in the next semver bump of the crate.
Yes that sounds fine.
What about the
non_exhaustive
bit?
Giving this some thought I'm a bit on the fence. non_exhaustive
allows backwards compatibility when new variants are added but I'm not sure this is desirable in this this case. I.e. maybe the default behaviour of reporting unhandled variants is desirable so that you get a prompt to handle the new cases versus having them silently hit a catch all case. The scenario where new variants are added coincides with new Unicode releases so perhaps it's reasonable to require updating code when adding support for newer Unicode releases. Then again if you're just a downstream consumer of some unicode related thing maybe you don't care enough about those details and would just like your code to keep compiling... 🤔
Perhaps it's worth noting that none of the other enums have it so far either and they also see new additions. E.g. I think it's beneficial that the addition of new scripts (which happens in most Unicode releases) prompts code to handle them.
Anyway I don't think I have strong enough feelings to advocate one way or another. Since you have more experience maintaining popular crates I'm happy to defer to what you think is best.
This PR adds a
joining-group
sub-command that exposes the field of the same name fromArabicShaping.txt
.There is an initial commit that changes the
joining-group
field onArabicShaping
from aString
to and enum.This field is a little weird in that it's stored in
ArabicShaping.txt
in a different form to the "formal" form inPropertyValueAliases.txt
. As a result there is a function to convert from the input ArabicShaping form to the PropertyValueAliases form. The enum and associated methods are implmented in terms of the PropertyValueAliases form.For example in this row:
TEH MARBUTA
(last field) corresponds toJoining_Group
Teh_Marbuta
.Sample output: