-
-
Notifications
You must be signed in to change notification settings - Fork 766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICU-21592 Update cj normal/loose linebreak per CSS #1991
ICU-21592 Update cj normal/loose linebreak per CSS #1991
Conversation
197b984
to
f4cf01d
Compare
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
f4cf01d
to
40cdb9e
Compare
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
4aa996a
to
9203ec6
Compare
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
/azp run CI-Exhaustive |
Azure Pipelines successfully started running 1 pipeline(s). |
I ran the ICU4C testMonkey for 400000 iterations (compared to 100 for a normal quick test), which took about 45 min; no problems found. |
@allensu05 please also take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to the main rules look good.
I think the LB21 changes in the line_loose_cj.txt monkey test rules can be simplified, but they're probably good enough for the moment, since you're under time pressure.
I'll play with them, and make a followup PR if I come up with something better.
@@ -200,8 +201,10 @@ LB20.09: ^(HY | HH) CM* AL; | |||
|
|||
LB21a: HL CM* (HY | BA | BAX) CM* [^CM CB]?; | |||
|
|||
LB21.1: . CM* [BA HY NS]; | |||
LB21.2: BB CM* [^CM CB]; | |||
LB21.1: [^BK CR LF NL CM ZW SP CB ID] CM* [BA BAX HY NS]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the long negated set is necessary; the hard breaks should never reach this point, having been handled by earlier rules. Monkey test rules are handled sequentially, unlike the main production rules, which are run in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Andy for looking at this and for approving!
This is a followup to PR unicode-org#1991, Update cj normal/loose linebreak per CSS The original change to the line_loose_cj rules involved splitting hyphens out of the BA (Break After) class, allowing a break when they follow an ID. This change simplifies the the rules for doing that. It also fixes a problem with the original change that had altered the behavior of BAX hyphens that followed Regional Indicators or Unattached Combining Marks.
This is a followup to PR #1991, Update cj normal/loose linebreak per CSS The original change to the line_loose_cj rules involved splitting hyphens out of the BA (Break After) class, allowing a break when they follow an ID. This change simplifies the the rules for doing that. It also fixes a problem with the original change that had altered the behavior of BAX hyphens that followed Regional Indicators or Unattached Combining Marks.
Checklist
This updates the CJ normal & loose linebreak tailorings per a recent change agreed on for CSS:
The separate rules for RBBIMonkeyTest in
icu4c/source/test/testdata/break_rules/
andicu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/break_rules/
needed to be updated accordingly. However they also needed one additional update forline_loose_cj.txt
: They needed the following addition corresponding to something that has been in the standard rules for a while, but whose absence in the RBBIMonkeyTest did not previously cause a problem:While the CI tests are running I will in parallel run an extended-duration version of RBBIMonkeyTest locally to check for any hard-to-find problems, and note the result below.