-
-
Notifications
You must be signed in to change notification settings - Fork 754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICU-22941 Revert ICU-22112, untailoring root word break #3249
Conversation
@markusicu, I am running into the same problem as in #3028 (comment): the documentation only tells me how to regenerate the old icudata.jar, not the ICU4J .brk files. Can you perform the same thaumaturgy you did in 169023a? (At some point it would be good to update the documentation, too…) |
@markusicu post-UTW poke |
@markusicu in fcd04fc I tried copying over the .brk files that get generated when I rebuild on my machine; these don’t seem to work:
But genbrk.cpp does not seem to have any options for output format, so I don’t understand how I can be generating brk files that work for ICU4C but not for ICU4J. |
I just fetched your branch and ran the rbbi tests in Eclipse. 66 tests, 2 failed. So 64 tests worked :-) The code fails to load "brkitr/word_fi_sv.brk". ICUBinary.getData() tries to find it two ways but ends up returning null because it's not there, and it doesn't throw an exception because the caller didn't ask for it. This is a bug --> ICU-22960 I see that you updated that .brk file, but I don't see why ICU can't find it :-( |
Oh, wait, you are deleting that file... |
I refreshed all of the ICU4J data on my Linux box. It still fails for me in Eclipse because it still tries to load the word_fi_sv file. I don't see where it still has that registered. Pushing my files to your branch in the hope that my Eclipse is just wedged... |
If this works, then I suspect that updating the res_index.res file did the trick. |
I got it to work locally. You deleted the ICU4C brkitr/fi.txt & sv.txt files, but the repo still had the ICU4J .res versions. So when asked for Finnish word breaks, it found & loaded fi.res which referred to the deleted word_fi_sv.res file. Hopefully this is it. |
Bingo! 🎉 |
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
@markusicu As discussed over virtual tea, you might still want to flip the bytes. |
I regenerated the data and pushed the updated files. When the tests pass, please squash again. Explanation for others: We want to keep the Java data in big-endian format, so that different people generating the data don't flip-flop on no-op data changes, and wonder why what they are doing affects BreakIterator data files. |
…lorings for fi,sv" This reverts commit 49d192f.
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
|
This brings the colon back into MidLetter (with no tailoring on top of the UCD), instead of its inclusion in MidLetter being an fi & sv tailoring.
Checklist