Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICU-22511 collator hang in comparison #2673

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

FrankYFTang
Copy link
Contributor

@FrankYFTang FrankYFTang commented Oct 17, 2023

Avoid infinity loop by terminating and returning error if we loop the same for too many times.

Checklist
  • Required: Issue filed: https://unicode-org.atlassian.net/browse/ICU-22511
  • Required: The PR title must be prefixed with a JIRA Issue number.
  • Required: The PR description must include the link to the Jira Issue, for example by completing the URL in the first checklist item
  • Required: Each commit message must be prefixed with a JIRA Issue number.
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable

@FrankYFTang
Copy link
Contributor Author

to test java run
mvn verify -Dtest="com.ibm.icu.dev.test.collator.CollationTest" -Dsurefire.failIfNoSpecifiedTests=false -ff

(Googlers- make sure you did the /etc/group trick on your machine to make the build faster first)

@FrankYFTang FrankYFTang force-pushed the ICU22511-infinityloop branch from 83ee0ed to ce0c676 Compare October 18, 2023 21:31
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/i18n/collationcompare.cpp is now changed in the branch
  • icu4c/source/test/intltest/collationtest.cpp is different
  • icu4j/main/collate/src/main/java/com/ibm/icu/impl/coll/CollationCompare.java is now changed in the branch
  • icu4j/main/collate/src/test/java/com/ibm/icu/dev/test/collator/CollationTest.java is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@FrankYFTang FrankYFTang requested a review from markusicu October 18, 2023 21:32
@markusicu markusicu self-assigned this Oct 19, 2023
FrankYFTang added a commit to FrankYFTang/icu that referenced this pull request Nov 14, 2023
@FrankYFTang FrankYFTang force-pushed the ICU22511-infinityloop branch from ce0c676 to f86c6f7 Compare November 14, 2023 01:08
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@FrankYFTang FrankYFTang changed the title ICU-22511 Test "vi" locale collator hang in comparison ICU-22511 Fix "vi" locale collator hang in comparison Nov 14, 2023
@FrankYFTang
Copy link
Contributor Author

@markusicu friendly ping. this bug was filed Nov 2, 2023 by fuzzer and and the PR is ready for review for a week.

@macchiati
Copy link
Member

I think both are out of office for Thanksgiving week.

@macchiati
Copy link
Member

I took a quick look at the code, and I am concerned. It appears to return as equal whenever three times in a row, a primary is equal. But then we would get the clearly incorrect:

aaah = aaagh

Each pass through the main loop should be resetting either the right or left CEs, or both. Because getting a ce advances an internal pointer, that should always terminate. So something fishy is going on.

I suggest that you augment your printout to provide more readable information, since lines like the following are too hard to follow:
ce from left 0xa4000500
ce from left 0x2a00000005009c00
...

I suggest when you print, break down the left and right ce values in to primary, secondary, tertiary, and put tabs between the fields so that this can be loaded into a spreadsheet.
Eg something like:
...
Lp 2a0000 Ls 0500 Lt 0042
...

I can't remember, but you make also be able to fetch a code point offset from the ce; if so it would help to include that in the printout.

@FrankYFTang
Copy link
Contributor Author

sorry, I assume you look at the Java code first. My java change is indeed wrong. I should throw there not return equal.

@FrankYFTang FrankYFTang force-pushed the ICU22511-infinityloop branch from f86c6f7 to 1c70aa0 Compare November 21, 2023 00:46
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4j/main/collate/src/main/java/com/ibm/icu/impl/coll/CollationCompare.java is different
  • icu4j/main/collate/src/test/java/com/ibm/icu/dev/test/collator/CollationTest.java is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@macchiati
Copy link
Member

Thanks. I'm still a bit worried about the change, because of the aaah case, so I'd appreciate a printout for the test case along the lines I suggested.

@richgillam
Copy link
Contributor

I looked at this and didn't understand what it was doing or why. I think I'll just defer to Mark here.

@FrankYFTang
Copy link
Contributor Author

@markusicu ping

@FrankYFTang
Copy link
Contributor Author

ping @markusicu . this is more urgent than other PR because it is now able to hang v8 with very simple script (see the bug for details)

@FrankYFTang
Copy link
Contributor Author

Somehow the bug is fixed between
788b893...699fb1d

per fuzzer. Obsolete this PR

@FrankYFTang
Copy link
Contributor Author

FrankYFTang commented May 28, 2024

reopen
ok, the fuzzer is verify the fixed just because we increase the number of supported locales and change the testing locale to a different one. This mean later it will find another data to cause the same parameter to call into ICU and reproduce the problem later. The problem is not really fixed. If we try the tests in this PR it is still broken.

@FrankYFTang FrankYFTang reopened this May 28, 2024
@FrankYFTang FrankYFTang force-pushed the ICU22511-infinityloop branch from 1c70aa0 to 36ad138 Compare June 25, 2024 21:13
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/test/fuzzer/collator_compare_fuzzer.cpp is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@FrankYFTang FrankYFTang changed the title ICU-22511 Fix "vi" locale collator hang in comparison ICU-22511 collator hang in comparison Jun 26, 2024
@FrankYFTang
Copy link
Contributor Author

The fix is not good enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants