Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word segmentation is incorrect #5015

Open
robertbastian opened this issue Jun 6, 2024 · 2 comments
Open

Word segmentation is incorrect #5015

robertbastian opened this issue Jun 6, 2024 · 2 comments
Assignees

Comments

@robertbastian
Copy link
Member

robertbastian commented Jun 6, 2024

WB3c and WB3c interact in the same way LB8a and LB9 do.
A correct implementation of that would require either duplicating every state as in #4389, or hoisting the two rules into the logic as in #5001.

The latter seems more attractive, both for data size and sanity of the maintainer; note that since rule_segmenter.rs is shared with extended grapheme cluster and sentence breaking, this will require passing a flag for that logic.

@sffc
Copy link
Member

sffc commented Sep 17, 2024

@eggrobin What is left on this issue?

@eggrobin
Copy link
Member

What is left on this issue?

All of it? It was created to allow us to close the specific issue reported in #4417, but word segmentation is still wrong and hasn’t changed since this was filed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

3 participants