Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create extension methods for common BreakIterator operations #22

Open
NightOwl888 opened this issue Oct 12, 2019 · 1 comment
Open

Create extension methods for common BreakIterator operations #22

NightOwl888 opened this issue Oct 12, 2019 · 1 comment
Labels
design is:enhancement New feature or request is:feature is:idea pri:normal up for grabs This issue is open to be worked on by anyone
Milestone

Comments

@NightOwl888
Copy link
Owner

While BreakIterator provides great low-level functionality for iterating forward and backward through breaks, it would be great if there were a simple way to do forward-only operations on string, StringBuilder, and char[].

IEnumerable<int> wordBreaks = theString.ToWordBreaks();
foreach (var break in wordBreaks)
{
    // consume
}

Or

IEnumerable<int> sentenceBreaks = theString.ToSentenceBreaks(new CultureInfo("th"));
foreach (var break in sentenceBreaks)
{
    // consume
}

We would ideally create a different extension method (with overloads for optional culture) for all 4 modes:

  1. Word
  2. Sentence
  3. Line
  4. Character

We could then expand on this to do a higher level operation, such as providing an IEnumerable<string> that would tokenize the text so it can be iterated with a foreach loop.

foreach (var word in theText.ToWords(new CultureInfo("th-th")))
{
   // consume each word
}

Some thought needs to be given to thread safety, since BreakIterator requires a separate clone for each thread.

@NightOwl888 NightOwl888 added is:enhancement New feature or request up for grabs This issue is open to be worked on by anyone is:idea pri:normal design is:feature labels Oct 12, 2019
@NightOwl888
Copy link
Owner Author

After an attempt was done on this, it is more complicated than was first envisioned because the definition of what qualifies as a "word" could vary. Need to rethink the approach.

@NightOwl888 NightOwl888 added this to the Future milestone Mar 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design is:enhancement New feature or request is:feature is:idea pri:normal up for grabs This issue is open to be worked on by anyone
Projects
None yet
Development

No branches or pull requests

1 participant