Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Grapheme#each_char and #each_byte #11605

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

straight-shoota
Copy link
Member

This patch adds two iteration methods to String::Grapheme. They delegate to the respective implementations of Char and String.

@asterite
Copy link
Member

I'm starting to think that instead of Grapheme, we should have maybe returned a String as each grapheme, even if this is not the most efficient thing. Maybe we could have an array of strings for ascii chars and use those so we don't create chars for those. The thing is, it looks like Grapheme should behave exactly the same as a String. In fact, in Elixir there's no grapheme concept: you always get strings back.

@straight-shoota
Copy link
Member Author

straight-shoota commented Dec 17, 2021

A table for ASCII characters won't help for non-ASCII characters. But I agree that the data format for graphemes is not entirely finalized. So I'd suggest to continue this discussion in a separate issue. And for now we can add the Experimental annotation to the grapheme API, so we can refactor it in one of the next releases if we want to (=> #11611).

@@ -19,6 +20,20 @@ describe String::Grapheme do
end.should eq "f"
end

describe "#each_char" do
it_iterates "string", ['f', 'o', 'o'], String::Grapheme.new("foo").each_char
it_iterates "string", ['🙂', '🙂'], String::Grapheme.new("🙂🙂").each_char
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a bit obsessive, but IMO it's missing two cases: an empty string, and (a) grapheme(s) with more than one codepoint.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A grapheme consisting of an empty string should be impossible to acquire. I don't think it's necessary to test that. I would see that as an invalid state with undefined behaviour.
These two highlighted examples have more than one codepoint. Maybe it's a bit confusing that they are not actual graphemes. I chose sequences of codepoints that don't form an actual grapheme for demonstration purposes. This is unrelated to the grapheme breaker algorithm, so it doesn't matter. Let me know if you think I should use actual graphemes instead.

Comment on lines 31 to 34
it_iterates "string", ['f', 'o', 'o'], String::Grapheme.new("foo").each_char
it_iterates "string", ['🙂', '🙂'], String::Grapheme.new("🙂🙂").each_char
it_iterates "char", ['f'], String::Grapheme.new('f').each_char
it_iterates "char", ['🙂'], String::Grapheme.new('🙂').each_char
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are copy-pasted from above, testing each_char

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants