-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Grapheme#each_char
and #each_byte
#11605
base: master
Are you sure you want to change the base?
Add Grapheme#each_char
and #each_byte
#11605
Conversation
I'm starting to think that instead of |
A table for ASCII characters won't help for non-ASCII characters. But I agree that the data format for graphemes is not entirely finalized. So I'd suggest to continue this discussion in a separate issue. And for now we can add the |
@@ -19,6 +20,20 @@ describe String::Grapheme do | |||
end.should eq "f" | |||
end | |||
|
|||
describe "#each_char" do | |||
it_iterates "string", ['f', 'o', 'o'], String::Grapheme.new("foo").each_char | |||
it_iterates "string", ['🙂', '🙂'], String::Grapheme.new("🙂🙂").each_char |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a bit obsessive, but IMO it's missing two cases: an empty string, and (a) grapheme(s) with more than one codepoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A grapheme consisting of an empty string should be impossible to acquire. I don't think it's necessary to test that. I would see that as an invalid state with undefined behaviour.
These two highlighted examples have more than one codepoint. Maybe it's a bit confusing that they are not actual graphemes. I chose sequences of codepoints that don't form an actual grapheme for demonstration purposes. This is unrelated to the grapheme breaker algorithm, so it doesn't matter. Let me know if you think I should use actual graphemes instead.
spec/std/string/grapheme_spec.cr
Outdated
it_iterates "string", ['f', 'o', 'o'], String::Grapheme.new("foo").each_char | ||
it_iterates "string", ['🙂', '🙂'], String::Grapheme.new("🙂🙂").each_char | ||
it_iterates "char", ['f'], String::Grapheme.new('f').each_char | ||
it_iterates "char", ['🙂'], String::Grapheme.new('🙂').each_char |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are copy-pasted from above, testing each_char
This patch adds two iteration methods to
String::Grapheme
. They delegate to the respective implementations ofChar
andString
.