-
-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
U+0670 superscript alef should be written with horizontal spacing when input after fathah #217
Comments
Thanks for the quick reply. The main use that I see in the Quran is that superscript alef is used
It seems to me that both usages will be supported if the sequence U+064E->U+0670 offsets the superscript alef horizontally without affecting letter joining. Otherwise if U+0670 is input without U+064E then it may be placed vertically on top of the letter. Calibri Arabic handles it this way as far as I can see. Unless there is other usage besides the above two which won't be handled? Thanks again. |
The problem is that depending on the presence of fatha is a hack and goes against Unicode making the small alef a combining mark. Using a charcters as a seat is more reliable (you can have |
Thanks again for the detailed explanation. I happily take your word on it. However, do you think it will be harmless to add the fatha+dagger alef hack (in addition to explicit input over tatweel/nbsp) as it will separate character input from glyph typesetting, which, as I understand, is an underlying tenet of Unicode philosophy. It will also make the behavior similar to how Amiri deals with inline hamza in words like خطيءة. Amiri correctly joins the ي with the ة, unlike most other fonts, which require superscript hamza over a tatweel. |
I think the use of tatweel and no-break space was proposed in L2/09-358 and there is a UTC action item 139-A60 for a formal proposal. The Unicode 13.0.0 chapter 9 doesn't mention this use of tatweel or no-break space but this is similar to the use of hamza above on tatweel. |
I already had this before (even before Calibri Arabic was design) but I removed it for the reasons above, and I’d rather people followed a standard way to encode this sequence (with reasonable fallback for fonts that don’t handle it nicely) rather than depend on font-specific hacks. I’d have preferred a more semantic way to encode this, but Unicode seems to be reluctant (the cleanest way would be a separate character, and I encourage you to work on a proposal to Unicode if you feel sstrongly enough about this issue).
This is also another non-standard feature of Amiri that I wish to drop at some point for the exact same reasons.
Thanks @moyogo for the links. |
Thank you @khaledhosny and @moyogo .
There are a couple of Unicode documents by Thomas Milo discussing this issue:
There are rare cases in non-Quranic script where superscript hamzah over a tatweel character will not suffice. For example, لَءَّال la22aal (pearl-seller) will break the mandatory lam-alef ligature if written with tatweel: لَـَّٔال. It seems a complicated situation that you are definitely more qualified to address. As a user, however, semantic encoding is quite nice to have. Thanks for discussing. |
@khaledhosny @moyogo I hope it’s ok if I re-open this discussion a bit. I appreciate your point about not wanting to have a font-specific hack. Doing some research, I found this description of U+034F ͏COMBINING GRAPHEME JOINER (CGJ): https://en.wikipedia.org/wiki/Combining_Grapheme_Joiner The discussion on the rendering of Hebrew diacritics seems quite relevant. Could we use CGJ in the case of dagger alef and hamza? Here is how it could potentially be used: Dagger alef:This way one common method can be used for both joining characters and non-joining characters (dal, thal, waw, etc.). Instead of using tatweel for joining characters and NBSP for non-joining characters. Also, we are not relying on the presence of fatha to determine whether to horizontally offset the dagger. (I now appeciate your point about wanting to have هـٰذا displayed without a fatha on the heh.) Floating hamzaThe implementation for hamza is a bit muddier since, uni0621 standalone hamza is now expected by users to break the joining of characters. But one possible method could be to use CGJ with uni0654 “hamza above”. If CGJ comes before uni0654 then it will appear above the baseline without affecting the joining of the previous character to the next character.
If you think this idea has merit, I can try creating a formal proposal. Please let me know what you think, as I'm only a user and haven't studied Unicode development in detail. Thank you. |
Using CGJ is not a bad idea. I don't personally care much what method should be used, all I care about is standardized way that can represent the text reliably. Any solution can be made to produce the same output by the font. |
I've written a draft proposal here: https://github.com/adamiturabi/arabic-inline-unicode/blob/main/index.pdf I'd appreciate it if you could take a look. Also, if you could mention it to others who might be interested in this implementation and who might be able to give it some traction. Thanks. |
Looks good. Few comments:
|
Thank you. I've incorporated your feedback. You can see the diffs here: adamiturabi/arabic-inline-unicode@7a5179d The updated PDF is in the same location: https://github.com/adamiturabi/arabic-inline-unicode/blob/main/index.pdf Regarding your last point, I wasn't sure exactly what you meant by making a comparison. Because we are not proposing a separate encoding for breaking dagger alef. But according to the CGJ scheme, the breaking "small waw" and "small yeh" won't technically be needed any more. So I've mentioned that. Also, attempting to tag @roozbehp here. |
What is the question for me? |
@roozbehp Thanks for responding. I see that you have some prior work regarding proposing the handling of Arabic inline characters:
I've written a document on this issue and a proposed solution, matching one in L2/09-358R, here: https://github.com/adamiturabi/arabic-inline-unicode/blob/main/index.pdf It will be great if you can provide feedback and recommend how to proceed w.r.t. proposing a solution to Unicode. |
When U+0670 superscript alef is input after a fathah, I believe it should be written with horizontal spacing after the fathah. This image should illustrate what I mean:
However, even in the "better image" the vertical positioning is not correct. It is too high in ذلك and too low in هذا. (I faked it with U+0202f in the former and tatweel in the latter.)
Thank you for your continued support to this great typeface.
The text was updated successfully, but these errors were encountered: