-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[spec] Specify Feature File encoding as UTF-8 #165
Comments
Good point on the encodings. I do recommend keeping the escaping, but also allowing UTF-8. I would also clarify that usage of any characters outside ASCII would imply UTF-8 (i.e. the conversion to the target encoding would be up to the compiler), and presence of any escapement would imply that the escapes are in the native encoding. If it helps implementers, mixing of non-ASCII and escapes could be disallowed. |
This change explicitly allows us to write comments with non-ASCII characters; it is so useful for me 😄 |
Escaping might be OK for the odd accent in an otherwise pure ASCII string, but it is a PITA if you are trying to add name entries for, say, Arabic or Indic and is completely against the general notion that feature files being human readable. |
By the way, fonttools.feaLib already implements the proposal (somewhat accidentally). For example, this feature file:
gets compiled to the following TTX: <name>
<namerecord nameID="7" platformID="3" platEncID="1" langID="0x409">
Joachim Müller-Lancé
</namerecord>
<namerecord nameID="7" platformID="1" platEncID="0" langID="0x0" unicode="True">
Joachim Müller-Lancé
</namerecord>
<namerecord nameID="8" platformID="3" platEncID="1" langID="0x409">
Joachim Müller-Lancé
</namerecord>
<namerecord nameID="8" platformID="1" platEncID="0" langID="0x0" unicode="True">
Joachim Müller-Lancé
</namerecord>
<namerecord nameID="17" platformID="3" platEncID="1" langID="0x409">
Jovica Veljović
</namerecord>
<namerecord nameID="17" platformID="1" platEncID="0" langID="0x12" unicode="True">
Jovica Veljović
</namerecord>
<namerecord nameID="18" platformID="3" platEncID="1" langID="0x409">
Jovica Veljović
</namerecord>
<namerecord nameID="18" platformID="1" platEncID="0" langID="0x12" unicode="True">
Jovica Veljović
</namerecord>
</name>
|
@readroberts, what do you think about this proposal? |
I favor supporting UTF-8. The feature file syntax was developed before UTF-8 was widely supported, but that is hardly the case the case any more - now it is hard to find a text editor that doesn't support it. Given UTF-8 support, the spec certainly needs documentation of what constitutes white-space. |
UTF-8 support is very useful for comments. About the whitespace topic of Issue #191, I vote for U+0009 and U+0020 as valid "white space" characters in non-comments, with anything else throwing an error. ✨🙈✨🙉✨🙊✨ |
Is there already a decision made? comments with utf-8 is, hmm very, modern :) |
I think we are all in favor of specifying the that the feature file encoding should be UTF-8, for both comments and non-comment text. I think the only issue outstanding is what white-space characters should be allowed. I'd favor not restricting this, and leaving it to the developer to make choices useful to them. What are the reasons to restrict what white-space characters to use? |
Has there been any progress in this matter? I tried the latest version and still does not support non-ASCII characters. Sad. |
I gave this a try, and does not seem hard to allow UTF-8 input for Windows name entries. For Mac entries, makeotf does not do Unicode to legacy mac encodings conversion, and expects the input to be in the legacy encoding (escaped characters are not treated as Unicode but as legacy bytes). So either direct UTF-8 input be disallowed (simplest solution, Mac name IDs are legacy and should not be needed any more), or implement UTF-8 to legacy mac encodings conversion (more work, dubious value). WDYT? |
In my view, Unicode strings should be allowed as Unicode strings. E.g. strings that ultimately are UTF-16BE should be expressable as UTF-8 in FEA. But strings that are not Unicode in the targets should be expressable as escaped byte sequences. |
That would be the 1st option. To correct my previous comment, AFDKO seems to have conversion tables from Unicode to several legacy mac encodings, but these are used for |
The current Feature File Syntax does not say what encoding a feature file should have, besides restricting string literals to ASCII.
Proposal: Change the spec to require that feature files use UTF-8 encoding; allow for arbitrary Unicode string literals; and change section 9.e to allow for Unicode strings in
nameid
statements.This would preserve backwards compatibility as long as the current escaping mechanism is left unchanged. For example, the following two blocks would produce the exact same output. The former is from the current spec, and the escapes would still continue to be based on platform encodings even after the suggested specification change. However, font designers could also use the new syntax below, and it would be up to the compiler to convert this whatever is needed for the OpenType table.
In fonttools/fonttools#780 (comment), @twardoch suggested an extension to feaLib. Personally I like Adam’s idea (minus the complexity of arbitrary encodings; simply requiring UTF-8 would be easier). But I’d prefer to not deviate from the published feature file spec, hence filing this issue here.
The text was updated successfully, but these errors were encountered: