[spec] Specify Feature File encoding as UTF-8 #165

brawer · 2017-02-14T09:57:46Z

The current Feature File Syntax does not say what encoding a feature file should have, besides restricting string literals to ASCII.

Proposal: Change the spec to require that feature files use UTF-8 encoding; allow for arbitrary Unicode string literals; and change section 9.e to allow for Unicode strings in nameid statements.

This would preserve backwards compatibility as long as the current escaping mechanism is left unchanged. For example, the following two blocks would produce the exact same output. The former is from the current spec, and the escapes would still continue to be based on platform encodings even after the suggested specification change. However, font designers could also use the new syntax below, and it would be up to the compiler to convert this whatever is needed for the OpenType table.

  table name {
     nameid 9 "Joachim M\00fcller-Lanc\00e9";    # Windows (Unicode)
     nameid 9 1 "Joachim M\9fller-Lanc\8e";      # Macintosh (Mac Roman)
  } name;

  table name {
     nameid 9 "Joachim Müller-Lancé";    # Windows (Unicode)
     nameid 9 1 "Joachim Müller-Lancé";  # Macintosh (Mac Roman)
  } name;

In fonttools/fonttools#780 (comment), @twardoch suggested an extension to feaLib. Personally I like Adam’s idea (minus the complexity of arbitrary encodings; simply requiring UTF-8 would be easier). But I’d prefer to not deviate from the published feature file spec, hence filing this issue here.

The text was updated successfully, but these errors were encountered:

twardoch · 2017-02-14T10:19:51Z

Good point on the encodings. I do recommend keeping the escaping, but also allowing UTF-8.

I would also clarify that usage of any characters outside ASCII would imply UTF-8 (i.e. the conversion to the target encoding would be up to the compiler), and presence of any escapement would imply that the escapes are in the native encoding. If it helps implementers, mixing of non-ASCII and escapes could be disallowed.

mashabow · 2017-02-14T11:33:44Z

This change explicitly allows us to write comments with non-ASCII characters; it is so useful for me 😄

khaledhosny · 2017-02-14T15:12:03Z

Escaping might be OK for the odd accent in an otherwise pure ASCII string, but it is a PITA if you are trying to add name entries for, say, Arabic or Indic and is completely against the general notion that feature files being human readable.

brawer · 2017-02-14T17:40:35Z

By the way, fonttools.feaLib already implements the proposal (somewhat accidentally). For example, this feature file:

table name {
    nameid 7 "Joachim M\00fcller-Lanc\00e9";  # Windows (Unicode)
    nameid 7 1 "Joachim M\9fller-Lanc\8e";    # Macintosh (MacRoman, English)
    nameid 8 "Joachim Müller-Lancé";          # Windows (Unicode)
    nameid 8 1 "Joachim Müller-Lancé";        # Macintosh (MacRoman, English)

    nameid 17 "Jovica Veljovi\0107";          # Windows (Unicode)
    nameid 17 1 0 18 "Jovica Veljovi\e6";     # Macintosh (MacRoman, Croatian)
    nameid 18 "Jovica Veljović";              # Windows (Unicode)
    nameid 18 1 0 18 "Jovica Veljović";       # Macintosh (MacRoman, Croatian)
} name;

gets compiled to the following TTX:

  <name>
    <namerecord nameID="7" platformID="3" platEncID="1" langID="0x409">
      Joachim Müller-Lancé
    </namerecord>
    <namerecord nameID="7" platformID="1" platEncID="0" langID="0x0" unicode="True">
      Joachim Müller-Lancé
    </namerecord>
    <namerecord nameID="8" platformID="3" platEncID="1" langID="0x409">
      Joachim Müller-Lancé
    </namerecord>
    <namerecord nameID="8" platformID="1" platEncID="0" langID="0x0" unicode="True">
      Joachim Müller-Lancé
    </namerecord>
    <namerecord nameID="17" platformID="3" platEncID="1" langID="0x409">
      Jovica Veljović
    </namerecord>
    <namerecord nameID="17" platformID="1" platEncID="0" langID="0x12" unicode="True">
      Jovica Veljović
    </namerecord>
    <namerecord nameID="18" platformID="3" platEncID="1" langID="0x409">
      Jovica Veljović
    </namerecord>
    <namerecord nameID="18" platformID="1" platEncID="0" langID="0x12" unicode="True">
      Jovica Veljović
    </namerecord>
  </name>

brawer · 2017-08-09T09:38:21Z

@readroberts, what do you think about this proposal?

readroberts · 2017-08-09T16:11:49Z

I favor supporting UTF-8. The feature file syntax was developed before UTF-8 was widely supported, but that is hardly the case the case any more - now it is hard to find a text editor that doesn't support it. Given UTF-8 support, the spec certainly needs documentation of what constitutes white-space.

kenlunde · 2017-08-09T16:15:39Z

UTF-8 support is very useful for comments.

About the whitespace topic of Issue #191, I vote for U+0009 and U+0020 as valid "white space" characters in non-comments, with anything else throwing an error. ✨🙈✨🙉✨🙊✨

typemytype · 2018-09-04T18:57:46Z

Is there already a decision made?

comments with utf-8 is, hmm very, modern :)

see adobe-type-tools/afdko#165

readroberts · 2018-09-05T17:48:47Z

I think we are all in favor of specifying the that the feature file encoding should be UTF-8, for both comments and non-comment text. I think the only issue outstanding is what white-space characters should be allowed. I'd favor not restricting this, and leaving it to the developer to make choices useful to them. What are the reasons to restrict what white-space characters to use?

LIXiangChen · 2019-07-11T15:24:27Z

Has there been any progress in this matter? I tried the latest version and still does not support non-ASCII characters. Sad.

khaledhosny · 2020-05-17T00:05:59Z

I gave this a try, and does not seem hard to allow UTF-8 input for Windows name entries. For Mac entries, makeotf does not do Unicode to legacy mac encodings conversion, and expects the input to be in the legacy encoding (escaped characters are not treated as Unicode but as legacy bytes).

So either direct UTF-8 input be disallowed (simplest solution, Mac name IDs are legacy and should not be needed any more), or implement UTF-8 to legacy mac encodings conversion (more work, dubious value). WDYT?

twardoch · 2020-05-17T02:20:02Z

In my view, Unicode strings should be allowed as Unicode strings. E.g. strings that ultimately are UTF-16BE should be expressable as UTF-8 in FEA. But strings that are not Unicode in the targets should be expressable as escaped byte sequences.

khaledhosny · 2020-05-17T02:26:41Z

That would be the 1st option.

To correct my previous comment, AFDKO seems to have conversion tables from Unicode to several legacy mac encodings, but these are used for cmap and not name table.

khaledhosny · 2020-05-17T03:02:44Z

#1133

brawer mentioned this issue Feb 14, 2017

fea writer should escape non-ASCII characters in nameid strings fonttools/fonttools#780

Closed

miguelsousa assigned readroberts Feb 14, 2017

miguelsousa assigned readroberts and unassigned readroberts Jul 15, 2017

jenskutilek mentioned this issue Aug 9, 2017

[spec] Valid whitespace characters in feature files? #191

Open

miguelsousa changed the title ~~Specify Feature File encoding as UTF-8~~ [spec] Specify Feature File encoding as UTF-8 Jul 3, 2018

miguelsousa added the fea spec label Jul 17, 2018

typemytype added a commit to unified-font-object/ufoLib that referenced this issue Sep 4, 2018

read fea files with encoding utf-8

cffbfee

see adobe-type-tools/afdko#165

typemytype mentioned this issue Sep 4, 2018

read fea files with encoding utf-8 unified-font-object/ufoLib#169

Merged

miguelsousa mentioned this issue Dec 15, 2018

Is there any plan to make the features file support non-ASCII chars? #696

Closed

khaledhosny mentioned this issue Jul 16, 2019

[makeotf] UnicodeDecodeError: 'ascii' codec can't decode byte #847

Closed

cjchapman unassigned readroberts Oct 2, 2019

khaledhosny mentioned this issue May 17, 2020

Allow UTF-8 input for name entries #1133

Merged

josh-hadley closed this as completed in #1133 May 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[spec] Specify Feature File encoding as UTF-8 #165

[spec] Specify Feature File encoding as UTF-8 #165

brawer commented Feb 14, 2017

twardoch commented Feb 14, 2017

mashabow commented Feb 14, 2017

khaledhosny commented Feb 14, 2017

brawer commented Feb 14, 2017

brawer commented Aug 9, 2017

readroberts commented Aug 9, 2017

kenlunde commented Aug 9, 2017

typemytype commented Sep 4, 2018

readroberts commented Sep 5, 2018

LIXiangChen commented Jul 11, 2019

khaledhosny commented May 17, 2020

twardoch commented May 17, 2020 •

edited

Loading

khaledhosny commented May 17, 2020

khaledhosny commented May 17, 2020

[spec] Specify Feature File encoding as UTF-8 #165

[spec] Specify Feature File encoding as UTF-8 #165

Comments

brawer commented Feb 14, 2017

twardoch commented Feb 14, 2017

mashabow commented Feb 14, 2017

khaledhosny commented Feb 14, 2017

brawer commented Feb 14, 2017

brawer commented Aug 9, 2017

readroberts commented Aug 9, 2017

kenlunde commented Aug 9, 2017

typemytype commented Sep 4, 2018

readroberts commented Sep 5, 2018

LIXiangChen commented Jul 11, 2019

khaledhosny commented May 17, 2020

twardoch commented May 17, 2020 • edited Loading

khaledhosny commented May 17, 2020

khaledhosny commented May 17, 2020

twardoch commented May 17, 2020 •

edited

Loading