-
-
Notifications
You must be signed in to change notification settings - Fork 10
Encoding
For a computer to store text, it must know how to encode it, i.e. how to represent its symbols as bits and bytes. UTF-8 is a great encoding that can represent a vast number of symbols from all kinds of languages. Other encodings, like Windows code pages, are far more limited and are only suitable for a small family of languages.
Unfortunately, USDB accepts and returns UTF-8, but internally stores text with the ANSI encoding (Windows code page 1252). This is sufficient for Western European languages, but causes problems with songs in other languages. The following sections describe how to avoid these problems as far as possible.
Txts from USDB are UTF-8-encoded, but can still only contain symbols from code page 1252. Symbols from other code pages are replaced with their counterparts from code page 1252. E.g. ł
will be displayed as ³
. To restore them you have to convert the file to ANSI encoding, then reopen the file with encoding set to the correct code page for the used language. E.g. for Polish this would be code page 1250. You can then convert the txt to UTF-8 again, so UltraStar Deluxe displays all symbols correctly.
Note: The USDB Syncer does this automatically!
This does not work for songs which were uploaded to USDB as UTF-8. In these songs, symbols have been replaced with ?
and will have to be fixed manually. Also, some creators may have avoided symbols from other code pages altogether, e.g. replacing ł
with l
. These replacements cannot be reverted automatically, either.
If you have a txt file with symbols from outside code page 1252, you should not upload it as UTF-8, which would irreversibly replace them with ?
. Instead, you should upload the file encoded with the correct local encoding, e.g. code page 1253 for Greek. The wrong symbols will show up on USDB, but these replacements are reversible after downloading.
A small number of symbols do not have equivalents on code page 1252 and will get lost. E.g. if you compare code page 1252 and code page 1250, you can see that the bytes 8D, 8F and 9D are missing from the former, but represent Ť
, Ź
and ť
on the latter. Ideally, you should replace these unrepresentable symbols with close approximates like T
, Z
and t
before the upload.