Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SWFv5 CJK text encoding samples (Shift-JIS, EUC-KR, Big5, GB2312) #8390

Open
n0samu opened this issue Oct 26, 2022 · 9 comments
Open

SWFv5 CJK text encoding samples (Shift-JIS, EUC-KR, Big5, GB2312) #8390

n0samu opened this issue Oct 26, 2022 · 9 comments
Labels
bug Something isn't working text Issues relating to text rendering/input

Comments

@n0samu
Copy link
Member

n0samu commented Oct 26, 2022

SWFv5 text is encoded based on the system locale instead of using UTF-8. Ruffle always decodes SWFv5 text as Windows-1252, but this is not accurate behavior for non-English SWFs. When the SWF embeds a font that provides the necessary glyph for a DefineText field, there is still no problem because the glyphs are matched to their respective characters, since both were decoded the same (incorrect) way. But if Ruffle were to support selecting and copying text, this would expose the fact that Ruffle is decoding text incorrectly.

More importantly, when a v5 SWF has a DefineText field without a corresponding font that provides the needed glyphs, Ruffle renders the text using its fake device font, and the garbled text is exposed. Similarly, non-English text in DefineEditText fields displays as garbled mojibake.
image

I am well aware that this situation cannot be improved anytime soon. Even if Ruffle did decode the text correctly, its fake device font does not include the needed characters, so nothing would be displayed at all.

Let's also make a note of Adobe Flash Player's behavior. Normally Flash Player decodes all SWFv5 text using the system codepage and displays it accordingly, so on an English system the result is much the same as in Ruffle. The exception is Shift-JIS; Flash Player is able to detect Shift-JIS text and display it properly, even on an English locale.

Here are sample SWFv5 files with text encoded as Shift-JIS, EUC-KR, Big5, or GB2312:
SWFv5-CJK-samples.zip
I've also exported text from some of the files to give you something to check against.

Viewing and exporting the text
Unfortunately JPEXS Flash Decompiler does not correctly decode SWFv5 text, so exporting it the usual way makes the original text unrecoverable. But I did find a way to export text from v5 DefineEditText fields:

  1. In the normal view, find the text field that you want to export.
  2. Switch to the "Hex dump" view and find the same DefineEditText tag.
  3. Expand it and scroll all the way down to the initialText (string) field.
  4. Right-click it and click "save to file."

Once exported, you can open each file in Notepad++ and select the correct encoding from the menu. Or you can just open the file in your web browser, since they autodetect encoding very accurately.

@n0samu n0samu added the text Issues relating to text rendering/input label Oct 26, 2022
@n0samu
Copy link
Member Author

n0samu commented Dec 6, 2022

Here's another Shift-JIS sample, provided by olux997 on Discord: handy.zip

@n0samu
Copy link
Member Author

n0samu commented Feb 24, 2023

JPEXS has added a feature to set the interpreted charset of SWFv5 (and lower) files. Just right-click the file on the left sidebar and choose "Change charset".

@n0samu
Copy link
Member Author

n0samu commented Feb 24, 2023

Another Shift-JIS sample from #9698: https://github.com/ruffle-rs/ruffle/files/10820434/Para2.zip

@mathewhodson
Copy link
Contributor

Normally Flash Player decodes all SWFv5 text using the system codepage and displays it accordingly, so on an English system the result is much the same as in Ruffle. The exception is Shift-JIS; Flash Player is able to detect Shift-JIS text and display it properly, even on an English locale.

Maybe Ruffle could use https://crates.io/crates/chardetng to detect the encoding. This could work even in non-Shift JIS scenarios instead of using the system codepage. I think that would be an improvement without breaking existing SWF files.

chardetng is used by Firefox to detect the encoding of old HTML pages that assumed a system codepage without specifying it in the document. That seems similar to the problem Ruffle needs to solve.

@Toad06
Copy link
Member

Toad06 commented Apr 21, 2023

It seems UTF-8 should also work in SWFv5 (sample):

trace("âäéèÔ");
trace("-----");
trace("早上好");
trace("-----");
trace("สวัสดีตอนเช้า");

In Flash Player, these strings are displayed as you'd expect, but in Ruffle they're all broken:

âäéèÔ
-----
早上好
-----
สวัสดีตอนเช้า

Mike mentioned a TODO in #2636:

Add option for specifying encoding for SWFv5 files. (Currently defaults to WINDOWS-1252).

@n0samu
Copy link
Member Author

n0samu commented Apr 21, 2023

Huh that's surprising, I can confirm it works even in the Flash 5 authoring software. The SWF outputs the text correctly in Flash Player 32 (though not in the player that comes with Flash 5, as seen in the Output window)
image

On the other hand, I couldn't find any way to make a text field in Flash 5 that displays UTF-8 characters correctly in Flash Player 32, even if the data in the text field is correct (it's still interpreted with the system codepage when played).

@sombraguerrero
Copy link
Contributor

Circling this back to Dora DORA Janken, I recently did some experimentation where I modified master so that the encoding_for_version function just unilaterally returned UTF-8. I also changed the source fonts and edited the tags in the SWF to use a Japanese-supporting font, but to no avail. The attached version of the SWF is unaltered.
djk_0329.zip

@chenxuuu
Copy link

Here is another error example: https://oldswf.com/game/71121
Text is not correct and invisible, but it can copy out.
It shound decode as GB2312(included by GBK), not Windows-1252

ÄãÐÑÀ²£¡Äã½ñÔç´Óº£ÉÏƯµ½Õâ¸ö

Image

@chenxuuu
Copy link

I try to change all Windows-1252 to GBK in project, it works fine for my games.
maybe this need to provide a config param for user?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working text Issues relating to text rendering/input
Projects
None yet
Development

No branches or pull requests

6 participants