-
-
Notifications
You must be signed in to change notification settings - Fork 809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SWFv5 CJK text encoding samples (Shift-JIS, EUC-KR, Big5, GB2312) #8390
Comments
Here's another Shift-JIS sample, provided by olux997 on Discord: handy.zip |
JPEXS has added a feature to set the interpreted charset of SWFv5 (and lower) files. Just right-click the file on the left sidebar and choose "Change charset". |
Another Shift-JIS sample from #9698: https://github.com/ruffle-rs/ruffle/files/10820434/Para2.zip |
Maybe Ruffle could use https://crates.io/crates/chardetng to detect the encoding. This could work even in non-Shift JIS scenarios instead of using the system codepage. I think that would be an improvement without breaking existing SWF files. chardetng is used by Firefox to detect the encoding of old HTML pages that assumed a system codepage without specifying it in the document. That seems similar to the problem Ruffle needs to solve. |
It seems UTF-8 should also work in SWFv5 (sample):
In Flash Player, these strings are displayed as you'd expect, but in Ruffle they're all broken:
Mike mentioned a TODO in #2636:
|
Circling this back to Dora DORA Janken, I recently did some experimentation where I modified master so that the encoding_for_version function just unilaterally returned UTF-8. I also changed the source fonts and edited the tags in the SWF to use a Japanese-supporting font, but to no avail. The attached version of the SWF is unaltered. |
Here is another error example: https://oldswf.com/game/71121
|
I try to change all Windows-1252 to GBK in project, it works fine for my games. |
SWFv5 text is encoded based on the system locale instead of using UTF-8. Ruffle always decodes SWFv5 text as Windows-1252, but this is not accurate behavior for non-English SWFs. When the SWF embeds a font that provides the necessary glyph for a DefineText field, there is still no problem because the glyphs are matched to their respective characters, since both were decoded the same (incorrect) way. But if Ruffle were to support selecting and copying text, this would expose the fact that Ruffle is decoding text incorrectly.
More importantly, when a v5 SWF has a DefineText field without a corresponding font that provides the needed glyphs, Ruffle renders the text using its fake device font, and the garbled text is exposed. Similarly, non-English text in DefineEditText fields displays as garbled mojibake.
I am well aware that this situation cannot be improved anytime soon. Even if Ruffle did decode the text correctly, its fake device font does not include the needed characters, so nothing would be displayed at all.
Let's also make a note of Adobe Flash Player's behavior. Normally Flash Player decodes all SWFv5 text using the system codepage and displays it accordingly, so on an English system the result is much the same as in Ruffle. The exception is Shift-JIS; Flash Player is able to detect Shift-JIS text and display it properly, even on an English locale.
Here are sample SWFv5 files with text encoded as Shift-JIS, EUC-KR, Big5, or GB2312:
SWFv5-CJK-samples.zip
I've also exported text from some of the files to give you something to check against.
Viewing and exporting the text
Unfortunately JPEXS Flash Decompiler does not correctly decode SWFv5 text, so exporting it the usual way makes the original text unrecoverable. But I did find a way to export text from v5 DefineEditText fields:
Once exported, you can open each file in Notepad++ and select the correct encoding from the menu. Or you can just open the file in your web browser, since they autodetect encoding very accurately.
The text was updated successfully, but these errors were encountered: