-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Garbled textual metadata #365
Comments
Could you share a file that exhibits the problem? |
Thank you for the test case. Was the ID3 tag in the file generated using LAME also? The problem seems to be one of text encoding. While ID3v1 tags use the 8859-1 charset (although sometimes the machine's local encoding is used, such as Windows-1251 which appears to be the correct encoding for this particular ID3v1 tag), ID3v2 uses UTF-8. It seems the ID3v2 tag in this file is not encoded using UTF-8 but rather a different character set, most likely Windows-1251, the same as the ID3v1 tag. Take the "word" Þðèé (should be Юрий) from the
The octet values interpreted using UTF-8 give Þðèé while using Windows-1251 give Юрий. So it seems that the text in both the ID3v1 and ID3v2 tags in this file is incorrectly encoded. |
It's strange when Apple's native player easily recognizes text, just like Google Translate I'll try to search for similar files and let you know the result |
That is interesting. I will take a closer look at the file's tag to make sure it is being handled correctly. I've heard of charset detection for ID3v1 tags but for ID3v2 I don't think there should be any guessing involved. |
Apple Music says it's version 3 I also ran it through several libraries and they all say that it is version 3 |
It is an ID3v2.3 tag. The
It's possible that Music runs text reported as ISO 8859-1 through a character detection library. Based on the ID3v2 tag itself, TagLib (the metadata library used by SFBAudioEngine) is interpreting the data correctly. It shouldn't be terribly hard to wrap uchardet to add the option for character set detection for ID3v1 or ID3v2 tags using ISO 8559-1 but I haven't investigated what it would entail. |
I have the following question. |
Algorithms for character set detection are something I know little to nothing about. Perhaps an educated guess is made based on a frequency analysis of octets in the input? For the file that you shared it should be possible to feed the C strings from the metadata to uchardet or a similar library and see what it comes back with, and then use iconv to convert to UTF-8. |
Hello
My file contains LAME3.93 encoding
How can I get metadata through your library, without hieroglyphs
I get the following data: "Ñòî ÷àñîâ" and "Þðèé Ëîçà"
The text was updated successfully, but these errors were encountered: