Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Japanese charsets #2129

Closed
dhow opened this issue May 21, 2024 · 9 comments · Fixed by #2135
Closed

Support Japanese charsets #2129

dhow opened this issue May 21, 2024 · 9 comments · Fixed by #2135
Labels
Milestone

Comments

@dhow
Copy link

dhow commented May 21, 2024

Summary

I noticed that when AUTO_RESOLVE_TITLES is true I get th[1] error when I create short links from Japanese pages that use Japanese specific encoding such as shift_jis. I noticed that this doesn't happen on UTF-8 pages. It would be great if we can support Japanese encoding, in specific, the two Japanese traditional encodings[2].

Version: 4.1.0 / docker image: shlinkio/shlink@sha256:6d0fb5fb169bf76940d2c7c628aac78ea2dfbbd37e897c456c1ff198a6185f64
DB: MariaDB 11.3.2-MariaDB-1:11.3.2 / docker image: sha256:f0a6faee9d0e55492f238d1d11ff13d77616ea12d8c38bedf090da2ee05532be

[1] Error

$ shlink short-url:create "https://www.sony.jp/ichigan/products/SEL85F18/"

In ExceptionConverter.php line 90:
  An exception occurred while executing a query: SQLSTATE[22007]: Invalid datetime format: 1366 Incorrect string value: '\x83f\x83W\x83^...' f or column `shlink`.`short_urls`.`title` at row 1

In Exception.php line 28:
  SQLSTATE[22007]: Invalid datetime format: 1366 Incorrect string value: '\x83f\x83W\x83^...' for column `shlink`.`short_urls`.`title` at row 1

In Statement.php line 55:
  SQLSTATE[22007]: Invalid datetime format: 1366 Incorrect string value: '\x83f\x83W\x83^...' for column `shlink`.`short_urls`.`title` at row 1

[2] Two most widely used Japanese traditional encoding

  • EUC-JP
  • Shift_JIS

Use case

Many users who create links from pages using Japanese encoding will benefit from this.

@dhow
Copy link
Author

dhow commented May 21, 2024

P.S.: I was unable to trace where the logic happens when auto_resolve_titles is true, I could try to implement this if someone could show me the file / line of the fetching title code. :)

@acelaya
Copy link
Member

acelaya commented May 21, 2024

Yep, I can reproduce this, but this is more a bug than a feature request, as it should work.

The error is pretty strange. It's like it's trying to parse the vale as a date somehow? 😅

I'll investigate.

@acelaya acelaya added bug and removed feature labels May 21, 2024
@dhow
Copy link
Author

dhow commented May 21, 2024

Hello @acelaya -- you might already find the solution but I have feeling it's related to this answer.

@acelaya
Copy link
Member

acelaya commented May 22, 2024

What I can confirm is that, if you manually pass the title with the same value that's set in https://www.sony.jp/ichigan/products/SEL85F18/ title, then it works, so the problem is happening when "reading" the page.

image

@acelaya
Copy link
Member

acelaya commented May 22, 2024

Yep, the title needs to be re-encoded. If I apply mb_convert_encoding($title, 'utf8', 'shift_jis') it works, so I guess the fix would require reading the <meta charset="??" /> from the page and using it.

@acelaya acelaya added this to the 4.1.1 milestone May 22, 2024
@acelaya acelaya moved this to Todo in Shlink May 22, 2024
@dhow
Copy link
Author

dhow commented May 22, 2024

Yes @acelaya that's the same way I'd approach this problem -- reading charset in header and use that to feed mb_ functions is the way to go. If you manage to fix it, please share your commits, I'd like to learn. Thanks!!

@acelaya
Copy link
Member

acelaya commented May 22, 2024

#2135 should do the trick.

@acelaya acelaya moved this from Todo to In review in Shlink May 22, 2024
@github-project-automation github-project-automation bot moved this from In review to Done in Shlink May 22, 2024
@acelaya
Copy link
Member

acelaya commented May 23, 2024

I have just released Shlink 4.1.1 which includes this fix.

@dhow
Copy link
Author

dhow commented May 24, 2024

I just confirmed this in 4.1.1 docker -- working perfectly! Thanks so much @acelaya !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants