Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot upload Chinese books with long title #2309

Closed
kfstorm opened this issue Feb 9, 2022 · 2 comments
Closed

Cannot upload Chinese books with long title #2309

kfstorm opened this issue Feb 9, 2022 · 2 comments

Comments

@kfstorm
Copy link

kfstorm commented Feb 9, 2022

Describe the bug/problem

I see below errors at the top of the page when I try to upload a book with Chinese title.

Failed to Move Cover File /media/[英]乔安·弗莱彻/埃及四千年(破解四千年王朝兴衰秘密的里程碑式巨作,BBC古埃及历史纪录片原著,《出版人周刊》《科克斯书评》《图书馆杂志》2016年度最佳图书,《华盛顿邮报》《卫报》等30家知名媒体联名推荐.)/cover.jpg: [Errno 36] File name too long: '/media/[英]乔安·弗莱彻/埃及四千年(破解四千年王朝兴衰秘密的里程碑式巨作,BBC古埃及历史纪录片原著,《出版人周刊》《科克斯书评》《图书馆杂志》2016年度最佳图书,《华盛顿邮报》《卫报》等30家知名媒体联名推荐.)/cover.jpg'

Rename title from: '/tmp/calibre_web/f55b36cf44482f28ec2c90c7b809715c' to '/media/[英]乔安·弗莱彻/埃及四千年(破解四千年王朝兴衰秘密的里程碑式巨作,BBC古埃及历史纪录片原著,《出版人周刊》《科克斯书评》《图书馆杂志》2016年度最佳图书,《华盛顿邮报》《卫报》等30家知名媒体联名推荐.) (12)' failed with error: [Errno 36] File name too long: '/media/[英]乔安·弗莱彻/埃及四千年(破解四千年王朝兴衰秘密的里程碑式巨作,BBC古埃及历史纪录片原著,《出版人周刊》《科克斯书评》《图书馆杂志》2016年度最佳图书,《华盛顿邮报》《卫报》等30家知名媒体联名推荐.) (12)'

If the title of a book is mostly consisted of Chinese characters, and the title is really long, the book will fail to upload due to the filename length limit from the file system.

For ext4 file system, the filename limit is 255 bytes according to https://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits. Most linux users encode filenames with UTF-8 encoding, so a Chinese character results in 3 bytes. So the maximum title length allowed for a Chinese book is 255/3 = 85 characters (on ext4 and some other file systems).

I've found that 1e04b51 limites the filename to 96 characters. But that's not short enough. In the worst case, a character is encoded to 4 bytes (e.g. all characters encoded with UTF-32 or some characters encoded with UTF-8), that means the chars parameter of get_valid_filename should be set no larger than 255/4 = 63.

Furthermore, maybe it's a better idea to make it configurable to support other file systems with more strict limits.

BTW, I can import the book with calibredb on ext4 with no problem. Not sure how calibredb handles this case.

To Reproduce
Steps to reproduce the behavior:

  1. Make sure the file system is ext4. (I use mac and I don't have a Linux machine. But ext4 is possible with Docker Desktop on macOS.)
  2. Edit a random epub file with ebook-edit and update the title field of content.opf to a title with 120+ Chinese characters, then save it.
  3. Upload the modifed file to calibre-web and you should be able to see the error messages.

Logfile
N/A

Expected behavior
No error messages and the file should be able to be uploaded.

Screenshots
N/A

Environment (please complete the following information):

  • OS: Linux 5.10.76-linuxkit Unable to create Admin user #1 SMP Mon Nov 8 10:21:19 UTC 2021 x86_64 x86_64
  • Python version: 3.8.10 (default, Nov 26 2021, 20:14:08) [GCC 9.3.0]
  • Calibre-Web version: 0.6.17 Beta - 7c62394 - 2022-02-07T13:55:18+01:00
  • Docker container: linuxserver/calibre-web:nightly-version-7c623941

Additional context
N/A

@OzzieIsaacs
Copy link
Collaborator

I'll have a look at it. I figured out the filename limit by coping the behavior from calibre itself (but I only checked non-unicode filenames)

@OzzieIsaacs OzzieIsaacs added the bug label Feb 9, 2022
@kfstorm
Copy link
Author

kfstorm commented Feb 10, 2022

Thanks. I've searched the source code of calibre. Looks like this is what we want: https://github.com/kovidgoyal/calibre/blob/d552304ba0abcc898acbd2d3a77ef486030d926e/src/calibre/utils/filenames.py#L52

OzzieIsaacs added a commit that referenced this issue Mar 29, 2022
Fix for #2309 (long unicode filenames could get to long)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants