Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable byte-range reads when loading PDF file #2654

Closed
echo094 opened this issue Jan 12, 2023 · 1 comment
Closed

Enable byte-range reads when loading PDF file #2654

echo094 opened this issue Jan 12, 2023 · 1 comment

Comments

@echo094
Copy link

echo094 commented Jan 12, 2023

Is your feature request related to a problem? Please describe.

The size of PDF can be very big and currently we cannot read the book before the file is fully loaded. The pdf.js plugin has already supported the byte-range feafure and the experience can be better if we turn on the feature.

Describe the solution you'd like

The feature can be implemented by modifying two places:

  1. The html template

window.addEventListener('webviewerloaded', function() {
PDFViewerApplicationOptions.set('disableAutoFetch', true);
PDFViewerApplicationOptions.set('disableRange', true);
PDFViewerApplicationOptions.set('cMapUrl', "{{ url_for('static', filename='cmaps/') }}");
PDFViewerApplicationOptions.set('sidebarViewOnLoad', 0);
PDFViewerApplicationOptions.set('imageResourcesPath', "{{ url_for('static', filename='css/images/') }}");
PDFViewerApplicationOptions.set('workerSrc', "{{ url_for('static', filename='js/libs/pdf.worker.js') }}");
PDFViewerApplicationOptions.set('defaultUrl',"{{ url_for('web.serve_book', book_id=pdffile, book_format='pdf') }}")
});

  • Change disableRange to false
  • Add two other options: disablePreferences=true, disableStream=true
  1. The server side

calibre-web/cps/web.py

Lines 1150 to 1181 in ce0b3d8

@web.route("/show/<int:book_id>/<book_format>", defaults={'anyname': 'None'})
@web.route("/show/<int:book_id>/<book_format>/<anyname>")
@login_required_if_no_ano
@viewer_required
def serve_book(book_id, book_format, anyname):
book_format = book_format.split(".")[0]
book = calibre_db.get_book(book_id)
data = calibre_db.get_book_format(book_id, book_format.upper())
if not data:
return "File not in Database"
log.info('Serving book: %s', data.name)
if config.config_use_google_drive:
try:
headers = Headers()
headers["Content-Type"] = mimetypes.types_map.get('.' + book_format, "application/octet-stream")
df = getFileFromEbooksFolder(book.path, data.name + "." + book_format)
return do_gdrive_download(df, headers, (book_format.upper() == 'TXT'))
except AttributeError as ex:
log.error_or_exception(ex)
return "File Not Found"
else:
if book_format.upper() == 'TXT':
try:
rawdata = open(os.path.join(config.config_calibre_dir, book.path, data.name + "." + book_format),
"rb").read()
result = chardet.detect(rawdata)
return make_response(
rawdata.decode(result['encoding'], 'surrogatepass').encode('utf-8', 'surrogatepass'))
except FileNotFoundError:
log.error("File Not Found")
return "File Not Found"
return send_from_directory(os.path.join(config.config_calibre_dir, book.path), data.name + "." + book_format)

A header need to be added when sending the file to show the support: Accept-Ranges: bytes;, while I don't know how to realize. Adding the header manually may cause unexpected problems. According to the doc of flask(flask.send_from_directory), adding param conditional=True may help.

Describe alternatives you've considered

No

Additional context

The disablePreference option is set because of the bug in current version of pdf.js, as shown in this issue: mozilla/pdf.js#14063

Now, I can realize the feature by modifying the html template and using a nginx forward proxy that sets a proxy_force_ranges on option. If the modification of the server side is valid, the forward proxy can be removed.

Besides, the log should be adjusted since the feature will cause many requests.

@UFervor
Copy link
Contributor

UFervor commented Feb 3, 2023

Is your feature request related to a problem? Please describe.

The size of PDF can be very big and currently we cannot read the book before the file is fully loaded. The pdf.js plugin has already supported the byte-range feafure and the experience can be better if we turn on the feature.

Describe the solution you'd like

The feature can be implemented by modifying two places:

  1. The html template

window.addEventListener('webviewerloaded', function() {
PDFViewerApplicationOptions.set('disableAutoFetch', true);
PDFViewerApplicationOptions.set('disableRange', true);
PDFViewerApplicationOptions.set('cMapUrl', "{{ url_for('static', filename='cmaps/') }}");
PDFViewerApplicationOptions.set('sidebarViewOnLoad', 0);
PDFViewerApplicationOptions.set('imageResourcesPath', "{{ url_for('static', filename='css/images/') }}");
PDFViewerApplicationOptions.set('workerSrc', "{{ url_for('static', filename='js/libs/pdf.worker.js') }}");
PDFViewerApplicationOptions.set('defaultUrl',"{{ url_for('web.serve_book', book_id=pdffile, book_format='pdf') }}")
});

  • Change disableRange to false
  • Add two other options: disablePreferences=true, disableStream=true
  1. The server side

calibre-web/cps/web.py

Lines 1150 to 1181 in ce0b3d8

@web.route("/show/<int:book_id>/<book_format>", defaults={'anyname': 'None'})
@web.route("/show/<int:book_id>/<book_format>/<anyname>")
@login_required_if_no_ano
@viewer_required
def serve_book(book_id, book_format, anyname):
book_format = book_format.split(".")[0]
book = calibre_db.get_book(book_id)
data = calibre_db.get_book_format(book_id, book_format.upper())
if not data:
return "File not in Database"
log.info('Serving book: %s', data.name)
if config.config_use_google_drive:
try:
headers = Headers()
headers["Content-Type"] = mimetypes.types_map.get('.' + book_format, "application/octet-stream")
df = getFileFromEbooksFolder(book.path, data.name + "." + book_format)
return do_gdrive_download(df, headers, (book_format.upper() == 'TXT'))
except AttributeError as ex:
log.error_or_exception(ex)
return "File Not Found"
else:
if book_format.upper() == 'TXT':
try:
rawdata = open(os.path.join(config.config_calibre_dir, book.path, data.name + "." + book_format),
"rb").read()
result = chardet.detect(rawdata)
return make_response(
rawdata.decode(result['encoding'], 'surrogatepass').encode('utf-8', 'surrogatepass'))
except FileNotFoundError:
log.error("File Not Found")
return "File Not Found"
return send_from_directory(os.path.join(config.config_calibre_dir, book.path), data.name + "." + book_format)

A header need to be added when sending the file to show the support: Accept-Ranges: bytes;, while I don't know how to realize. Adding the header manually may cause unexpected problems. According to the doc of flask(flask.send_from_directory), adding param conditional=True may help.

Describe alternatives you've considered

No

Additional context

The disablePreference option is set because of the bug in current version of pdf.js, as shown in this issue: mozilla/pdf.js#14063

Now, I can realize the feature by modifying the html template and using a nginx forward proxy that sets a proxy_force_ranges on option. If the modification of the server side is valid, the forward proxy can be removed.

Besides, the log should be adjusted since the feature will cause many requests.

Modifying the function serve_book as below, the program works as expected. (With log problem solved)

def serve_book(book_id, book_format, anyname):
    book_format = book_format.split(".")[0]
    book = calibre_db.get_book(book_id)
    data = calibre_db.get_book_format(book_id, book_format.upper())
    if not data:
        return "File not in Database"

    range_header = request.headers.get('Range', None)
    if not range_header:
        log.info('Serving book: %s', data.name)
        response = make_response(send_from_directory(os.path.join(config.config_calibre_dir, book.path), data.name + "." + book_format))
        response.headers['Accept-Ranges'] = 'bytes'
        return response

    if config.config_use_google_drive:
        try:
            headers = Headers()
            headers["Content-Type"] = mimetypes.types_map.get('.' + book_format, "application/octet-stream")
            df = getFileFromEbooksFolder(book.path, data.name + "." + book_format)
            return do_gdrive_download(df, headers, (book_format.upper() == 'TXT'))
        except AttributeError as ex:
            log.error_or_exception(ex)
            return "File Not Found"
    else:
        if book_format.upper() == 'TXT':
            try:
                rawdata = open(os.path.join(config.config_calibre_dir, book.path, data.name + "." + book_format),
                            "rb").read()
                result = chardet.detect(rawdata)
                return make_response(
                    rawdata.decode(result['encoding'], 'surrogatepass').encode('utf-8', 'surrogatepass'))
            except FileNotFoundError:
                log.error("File Not Found")
                return "File Not Found"
        return send_from_directory(os.path.join(config.config_calibre_dir, book.path), data.name + "." + book_format)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants