Enable byte-range reads when loading PDF file #2654

echo094 · 2023-01-12T08:15:37Z

Is your feature request related to a problem? Please describe.

The size of PDF can be very big and currently we cannot read the book before the file is fully loaded. The pdf.js plugin has already supported the byte-range feafure and the experience can be better if we turn on the feature.

Describe the solution you'd like

The feature can be implemented by modifying two places:

The html template

calibre-web/cps/templates/readpdf.html

Lines 44 to 52 in 1f6eb2d

    
           window.addEventListener('webviewerloaded', function() { 
        
               PDFViewerApplicationOptions.set('disableAutoFetch', true); 
        
               PDFViewerApplicationOptions.set('disableRange', true); 
        
               PDFViewerApplicationOptions.set('cMapUrl', "{{ url_for('static', filename='cmaps/') }}"); 
        
               PDFViewerApplicationOptions.set('sidebarViewOnLoad', 0); 
        
               PDFViewerApplicationOptions.set('imageResourcesPath', "{{ url_for('static', filename='css/images/') }}"); 
        
               PDFViewerApplicationOptions.set('workerSrc', "{{ url_for('static', filename='js/libs/pdf.worker.js') }}"); 
        
               PDFViewerApplicationOptions.set('defaultUrl',"{{ url_for('web.serve_book', book_id=pdffile, book_format='pdf') }}") 
        
           });

Change disableRange to false
Add two other options: disablePreferences=true, disableStream=true

The server side

calibre-web/cps/web.py

Lines 1150 to 1181 in ce0b3d8

    
           @web.route("/show/<int:book_id>/<book_format>", defaults={'anyname': 'None'}) 
        
           @web.route("/show/<int:book_id>/<book_format>/<anyname>") 
        
           @login_required_if_no_ano 
        
           @viewer_required 
        
           def serve_book(book_id, book_format, anyname): 
        
               book_format = book_format.split(".")[0] 
        
               book = calibre_db.get_book(book_id) 
        
               data = calibre_db.get_book_format(book_id, book_format.upper()) 
        
               if not data: 
        
                   return "File not in Database" 
        
               log.info('Serving book: %s', data.name) 
        
               if config.config_use_google_drive: 
        
                   try: 
        
                       headers = Headers() 
        
                       headers["Content-Type"] = mimetypes.types_map.get('.' + book_format, "application/octet-stream") 
        
                       df = getFileFromEbooksFolder(book.path, data.name + "." + book_format) 
        
                       return do_gdrive_download(df, headers, (book_format.upper() == 'TXT')) 
        
                   except AttributeError as ex: 
        
                       log.error_or_exception(ex) 
        
                       return "File Not Found" 
        
               else: 
        
                   if book_format.upper() == 'TXT': 
        
                       try: 
        
                           rawdata = open(os.path.join(config.config_calibre_dir, book.path, data.name + "." + book_format), 
        
                                          "rb").read() 
        
                           result = chardet.detect(rawdata) 
        
                           return make_response( 
        
                               rawdata.decode(result['encoding'], 'surrogatepass').encode('utf-8', 'surrogatepass')) 
        
                       except FileNotFoundError: 
        
                           log.error("File Not Found") 
        
                           return "File Not Found" 
        
                   return send_from_directory(os.path.join(config.config_calibre_dir, book.path), data.name + "." + book_format)

A header need to be added when sending the file to show the support: Accept-Ranges: bytes;, while I don't know how to realize. Adding the header manually may cause unexpected problems. According to the doc of flask(flask.send_from_directory), adding param conditional=True may help.

Describe alternatives you've considered

No

Additional context

The disablePreference option is set because of the bug in current version of pdf.js, as shown in this issue: mozilla/pdf.js#14063

Now, I can realize the feature by modifying the html template and using a nginx forward proxy that sets a proxy_force_ranges on option. If the modification of the server side is valid, the forward proxy can be removed.

Besides, the log should be adjusted since the feature will cause many requests.

The text was updated successfully, but these errors were encountered:

UFervor · 2023-02-03T15:21:01Z

Is your feature request related to a problem? Please describe.

The size of PDF can be very big and currently we cannot read the book before the file is fully loaded. The pdf.js plugin has already supported the byte-range feafure and the experience can be better if we turn on the feature.

Describe the solution you'd like

The feature can be implemented by modifying two places:

The html template

calibre-web/cps/templates/readpdf.html

Lines 44 to 52 in 1f6eb2d

window.addEventListener('webviewerloaded', function() {

PDFViewerApplicationOptions.set('disableAutoFetch', true);

PDFViewerApplicationOptions.set('disableRange', true);

PDFViewerApplicationOptions.set('cMapUrl', "{{ url_for('static', filename='cmaps/') }}");

PDFViewerApplicationOptions.set('sidebarViewOnLoad', 0);

PDFViewerApplicationOptions.set('imageResourcesPath', "{{ url_for('static', filename='css/images/') }}");

PDFViewerApplicationOptions.set('workerSrc', "{{ url_for('static', filename='js/libs/pdf.worker.js') }}");

PDFViewerApplicationOptions.set('defaultUrl',"{{ url_for('web.serve_book', book_id=pdffile, book_format='pdf') }}")

});

Change disableRange to false

Add two other options: disablePreferences=true, disableStream=true

The server side

calibre-web/cps/web.py

Lines 1150 to 1181 in ce0b3d8

@web.route("/show/<int:book_id>/<book_format>", defaults={'anyname': 'None'})

@web.route("/show/<int:book_id>/<book_format>/<anyname>")

@login_required_if_no_ano

@viewer_required

def serve_book(book_id, book_format, anyname):

book_format = book_format.split(".")[0]

book = calibre_db.get_book(book_id)

data = calibre_db.get_book_format(book_id, book_format.upper())

if not data:

return "File not in Database"

log.info('Serving book: %s', data.name)

if config.config_use_google_drive:

try:

headers = Headers()

headers["Content-Type"] = mimetypes.types_map.get('.' + book_format, "application/octet-stream")

df = getFileFromEbooksFolder(book.path, data.name + "." + book_format)

return do_gdrive_download(df, headers, (book_format.upper() == 'TXT'))

except AttributeError as ex:

log.error_or_exception(ex)

return "File Not Found"

else:

if book_format.upper() == 'TXT':

try:

rawdata = open(os.path.join(config.config_calibre_dir, book.path, data.name + "." + book_format),

"rb").read()

result = chardet.detect(rawdata)

return make_response(

rawdata.decode(result['encoding'], 'surrogatepass').encode('utf-8', 'surrogatepass'))

except FileNotFoundError:

log.error("File Not Found")

return "File Not Found"

return send_from_directory(os.path.join(config.config_calibre_dir, book.path), data.name + "." + book_format)

A header need to be added when sending the file to show the support: Accept-Ranges: bytes;, while I don't know how to realize. Adding the header manually may cause unexpected problems. According to the doc of flask(flask.send_from_directory), adding param conditional=True may help.

Describe alternatives you've considered

No

Additional context

The disablePreference option is set because of the bug in current version of pdf.js, as shown in this issue: mozilla/pdf.js#14063

Now, I can realize the feature by modifying the html template and using a nginx forward proxy that sets a proxy_force_ranges on option. If the modification of the server side is valid, the forward proxy can be removed.

Besides, the log should be adjusted since the feature will cause many requests.

Modifying the function serve_book as below, the program works as expected. (With log problem solved)

def serve_book(book_id, book_format, anyname):
    book_format = book_format.split(".")[0]
    book = calibre_db.get_book(book_id)
    data = calibre_db.get_book_format(book_id, book_format.upper())
    if not data:
        return "File not in Database"

    range_header = request.headers.get('Range', None)
    if not range_header:
        log.info('Serving book: %s', data.name)
        response = make_response(send_from_directory(os.path.join(config.config_calibre_dir, book.path), data.name + "." + book_format))
        response.headers['Accept-Ranges'] = 'bytes'
        return response

    if config.config_use_google_drive:
        try:
            headers = Headers()
            headers["Content-Type"] = mimetypes.types_map.get('.' + book_format, "application/octet-stream")
            df = getFileFromEbooksFolder(book.path, data.name + "." + book_format)
            return do_gdrive_download(df, headers, (book_format.upper() == 'TXT'))
        except AttributeError as ex:
            log.error_or_exception(ex)
            return "File Not Found"
    else:
        if book_format.upper() == 'TXT':
            try:
                rawdata = open(os.path.join(config.config_calibre_dir, book.path, data.name + "." + book_format),
                            "rb").read()
                result = chardet.detect(rawdata)
                return make_response(
                    rawdata.decode(result['encoding'], 'surrogatepass').encode('utf-8', 'surrogatepass'))
            except FileNotFoundError:
                log.error("File Not Found")
                return "File Not Found"
        return send_from_directory(os.path.join(config.config_calibre_dir, book.path), data.name + "." + book_format)

UFervor mentioned this issue Feb 3, 2023

Enabled byte-range reads when loading PDF file #2682

Merged

OzzieIsaacs added the Fixed in Nightly label Feb 22, 2023

OzzieIsaacs closed this as completed Mar 27, 2023

ares1977 mentioned this issue Dec 5, 2023

How to Enable hypothes.is Annotations for PDFs? #2882

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable byte-range reads when loading PDF file #2654

Enable byte-range reads when loading PDF file #2654

echo094 commented Jan 12, 2023 •

edited

Loading

UFervor commented Feb 3, 2023 •

edited

Loading

Enable byte-range reads when loading PDF file #2654

Enable byte-range reads when loading PDF file #2654

Comments

echo094 commented Jan 12, 2023 • edited Loading

UFervor commented Feb 3, 2023 • edited Loading

echo094 commented Jan 12, 2023 •

edited

Loading

UFervor commented Feb 3, 2023 •

edited

Loading