Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When I trigger the download event and get the file response, an exception is thrown because of the wrong Content-Encoding #321

Closed
ma-pony opened this issue Oct 8, 2024 · 1 comment

Comments

@ma-pony
Copy link
Contributor

ma-pony commented Oct 8, 2024

An exception is thrown because of the wrong content encoding when I fire the download event and get the file response.

2024-10-08 15:26:39 [scrapy.core.scraper] ERROR: Error downloading <GET http://www.yanan.gov.cn/gk/fdzdgknr/zdxm/sphzbaxx/1833020334224224258.html>
Traceback (most recent call last):
  File "/Users/rccpony/Library/Caches/pypoetry/virtualenvs/universal-spider-zsgF1vq0-py3.10/lib/python3.10/site-packages/twisted/internet/defer.py", line 1661, in _inlineCallbacks
    result = current_context.run(gen.send, result)
StopIteration: <200 http://www.yanan.gov.cn/upload/yanan/2024/09/09/202409091350252175.pdf>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/rccpony/Library/Caches/pypoetry/virtualenvs/universal-spider-zsgF1vq0-py3.10/lib/python3.10/site-packages/twisted/internet/defer.py", line 1661, in _inlineCallbacks
    result = current_context.run(gen.send, result)
  File "/Users/rccpony/Library/Caches/pypoetry/virtualenvs/universal-spider-zsgF1vq0-py3.10/lib/python3.10/site-packages/scrapy/core/downloader/middleware.py", line 68, in process_response
    method(request=request, response=response, spider=spider)
  File "/Users/rccpony/Library/Caches/pypoetry/virtualenvs/universal-spider-zsgF1vq0-py3.10/lib/python3.10/site-packages/scrapy/downloadermiddlewares/httpcompression.py", line 90, in process_response
    decoded_body = self._decode(
  File "/Users/rccpony/Library/Caches/pypoetry/virtualenvs/universal-spider-zsgF1vq0-py3.10/lib/python3.10/site-packages/scrapy/downloadermiddlewares/httpcompression.py", line 130, in _decode
    return gunzip(body, max_size=max_size)
  File "/Users/rccpony/Library/Caches/pypoetry/virtualenvs/universal-spider-zsgF1vq0-py3.10/lib/python3.10/site-packages/scrapy/utils/gz.py", line 21, in gunzip
    chunk = f.read1(_CHUNK_SIZE)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/gzip.py", line 314, in read1
    return self._buffer.read1(size)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/gzip.py", line 488, in read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/gzip.py", line 436, in _read_gzip_header
    raise BadGzipFile('Not a gzipped file (%r)' % magic)
gzip.BadGzipFile: Not a gzipped file (b'%P')
@elacuesta
Copy link
Member

Addressed by #322

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants