Add support for Gzip encoding #86

ariofrio · 2013-05-17T10:58:34Z

Some websites, like Wikipedia, send us a Gzip-encoded web page even if we don't ask for it. This usually happens when the page is very large. Try this to see what I mean:

curl -vI http://en.wikipedia.org/wiki/Spanish_language |& grep '^[<>]'

This patch helps us deal with those websites, and speeds things up, at least on Python 3.2 and up. Previous versions don't support GzipFile streaming, so we read the whole file to a StringIO before decoding with GzipFile.

SimonSapin · 2014-04-21T23:01:11Z

Hi! Thanks for contributing, and so sorry for letting this linger so long…

I took the liberty to rewrite this instead of taking your PR, in order to:

Use the byte-based io.BytesIO instead of the Unicode-based io.StringIO or the ambiguous StringIO.StringIO
When gzip streaming is not supported, just return a byte string. (As is done e.g. for data: URLs.) There is no use to returning a file-like object.
Add some tests

Add support for Gzip encoding

87960b0

SimonSapin closed this in 9404375 Apr 21, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Gzip encoding #86

Add support for Gzip encoding #86

ariofrio commented May 17, 2013

SimonSapin commented Apr 21, 2014

Add support for Gzip encoding #86

Add support for Gzip encoding #86

Conversation

ariofrio commented May 17, 2013

SimonSapin commented Apr 21, 2014