Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Gzip encoding #86

Closed
wants to merge 1 commit into from
Closed

Conversation

ariofrio
Copy link
Contributor

Some websites, like Wikipedia, send us a Gzip-encoded web page even if we don't ask for it. This usually happens when the page is very large. Try this to see what I mean:

curl -vI http://en.wikipedia.org/wiki/Spanish_language |& grep '^[<>]'

This patch helps us deal with those websites, and speeds things up, at least on Python 3.2 and up. Previous versions don't support GzipFile streaming, so we read the whole file to a StringIO before decoding with GzipFile.

@SimonSapin
Copy link
Member

Hi! Thanks for contributing, and so sorry for letting this linger so long…

I took the liberty to rewrite this instead of taking your PR, in order to:

  • Use the byte-based io.BytesIO instead of the Unicode-based io.StringIO or the ambiguous StringIO.StringIO
  • When gzip streaming is not supported, just return a byte string. (As is done e.g. for data: URLs.) There is no use to returning a file-like object.
  • Add some tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants