Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding GZip support to urllib3 #704

Merged
merged 8 commits into from
Mar 13, 2018
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -258,6 +258,17 @@ bodies via post::
from elasticsearch import Elasticsearch
es = Elasticsearch(send_get_body_as='POST')

Compression
~~~~~~~~~~~
When using capacity constrained networks (low throughput), it may be handy to enable
compression. This is especially useful when doing bulk loads or inserting large
documents. This will configure compression on the *request*.
::

from elasticsearch import Elasticsearch
es = Elasticsearch(hosts, http_compress = True)


Running on AWS with IAM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
11 changes: 10 additions & 1 deletion elasticsearch/connection/http_urllib3.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import urllib3
from urllib3.exceptions import ReadTimeoutError, SSLError as UrllibSSLError
import warnings
import gzip

# sentinal value for `verify_certs`.
# This is used to detect if a user is passing in a value for `verify_certs`
Expand Down Expand Up @@ -62,13 +63,15 @@ class Urllib3HttpConnection(Connection):
host. See https://urllib3.readthedocs.io/en/1.4/pools.html#api for more
information.
:arg headers: any custom http headers to be add to requests
:arg http_compress: Use gzip compression
"""
def __init__(self, host='localhost', port=9200, http_auth=None,
use_ssl=False, verify_certs=VERIFY_CERTS_DEFAULT, ca_certs=None, client_cert=None,
client_key=None, ssl_version=None, ssl_assert_hostname=None,
ssl_assert_fingerprint=None, maxsize=10, headers=None, ssl_context=None, **kwargs):
ssl_assert_fingerprint=None, maxsize=10, headers=None, ssl_context=None, http_compress=False, **kwargs):

super(Urllib3HttpConnection, self).__init__(host=host, port=port, use_ssl=use_ssl, **kwargs)
self.http_compress = http_compress
self.headers = urllib3.make_headers(keep_alive=True)
if http_auth is not None:
if isinstance(http_auth, (tuple, list)):
Expand All @@ -80,6 +83,10 @@ def __init__(self, host='localhost', port=9200, http_auth=None,
for k in headers:
self.headers[k.lower()] = headers[k]

if self.http_compress == True:
self.headers.update(urllib3.make_headers(accept_encoding=True))
self.headers.update({'content-encoding': 'gzip'})

self.headers.setdefault('content-type', 'application/json')
pool_class = urllib3.HTTPConnectionPool
kw = {}
Expand Down Expand Up @@ -152,6 +159,8 @@ def perform_request(self, method, url, params=None, body=None, timeout=None, ign

request_headers = self.headers
if headers:
if self.http_compress == True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be outside of the if headers block, otherwise it will only be used if custome headers are specified.

if should be:

if self.http_compress and body:
    body = gzip.compress(body)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derp! Updating.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good eye @honzakral

body = gzip.compress(body)
request_headers = request_headers.copy()
request_headers.update(headers)
response = self.pool.urlopen(method, url, body, retries=False, headers=request_headers, **kw)
Expand Down
4 changes: 4 additions & 0 deletions test_elasticsearch/test_connection.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,10 @@ def test_ssl_context(self):
)
self.assertTrue(con.use_ssl)

def test_http_compression(self):
con = Urllib3HttpConnection(http_compress=True)
self.assertTrue(con.http_compress)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also verify that con.headers are set properly, once the headers are set in __init__.


def test_timeout_set(self):
con = Urllib3HttpConnection(timeout=42)
self.assertEquals(42, con.timeout)
Expand Down