Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

url: resource and query encoding should support unicode #530

Merged
merged 1 commit into from
Jun 19, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,13 @@ env:
- ARCH=i686

install:
- pip install urllib3 certifi pytz pyflakes
- pip install urllib3 certifi pytz pyflakes faker

script:
- pyflakes minio/*.py || true
- python setup.py install
- python setup.py nosetests
- python tests/functional/tests.py

notifications:
slack:
Expand Down
3 changes: 2 additions & 1 deletion appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ environment:

install:
# We need wheel installed to build wheels
- "%PYTHON%\\python.exe -m pip install wheel"
- "%PYTHON%\\python.exe -m pip install wheel faker"
- "%PYTHON%\\python.exe setup.py install"

build: off
Expand All @@ -17,6 +17,7 @@ test_script:
# the interpreter you're using - Appveyor does not do anything special
# to put the Python evrsion you want to use on PATH.
- "%PYTHON%\\python.exe setup.py nosetests"
- "%PYTHON%\\python.exe tests/functional/tests.py"

after_test:
# This step builds your wheels.
Expand Down
21 changes: 12 additions & 9 deletions minio/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,8 @@

# Internal imports
from . import __title__, __version__
from .compat import urlsplit, range, urlencode, basestring
from .compat import (urlsplit, queryencode,
Copy link

@Yurmusp Yurmusp Jun 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see issue in your files/code.
It depends what you target for. I see it from the standpoint what I would target for.

I would target for have everything in your lib unicode. I mean all coming and outgoing strings.
For that typically programmer needs to have two things enabled shown as two lines below.

# encoding: utf-8
from __future__ import unicode_literals

First line says that this file is UTF-8.
Second line makes every string to be unicode string in this file (not just base string with UTF-8 encoding in it).
What you have now is only first line out of two in this file.
So, your file has all base strings with UTF-8. NOT really unicode strings everywhere.
I'm not sure what you want.
You still can try to go with base strings and UTF-8, but that seems assumed to be more complicated way
with many side effects that needs to be handled.
Hope this can be useful.
All this message is from python 2.7 point of view of course. Not 3.x

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea @Yurmusp will test it out and see if there are any problems for 3.x

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what was meant. Who will test it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"will" means "we will" shorter way to saying it :-)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my understanding. Change I suggested shall have no impact on 3.x.
Because this change is actually to make everything more like 3.x
(every string to be unicode) but within 2.x python.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We won't be doing this for now this PR works as is.. We will take this activity as such an issue comes in future. We don't want to introduce subtle changes like this which can have unexpected consequences.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as it works, it's fine I think.

range, basestring)
from .error import (KnownResponseError, ResponseError, NoSuchBucket,
InvalidArgumentError, InvalidSizeError, NoSuchBucketPolicy)
from .definitions import Object, UploadPart
Expand Down Expand Up @@ -841,7 +842,7 @@ def copy_object(self, bucket_name, object_name, object_source,
if conditions:
headers = {k: v for k, v in conditions.items()}

headers['X-Amz-Copy-Source'] = urlencode(object_source)
headers['X-Amz-Copy-Source'] = queryencode(object_source)
response = self._url_open('PUT',
bucket_name=bucket_name,
object_name=object_name,
Expand Down Expand Up @@ -946,8 +947,9 @@ def list_objects(self, bucket_name, prefix=None, recursive=False):

# Initialize query parameters.
query = {
'max-keys': 1000
'max-keys': '1000'
}

# Add if prefix present.
if prefix:
query['prefix'] = prefix
Expand Down Expand Up @@ -1012,7 +1014,7 @@ def list_objects_v2(self, bucket_name, prefix=None, recursive=False):

# Initialize query parameters.
query = {
'list-type': 2
'list-type': '2'
}
# Add if prefix present.
if prefix:
Expand Down Expand Up @@ -1228,7 +1230,7 @@ def _list_incomplete_uploads(self, bucket_name, prefix=None,
# Initialize query parameters.
query = {
'uploads': '',
'max-uploads': 1000
'max-uploads': '1000'
}

if prefix:
Expand Down Expand Up @@ -1286,14 +1288,14 @@ def _list_object_parts(self, bucket_name, object_name, upload_id):

query = {
'uploadId': upload_id,
'max-parts': 1000
'max-parts': '1000'
}

is_truncated = True
part_number_marker = None
while is_truncated:
if part_number_marker:
query['part-number-marker'] = part_number_marker
query['part-number-marker'] = str(part_number_marker)

response = self._url_open('GET',
bucket_name=bucket_name,
Expand Down Expand Up @@ -1558,7 +1560,8 @@ def _do_put_multipart_object(self, bucket_name, object_name, part_metadata,
'Invalid input data does not implement a callable read() method')

# Convert hex representation of md5 content to base64
md5content_b64 = codecs.encode(codecs.decode(part_metadata.md5_hex, 'hex'), 'base64').strip()
md5content_b64 = codecs.encode(codecs.decode(
part_metadata.md5_hex, 'hex_codec'), 'base64_codec').strip()

headers = {
'Content-Length': part_metadata.size,
Expand All @@ -1569,7 +1572,7 @@ def _do_put_multipart_object(self, bucket_name, object_name, part_metadata,
'PUT', bucket_name=bucket_name,
object_name=object_name,
query={'uploadId': upload_id,
'partNumber': part_number},
'partNumber': str(part_number)},
headers=headers,
body=data,
content_sha256=part_metadata.sha256_hex
Expand Down
23 changes: 21 additions & 2 deletions minio/compat.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@

if _is_py2:
from urllib import quote
urlencode = quote
_urlencode = quote

from urllib import unquote
urldecode = unquote
Expand All @@ -57,7 +57,7 @@
basestring = basestring
elif _is_py3:
from urllib.request import quote
urlencode = quote
_urlencode = quote

from urllib.request import unquote
urldecode = unquote
Expand All @@ -80,3 +80,22 @@
str = str

numeric_types = (int, long, float)

def urlencode(resource):
"""
This implementation of urlencode supports all unicode characters

:param: resource: Resource value to be url encoded.
"""
if isinstance(resource, str):
return _urlencode(resource.encode('utf-8'))

return _urlencode(resource)

def queryencode(query):
"""
This implementation of queryencode supports all unicode characters

:param: query: Query value to be url encoded.
"""
return urlencode(query).replace('/', '%2F')
17 changes: 10 additions & 7 deletions minio/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,8 @@
import errno
import math

from .compat import urlsplit, urlencode, str, bytes, basestring
from .compat import (urlsplit, urlencode, queryencode,
str, bytes, basestring)
from .error import (InvalidBucketError, InvalidEndpointError,
InvalidArgumentError)

Expand Down Expand Up @@ -226,13 +227,15 @@ def get_target_url(endpoint_url, bucket_name=None, object_name=None,
if ordered_query[component_key] is not None:
if isinstance(ordered_query[component_key], list):
for value in ordered_query[component_key]:
encoded_query = urlencode(
str(value)).replace('/', '%2F')
query_components.append(component_key+'='+encoded_query)
query_components.append(component_key+'='+
queryencode(value))
else:
encoded_query = urlencode(
str(ordered_query[component_key])).replace('/', '%2F')
query_components.append(component_key+'='+encoded_query)
query_components.append(
component_key+'='+
queryencode(
ordered_query[component_key]
)
)
else:
query_components.append(component_key)

Expand Down
12 changes: 5 additions & 7 deletions minio/signer.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@

from datetime import datetime
from .error import InvalidArgumentError
from .compat import urlsplit, urlencode
from .compat import urlsplit, queryencode
from .helpers import get_sha256_hexdigest
from .fold_case_dict import FoldCaseDict

Expand Down Expand Up @@ -84,7 +84,7 @@ def presign_v4(method, url, access_key, secret_key, region=None,
headers = {}

if expires is None:
expires = 604800
expires = '604800'

parsed_url = urlsplit(url)
content_hash_hex = _UNSIGNED_PAYLOAD
Expand All @@ -100,7 +100,7 @@ def presign_v4(method, url, access_key, secret_key, region=None,
query['X-Amz-Credential'] = generate_credential_string(access_key,
date, region)
query['X-Amz-Date'] = iso8601Date
query['X-Amz-Expires'] = expires
query['X-Amz-Expires'] = str(expires)
signed_headers = get_signed_headers(headers_to_sign)
query['X-Amz-SignedHeaders'] = ';'.join(signed_headers)

Expand All @@ -117,10 +117,8 @@ def presign_v4(method, url, access_key, secret_key, region=None,
if ordered_query[component_key] is not None:
single_component.append('=')
single_component.append(
urlencode(
str(ordered_query[component_key])
).replace('/',
'%2F'))
queryencode(ordered_query[component_key])
)
query_components.append(''.join(single_component))

query_string = '&'.join(query_components)
Expand Down
4 changes: 0 additions & 4 deletions tests/functional/tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,10 +50,6 @@ def main():

# Enable trace
# client.trace_on(sys.stderr)

# Make a new bucket.
bucket_name = 'minio-pytest'

client.make_bucket(bucket_name)

is_s3 = client._endpoint_url.startswith("s3.amazonaws")
Expand Down
21 changes: 20 additions & 1 deletion tests/unit/sign_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
generate_signing_key, generate_authorization_header,
presign_v4)
from minio.error import InvalidArgumentError
from minio.compat import urlsplit
from minio.compat import urlsplit, urlencode, queryencode
from minio.fold_case_dict import FoldCaseDict

empty_hash = 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'
Expand Down Expand Up @@ -115,3 +115,22 @@ def test_presigned_no_access_key(self):
@raises(InvalidArgumentError)
def test_presigned_invalid_expires(self):
presign_v4('GET', 'http://localhost:9000/hello', None, None, region=None, headers={}, expires=0)

class UnicodeEncodeTest(TestCase):
def test_unicode_urlencode(self):
eq_(urlencode('/test/123/汉字'), '/test/123/%E6%B1%89%E5%AD%97')

def test_unicode_queryencode(self):
eq_(queryencode('/test/123/汉字'), '%2Ftest%2F123%2F%E6%B1%89%E5%AD%97')

def test_unicode_urlencode_u(self):
eq_(urlencode(u'/test/123/汉字'), '/test/123/%E6%B1%89%E5%AD%97')

def test_unicode_queryencode_u(self):
eq_(queryencode(u'/test/123/汉字'), '%2Ftest%2F123%2F%E6%B1%89%E5%AD%97')

def test_unicode_urlencode_b(self):
eq_(urlencode(b'/test/123/\xe6\xb1\x89\xe5\xad\x97'), '/test/123/%E6%B1%89%E5%AD%97')

def test_unicode_queryencode_b(self):
eq_(queryencode(b'/test/123/\xe6\xb1\x89\xe5\xad\x97'), '%2Ftest%2F123%2F%E6%B1%89%E5%AD%97')