-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix several byte / unicode issues for Python 3 #7665
Conversation
Run Python PostCommit |
@@ -226,7 +226,7 @@ def read_meta_data_from_file(f): | |||
meta = header['meta'] | |||
|
|||
if datafile.CODEC_KEY in meta: | |||
codec = meta[datafile.CODEC_KEY] | |||
codec = meta[datafile.CODEC_KEY].decode('utf-8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When running the avroio tests in Python 3, this results in the following error for me:
File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line 355, in _decompress_bytes
raise ValueError('Unknown codec: %r' % codec)
ValueError: Unknown codec: 'null'
I've used another approach and have fixed all but 2 Python 3 errors in avroio. see #7644.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach in #7644 makes sense to me. Once merged - @charlesccychen PTAL if you still encounter the errors and rebase the PR on top of master. Thank you.
@@ -1046,7 +1046,7 @@ def byte_array_to_json_string(raw_bytes): | |||
@staticmethod | |||
def json_string_to_byte_array(encoded_string): | |||
"""Implements org.apache.beam.sdk.util.StringUtils.jsonStringToByteArray.""" | |||
return unquote(encoded_string) | |||
return unquote(encoded_string).encode('utf-8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR #7650 also contains a fix for this, using unquote_to_bytes()
, instead of importing
from future.moves.urllib.parse import quote
from future.moves.urllib.parse import unquote
so the code is 'Python 3 native'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comments @RobbeSneyders. Let's use unquote_to_bytes.
This change fixes some compatibility issues with running on Python 3.
Follow this checklist to help us incorporate your contribution quickly and easily:
[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.It will help us expedite review of your Pull Request if you tag someone (e.g.
@username
) to look at it.Post-Commit Tests Status (on master branch)