Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix several byte / unicode issues for Python 3 #7665

Closed
wants to merge 1 commit into from

Conversation

charlesccychen
Copy link
Contributor

@charlesccychen charlesccychen commented Jan 29, 2019

This change fixes some compatibility issues with running on Python 3.


Follow this checklist to help us incorporate your contribution quickly and easily:

  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

It will help us expedite review of your Pull Request if you tag someone (e.g. @username) to look at it.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- --- --- --- ---
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Python Build Status --- Build Status
Build Status
Build Status --- --- ---

@charlesccychen
Copy link
Contributor Author

Run Python PostCommit

@charlesccychen
Copy link
Contributor Author

R: @tvalentyn @aaltay @markflyhigh

@@ -226,7 +226,7 @@ def read_meta_data_from_file(f):
meta = header['meta']

if datafile.CODEC_KEY in meta:
codec = meta[datafile.CODEC_KEY]
codec = meta[datafile.CODEC_KEY].decode('utf-8')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When running the avroio tests in Python 3, this results in the following error for me:

File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line 355, in _decompress_bytes
    raise ValueError('Unknown codec: %r' % codec)
ValueError: Unknown codec: 'null'

I've used another approach and have fixed all but 2 Python 3 errors in avroio. see #7644.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach in #7644 makes sense to me. Once merged - @charlesccychen PTAL if you still encounter the errors and rebase the PR on top of master. Thank you.

@@ -1046,7 +1046,7 @@ def byte_array_to_json_string(raw_bytes):
@staticmethod
def json_string_to_byte_array(encoded_string):
"""Implements org.apache.beam.sdk.util.StringUtils.jsonStringToByteArray."""
return unquote(encoded_string)
return unquote(encoded_string).encode('utf-8')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #7650 also contains a fix for this, using unquote_to_bytes(), instead of importing

from future.moves.urllib.parse import quote
from future.moves.urllib.parse import unquote

so the code is 'Python 3 native'.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comments @RobbeSneyders. Let's use unquote_to_bytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants