Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that mulitpart bodies are always bytes. #1779

Merged
merged 6 commits into from
May 12, 2016
Merged

Ensure that mulitpart bodies are always bytes. #1779

merged 6 commits into from
May 12, 2016

Conversation

tseaver
Copy link
Contributor

@tseaver tseaver commented May 6, 2016

Loosely based on @joar's gist

Fixes #1760.

@tseaver tseaver added api: storage Issues related to the Cloud Storage API. api: bigquery Issues related to the BigQuery API. labels May 6, 2016
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label May 6, 2016
@tseaver
Copy link
Contributor Author

tseaver commented May 6, 2016

Note that opening the file passed to Table.upload_from_file in text mode now always fails on Py3k (for both small and large files). We should likely document that requirement.

def _email_chunk_parser():
import six
if six.PY3: # pragma: NO COVER Python3
from email.parser import BytesParser # pylint: disable=E0611

This comment was marked as spam.

This comment was marked as spam.

@dhermes
Copy link
Contributor

dhermes commented May 6, 2016

RE: Always failing tables, couldn't we just convert to bytes? Then instead of documenting "this will fail", we'd document "if you don't give us bytes, we're going to guess your encoding and we might get it wrong"

@dhermes dhermes mentioned this pull request May 6, 2016
@tseaver
Copy link
Contributor Author

tseaver commented May 6, 2016

Do we want to be in the business of reading potentially huge files in memory and guessing their encoding? If we want to try checking for 'stream.mode != 'rb'` (assuming they opened it, rather than passing us some other file-like object) we could raise an exception right away, rather than failing with a much uglier traceback later.

craigcitro added a commit to craigcitro/apitools that referenced this pull request May 9, 2016
As discovered in
googleapis/google-cloud-python#1760, we were
mangling bytes when encoding them as part of a multipart upload request. The
fix is to switch from using `six.StringIO` to `six.BytesIO` in `transfer.py`.
The patch here is closely based on
googleapis/google-cloud-python#1779.
@tseaver
Copy link
Contributor Author

tseaver commented May 10, 2016

@dhermes I'm reluctant to get into the mess of transcoding a large input stream opened in text mode.

@jgeewax WDYT?

@jgeewax
Copy link
Contributor

jgeewax commented May 11, 2016

I agree that guessing the format is probably a bad idea (because while it might make some people happy, it might make some other people really really mad).

Can we have a super clear exception saying ... "You gave us a file opened in text mode, which means we don't know the encoding... Can you open in binary mode with the right encoding?"

@dhermes
Copy link
Contributor

dhermes commented May 11, 2016

Sorry for the delay. @tseaver I agree with you, sniffing the entire file was ill-conceived.

if six.PY3: # pragma: NO COVER Python3
# pylint: disable=no-name-in-module
from email.parser import BytesParser
# pylint: enable=no-name-in-module

This comment was marked as spam.

This comment was marked as spam.

@tseaver
Copy link
Contributor Author

tseaver commented May 11, 2016

@jgeewax

Can we have a super clear exception saying ... "You gave us a file opened in text mode, which means we don't know the encoding... Can you open in binary mode with the right encoding?"

IFF the user gives us an actual file object (created via open), we can sniff its mode attribute. If they hand us a file-ish object (e.g., an instance of io.StreamIO) it likely won't have such an attribute.

@jgeewax
Copy link
Contributor

jgeewax commented May 11, 2016

@dhermes And in that case, we just do our best here, right?

@tseaver
Copy link
Contributor Author

tseaver commented May 11, 2016

@jgeewax 4a8e952 adds a test for a text-mode file, where the exception message says "use open(filename, mode='rb')".

@tseaver
Copy link
Contributor Author

tseaver commented May 12, 2016

@dhermes, @jgeewax PTAL: I think I've addressed your concerns.

@dhermes
Copy link
Contributor

dhermes commented May 12, 2016

LGTM

@tseaver tseaver merged commit 6064270 into googleapis:master May 12, 2016
@tseaver tseaver deleted the 1760-streaming-multipart_upload_py3k branch May 12, 2016 23:01
@tseaver tseaver mentioned this pull request May 16, 2016
parthea pushed a commit that referenced this pull request Jul 6, 2023
parthea pushed a commit that referenced this pull request Oct 21, 2023
parthea pushed a commit that referenced this pull request Oct 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. api: storage Issues related to the Cloud Storage API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants