Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery: Errno::EPIPE on loading csv file #266

Closed
vitaliel opened this issue Sep 3, 2015 · 6 comments · Fixed by #268
Closed

BigQuery: Errno::EPIPE on loading csv file #266

vitaliel opened this issue Sep 3, 2015 · 6 comments · Fixed by #268
Assignees
Labels
api: bigquery Issues related to the BigQuery API. 🚨 This issue needs some love. triage me I really want to be triaged. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Milestone

Comments

@vitaliel
Copy link

vitaliel commented Sep 3, 2015

Hi,

I'm trying to upload 100Mb csv file to bigquery, but I get Errno::EPIPE errors.

Snippet:

gcloud = Gcloud.new project_id, key_file
bigquery = gcloud.bigquery
dataset = bigquery.dataset 'logging'
table = dataset.table table_name
load_job = table.load 'site_access.csv', chunk_size: 10 * 1024 * 1024

I get the error after 10 seconds, but If I do not pass chunk_size, it fails after 50 seconds.

Exception:

$ time ./bin/bq_uploader
/home/ana/.rvm/rubies/ruby-2.2.2/lib/ruby/2.2.0/openssl/buffering.rb:326:in `syswrite': Broken pipe (Errno::EPIPE)
    from /home/ana/.rvm/rubies/ruby-2.2.2/lib/ruby/2.2.0/openssl/buffering.rb:326:in `do_write'
    from /home/ana/.rvm/rubies/ruby-2.2.2/lib/ruby/2.2.0/openssl/buffering.rb:344:in `write'
    from /home/ana/.rvm/rubies/ruby-2.2.2/lib/ruby/2.2.0/net/http/generic_request.rb:205:in `copy_stream'
    from /home/ana/.rvm/rubies/ruby-2.2.2/lib/ruby/2.2.0/net/http/generic_request.rb:205:in `send_request_with_body_stream'
    from /home/ana/.rvm/rubies/ruby-2.2.2/lib/ruby/2.2.0/net/http/generic_request.rb:122:in `exec'
    from /home/ana/.rvm/rubies/ruby-2.2.2/lib/ruby/2.2.0/net/http.rb:1412:in `block in transport_request'
    from /home/ana/.rvm/rubies/ruby-2.2.2/lib/ruby/2.2.0/net/http.rb:1411:in `catch'
    from /home/ana/.rvm/rubies/ruby-2.2.2/lib/ruby/2.2.0/net/http.rb:1411:in `transport_request'
    from /home/ana/.rvm/rubies/ruby-2.2.2/lib/ruby/2.2.0/net/http.rb:1384:in `request'
    from /home/ana/.rvm/rubies/ruby-2.2.2/lib/ruby/2.2.0/net/http.rb:1377:in `block in request'
    from /home/ana/.rvm/rubies/ruby-2.2.2/lib/ruby/2.2.0/net/http.rb:853:in `start'
    from /home/ana/.rvm/rubies/ruby-2.2.2/lib/ruby/2.2.0/net/http.rb:1375:in `request'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/faraday-0.9.1/lib/faraday/adapter/net_http.rb:82:in `perform_request'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/faraday-0.9.1/lib/faraday/adapter/net_http.rb:40:in `block in call'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/faraday-0.9.1/lib/faraday/adapter/net_http.rb:87:in `with_net_http_connection'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/faraday-0.9.1/lib/faraday/adapter/net_http.rb:32:in `call'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/faraday-0.9.1/lib/faraday/response.rb:8:in `call'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/google-api-client-0.8.6/lib/google/api_client/request.rb:163:in `send'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/google-api-client-0.8.6/lib/google/api_client/request.rb:174:in `send'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/google-api-client-0.8.6/lib/google/api_client.rb:648:in `block (2 levels) in execute!'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/retriable-1.4.1/lib/retriable/retry.rb:27:in `perform'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/retriable-1.4.1/lib/retriable.rb:15:in `retriable'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/google-api-client-0.8.6/lib/google/api_client.rb:645:in `block in execute!'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/retriable-1.4.1/lib/retriable/retry.rb:27:in `perform'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/retriable-1.4.1/lib/retriable.rb:15:in `retriable'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/google-api-client-0.8.6/lib/google/api_client.rb:636:in `execute!'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/google-api-client-0.8.6/lib/google/api_client.rb:679:in `execute'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/gcloud-0.3.0/lib/gcloud/bigquery/connection.rb:307:in `load_resumable'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/gcloud-0.3.0/lib/gcloud/bigquery/table.rb:758:in `load_resumable'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/gcloud-0.3.0/lib/gcloud/bigquery/table.rb:750:in `load_local'
    from /home/ana/.rvm/gems/ruby-2.2.2@bq_uploader/gems/gcloud-0.3.0/lib/gcloud/bigquery/table.rb:613:in `load'
    from /home/lz/projects/assembla/bq_uploader/lib/assembla/bq_uploader.rb:41:in `initialize'
    from ./bin/bq_uploader:10:in `new'
    from ./bin/bq_uploader:10:in `<main>'
./bin/bq_uploader  1,21s user 0,09s system 12% cpu 10,104 total
@blowmage blowmage added this to the 0.3.1 milestone Sep 3, 2015
@blowmage blowmage added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. api: bigquery Issues related to the BigQuery API. labels Sep 3, 2015
@blowmage
Copy link
Contributor

blowmage commented Sep 3, 2015

Thanks again for opening the issue. We'll get right on it.

@vitaliel
Copy link
Author

vitaliel commented Sep 3, 2015

It's strange, I succeded only with a csv file with size < 5_000_000 bytes and 39250 rows.

@quartzmo
Copy link
Member

quartzmo commented Sep 3, 2015

Hi @vitaliel, and thank you for reporting this!

This Broken pipe (Errno::EPIPE) error that you appear to have encountered is a known issue that is the root cause of three of the 39 currently open issues in google-api-ruby-client (upon which Gcloud depends.) They are:

  • #67 - Batch requests broken...
  • #69 - Resumable Upload results in broken pipe
  • #106 - Google::APIClient::BatchRequest, broken pipe...

A solution, documented in two of the issues above as well as in this Stack Overflow answer, is to add this line before your code (right after requiring gcloud.) You will also need to add httpclient as a dependency in your project.

Faraday.default_adapter = :httpclient

Can you give this a try and let us know if it solves the problem? If so, I will add the solution to the documentation for Table#load, and close this issue.

Thank you @blowmage for providing the background story on this.

@vitaliel
Copy link
Author

vitaliel commented Sep 4, 2015

@quartzmo Thanks, it worked.

@quartzmo
Copy link
Member

quartzmo commented Sep 4, 2015

@vitaliel Great. I will add documentation of this issue to the API doc for Table#load, and in a Cloud Storage method where it is also possible. Then I will close this issue. Thanks again.

quartzmo added a commit to quartzmo/google-cloud-ruby that referenced this issue Sep 4, 2015
There is no fix we can make for this file upload issue, so
add documentation instead.

[closes googleapis#266]
blowmage added a commit that referenced this issue Sep 4, 2015
Update Storage and BigQuery docs with broken pipe solution

[closes #266]
@blowmage
Copy link
Contributor

blowmage commented Sep 4, 2015

FYI, the updated docs will be included in the next point release (0.3.1), but the release after that (0.4.0) will most likely switch dependencies from Faraday to Hurley, meaning this guidance will change. Hopefully Hurley will be an improvement on Faraday and not have this issue in the default provider. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. 🚨 This issue needs some love. triage me I really want to be triaged. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants