Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming limitation confusion #347

Closed
lostpebble opened this issue Feb 6, 2019 · 4 comments
Closed

Streaming limitation confusion #347

lostpebble opened this issue Feb 6, 2019 · 4 comments
Assignees
Labels
api: bigquery Issues related to the googleapis/nodejs-bigquery API. type: docs Improvement to the documentation for an API.

Comments

@lostpebble
Copy link
Contributor

Hi there,

This is just an issue that I had with the docs which confused me a bit.

I think that I have figured out the best route going forward for myself, but I think the docs were a bit of a false red flag for me and made me take longer to realised the next step than I should have.

My use case is a decent volume of data rows coming in (1000 / hour) during some processing jobs that I'd like to push to a BigQuery table one at a time, as they happen (not batched).

In the docs for table.insert():

insert(rows, options, callback) returns Promise

Stream data into BigQuery one record at a time without running a load job.

There are more strict quota limits using this method so it is highly recommended that you load data into BigQuery using Table#load instead.

Here it says that there are "more strict quota limits" to using this method and that we should use table.load() instead.

So I went to go look at https://cloud.google.com/bigquery/quotas#load_jobs which tells me that these are limited to 1000 per day, so its really much below my use case - unless I now create a whole process to pre-store the data somewhere and then batch it into BigQuery at a later stage (undesired).

What's confusing there is that it also says:

The limits also apply to load jobs submitted programmatically by using the load-type jobs.insert API method.

Which made me think that even using table.insert has these limitations too.

Upon further investigation I have found that different limits apply to streaming (https://cloud.google.com/bigquery/quotas#streaming_inserts), which I assume are the limits that apply to this libraries table.insert() too?

So, where I've landed now is that I should be using the original method table.insert() which allows me the limits defined here.

I think it would be very helpful for future users if you were a little more clear about the limitations according to certain situations - as depending on the situation, those limitations can become irrelevant (much higher inserts but lower row count / huge row counts but limited on inserts ). The way its currently implied is leaning only towards the latter.

It took me much longer than I care to admit (thinking of ways to divert and batch these events with table.load()) to realise that I had landed on the correct method the first time.

@tswast
Copy link
Contributor

tswast commented Feb 6, 2019

Here it says that there are "more strict quota limits" to using this method and that we should use table.load() instead.

That is the opposite of what it should be. You can stream a lot. The quotas for streaming are quite high by default, but note there are additional costs associated with the streaming API. Whereas load jobs do have a strict quota limit both per table and per project.

@tswast
Copy link
Contributor

tswast commented Feb 6, 2019

I think that "There are more strict quota limits using this method so it is highly recommended that you load data into BigQuery using Table#load instead." should just be deleted entirely.

We do want to recommend people use load jobs if they want to create a whole table from a file for example, but there are many uses where streaming makes sense.

@lostpebble
Copy link
Contributor Author

We do want to recommend people use load jobs if they want to create a whole table from a file for example, but there are many uses where streaming makes sense.

Exactly. Its complete use-case dependant, so the docs sent me on a bit of a goose-chase being so "anti-streaming" in its wording. I think a short blurb on both methods explaining the pros / cons of either (with quota doc links) would be very helpful.

@callmehiphop callmehiphop added the type: docs Improvement to the documentation for an API. label Feb 6, 2019
@stephenplusplus
Copy link
Contributor

Our table.insert() method uses the API's table.insertAll method. The quota limits referenced are probably these ones: https://cloud.google.com/bigquery/quotas#streaming_inserts

The following limits apply for streaming data into BigQuery.

  • Maximum row size: 1 MB. Exceeding this value will cause invalid errors.
  • HTTP request size limit: 10 MB. Exceeding this value will cause invalid errors.
  • Maximum rows per second: 100,000 rows per second, per project. Exceeding this amount will cause quotaExceeded errors. The maximum number of rows per second per table is also 100,000.
    You can use all of this quota on one table or you can divide this quota among several tables in a project.
  • Maximum rows per request: 10,000 rows per request. We recommend a maximum of 500 rows. Batching can increase performance and throughput to a point, but at the cost of per-request latency. Too few rows per request and the overhead of each request can make ingestion inefficient. Too many rows per request and the throughput may drop.
    We recommend using about 500 rows per request, but experimentation with representative data (schema and data sizes) will help you determine the ideal batch size.
  • Maximum bytes per second: 100 MB per second, per table. Exceeding this amount will cause quotaExceeded errors.

We should add that link (https://cloud.google.com/bigquery/quotas#streaming_inserts) to explain what quota we're talking about.

@google-cloud-label-sync google-cloud-label-sync bot added the api: bigquery Issues related to the googleapis/nodejs-bigquery API. label Jan 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/nodejs-bigquery API. type: docs Improvement to the documentation for an API.
Projects
None yet
Development

No branches or pull requests

5 participants