Streaming limitation confusion #347

lostpebble · 2019-02-06T14:13:53Z

Hi there,

This is just an issue that I had with the docs which confused me a bit.

I think that I have figured out the best route going forward for myself, but I think the docs were a bit of a false red flag for me and made me take longer to realised the next step than I should have.

My use case is a decent volume of data rows coming in (1000 / hour) during some processing jobs that I'd like to push to a BigQuery table one at a time, as they happen (not batched).

In the docs for table.insert():

insert(rows, options, callback) returns Promise

Stream data into BigQuery one record at a time without running a load job.

There are more strict quota limits using this method so it is highly recommended that you load data into BigQuery using Table#load instead.

Here it says that there are "more strict quota limits" to using this method and that we should use table.load() instead.

So I went to go look at https://cloud.google.com/bigquery/quotas#load_jobs which tells me that these are limited to 1000 per day, so its really much below my use case - unless I now create a whole process to pre-store the data somewhere and then batch it into BigQuery at a later stage (undesired).

What's confusing there is that it also says:

The limits also apply to load jobs submitted programmatically by using the load-type jobs.insert API method.

Which made me think that even using table.insert has these limitations too.

Upon further investigation I have found that different limits apply to streaming (https://cloud.google.com/bigquery/quotas#streaming_inserts), which I assume are the limits that apply to this libraries table.insert() too?

So, where I've landed now is that I should be using the original method table.insert() which allows me the limits defined here.

I think it would be very helpful for future users if you were a little more clear about the limitations according to certain situations - as depending on the situation, those limitations can become irrelevant (much higher inserts but lower row count / huge row counts but limited on inserts ). The way its currently implied is leaning only towards the latter.

It took me much longer than I care to admit (thinking of ways to divert and batch these events with table.load()) to realise that I had landed on the correct method the first time.

The text was updated successfully, but these errors were encountered:

tswast · 2019-02-06T17:05:21Z

Here it says that there are "more strict quota limits" to using this method and that we should use table.load() instead.

That is the opposite of what it should be. You can stream a lot. The quotas for streaming are quite high by default, but note there are additional costs associated with the streaming API. Whereas load jobs do have a strict quota limit both per table and per project.

tswast · 2019-02-06T17:07:14Z

I think that "There are more strict quota limits using this method so it is highly recommended that you load data into BigQuery using Table#load instead." should just be deleted entirely.

We do want to recommend people use load jobs if they want to create a whole table from a file for example, but there are many uses where streaming makes sense.

lostpebble · 2019-02-06T17:26:00Z

We do want to recommend people use load jobs if they want to create a whole table from a file for example, but there are many uses where streaming makes sense.

Exactly. Its complete use-case dependant, so the docs sent me on a bit of a goose-chase being so "anti-streaming" in its wording. I think a short blurb on both methods explaining the pros / cons of either (with quota doc links) would be very helpful.

stephenplusplus · 2019-02-07T19:09:59Z

Our table.insert() method uses the API's table.insertAll method. The quota limits referenced are probably these ones: https://cloud.google.com/bigquery/quotas#streaming_inserts

The following limits apply for streaming data into BigQuery.

Maximum row size: 1 MB. Exceeding this value will cause invalid errors.
HTTP request size limit: 10 MB. Exceeding this value will cause invalid errors.
Maximum rows per second: 100,000 rows per second, per project. Exceeding this amount will cause quotaExceeded errors. The maximum number of rows per second per table is also 100,000.
You can use all of this quota on one table or you can divide this quota among several tables in a project.
Maximum rows per request: 10,000 rows per request. We recommend a maximum of 500 rows. Batching can increase performance and throughput to a point, but at the cost of per-request latency. Too few rows per request and the overhead of each request can make ingestion inefficient. Too many rows per request and the throughput may drop.
We recommend using about 500 rows per request, but experimentation with representative data (schema and data sizes) will help you determine the ideal batch size.
Maximum bytes per second: 100 MB per second, per table. Exceeding this amount will cause quotaExceeded errors.

We should add that link (https://cloud.google.com/bigquery/quotas#streaming_inserts) to explain what quota we're talking about.

callmehiphop added the type: docs Improvement to the documentation for an API. label Feb 6, 2019

callmehiphop mentioned this issue Mar 11, 2019

docs(table): link to upstream limit docs #376

Merged

3 tasks

callmehiphop closed this as completed in #376 Mar 11, 2019

google-cloud-label-sync bot added the api: bigquery Issues related to the googleapis/nodejs-bigquery API. label Jan 31, 2020

JustinBeckwith assigned yoshi-automation Feb 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming limitation confusion #347

Streaming limitation confusion #347

lostpebble commented Feb 6, 2019

tswast commented Feb 6, 2019

tswast commented Feb 6, 2019

lostpebble commented Feb 6, 2019

stephenplusplus commented Feb 7, 2019

Streaming limitation confusion #347

Streaming limitation confusion #347

Comments

lostpebble commented Feb 6, 2019

tswast commented Feb 6, 2019

tswast commented Feb 6, 2019

lostpebble commented Feb 6, 2019

stephenplusplus commented Feb 7, 2019