-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming limitation confusion #347
Comments
That is the opposite of what it should be. You can stream a lot. The quotas for streaming are quite high by default, but note there are additional costs associated with the streaming API. Whereas load jobs do have a strict quota limit both per table and per project. |
I think that "There are more strict quota limits using this method so it is highly recommended that you load data into BigQuery using Table#load instead." should just be deleted entirely. We do want to recommend people use load jobs if they want to create a whole table from a file for example, but there are many uses where streaming makes sense. |
Exactly. Its complete use-case dependant, so the docs sent me on a bit of a goose-chase being so "anti-streaming" in its wording. I think a short blurb on both methods explaining the pros / cons of either (with quota doc links) would be very helpful. |
Our The following limits apply for streaming data into BigQuery.
We should add that link (https://cloud.google.com/bigquery/quotas#streaming_inserts) to explain what quota we're talking about. |
Hi there,
This is just an issue that I had with the docs which confused me a bit.
I think that I have figured out the best route going forward for myself, but I think the docs were a bit of a false red flag for me and made me take longer to realised the next step than I should have.
My use case is a decent volume of data rows coming in (1000 / hour) during some processing jobs that I'd like to push to a BigQuery table one at a time, as they happen (not batched).
In the docs for
table.insert()
:Here it says that there are "more strict quota limits" to using this method and that we should use
table.load()
instead.So I went to go look at https://cloud.google.com/bigquery/quotas#load_jobs which tells me that these are limited to 1000 per day, so its really much below my use case - unless I now create a whole process to pre-store the data somewhere and then batch it into BigQuery at a later stage (undesired).
What's confusing there is that it also says:
Which made me think that even using
table.insert
has these limitations too.Upon further investigation I have found that different limits apply to streaming (https://cloud.google.com/bigquery/quotas#streaming_inserts), which I assume are the limits that apply to this libraries
table.insert()
too?So, where I've landed now is that I should be using the original method
table.insert()
which allows me the limits defined here.I think it would be very helpful for future users if you were a little more clear about the limitations according to certain situations - as depending on the situation, those limitations can become irrelevant (much higher inserts but lower row count / huge row counts but limited on inserts ). The way its currently implied is leaning only towards the latter.
It took me much longer than I care to admit (thinking of ways to divert and batch these events with
table.load()
) to realise that I had landed on the correct method the first time.The text was updated successfully, but these errors were encountered: