Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bigquery: remove automatic insertId generation & allow specifying raw format #1068

Merged
merged 1 commit into from
Jan 19, 2016

Conversation

stephenplusplus
Copy link
Contributor

Closes #1066

Breaking change included!

insertId is no longer defaulted to a value.

@vladmiller - Please take a look. I just wanted to change a few things (mostly style things) and also remove the default insertId generation. The CLA bot will say something about confirming you are the original author of this code. If you're okay with it, just leave a note that it's okay.

@stephenplusplus stephenplusplus added the api: bigquery Issues related to the BigQuery API. label Jan 18, 2016
@googlebot
Copy link

We found a Contributor License Agreement for you (the sender of this pull request) and all commit authors, but as best as we can tell these commits were authored by someone else. If that's the case, please add them to this pull request and have them confirm that they're okay with these commits being contributed to Google. If we're mistaken and you did author these commits, just reply here to confirm.

@googlebot googlebot added the cla: no This human has *not* signed the Contributor License Agreement. label Jan 18, 2016
@vladmiller
Copy link
Contributor

@stephenplusplus I think that we have to leave autogenerated insertID untouched s.t. old apps won't break.

@vladmiller
Copy link
Contributor

Okay Google, contribute my commits.

@vladmiller
Copy link
Contributor

Okay Google, I agree to contribute my commits.

@vladmiller
Copy link
Contributor

Google set cla yes

@stephenplusplus
Copy link
Contributor Author

We will just bump the minor version to follow semver rules for the breaking change (pre-1.0, a minor bump is equivalent to a post-1.0 major bump).

@stephenplusplus stephenplusplus added cla: yes This human has signed the Contributor License Agreement. and removed cla: no This human has *not* signed the Contributor License Agreement. labels Jan 18, 2016
@stephenplusplus
Copy link
Contributor Author

Thanks @vladmiller!

@vladmiller
Copy link
Contributor

@stephenplusplus any estimate on when package becomes available in npm?

@stephenplusplus
Copy link
Contributor Author

We're due for a release soon. Maybe I can get one out this week. In the meantime, you can use master:

$ npm install --save googlecloudplatform/gcloud-node

@vladmiller
Copy link
Contributor

@stephenplusplus Thank you!

@vladmiller
Copy link
Contributor

@stephenplusplus do you know when this PR will be merged?

@stephenplusplus
Copy link
Contributor Author

Just needs a review from @callmehiphop.

This should work if you want to install from my branch for now:

$ npm install --save stephenplusplus/gcloud-node#vlad--patch-1

@callmehiphop
Copy link
Contributor

@stephenplusplus Looks good to me!

stephenplusplus added a commit that referenced this pull request Jan 19, 2016
bigquery: remove automatic `insertId` generation & allow specifying raw format
@stephenplusplus stephenplusplus merged commit 4d9f37c into googleapis:master Jan 19, 2016
@jgeewax
Copy link
Contributor

jgeewax commented May 7, 2017

@stephenplusplus I'm confused by this one. Why can't we allow control of the insert ID via overriding without keeping the auto-generation?

The request in #1066 was about being able to manually specify an insert ID, so adding the raw thing works, but taking out the other part kind of screws over the people who don't necessarily care about what value the ID is so long as it is unique (effectively saying "if an API request is one I've already sent, treat it as a duplicate")

The request in #1041 is about adding multiple of the same rows in parallel (aka, same timestamp, same data), so I can see how a hash wouldn't make sense there.

What we're trying to control for is:

  • Insert rows r
  • Request times out (maybe network cable was yanked, maybe GCP had an error)
  • It's impossible to know for sure whether this batch of rows was added
  • Retry inserting rows r

I'd argue that this is not for de-duping your data (aka, only inserting unique rows), and it is not for recovering after a client-side failure, but only about server errors and auto-retries. If you want uniqueness constraints or transactions to avoid client-side problems, you should use a different storage system or de-duplicate, save to GCS, and bulk load the data afterwards.

Can we re-open the discussion about using a UUID-1 (or similar) to generate insert ID values when none are provided (and which are only used when we automatically retry a failed request) ?

@stephenplusplus
Copy link
Contributor Author

Sure, let's move the convo back to the now re-opened #1041.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants