Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WRITE_TRUNCATE appending to table #2326

Closed
DannyLee12 opened this issue Sep 16, 2016 · 5 comments
Closed

WRITE_TRUNCATE appending to table #2326

DannyLee12 opened this issue Sep 16, 2016 · 5 comments
Assignees
Labels
api: bigquery Issues related to the BigQuery API. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@DannyLee12
Copy link

DannyLee12 commented Sep 16, 2016

Using the templates found here
Running the commands

job = client.load_table_from_storage(
    'load-from-storage' + datetime.now().strftime('%Y%m%d%H%M'),
    table, gsbucket)
job.skip_leading_rows = 0
job.writeDisposition = 'WRITE_TRUNCATE'
job.field_delimiter = ','
job.begin()

This piece of code appends to my table instead of overwriting it like the docs say.

WRITE_TRUNCATE: If the table already exists, BigQuery overwrites the table data.

I know this because if I use:

table.reload()
print(table.num_rows)

after the job it has increased by 2 million - the size of the table.

I use a workaround as follows:

if disposition == 'WRITE_TRUNCATE':
    schema = table.schema
    while table.exists():
        table.delete()
    table = dataset.table(table_name, schema=schema)
    table.create()

Which seems to work fine, and the while loop is just me making certain that the table is deleted but when I tested it without the while loop it also worked.

Anyway, the issue is WRITE_TRUNCATE isn't doing what it say it does in the docs.

@tseaver tseaver added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. api: bigquery Issues related to the BigQuery API. backend labels Sep 16, 2016
@tseaver tseaver self-assigned this Sep 16, 2016
@tseaver
Copy link
Contributor

tseaver commented Sep 16, 2016

@DannyLee12 in #2327 I tried to reproduce this issue, creating a new system test which loads the same table twice, with write_disposition = 'WRITE_TRUNCATE' on the second run. It works as expected: the rows fetched from the table after the second run aren't doubled.

Can you figure out what differs between your case and that new test?

@DannyLee12
Copy link
Author

@tseaver I see you are using job.write_disposition while I have been using job.writeDisposition This fixed the issue and thanks for the test.

This is a quote from these documents:

Schema update options are supported in two cases: when writeDisposition is WRITE_APPEND; when writeDisposition is WRITE_TRUNCATE and the destination table is a partition of a table, specified by partition decorators. For normal tables, WRITE_TRUNCATE will always overwrite the schema.

Emphasis mine. Is it possible to update the docs referenced? Thanks again.

@daspecster
Copy link
Contributor

The documentation that you referenced is a more general and somewhat language agnostic documentation to describe the architecture of the service.

I think this might be more helpful for you.

@tseaver
Copy link
Contributor

tseaver commented Sep 19, 2016

@DannyLee12 those docs describe the field names which the back-end requires be set in the JSON payloads: the property names we expose in the google-cloud-python are all PEP8-conformant, so we translate from the camelCasedNames the API uses to names_with_underscores. As @daspecster notes, you need to look at the docs for this library to see how things are spelled, and use the back-end docs just for concepts.

FWIW: @jonparrott and his team are working on exposing correct snippets for each API language wrapper in the back-end docs. Those snippets will be tested, and therefore match this library.

@DannyLee12
Copy link
Author

@tseaver @daspecster Thanks guys, as with everything, it's obvious once you know. Thanks again, really appreciate the assistance.
Cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

3 participants