Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google cloud spanner client fails when reading large columns #3998

Closed
paulMissault opened this issue Sep 19, 2017 · 3 comments
Closed

Google cloud spanner client fails when reading large columns #3998

paulMissault opened this issue Sep 19, 2017 · 3 comments
Assignees
Labels
api: spanner Issues related to the Spanner API.

Comments

@paulMissault
Copy link

Consider following table:

CREATE TABLE big_arrays (
    key INT64 NOT NULL,
    a_big_array ARRAY<TIMESTAMP> NOT NULL,
) PRIMARY KEY (key)

Which gets populated by 100 rows each containing a large array of 1000 timestamps.

from google.cloud import spanner
import datetime
import random
spanner_client = spanner.Client(project='my-project')
instance = spanner_client.instance('my-instance')
db = instance.database('my-db')

def random_timestamp():
    return datetime.datetime.utcfromtimestamp(random.uniform(1e9, 2e9))
def random_big_array(size=100):
    return [random_timestamp() for i in range(1000)]

batch_size = 10
batches = 10
for batch_index in range(batches):
    with db.batch() as batch:
        batch.insert_or_update(
            table='big_arrays',
            columns=['key', 'a_big_array'],
            values = [(key, random_big_array()) for key in range(batch_index*batch_size, (batch_index+1)*batch_size)]
        )

Trying to fetch these rows with db.execute_sql('select * from big_arrays limit 100').consume_all() results in following error and stacktrace:

Traceback (most recent call last):
  File "Untitled5.py", line 60, in <module>
    db.execute_sql("SELECT * FROM big_arrays limit 100").consume_all()
  File "/Users/myname/Repos/transformations/venv/lib/python2.7/site-packages/google/cloud/spanner/streamed.py", line 159, in consume_all
    self.consume_next()
  File "/Users/myname/Repos/transformations/venv/lib/python2.7/site-packages/google/cloud/spanner/streamed.py", line 148, in consume_next
    values[0] = self._merge_chunk(values[0])
  File "/Users/myname/Repos/transformations/venv/lib/python2.7/site-packages/google/cloud/spanner/streamed.py", line 108, in _merge_chunk
    merged = _merge_by_type(self._pending_chunk, value, field.type)
  File "/Users/myname/Repos/transformations/venv/lib/python2.7/site-packages/google/cloud/spanner/streamed.py", line 272, in _merge_by_type
    return merger(lhs, rhs, type_)
  File "/Users/myname/Repos/transformations/venv/lib/python2.7/site-packages/google/cloud/spanner/streamed.py", line 233, in _merge_array
    merged = _merge_by_type(last, first, element_type)
  File "/Users/myname/Repos/transformations/venv/lib/python2.7/site-packages/google/cloud/spanner/streamed.py", line 271, in _merge_by_type
    merger = _MERGE_BY_TYPE[type_.code]
KeyError: 4

Running the above query with the gcloud cli works as intended
gcloud spanner databases execute-sql my-db --instance=my-instance --sql='select * from big_arrays limit 100' >/dev/null

@paulMissault
Copy link
Author

Issue seems fixed by adding

_MERGE_BY_TYPE.update({
    type_pb2.DATE: _merge_string,
    type_pb2.TIMESTAMP: _merge_string,
})

to google.cloud.spanner.streamed

@vkedia
Copy link

vkedia commented Sep 19, 2017

cc @tseaver @lukesneeringer
Thanks for filing this. This is duplicate of #3981 .
@paulMissault Do you want to submit your proposed fix as a pull request. It looks reasonable to me.

@bjwatson
Copy link

@lukesneeringer @tseaver Do you agree that this looks like a quick fix? If so, let's get it done before Beta. If it looks much more involved, then let's discuss more.

lukesneeringer pushed a commit to lukesneeringer/google-cloud-python that referenced this issue Sep 21, 2017
@tseaver tseaver added the api: spanner Issues related to the Spanner API. label Sep 21, 2017
tseaver pushed a commit that referenced this issue Sep 21, 2017
This was referenced Sep 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the Spanner API.
Projects
None yet
Development

No branches or pull requests

4 participants