-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-implement select_object_content implementation #793
Conversation
24320bb
to
f9f3a17
Compare
f9f3a17
to
97a861c
Compare
PR is updated with further changes @Praveenrajmani @sinhaashish PTAL |
97a861c
to
aedeea3
Compare
Breaking for this
|
Unicode characters should be inputs for python as |
from minio import Minio
from minio.error import ResponseError
from minio.select.options import (SelectObjectOptions, CSVInput,
JSONInput, RequestProgress,
ParquetInput, InputSerialization,
OutputSerialization, CSVOutput,
JsonOutput)
from minio.select.errors import (SelectCRCValidationError, SelectMessageError)
client = Minio('s3.amazonaws.com',
access_key='ACCESSKEY',
secret_key='SECRETKEY')
options = SelectObjectOptions(
expression="select * from s3object",
input_serialization=InputSerialization(
compression_type="GZIP",
csv=CSVInput(FileHeaderInfo="USE",
RecordDelimiter="\n",
FieldDelimiter=u'╦',
QuoteCharacter='"',
QuoteEscapeCharacter='"',
Comments="#",
AllowQuotedRecordDelimiter="FALSE",
),
# If input is JSON
# json=JSONInput(Type="DOCUMENT",)
),
output_serialization=OutputSerialization(
csv=CSVOutput(QuoteFields="ASNEEDED",
RecordDelimiter="\n",
FieldDelimiter=u'╦',
QuoteCharacter='"',
QuoteEscapeCharacter='"',)
# json = JsonOutput(
# RecordDelimiter="\n",
# )
),
request_progress=RequestProgress(
enabled="False"
)
)
try:
data = client.select_object_content('wlk-data-wbrp', '20190612-00690-1/wlk-wbrp-part-0000.csv.gz', options)
# Get the records
with open('my-record-file', 'w') as record_data:
for d in data.stream(10*1024):
record_data.write(d)
# Get the stats
print(data.stats())
except SelectMessageError as err:
print(err)
except SelectCRCValidationError as err:
print(err)
except ResponseError as err:
print(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested with different inputs and LGTM ,
Just SelectSelectCRCValidationError
-> SelectCRCValidationError
in examples/select_object_content.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This change fixes multiple issues - handles unicode boundaries properly for special delimiters - handle zero payload 'Cont' event messages - handle error messages properly
d6a8826
aedeea3
to
d6a8826
Compare
This change fixes multiple issues