Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization of Lists of Strings #30

Closed

Conversation

691175002
Copy link

I found the existing behavior when serializing length one strings to be surprising.

s = Series(["One","Two","3"])
s.meta = MetaData(qtype=QSTRING_LIST)

k.sync('type each',s)
    0    10
    1    10
    2   -10
    dtype: int1

k.sync('{x like "Two"}', s)
    `type

I realize that QSTRING_LIST == QGENERIC_LIST but as far as I can tell there is no way to serialize a list of strings without the risk of some ending up as characters.

This fork contains a pretty straightforward change - QWriter will now always serialize str as a QSTRING regardless of its length.

The bytes type retains old behaviour so a bytestring will serialize as a QSTRING while a length-one bytes will serialize to the QCHAR type. This change is carried over to QReader, so QSTRING is converted to str and QCHAR is converted to bytes.

I'm not sure if this is the best approach to the problem, or if other people even consider it an issue but I figured I'd toss it out as an option.

@maciejlach
Copy link
Collaborator

Thanks for pointing this out. This default encoding is based on behavior of q console, e.g.:

q)type each ("one";"two";"3")
10 10 -10h

I agree that this might become surprising and limiting in some cases. In order to minimize impact on existing code base, I would propose a different approach based on parser configuration mechanism. The QConnection would accept a single_char_strings argument which would control the encoding for single character length scripts:

q = qconnection.QConnection(host='localhost', port=5000, single_char_strings = True)
q.open()

s = pandas.Series(["One","Two","3"])
s.meta = MetaData(qtype=QSTRING_LIST)

r = q('{[x] type each x}', s)
print (r)

s = qlist(['One', 'Two', '3'], qtype = QSTRING_LIST)
r = q('{[x] type each x}', s, single_char_strings = True)
print (r)

s = ['One', 'Two', '3']
r = q('{[x] type each x}', s)
print (r)

q.close()

Would yield:

[10 10 10]
[10 10 10]
[10 10 10]

Does this solution match your use case?

@691175002
Copy link
Author

That would probably be a better option. It would work well for me.

@maciejlach
Copy link
Collaborator

The enhancement has been applied to master and 1.1 branch.

Check the documentation for 1.1 for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants