Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Serialization Logic #82

Merged
merged 3 commits into from
Sep 26, 2019
Merged

Improve Serialization Logic #82

merged 3 commits into from
Sep 26, 2019

Conversation

feanil
Copy link
Contributor

@feanil feanil commented Sep 25, 2019

As a part of the python3 conversion, we added logic to try to convert
bytestrings to unicode strings. This had a problem because not all
bytestrings that are produced as a part of the context passed into and
out of the jailed code execution is actually unicode strings. In some
cases this was resulting in a unicode decode error.

Before this code existed we used to just drop any nonstandard types and
also only keep complex types(list,dict,tuples) that were easily
jsonable. In python2 this was fine because byte strings would
automatically get coerced to unicode strings. However in python3 this
will throw errors. So if we don't explicitly try to convert all the
byte strings to unicode, we could stop passing back data that we
previously used to pass in.

The first iteration of this code assumed all byte arrays were unicode
strings. From evidence in production this is not always the case. So
we fall back to the behaviour we had before this change where if an item
can't successfully be converted to something we can serialize to json,
we just drop that item from the list of content being passed back
through the serialization straw.

As a part of the python3 conversion, we added logic to try to convert
bytestrings to unicode strings.  This had a problem because not all
bytestrings that are produced as a part of the context passed into and
out of the jailed code execution is actually unicode strings.  In some
cases this was resulting in a unicode decode error.

Before this code existed we used to just drop any nonstandard types and
also only keep complex types(list,dict,tuples) that were easily
jsonable.  In python2 this was fine because byte strings would
automatically get coerced to unicode strings.  However in python3 this
will throw errors.  So if we don't explicitly try to convert all the
byte strings to unicode, we could stop passing back data that we
previously used to pass in.

The first iteration of this code assumed all byte arrays were unicode
strings.  From evidence in production this is not always the case.  So
we fall back to the behaviour we had before this change where if an item
can't successfully be converted to something we can serialize to json,
we just drop that item from the list of content being passed back
through the serialization straw.
@feanil feanil requested review from jinder1s and jmbowman September 26, 2019 14:26
@feanil feanil changed the title See where it blows up if we don't have this. Improve Serialization Logic Sep 26, 2019
@@ -22,7 +22,7 @@
# Flags to let developers temporarily change some behavior in this file.

# Set this to True to log all the code and globals being executed.
LOG_ALL_CODE = False
LOG_ALL_CODE = True

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume this change should be reverted before merging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

try:
new_value = filter_unserializable(v)
if jsonable(new_value):
new_dict[k] = new_value

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the key comes in as bytes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add that case.


ok_types = (type(None), int, float, str, six.text_type, list, tuple, dict)
serializable_dict = filter_unserializable(d)
#serializable_dict = d

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to keep this comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove.

globs
)
from pprint import pprint
pprint(globs)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this just for debugging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@feanil feanil requested a review from jmbowman September 26, 2019 19:53
@feanil feanil closed this Sep 26, 2019
@feanil feanil reopened this Sep 26, 2019
@feanil feanil merged commit de966ae into master Sep 26, 2019
@feanil feanil deleted the feanil/zip_debug branch September 26, 2019 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants