-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Uint8Array as internal repr of Bytes and fix #117 #195
Conversation
From discussion with @volovyks , I propose the following design (all alternatives I can think of are in PR description):
In summary, impact to user is in low level APIs. Function returning Please comment if there's anything looks wrong or there's a better design! |
let storageKey = this.keyPrefix + JSON.stringify(key) | ||
if (near.storageRemove(storageKey)) { | ||
return JSON.parse(near.storageGetEvicted()) | ||
return JSON.parse(u8ArrayToLatin1(near.storageGetEvicted())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we localize conversion such conversions in api.ts
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on a few decisions.
The complexity is arise from a few places:
- should bytes be alias to string, alias to Uint8Array, or a polymorphic type
- what should the
storage*
returns, string or Uint8Array - if it returns
Uint8Array
, what should auto-deserialization do,Uint8Array -> string -> JSON.parse
, or a binary-format deserialization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We serialize state to JSON, this is problematic with unicode characters. After we have the correct enforcement on storage key / value to be Uint8Array, unicode characters will be correctly rejected, but we must come up a strategy in our sdk's auto-serialization to handle user object with unicode string properties
Why do you need to reject unicode characters if everything is being translated to utf8?
Throw error at any unicode string character, only allow latin1 chars.
Did you mean only allow utf8 chars? Why latin1? (curiosity) The changes feel like it's actually utf8
In general I think at a high level #195 (comment) makes sense
readonly keyPrefix: string; | ||
|
||
constructor(keyPrefix: Bytes) { | ||
constructor(keyPrefix: string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why doesn't this accept Bytes
input? Wouldn't it make more sense to store the prefix as UInt8Array
since it isn't really a utf16 string but bytes (or intended to be)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that makes most sense. I'm experimenting different approaches in collections, this one tries to keep backward compatibility but implementation then looks really awkward and not correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it backwards compatible if using Bytes
? One of the variants would be string? Does it affect anything if the internal type changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is backward compatible, if Bytes is string | Uint8Array.
} | ||
throw new Error("bytes: expected string or Uint8Array"); | ||
return ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return ret; | |
return String.fromCharCode(...array); |
And you can delete the lines above
The problem is ambiguity. A utf-8 sequence in byte and a string that is utf-8 encoded from same bytes will be serialized to the same thing, and you cannot know which one it is from when doing deserialize.
I want to note that this bullet point is one of the possible approaches, but not the approach implemented in this PR. So yeah your observation "The changes feel like it's actually utf8" is right. This alternative approach is to restrict to must pass a JS string full of latin1 character (char code 0-255 only) to ensure correctness in deserialization.
Thanks! Good to know it's a reasonable direction |
Hi all, what is the status on this PR? In the current form, it's pretty much impossible to use bytes in a contract without the data being mangled in UTF-16 conversion issues, making any application which involves raw bytes unusable. As a workaround I am base64 encoding everything but this seems extremely inefficient. IMO, bytes should = Uint8Array and nothing else. For example, I think this PR needs urgent attention as the lack of being able to using a raw bytes array makes the JS SDK unusable for many applications. |
@no2chem you are right, I'll look into this week |
superseded by #308 , basically what we agreed here #195 (comment) + @no2chem suggested:
are implemented. Let's review and further discuss in #308 |
builder.c code JS
Uint8Array
-> Cuint8_t *
and vice versa works, ensured by a few passed tests (valueReturn
: JS -> C,readRegister
: C -> JS).To discuss:
Bytes
for backward-compatible. HaveBytes
Uint8Array | String
, or a class which constructor can take a string and check it's all in range, or just alias ofUint8Array
, or just dropBytes
and useUint8Array
for low level APIsUint8Array
<>String
decode/encode. In nodejs, this is typically viaBuffer
(a subclass and enhanced version of Uint8Array) orTextEncoder
/TextDecoder
, but none of them is in quickjs. WIth basicString.charCodeAt/fromCharCode
, I can make a simplest latin1 encoder/decoder (each Uint8 is mapped to the same 0-255 char code). With the unicode C library shipped with quickjs, it seems possible to expose a UTF-8 and UTF-16 text encoder/decoder. I think these together are sufficiently good. Other ideas are welcome.@volovyks @austinabell What are your thoughts? Thank you!