Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
fix issue #5336
This issue is little bit complicated so please read through below:
User want to load a pb model in ML.NET, the input tensor looks like below which is a serialized Example object (a binary buffer, not a text string):
inputs['inputs'] tensor_info: dtype: DT_STRING shape: (-1) name: input_example_tensor:0
I find a workable solution is first convert Example object to protobuf encoded byte array using:
example.ToByteArray()
then convert byte array to string (char array) using some sort of reliable encoding (ideally Unicode or Base64 encoding):
Encoding.Unicode.GetString(example.ToByteArray())
Then ML.NET will convert the string back to byte array with same encoding and pass to tf.net:
Encoding.Unicode.GetBytes(((ReadOnlyMemory<char>)(object)data[i]).ToArray());
The method ML.NET uses to create Tensor is CastDataAndReturnAsTensor, previously we are using UTF8 to decode the string and convert to byte array, UTF8 is not reliable encoding as I described in this comment so I would like to change the encoding to Unicode.
Also, recently Xiaoyun upgrade our TF version in this PR and changed to use string[] instead of byte[][] to create Tensor, in this case we need to use byte[][] as the input string itself is converted from binary buffer(protobuf encoded).