String encoding and decoding #253

lujjjh · 2020-12-29T16:26:04Z

There are 2 known issues in string encoding and decoding.

The first one is an out-of-range problem that the code below doesn't handle edge cases well:

dubbo-go-hessian2/string.go

Line 164 in e2494da

bufp := gxbytes.AcquireBytes(CHUNK_SIZE * 3)

The inner loop

dubbo-go-hessian2/string.go

Line 176 in e2494da

for charCount < CHUNK_SIZE {

could actually exit with charCount > CHUNK_SIZE (or more precisely, charCount == CHUNK_SIZE + 1). The maximum bytes taken could be (CHUNK_SIZE + 1) * 3.

A simple reproducible test case:

func TestEncStringChunk(t *testing.T) {
	enc := NewEncoder()
	v := strings.Repeat("我", CHUNK_SIZE-1) + "🤣"
	assert.Nil(t, enc.Encode(v))
	dec := NewDecoder(enc.Buffer())
	s, err := dec.Decode()
	assert.Nil(t, err)
	assert.Equal(t, v, s)
}

After a quick fix with

bufp := gxbytes.AcquireBytes((CHUNK_SIZE + 1) * 3)

I encountered the second issue with the same test case above:

    	Error:      	Not equal: 
    	            	expected: "我我我……我🤣"
   	            	actual  : "我我我……我🤣\x00\x00"

After bisection, I assume this was introduced in dea1174 because the same test could be passed if I apply the quick fix on 8dcaa20, which is the parent of dea1174.

I haven't dived into the commit yet since it's a bit complicated.

The text was updated successfully, but these errors were encountered:

AlexStocks · 2020-12-29T17:07:05Z

@wongoo we have met such problems in my memory. It is not so easy to fix this problem.

wongoo · 2020-12-30T00:33:02Z

@zonghaishang pls check this issue, I will also go into sometime later

wongoo · 2020-12-31T00:36:23Z

ref: #252

wongoo · 2021-01-03T03:31:53Z

the current chunk string decoding algorithm is complex, and hard to maintain. I will try to refactor it.

wongoo · 2021-01-06T00:34:39Z

it's fixed in https://github.com/apache/dubbo-go-hessian2/releases/tag/v1.8.1

lujjjh · 2021-01-09T16:30:39Z

#254 does not actually fix this case. I've created a pull request.

Fix #253: Acquire sufficient bytes for string encoding buffers

wongoo mentioned this issue Jan 3, 2021

Fix emoji decoding error #254

Merged

wongoo closed this as completed Jan 6, 2021

lujjjh mentioned this issue Jan 9, 2021

Fix #253: Acquire sufficient bytes for string encoding buffers #255

Merged

AlexStocks added a commit that referenced this issue Jan 12, 2021

Merge pull request #255 from lujjjh/fix/enc-string-chunk

c052455

Fix #253: Acquire sufficient bytes for string encoding buffers

zhaoyunxing92 pushed a commit that referenced this issue Sep 4, 2021

Fix #253: Acquire sufficient bytes for string encoding buffers

c1644b5

zhaoyunxing92 pushed a commit that referenced this issue Sep 4, 2021

Merge pull request #255 from lujjjh/fix/enc-string-chunk

967a9e2

Fix #253: Acquire sufficient bytes for string encoding buffers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

String encoding and decoding #253

String encoding and decoding #253

lujjjh commented Dec 29, 2020

AlexStocks commented Dec 29, 2020

wongoo commented Dec 30, 2020

wongoo commented Dec 31, 2020

wongoo commented Jan 3, 2021

wongoo commented Jan 6, 2021

lujjjh commented Jan 9, 2021

String encoding and decoding #253

String encoding and decoding #253

Comments

lujjjh commented Dec 29, 2020

AlexStocks commented Dec 29, 2020

wongoo commented Dec 30, 2020

wongoo commented Dec 31, 2020

wongoo commented Jan 3, 2021

wongoo commented Jan 6, 2021

lujjjh commented Jan 9, 2021