Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String encoding and decoding #253

Closed
lujjjh opened this issue Dec 29, 2020 · 6 comments · Fixed by #255
Closed

String encoding and decoding #253

lujjjh opened this issue Dec 29, 2020 · 6 comments · Fixed by #255

Comments

@lujjjh
Copy link
Contributor

lujjjh commented Dec 29, 2020

There are 2 known issues in string encoding and decoding.

The first one is an out-of-range problem that the code below doesn't handle edge cases well:

bufp := gxbytes.AcquireBytes(CHUNK_SIZE * 3)

The inner loop

for charCount < CHUNK_SIZE {

could actually exit with charCount > CHUNK_SIZE (or more precisely, charCount == CHUNK_SIZE + 1). The maximum bytes taken could be (CHUNK_SIZE + 1) * 3.

A simple reproducible test case:

func TestEncStringChunk(t *testing.T) {
	enc := NewEncoder()
	v := strings.Repeat("我", CHUNK_SIZE-1) + "🤣"
	assert.Nil(t, enc.Encode(v))
	dec := NewDecoder(enc.Buffer())
	s, err := dec.Decode()
	assert.Nil(t, err)
	assert.Equal(t, v, s)
}

After a quick fix with

bufp := gxbytes.AcquireBytes((CHUNK_SIZE + 1) * 3)

I encountered the second issue with the same test case above:

    	Error:      	Not equal: 
    	            	expected: "我我我……我🤣"
   	            	actual  : "我我我……我🤣\x00\x00"

After bisection, I assume this was introduced in dea1174 because the same test could be passed if I apply the quick fix on 8dcaa20, which is the parent of dea1174.

I haven't dived into the commit yet since it's a bit complicated.

@AlexStocks
Copy link
Contributor

@wongoo we have met such problems in my memory. It is not so easy to fix this problem.

@wongoo
Copy link
Contributor

wongoo commented Dec 30, 2020

@zonghaishang pls check this issue, I will also go into sometime later

@wongoo
Copy link
Contributor

wongoo commented Dec 31, 2020

ref: #252

@wongoo
Copy link
Contributor

wongoo commented Jan 3, 2021

the current chunk string decoding algorithm is complex, and hard to maintain. I will try to refactor it.

@wongoo
Copy link
Contributor

wongoo commented Jan 6, 2021

@lujjjh
Copy link
Contributor Author

lujjjh commented Jan 9, 2021

#254 does not actually fix this case. I've created a pull request.

AlexStocks added a commit that referenced this issue Jan 12, 2021
Fix #253: Acquire sufficient bytes for string encoding buffers
zhaoyunxing92 pushed a commit that referenced this issue Sep 4, 2021
Fix #253: Acquire sufficient bytes for string encoding buffers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants