-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf decode string #188
Perf decode string #188
Conversation
Codecov Report
@@ Coverage Diff @@
## master #188 +/- ##
==========================================
- Coverage 67.63% 66.50% -1.13%
==========================================
Files 22 22
Lines 2688 2762 +74
==========================================
+ Hits 1818 1837 +19
- Misses 665 713 +48
- Partials 205 212 +7
Continue to review full report at Codecov.
|
decode_test.go
Outdated
@@ -126,3 +127,16 @@ func testDecodeFrameworkFunc(t *testing.T, method string, expected func(interfac | |||
} | |||
expected(r) | |||
} | |||
|
|||
func BenchmarkDecodeStringOptimized(t *testing.B) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Place decode string benchmark to string_test.go
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, fixed.
d.reader.Reset(bytes.NewReader(b)) | ||
d.typeRefs = &TypeRefs{records: map[string]bool{}} | ||
|
||
if d.refs != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the two following lines is enough. u do not need write two if clauses.
d.refs = nil
d.classInfoList = nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if d.refs == nill or d.classInfoList == nil already, do nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree with @AlexStocks , just reset them
string.go
Outdated
|
||
// quickly detect the actual number of bytes | ||
prev := offset | ||
for i, len := offset, offset+nread; i < len; chunkLen-- { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
看起来offset
和i
的作用差不多。可以把i
移除掉么?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optimized.
string.go
Outdated
|
||
// the expected length string has been processed. | ||
if chunkLen <= 0 { | ||
if last { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个if
和if
里面的内容可以去掉。因为去掉之后,下面continue
也会跳到相同的代码。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed already.
|
||
if chunkLen < 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
直接把chunkLen
设置成0就行了,因为外面的判断是chunkLen<=0
.
if chunkLen < 0 { | ||
chunkLen = 0 | ||
} | ||
if charLen < 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
charLen
应该不会小于0吧?
charLen = 0 | ||
} | ||
|
||
chunkLen += charLen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
直接写chunkLen=charLen
就行了。因为从上面的分析看,此时的chunkLen
一定等于0.
} | ||
|
||
// decode byte | ||
ch, err := d.readByte() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为啥最后要decode一下?上面的循环解析cover不掉什么corner case么?能解释一下么?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
两个目的:
- 触发buffer fill data.
- 前面只能保证读取chunkLen字节数, 并且读取整个char的utf-8编码字节(保证通过char的字节不被拆分),但是可能并没有处理所有chunkLen个数的字符。后续的read可以处理完所有chunkLen字符。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this PR will let the hessian only support at most three bytes length utf8, while four bytes mathematical symbols, and emoji will not supprted. even more the max length of a utf8 character can be six. And the worst is the length of string in hessian definition is the length of A 16-bit unicode character string encoded in UTF-8.
ref
d.reader.Reset(bytes.NewReader(b)) | ||
d.typeRefs = &TypeRefs{records: map[string]bool{}} | ||
|
||
if d.refs != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree with @AlexStocks , just reset them
|
maybe u can set an issue for dubbo hessian2. |
I'll explore the byte code for the go emoji first, and there may be a way to fix it |
glad to hear that. |
in fact, when decoding you can't know how many bytes you should read, until you analysis all bytes. A possible improvement may be you can first read a chunk length of bytes first, and then loop analysis and read more until reach the length of chunk. This may be useful to improve performance for most situation. |
The current optimization already includes this idea. |
Codecov Report
@@ Coverage Diff @@
## master #188 +/- ##
==========================================
- Coverage 67.63% 66.01% -1.63%
==========================================
Files 22 22
Lines 2688 2845 +157
==========================================
+ Hits 1818 1878 +60
- Misses 665 749 +84
- Partials 205 218 +13
Continue to review full report at Codecov.
|
@wongoo please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
2 votes support, merge to master now. |
func TestStringEmoji(t *testing.T) { | ||
// see: test_hessian/src/main/java/test/TestString.java | ||
s0 := "emoji🤣" | ||
s0 += ",max" + string(rune(0x10FFFF)) | ||
|
||
// todo 这里正确拿到hessian解码字节数组,但是构造string的时候,不是rune,emoji表情符号显示有点问题,修改assert??? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I just submitted pr to fix it: #194
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wongoo are there any other problems? If not, we can reserve this comment in my opinion. @zonghaishang , u can translate this comment into english.
What this PR does:
Which issue(s) this PR fixes:
Fixes ##186
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
优化思路: