Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf decode string #188

Merged
merged 8 commits into from
May 28, 2020
Merged

Perf decode string #188

merged 8 commits into from
May 28, 2020

Conversation

zonghaishang
Copy link
Member

@zonghaishang zonghaishang commented May 14, 2020

What this PR does:

Which issue(s) this PR fixes:

Fixes ##186

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

优化思路:

  1. 直接使用utf-8 byte解码,性能最高;之前先解码成 rune, 对rune解码成string,及其耗费性能
  2. 增加批量string chunk copy, 降低read调用
  3. 使用unsafe转换string

@codecov-io
Copy link

codecov-io commented May 14, 2020

Codecov Report

Merging #188 into master will decrease coverage by 1.12%.
The diff coverage is 62.96%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #188      +/-   ##
==========================================
- Coverage   67.63%   66.50%   -1.13%     
==========================================
  Files          22       22              
  Lines        2688     2762      +74     
==========================================
+ Hits         1818     1837      +19     
- Misses        665      713      +48     
- Partials      205      212       +7     
Impacted Files Coverage Δ
decode.go 66.66% <38.46%> (-3.53%) ⬇️
string.go 57.73% <66.31%> (-10.41%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bb036e6...fa6429e. Read the comment docs.

@zonghaishang zonghaishang requested a review from wongoo May 14, 2020 14:08
decode_test.go Outdated
@@ -126,3 +127,16 @@ func testDecodeFrameworkFunc(t *testing.T, method string, expected func(interfac
}
expected(r)
}

func BenchmarkDecodeStringOptimized(t *testing.B) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Place decode string benchmark to string_test.go.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, fixed.

string.go Show resolved Hide resolved
decode.go Outdated Show resolved Hide resolved
d.reader.Reset(bytes.NewReader(b))
d.typeRefs = &TypeRefs{records: map[string]bool{}}

if d.refs != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the two following lines is enough. u do not need write two if clauses.

d.refs = nil
d.classInfoList = nil

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if d.refs == nill or d.classInfoList == nil already, do nothing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with @AlexStocks , just reset them

string_test.go Show resolved Hide resolved
string.go Outdated

// quickly detect the actual number of bytes
prev := offset
for i, len := offset, offset+nread; i < len; chunkLen-- {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看起来offseti的作用差不多。可以把i移除掉么?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optimized.

string.go Outdated

// the expected length string has been processed.
if chunkLen <= 0 {
if last {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个ifif里面的内容可以去掉。因为去掉之后,下面continue也会跳到相同的代码

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed already.


if chunkLen < 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

直接把chunkLen设置成0就行了,因为外面的判断是chunkLen<=0.

if chunkLen < 0 {
chunkLen = 0
}
if charLen < 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

charLen应该不会小于0吧?

charLen = 0
}

chunkLen += charLen
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

直接写chunkLen=charLen就行了。因为从上面的分析看,此时的chunkLen一定等于0.

}

// decode byte
ch, err := d.readByte()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为啥最后要decode一下?上面的循环解析cover不掉什么corner case么?能解释一下么?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

两个目的:

  1. 触发buffer fill data.
  2. 前面只能保证读取chunkLen字节数, 并且读取整个char的utf-8编码字节(保证通过char的字节不被拆分),但是可能并没有处理所有chunkLen个数的字符。后续的read可以处理完所有chunkLen字符。

Copy link
Contributor

@wongoo wongoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR will let the hessian only support at most three bytes length utf8, while four bytes mathematical symbols, and emoji will not supprted. even more the max length of a utf8 character can be six. And the worst is the length of string in hessian definition is the length of A 16-bit unicode character string encoded in UTF-8.
ref

d.reader.Reset(bytes.NewReader(b))
d.typeRefs = &TypeRefs{records: map[string]bool{}}

if d.refs != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with @AlexStocks , just reset them

@zonghaishang
Copy link
Member Author

this PR will let the hessian only support at most three bytes length utf8, while four bytes mathematical symbols, and emoji will not supprted. even more the max length of a utf8 character can be six. And the worst is the length of string in hessian definition is the length of A 16-bit unicode character string encoded in UTF-8.
ref

@AlexStocks
Copy link
Contributor

this PR will let the hessian only support at most three bytes length utf8, while four bytes mathematical symbols, and emoji will not supprted. even more the max length of a utf8 character can be six. And the worst is the length of string in hessian definition is the length of A 16-bit unicode character string encoded in UTF-8.
ref

maybe u can set an issue for dubbo hessian2.

@zonghaishang
Copy link
Member Author

I'll explore the byte code for the go emoji first, and there may be a way to fix it

@AlexStocks
Copy link
Contributor

I'll explore the byte code for the go emoji first, and there may be a way to fix it

glad to hear that.

@wongoo
Copy link
Contributor

wongoo commented May 16, 2020

in fact, when decoding you can't know how many bytes you should read, until you analysis all bytes. A possible improvement may be you can first read a chunk length of bytes first, and then loop analysis and read more until reach the length of chunk. This may be useful to improve performance for most situation.

@zonghaishang
Copy link
Member Author

in fact, when decoding you can't know how many bytes you should read, until you analysis all bytes. A possible improvement may be you can first read a chunk length of bytes first, and then loop analysis and read more until reach the length of chunk. This may be useful to improve performance for most situation.

The current optimization already includes this idea.

@codecov-commenter
Copy link

Codecov Report

Merging #188 into master will decrease coverage by 1.62%.
The diff coverage is 56.38%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #188      +/-   ##
==========================================
- Coverage   67.63%   66.01%   -1.63%     
==========================================
  Files          22       22              
  Lines        2688     2845     +157     
==========================================
+ Hits         1818     1878      +60     
- Misses        665      749      +84     
- Partials      205      218      +13     
Impacted Files Coverage Δ
decode.go 66.66% <38.46%> (-3.53%) ⬇️
string.go 55.36% <57.71%> (-12.78%) ⬇️
request.go 60.57% <0.00%> (+0.57%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bb036e6...39b42f7. Read the comment docs.

@zonghaishang
Copy link
Member Author

@wongoo please take a look.

Copy link
Contributor

@fangyincheng fangyincheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zonghaishang
Copy link
Member Author

2 votes support, merge to master now.

@zonghaishang zonghaishang merged commit dea1174 into apache:master May 28, 2020
func TestStringEmoji(t *testing.T) {
// see: test_hessian/src/main/java/test/TestString.java
s0 := "emoji🤣"
s0 += ",max" + string(rune(0x10FFFF))

// todo 这里正确拿到hessian解码字节数组,但是构造string的时候,不是rune,emoji表情符号显示有点问题,修改assert???
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the comment

Copy link
Member Author

@zonghaishang zonghaishang May 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I just submitted pr to fix it: #194

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wongoo are there any other problems? If not, we can reserve this comment in my opinion. @zonghaishang , u can translate this comment into english.

zhaoyunxing92 pushed a commit that referenced this pull request Sep 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants