Perf decode string #188

zonghaishang · 2020-05-14T13:49:28Z

What this PR does:

Which issue(s) this PR fixes:

Fixes ##186

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

优化思路：

直接使用utf-8 byte解码，性能最高；之前先解码成 rune, 对rune解码成string，及其耗费性能
增加批量string chunk copy, 降低read调用
使用unsafe转换string

codecov-io · 2020-05-14T13:53:27Z

Codecov Report

Merging #188 into master will decrease coverage by 1.12%.
The diff coverage is 62.96%.

@@            Coverage Diff             @@
##           master     #188      +/-   ##
==========================================
- Coverage   67.63%   66.50%   -1.13%     
==========================================
  Files          22       22              
  Lines        2688     2762      +74     
==========================================
+ Hits         1818     1837      +19     
- Misses        665      713      +48     
- Partials      205      212       +7

Impacted Files	Coverage Δ
decode.go	`66.66% <38.46%> (-3.53%)`	⬇️
string.go	`57.73% <66.31%> (-10.41%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bb036e6...fa6429e. Read the comment docs.

gaoxinge · 2020-05-15T03:17:18Z

decode_test.go

@@ -126,3 +127,16 @@ func testDecodeFrameworkFunc(t *testing.T, method string, expected func(interfac
 	}
 	expected(r)
 }
+
+func BenchmarkDecodeStringOptimized(t *testing.B) {


Place decode string benchmark to string_test.go.

string.go

decode.go

AlexStocks · 2020-05-15T08:45:05Z

decode.go

+	d.reader.Reset(bytes.NewReader(b))
+	d.typeRefs = &TypeRefs{records: map[string]bool{}}
+
+	if d.refs != nil {


the two following lines is enough. u do not need write two if clauses.

d.refs = nil d.classInfoList = nil

if d.refs == nill or d.classInfoList == nil already, do nothing.

agree with @AlexStocks , just reset them

string_test.go

gaoxinge · 2020-05-15T14:23:18Z

string.go

+
+				// quickly detect the actual number of bytes
+				prev := offset
+				for i, len := offset, offset+nread; i < len; chunkLen-- {


看起来offset和i的作用差不多。可以把i移除掉么？

gaoxinge · 2020-05-15T14:54:49Z

string.go

+
+				// the expected length string has been processed.
+				if chunkLen <= 0 {
+					if last {


这个if和if里面的内容可以去掉。因为去掉之后，下面continue也会跳到相同的代码。

removed already.

gaoxinge · 2020-05-15T14:58:03Z

string.go


+					if chunkLen < 0 {


直接把chunkLen设置成0就行了，因为外面的判断是chunkLen<=0.

gaoxinge · 2020-05-15T14:59:37Z

string.go

+					if chunkLen < 0 {
+						chunkLen = 0
+					}
+					if charLen < 0 {


charLen应该不会小于0吧？

gaoxinge · 2020-05-15T15:00:38Z

string.go

+						charLen = 0
+					}
+
+					chunkLen += charLen


直接写chunkLen=charLen就行了。因为从上面的分析看，此时的chunkLen一定等于0.

gaoxinge · 2020-05-15T15:05:38Z

string.go

+			}
+
+			// decode byte
+			ch, err := d.readByte()


为啥最后要decode一下？上面的循环解析cover不掉什么corner case么？能解释一下么？

两个目的：

触发buffer fill data.

前面只能保证读取chunkLen字节数, 并且读取整个char的utf-8编码字节(保证通过char的字节不被拆分)，但是可能并没有处理所有chunkLen个数的字符。后续的read可以处理完所有chunkLen字符。

wongoo

this PR will let the hessian only support at most three bytes length utf8, while four bytes mathematical symbols, and emoji will not supprted. even more the max length of a utf8 character can be six. And the worst is the length of string in hessian definition is the length of A 16-bit unicode character string encoded in UTF-8.
ref

wongoo · 2020-05-16T03:11:00Z

decode.go

+	d.reader.Reset(bytes.NewReader(b))
+	d.typeRefs = &TypeRefs{records: map[string]bool{}}
+
+	if d.refs != nil {


agree with @AlexStocks , just reset them

zonghaishang · 2020-05-16T07:47:29Z

this PR will let the hessian only support at most three bytes length utf8, while four bytes mathematical symbols, and emoji will not supprted. even more the max length of a utf8 character can be six. And the worst is the length of string in hessian definition is the length of A 16-bit unicode character string encoded in UTF-8.
ref

https://en.wikipedia.org/wiki/UTF-8

http://hessian.caucho.com/doc/hessian-serialization.html##string

Fix: emoji encode error #131
@wongoo
Any Suggestions for fixing it? I've tried parsing 4-byte utf-8 before, but Java doesn't write 4-byte utf-8

AlexStocks · 2020-05-16T07:49:55Z

this PR will let the hessian only support at most three bytes length utf8, while four bytes mathematical symbols, and emoji will not supprted. even more the max length of a utf8 character can be six. And the worst is the length of string in hessian definition is the length of A 16-bit unicode character string encoded in UTF-8.
ref

https://en.wikipedia.org/wiki/UTF-8

http://hessian.caucho.com/doc/hessian-serialization.html##string

Fix: emoji encode error #131
@wongoo
Any Suggestions for fixing it? I've tried parsing 4-byte utf-8 before, but Java doesn't write 4-byte utf-8

maybe u can set an issue for dubbo hessian2.

zonghaishang · 2020-05-16T08:20:37Z

I'll explore the byte code for the go emoji first, and there may be a way to fix it

AlexStocks · 2020-05-16T08:29:12Z

I'll explore the byte code for the go emoji first, and there may be a way to fix it

glad to hear that.

wongoo · 2020-05-16T09:18:17Z

in fact, when decoding you can't know how many bytes you should read, until you analysis all bytes. A possible improvement may be you can first read a chunk length of bytes first, and then loop analysis and read more until reach the length of chunk. This may be useful to improve performance for most situation.

zonghaishang · 2020-05-17T14:47:53Z

in fact, when decoding you can't know how many bytes you should read, until you analysis all bytes. A possible improvement may be you can first read a chunk length of bytes first, and then loop analysis and read more until reach the length of chunk. This may be useful to improve performance for most situation.

The current optimization already includes this idea.

codecov-commenter · 2020-05-25T12:40:56Z

Codecov Report

Merging #188 into master will decrease coverage by 1.62%.
The diff coverage is 56.38%.

@@            Coverage Diff             @@
##           master     #188      +/-   ##
==========================================
- Coverage   67.63%   66.01%   -1.63%     
==========================================
  Files          22       22              
  Lines        2688     2845     +157     
==========================================
+ Hits         1818     1878      +60     
- Misses        665      749      +84     
- Partials      205      218      +13

Impacted Files	Coverage Δ
decode.go	`66.66% <38.46%> (-3.53%)`	⬇️
string.go	`55.36% <57.71%> (-12.78%)`	⬇️
request.go	`60.57% <0.00%> (+0.57%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bb036e6...39b42f7. Read the comment docs.

zonghaishang · 2020-05-26T10:35:44Z

@wongoo please take a look.

fangyincheng

LGTM

zonghaishang · 2020-05-28T02:03:45Z

2 votes support, merge to master now.

wongoo · 2020-05-28T04:38:06Z

string_test.go

 func TestStringEmoji(t *testing.T) {
 	// see: test_hessian/src/main/java/test/TestString.java
 	s0 := "emoji🤣"
 	s0 += ",max" + string(rune(0x10FFFF))

+	// todo 这里正确拿到hessian解码字节数组，但是构造string的时候，不是rune，emoji表情符号显示有点问题，修改assert？？？


remove the comment

ok, I just submitted pr to fix it: #194

@wongoo are there any other problems? If not, we can reserve this comment in my opinion. @zonghaishang , u can translate this comment into english.

Perf decode string

zonghaishang added 2 commits May 14, 2020 21:46

优化hessian解码string性能，提升54%

8eda6f1

optimize code.

f3584d9

optimize code.

b716f80

zonghaishang requested a review from wongoo May 14, 2020 14:08

gaoxinge reviewed May 15, 2020

View reviewed changes

fix code review.

747fa6e

AlexStocks requested review from fangyincheng and pantianying May 15, 2020 04:56

AlexStocks reviewed May 15, 2020

View reviewed changes

decode.go Outdated Show resolved Hide resolved

optimize codes.

84321ab

AlexStocks reviewed May 15, 2020

View reviewed changes

gaoxinge reviewed May 15, 2020

View reviewed changes

wongoo reviewed May 16, 2020

View reviewed changes

zonghaishang added 2 commits May 16, 2020 16:47

optimize cods.

3a72da5

optimize code.

fa6429e

support for decode emoji.

39b42f7

AlexStocks approved these changes May 26, 2020

View reviewed changes

fangyincheng approved these changes May 27, 2020

View reviewed changes

zonghaishang merged commit dea1174 into apache:master May 28, 2020

wongoo reviewed May 28, 2020

View reviewed changes

zhaoyunxing92 pushed a commit that referenced this pull request Sep 4, 2021

Merge pull request #188 from zonghaishang/perf_decode_string

323b16b

Perf decode string

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf decode string #188

Perf decode string #188

zonghaishang commented May 14, 2020 •

edited

Loading

codecov-io commented May 14, 2020 •

edited

Loading

gaoxinge May 15, 2020

zonghaishang May 15, 2020

AlexStocks May 15, 2020

zonghaishang May 15, 2020

wongoo May 16, 2020

gaoxinge May 15, 2020

zonghaishang May 16, 2020

gaoxinge May 15, 2020

zonghaishang May 16, 2020

gaoxinge May 15, 2020

gaoxinge May 15, 2020

gaoxinge May 15, 2020

gaoxinge May 15, 2020

zonghaishang May 16, 2020

wongoo left a comment •

edited

Loading

wongoo May 16, 2020

zonghaishang commented May 16, 2020

AlexStocks commented May 16, 2020

zonghaishang commented May 16, 2020

AlexStocks commented May 16, 2020

wongoo commented May 16, 2020

zonghaishang commented May 17, 2020

codecov-commenter commented May 25, 2020

zonghaishang commented May 26, 2020

fangyincheng left a comment

zonghaishang commented May 28, 2020

wongoo May 28, 2020

zonghaishang May 28, 2020 •

edited

Loading

AlexStocks May 28, 2020

Perf decode string #188

Perf decode string #188

Conversation

zonghaishang commented May 14, 2020 • edited Loading

codecov-io commented May 14, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wongoo left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zonghaishang commented May 16, 2020

AlexStocks commented May 16, 2020

zonghaishang commented May 16, 2020

AlexStocks commented May 16, 2020

wongoo commented May 16, 2020

zonghaishang commented May 17, 2020

codecov-commenter commented May 25, 2020

Codecov Report

zonghaishang commented May 26, 2020

fangyincheng left a comment

Choose a reason for hiding this comment

zonghaishang commented May 28, 2020

Choose a reason for hiding this comment

zonghaishang May 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zonghaishang commented May 14, 2020 •

edited

Loading

codecov-io commented May 14, 2020 •

edited

Loading

wongoo left a comment •

edited

Loading

zonghaishang May 28, 2020 •

edited

Loading