Performance Enhancements #69

wolfendale · 2024-07-05T15:45:54Z

Changes:

Since using the 2.0.3 release in production for the friends service, we noticed a bug which caused a leak in goroutines. The fix for this has already been applied in production but this backports that fix.

When observing the behaviour of the friends server, we noticed it was allocating a lot of memory. Currently we allocate a 64kb buffer every time a packet is sent to the server, that packet is only supposed to be held temporarily but it seems they're not being properly collected. This change means that the fixed pool of goroutines which respond to incoming packets will each have their own 64kb buffer that will be reused for each incoming packet instead, which should greatly reduce memory allocations.

I have read and agreed to the Code of Conduct.
I have read and complied with the contributing guidelines.
What I'm implementing was an approved issue.
I have tested all of my changes.

jonbarrow · 2024-07-05T16:12:18Z

The live server also hot patched these methods to no longer spawn goroutines. This change should also probably be back ported

Also we can see that DeriveKerberosKey is also producing a lot of allocations and using a lot of memory. According to pprof DeriveKerberosKey accounts for almost 40% of the applications entire allocations:

File: friends
Type: alloc_space
Time: Jul 5, 2024 at 11:52am (EDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 945832.13MB, 97.45% of 970564.26MB total
Dropped 758 nodes (cum <= 4852.82MB)
Showing top 10 nodes out of 19
      flat     flat%    sum%        cum    cum%
588871.18MB    60.67%   60.67% 592720.55MB 61.07%  github.com/PretendoNetwork/nex-go/v2.(*PRUDPServer).listenDatagram
356758.44MB    36.76%   97.43% 356758.44MB 36.76%  github.com/PretendoNetwork/nex-go/v2.DeriveKerberosKey
      137MB    0.014%   97.45% 371559.24MB 38.28%  github.com/PretendoNetwork/nex-go/v2.(*PRUDPEndPoint).processPacket
    40.51MB    0.0042%  97.45% 286847.14MB 29.55%  github.com/PretendoNetwork/nex-go/v2.(*PRUDPEndPoint).handleReliable
       18MB    0.0019%  97.45%   4935.45MB  0.51%  database/sql.(*DB).query
        5MB    0.00052% 97.45% 280457.63MB 28.90%  github.com/PretendoNetwork/nex-protocols-common-go/v2/ticket-granting.generateTicket
     1.50MB    0.00015% 97.45%  78875.56MB  8.13%  github.com/PretendoNetwork/nex-go/v2.(*PRUDPEndPoint).handleConnect
     0.50MB    5.2e-05% 97.45%  77077.36MB  7.94%  github.com/PretendoNetwork/nex-go/v2.(*PRUDPEndPoint).readKerberosTicket
          0    0%       97.45%   4935.45MB  0.51%  database/sql.(*DB).QueryContext
          0    0%       97.45%   4935.45MB  0.51%  database/sql.(*DB).QueryContext.func1

We can definitely do better, and this would be a good PR to address it

Using an implementation like this results in significantly fewer allocations:

func DeriveKerberosKey(pid int, password []byte) []byte {
	iterationCount := 65000 + pid%1024
	key := make([]byte, md5.Size)
	copy(key, password)

	for i := 0; i < iterationCount; i++ {
		hash := md5.Sum(key)
		copy(key, hash[:])
	}

	return key
}

though this has not been tested on a live server, I only ran benchmarks:

goos: linux
goarch: amd64
pkg: github.com/PretendoNetwork/nex-go/v2/kerb
cpu: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
BenchmarkOriginalDeriveKerberosKey-8         	    100	 10546657 ns/op	1040917 B/op	  65057 allocs/op
BenchmarkImprovedDeriveKerberosKey-8         	    174	  6740704 ns/op	     16 B/op	      1 allocs/op
BenchmarkOriginalDeriveKerberosKeyMemory-8   	    100	 10092520 ns/op	1040915 B/op	  65057 allocs/op
BenchmarkImprovedDeriveKerberosKeyMemory-8   	    172	  6807064 ns/op	     16 B/op	      1 allocs/op
PASS
ok  	github.com/PretendoNetwork/nex-go/v2/kerb	6.959s

This doesn't address the md5 CPU usage issue, but it drastically drops the memory usage

jonbarrow · 2024-07-05T16:42:21Z

LGTM 👍

timeout_manager.go

Co-authored-by: Daniel López Guimaraes <112760654+DaniElectra@users.noreply.github.com>

DaniElectra

LGTM

wolfendale added 3 commits July 5, 2024 16:37

fix: removed double lock from reliable ping packet handling

348c37d

chore: limit the scope of connection locking in TimeoutManager

4fefb05

chore: reduce buffer allocations when receiving packets

c214a2a

wolfendale marked this pull request as ready for review July 5, 2024 15:46

wolfendale added 2 commits July 5, 2024 17:15

chore: remove unnecessary goroutines

abfca9a

chore: update kerberos key derivation to reduce allocations

ad488be

wolfendale force-pushed the performance-tweaks branch from 63173c6 to ad488be Compare July 5, 2024 16:39

jonbarrow approved these changes Jul 5, 2024

View reviewed changes

DaniElectra reviewed Jul 5, 2024

View reviewed changes

timeout_manager.go Outdated Show resolved Hide resolved

chore: add comment to TimeoutManager

22d2698

Co-authored-by: Daniel López Guimaraes <112760654+DaniElectra@users.noreply.github.com>

DaniElectra approved these changes Jul 5, 2024

View reviewed changes

jonbarrow merged commit d633dca into PretendoNetwork:master Jul 5, 2024

wolfendale deleted the performance-tweaks branch July 5, 2024 16:55

wolfendale mentioned this pull request Jul 7, 2024

Fix bugs from recent performance enhancements #70

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Enhancements #69

Performance Enhancements #69

wolfendale commented Jul 5, 2024 •

edited

Loading

jonbarrow commented Jul 5, 2024 •

edited

Loading

jonbarrow commented Jul 5, 2024

DaniElectra left a comment

Performance Enhancements #69

Performance Enhancements #69

Conversation

wolfendale commented Jul 5, 2024 • edited Loading

Changes:

jonbarrow commented Jul 5, 2024 • edited Loading

jonbarrow commented Jul 5, 2024

DaniElectra left a comment

Choose a reason for hiding this comment

wolfendale commented Jul 5, 2024 •

edited

Loading

jonbarrow commented Jul 5, 2024 •

edited

Loading