Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core/txpool/legacypool: remove a redundant heap.Init #28910

Merged
merged 11 commits into from
Feb 15, 2024

Conversation

colinlyguo
Copy link
Contributor

@colinlyguo colinlyguo commented Feb 1, 2024

Found a redundant heap.Init(m.index) when learning the implementation of tx pool. this would save O(threshold) in each Cap() call.

Because *m.index is sorted increasingly and truncated the largest size - threshold values (same as m.Pop() for size - threshold times), or adding heap.Init(m.index) for completeness sake?

@colinlyguo colinlyguo changed the title tx_pool: remove a redundant heap.Init txpool: remove a redundant heap.Init Feb 2, 2024
@fjl fjl changed the title txpool: remove a redundant heap.Init core/txpool/legacypool: remove a redundant heap.Init Feb 5, 2024
@fjl
Copy link
Contributor

fjl commented Feb 5, 2024

This looks incorrect to me, but I'm slightly unsure. heap.Init restores the 'heap property' which is:

!h.Less(j, i) for 0 <= i < h.Len() and 2*i+1 <= j <= 2*i+2 and j < h.Len()

The actual storage of a heap slice is NOT simply a sorted slice, it's a binary tree. So I think we can't just use the sorted slice as the heap.

@colinlyguo
Copy link
Contributor Author

colinlyguo commented Feb 5, 2024

This looks incorrect to me, but I'm slightly unsure. heap.Init restores the 'heap property' (index starting from 0) which is:

!h.Less(j, i) for 0 <= i < h.Len() and 2*i+1 <= j <= 2*i+2 and j < h.Len()

The actual storage of a heap slice is NOT simply a sorted slice, it's a binary tree. So I think we can't just use the sorted slice as the heap.

Thanks for the discussion. I will demonstrate that a sorted slice satisfies the invariance of a min-heap, which can be expressed as:

!h.Less(j, i) for 0 <= i < h.Len() and 2*i+1 <= j <= 2*i+2 and j < h.Len()

This invariant means that for any valid index i within the heap, the value at index i should be less than or equal to the values at its children's indices 2*i+1 and 2*i+2, provided that these children indices are within the bounds of the heap.

In a stored slice:

for any j >= i, a[j] >= a[i]

This directly implies the heap invariance because if you pick any index i and its corresponding children indices 2*i+1 and 2*i+2, you can be assured that a[i] will be less than or equal to a[2*i+1] and a[2*i+2], as 2*i+1 and 2*i+2 are both greater than i and, thus, their values are greater than or equal to a[i] in a sorted array.

also some similar discussions here: https://stackoverflow.com/questions/3113165/is-a-sorted-array-a-min-heap-whats-the-minimum-value-of-a-max-heap

@holiman
Copy link
Contributor

holiman commented Feb 6, 2024 via email

@holiman
Copy link
Contributor

holiman commented Feb 13, 2024

I agree that the change seems correct, but otoh:

  • Not sure any testcase covers it (i tested intentionally reverse-sorting the m.index and running the tests in list_test.go)
  • The change makes the code less robust by depending on an implicit previous event -- more sensitive to refactorings.
  • Not sure how large (if any) the processing gain is

So I'm kind of leaning towards letting it be as it is... (?)

@colinlyguo
Copy link
Contributor Author

colinlyguo commented Feb 13, 2024

I agree that the change seems correct, but otoh:

  • Not sure any testcase covers it (i tested intentionally reverse-sorting the m.index and running the tests in list_test.go)
  • The change makes the code less robust by depending on an implicit previous event -- more sensitive to refactorings.
  • Not sure how large (if any) the processing gain is

So I'm kind of leaning towards letting it be as it is... (?)

Agree with point 2, adding a heap.Init is helpful for logic completeness.

btw, I added a benchmark test, removing heap.Init can decrease 4.27% op time (from 2013 ns/op to 1927 ns/op) when popping out a transaction in a heap with 32 transactions (under an account).

before removing:

colin@192 legacypool % go test -v -bench=BenchmarkListCapOneTx -run=^$ 
goos: darwin
goarch: arm64
pkg: github.com/ethereum/go-ethereum/core/txpool/legacypool
BenchmarkListCapOneTx
BenchmarkListCapOneTx-12          597567              2013 ns/op
PASS
ok      github.com/ethereum/go-ethereum/core/txpool/legacypool  38.480s

after removing:

colin@192 legacypool % go test -v -bench=BenchmarkListCapOneTx -run=^$
goos: darwin
goarch: arm64
pkg: github.com/ethereum/go-ethereum/core/txpool/legacypool
BenchmarkListCapOneTx
BenchmarkListCapOneTx-12          642597              1927 ns/op
PASS
ok      github.com/ethereum/go-ethereum/core/txpool/legacypool  41.322s

Thus if the benefit is not obvious, can close this PR.

@fjl
Copy link
Contributor

fjl commented Feb 13, 2024

I think it will have a larger effect if the transactions list is much longer, as it usually is for real world use cases. heap.Init has a complexity of O(n).

@colinlyguo
Copy link
Contributor Author

I think it will have a larger effect if the transactions list is much longer, as it usually is for real world use cases. heap.Init has a complexity of O(n).

Make sense, some logs about the exact time inside Cap():

total number 100000 threshold 99999
Check limit duration: 125ns
Sort duration: 10.84525ms
Drop duration: 4.875µs
Heap init duration: 239.75µs
Cache duration: 250ns
Total Cap function duration: 11.106458ms

The sort.Sort(*m.index) contributes the most latency. and heap.Init contributes the second.

@holiman
Copy link
Contributor

holiman commented Feb 13, 2024

So if I read it correctly, sorting is 10.8ms, and a heap-init of the already sorted list is 0.2ms ?

@colinlyguo
Copy link
Contributor Author

colinlyguo commented Feb 13, 2024

So if I read it correctly, sorting is 10.8ms, and a heap-init of the already sorted list is 0.2ms ?

Yes. It doesn't need to sort the whole heap here, I think. Thus refactoring this function again, now the performance is much better.

core/txpool/legacypool/list.go Outdated Show resolved Hide resolved
@colinlyguo
Copy link
Contributor Author

colinlyguo commented Feb 13, 2024

So if I read it correctly, sorting is 10.8ms, and a heap-init of the already sorted list is 0.2ms ?

Yeah, I think it's because the heap.Init() would just iterate all non-leaf heap nodes without changing the heap structure (although when changing the structure, the time complexity is still O(n)).

Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

colinlyguo and others added 2 commits February 15, 2024 17:20
Co-authored-by: Martin HS <martin@swende.se>
@colinlyguo
Copy link
Contributor Author

colinlyguo commented Feb 15, 2024

Faster after changing the sorting function (from 1545 ns/op to 1004 ns/op). quite thanks to @holiman

colin@192 legacypool % go test -v -bench=BenchmarkListCapOneTx -run=^$
goos: darwin
goarch: arm64
pkg: github.com/ethereum/go-ethereum/core/txpool/legacypool
BenchmarkListCapOneTx
BenchmarkListCapOneTx-12         1000000              1004 ns/op
PASS
ok      github.com/ethereum/go-ethereum/core/txpool/legacypool  50.448s

@fjl fjl added this to the 1.13.13 milestone Feb 15, 2024
@fjl
Copy link
Contributor

fjl commented Feb 15, 2024

I'm merging this because we're now running into the test timeout that has been plagueing the CI builder recently. The change itself is OK now.

@fjl fjl merged commit a193bb0 into ethereum:master Feb 15, 2024
2 of 3 checks passed
@colinlyguo colinlyguo deleted the remove-duplicated-heap-init branch February 15, 2024 18:52
jorgemmsilva pushed a commit to iotaledger/go-ethereum that referenced this pull request Jun 17, 2024
Co-authored-by: Martin HS <martin@swende.se>
Co-authored-by: Felix Lange <fjl@twurst.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants