Improve performance of endpoints deduplication #299

pleshakov · 2022-11-08T18:31:58Z

Proposed changes

We use a map as a set to deduplicate endpoints. Before deduplicating, we can calculate the total number of the endpoints in the input and assume that most of those endpoints are unique. Then, we can use that number when initializing the map. That will improve the performance, as it will help to reduce the cost of growing the map to accommodate all the endpoints.

The benchmarks are included.

The output of running the benchmarks on my machine:

BenchmarkResolve/1_endpoints-4                                   2177905               560.7 ns/op           632 B/op          4 allocs/op
BenchmarkResolve/1_endpoints_with_optimization-4                 2098209               578.2 ns/op           632 B/op          4 allocs/op
BenchmarkResolve/2_endpoints-4                                   1868185               630.5 ns/op           656 B/op          4 allocs/op
BenchmarkResolve/2_endpoints_with_optimization-4                 1840216               650.4 ns/op           656 B/op          4 allocs/op
BenchmarkResolve/5_endpoints-4                                   1457457               866.1 ns/op           736 B/op          4 allocs/op
BenchmarkResolve/5_endpoints_with_optimization-4                 1363773               861.5 ns/op           736 B/op          4 allocs/op
BenchmarkResolve/10_endpoints-4                                   659551              1714 ns/op            1268 B/op          5 allocs/op
BenchmarkResolve/10_endpoints_with_optimization-4                 731506              1489 ns/op            1060 B/op          4 allocs/op
BenchmarkResolve/25_endpoints-4                                   278122              3889 ns/op            2739 B/op          6 allocs/op
BenchmarkResolve/25_endpoints_with_optimization-4                 355548              2991 ns/op            2060 B/op          4 allocs/op
BenchmarkResolve/50_endpoints-4                                   150068              8093 ns/op            5475 B/op          9 allocs/op
BenchmarkResolve/50_endpoints_with_optimization-4                 192177              5640 ns/op            3748 B/op          5 allocs/op
BenchmarkResolve/100_endpoints-4                                   74073             15507 ns/op           11112 B/op         11 allocs/op
BenchmarkResolve/100_endpoints_with_optimization-4                109096             10806 ns/op            7269 B/op          5 allocs/op
BenchmarkResolve/500_endpoints-4                                   10000            101384 ns/op           75413 B/op         24 allocs/op
BenchmarkResolve/500_endpoints_with_optimization-4                 21944             56614 ns/op           42824 B/op          5 allocs/op
BenchmarkResolve/1000_endpoints-4                                   5320            201992 ns/op          150373 B/op         43 allocs/op
BenchmarkResolve/1000_endpoints_with_optimization-4                11083            106729 ns/op           85448 B/op          5 allocs/op

We use a map as a set to deduplicate endpoints. Before deduplicating, we can calculate the total number of the endpoints in the input and assume that most of those endpoints are unique. Then, we can use that number when initializing the map. That will improve the performance, as it will help to reduce the cost of growing the map to accommodate all the endpoints. The benchmarks are included.

pleshakov requested a review from a team as a code owner November 8, 2022 18:32

github-actions bot added the chore Pull requests for routine tasks label Nov 8, 2022

pleshakov force-pushed the chore/imrpove-endpoints-dedup-performance branch from a372ca1 to ebaae58 Compare November 8, 2022 18:33

kate-osborn approved these changes Nov 11, 2022

View reviewed changes

f5yacobucci approved these changes Nov 14, 2022

View reviewed changes

Merge branch 'main' into chore/imrpove-endpoints-dedup-performance

c110e55

pleshakov merged commit 3507fc2 into main Dec 5, 2022

pleshakov deleted the chore/imrpove-endpoints-dedup-performance branch December 5, 2022 23:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of endpoints deduplication #299

Improve performance of endpoints deduplication #299

pleshakov commented Nov 8, 2022

Improve performance of endpoints deduplication #299

Improve performance of endpoints deduplication #299

Conversation

pleshakov commented Nov 8, 2022

Proposed changes