See if it is possible to optimize a couch_util:reorder_results/2 #4051

nickva · 2022-06-06T22:20:48Z

Looking at

Lines 526 to 531 in 369ecc9

    
           % linear search is faster for small lists, length() is 0.5 ms for 100k list 
        
           reorder_results(Keys, SortedResults) when length(Keys) < 100 -> 
        
               [couch_util:get_value(Key, SortedResults) || Key <- Keys]; 
        
           reorder_results(Keys, SortedResults) -> 
        
               KeyDict = dict:from_list(SortedResults), 
        
               [dict:fetch(Key, KeyDict) || Key <- Keys].

With the recent map optimization in couch_key_tree on OTP 23, it be might worth double-checking if we can get some performance improvements by having a different cutoff value for linear search instead of 100, or use a map instead of a dict.

A quick informal benchmark showed a nice improvement when using a map reordering 1000 16 byte random doc IDs and values (Results are in iterations per second):

> f(Gen), Gen = fun(N) ->
    Keys = [crypto:strong_rand_bytes(16) || _ <- lists:seq(1, N)], 
    Res = lists:sort([{K, crypto:strong_rand_bytes(16)} || K <- Keys]), 
   {Keys, Res}
end.

> f(Keys), f(Res), {Keys, Res} = Gen(1000), ok.
> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, map]}}).
5434
> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, dict]}}).
1219

500 DocIDs

> f(Keys), f(Res), {Keys, Res} = Gen(500), ok.
> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, dict]}}).
2407
> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, map]}}).
11639

200 DocIDs

> f(Keys), f(Res), {Keys, Res} = Gen(200), ok.
> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, map]}}).
36267
> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, dict]}}).
6870

Stays pretty consistent around 5x.

The function is called from fabric_doc_update and couch_btree:lookup/2 which is in the hotpath of a good number of API calls (I mainly started looking at it with the eye to optimize _revs_diff implementation to speed-up replication a bit).

The text was updated successfully, but these errors were encountered:

nickva · 2022-06-06T22:44:50Z

The quick and dirty patch:

diff --git a/src/couch/src/couch_util.erl b/src/couch/src/couch_util.erl
--- a/src/couch/src/couch_util.erl
+++ b/src/couch/src/couch_util.erl
@@ -530,6 +531,17 @@ reorder_results(Keys, SortedResults) ->

+% linear search is faster for small lists, length() is 0.5 ms for 100k list
+reorder_results2(Keys, SortedResults, Cutoff, _DictType) when length(Keys) < Cutoff ->
+    [couch_util:get_value(Key, SortedResults) || Key <- Keys];
+reorder_results2(Keys, SortedResults, _Cutoff, map) ->
+    Map = maps:from_list(SortedResults),
+    [maps:get(Key, Map, undefined) || Key <- Keys];
+reorder_results2(Keys, SortedResults, _Cutoff, dict) ->
+    KeyDict = dict:from_list(SortedResults),
+    [dict:fetch(Key, KeyDict) || Key <- Keys].
+

iilyak · 2022-06-07T12:29:45Z

Another possible optimization.

If the function is always called in the context when passed results are sorted (and sorting method is compatible with erlang sorting) then we can use orddict:fetch/2 instead of couch_util:get_value/2.

In case of couch_btree:lookup/2 the sorting is guaranteed see couch_btree.erl#L272:L275.

In case of fabric_doc_update it might be not the case (quick review is not conclusive). Which means our variable name SortedResults is misleading.

I wish erlang had length_greater_than/2. The length/1 in the first clause still concerns me. However it seems like there is no option to remove it. Also IRC the keys are passed via API. It is unlikely we can get above 100k of elements.

nickva · 2022-06-07T14:03:52Z

@iilyak neat ideas!

For couch_btree.erl the issue is that we're sorting the input keys, so that's guaranteed, but the response assertion is that the results will be in the same order as the input Keys. We, for instance check if Keys is sorted already and skip the sort and final reorder.

Good point about the length guard, it is O(N). I am thinking since map already falls back to an orddict (but with =:=) comparison semantics which we want to preserve, I think we can try to simply the code and just use map always.

It seems with 2 elements in play, building the map vs fetching with couch_util is not that much different (map is even a tiny bit faster).

(node1@127.0.0.1)31> f(Keys), f(Res), {Keys, Res} = Gen(2), ok.
(node1@127.0.0.1)4> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 5, map]}}).
4494506
(node1@127.0.0.1)5> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 5, map]}}).
4467353
(node1@127.0.0.1)6> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 5, map]}}).
4216277
(node1@127.0.0.1)7> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 1, map]}}).
5699589
(node1@127.0.0.1)8> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 1, map]}}).
5796950
(node1@127.0.0.1)9> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 1, map]}}).
5816921

(There changed the cutoff switch from 1 to 5, when it's 1 it should use the map, when 5 it should use couch_util).

nickva · 2022-06-07T14:27:48Z

With just the map implementation and without the length/1 guard seems to do pretty well in most cases:

 f(Keys), f(Res), {Keys, Res} = Gen(2), ok.
ok
> erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}).
6670253
> erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}).
6441751
> erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}).
6638078

(Compared to 5816921 with the length guard)

> f(Keys), f(Res), {Keys, Res} = Gen(500), ok.
ok
> erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}).
12395
> erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}).
12508
> erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}).
12420

(Compared to 11639 with the length/1 guard)

This function is used in the hot path of _revs_diff and _bulk_docs API calls. Those could always use a bit more optimization: * In `_revs_diff` it's used when fetching all the FDIs to see which docs are missing in `couch_btree:lookup/2`. * In `_bulk_docs` it's used in the `fabric_doc_update` when finalizing the response. Using erlperf in #4051 noticed an at most 5x speedup from using a map instead of a dict. Since a map already falls back to a proplist for small sizes, skip the length guard. Some erlperf examples from #4051: 500 Keys ``` > f(Keys), f(Res), {Keys, Res} = Gen(500), ok. > erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, dict]}}). 2407 > erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, map]}}). 11639 ``` Using a map without the guard, which is the change in this this PR: ``` > f(Keys), f(Res), {Keys, Res} = Gen(500), ok. ok > erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}). 12395 > erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}). 12508 ``` As a bonus this also cleans up the code a bit, too.

nickva · 2022-06-08T17:12:01Z

Optimization PR merged. Closing issue.

This function is used in the hot path of _revs_diff and _bulk_docs API calls. Those could always use a bit more optimization: * In `_revs_diff` it's used when fetching all the FDIs to see which docs are missing in `couch_btree:lookup/2`. * In `_bulk_docs` it's used in the `fabric_doc_update` when finalizing the response. Using erlperf in apache#4051 noticed an at most 5x speedup from using a map instead of a dict. Since a map already falls back to a proplist for small sizes, skip the length guard. Some erlperf examples from apache#4051: 500 Keys ``` > f(Keys), f(Res), {Keys, Res} = Gen(500), ok. > erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, dict]}}). 2407 > erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, map]}}). 11639 ``` Using a map without the guard, which is the change in this this PR: ``` > f(Keys), f(Res), {Keys, Res} = Gen(500), ok. ok > erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}). 12395 > erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}). 12508 ``` As a bonus this also cleans up the code a bit, too.

nickva added enhancement needs-triage labels Jun 6, 2022

nickva removed the needs-triage label Jun 6, 2022

nickva mentioned this issue Jun 7, 2022

Optimize couch_util:reorder_results/2,3 #4052

Merged

nickva closed this as completed Jun 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

See if it is possible to optimize a couch_util:reorder_results/2 #4051

See if it is possible to optimize a couch_util:reorder_results/2 #4051

nickva commented Jun 6, 2022 •

edited

Loading

nickva commented Jun 6, 2022

iilyak commented Jun 7, 2022

nickva commented Jun 7, 2022

nickva commented Jun 7, 2022 •

edited

Loading

nickva commented Jun 8, 2022

See if it is possible to optimize a couch_util:reorder_results/2 #4051

See if it is possible to optimize a couch_util:reorder_results/2 #4051

Comments

nickva commented Jun 6, 2022 • edited Loading

nickva commented Jun 6, 2022

iilyak commented Jun 7, 2022

nickva commented Jun 7, 2022

nickva commented Jun 7, 2022 • edited Loading

nickva commented Jun 8, 2022

nickva commented Jun 6, 2022 •

edited

Loading

nickva commented Jun 7, 2022 •

edited

Loading