Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

See if it is possible to optimize a couch_util:reorder_results/2 #4051

Closed
nickva opened this issue Jun 6, 2022 · 5 comments
Closed

See if it is possible to optimize a couch_util:reorder_results/2 #4051

nickva opened this issue Jun 6, 2022 · 5 comments

Comments

@nickva
Copy link
Contributor

nickva commented Jun 6, 2022

Looking at

% linear search is faster for small lists, length() is 0.5 ms for 100k list
reorder_results(Keys, SortedResults) when length(Keys) < 100 ->
[couch_util:get_value(Key, SortedResults) || Key <- Keys];
reorder_results(Keys, SortedResults) ->
KeyDict = dict:from_list(SortedResults),
[dict:fetch(Key, KeyDict) || Key <- Keys].

With the recent map optimization in couch_key_tree on OTP 23, it be might worth double-checking if we can get some performance improvements by having a different cutoff value for linear search instead of 100, or use a map instead of a dict.

A quick informal benchmark showed a nice improvement when using a map reordering 1000 16 byte random doc IDs and values (Results are in iterations per second):

> f(Gen), Gen = fun(N) ->
    Keys = [crypto:strong_rand_bytes(16) || _ <- lists:seq(1, N)], 
    Res = lists:sort([{K, crypto:strong_rand_bytes(16)} || K <- Keys]), 
   {Keys, Res}
end.
> f(Keys), f(Res), {Keys, Res} = Gen(1000), ok.
> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, map]}}).
5434
> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, dict]}}).
1219

500 DocIDs

> f(Keys), f(Res), {Keys, Res} = Gen(500), ok.
> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, dict]}}).
2407
> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, map]}}).
11639

200 DocIDs

> f(Keys), f(Res), {Keys, Res} = Gen(200), ok.
> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, map]}}).
36267
> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, dict]}}).
6870

Stays pretty consistent around 5x.

The function is called from fabric_doc_update and couch_btree:lookup/2 which is in the hotpath of a good number of API calls (I mainly started looking at it with the eye to optimize _revs_diff implementation to speed-up replication a bit).

@nickva
Copy link
Contributor Author

nickva commented Jun 6, 2022

The quick and dirty patch:

diff --git a/src/couch/src/couch_util.erl b/src/couch/src/couch_util.erl
--- a/src/couch/src/couch_util.erl
+++ b/src/couch/src/couch_util.erl
@@ -530,6 +531,17 @@ reorder_results(Keys, SortedResults) ->

+% linear search is faster for small lists, length() is 0.5 ms for 100k list
+reorder_results2(Keys, SortedResults, Cutoff, _DictType) when length(Keys) < Cutoff ->
+    [couch_util:get_value(Key, SortedResults) || Key <- Keys];
+reorder_results2(Keys, SortedResults, _Cutoff, map) ->
+    Map = maps:from_list(SortedResults),
+    [maps:get(Key, Map, undefined) || Key <- Keys];
+reorder_results2(Keys, SortedResults, _Cutoff, dict) ->
+    KeyDict = dict:from_list(SortedResults),
+    [dict:fetch(Key, KeyDict) || Key <- Keys].
+

@iilyak
Copy link
Contributor

iilyak commented Jun 7, 2022

Another possible optimization.

If the function is always called in the context when passed results are sorted (and sorting method is compatible with erlang sorting) then we can use orddict:fetch/2 instead of couch_util:get_value/2.

In case of couch_btree:lookup/2 the sorting is guaranteed see couch_btree.erl#L272:L275.

In case of fabric_doc_update it might be not the case (quick review is not conclusive). Which means our variable name SortedResults is misleading.

I wish erlang had length_greater_than/2. The length/1 in the first clause still concerns me. However it seems like there is no option to remove it. Also IRC the keys are passed via API. It is unlikely we can get above 100k of elements.

@nickva
Copy link
Contributor Author

nickva commented Jun 7, 2022

@iilyak neat ideas!

For couch_btree.erl the issue is that we're sorting the input keys, so that's guaranteed, but the response assertion is that the results will be in the same order as the input Keys. We, for instance check if Keys is sorted already and skip the sort and final reorder.

Good point about the length guard, it is O(N). I am thinking since map already falls back to an orddict (but with =:=) comparison semantics which we want to preserve, I think we can try to simply the code and just use map always.

It seems with 2 elements in play, building the map vs fetching with couch_util is not that much different (map is even a tiny bit faster).

(node1@127.0.0.1)31> f(Keys), f(Res), {Keys, Res} = Gen(2), ok.
(node1@127.0.0.1)4> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 5, map]}}).
4494506
(node1@127.0.0.1)5> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 5, map]}}).
4467353
(node1@127.0.0.1)6> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 5, map]}}).
4216277
(node1@127.0.0.1)7> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 1, map]}}).
5699589
(node1@127.0.0.1)8> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 1, map]}}).
5796950
(node1@127.0.0.1)9> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 1, map]}}).
5816921

(There changed the cutoff switch from 1 to 5, when it's 1 it should use the map, when 5 it should use couch_util).

@nickva
Copy link
Contributor Author

nickva commented Jun 7, 2022

With just the map implementation and without the length/1 guard seems to do pretty well in most cases:

 f(Keys), f(Res), {Keys, Res} = Gen(2), ok.
ok
> erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}).
6670253
> erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}).
6441751
> erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}).
6638078

(Compared to 5816921 with the length guard)

> f(Keys), f(Res), {Keys, Res} = Gen(500), ok.
ok
> erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}).
12395
> erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}).
12508
> erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}).
12420

(Compared to 11639 with the length/1 guard)

nickva added a commit that referenced this issue Jun 7, 2022
This function is used in the hot path of _revs_diff and _bulk_docs API calls.
Those could always use a bit more optimization:

  * In `_revs_diff` it's used when fetching all the FDIs to see which docs are
    missing in `couch_btree:lookup/2`.

  * In `_bulk_docs` it's used in the `fabric_doc_update` when finalizing the
    response.

Using erlperf in #4051 noticed an at most 5x speedup from using a map instead
of a dict. Since a map already falls back to a proplist for small sizes, skip
the length guard.

Some erlperf examples from #4051:

500 Keys
```
> f(Keys), f(Res), {Keys, Res} = Gen(500), ok.

> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, dict]}}).
2407
> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, map]}}).
11639
```

Using a map without the guard, which is the change in this this PR:

```
> f(Keys), f(Res), {Keys, Res} = Gen(500), ok.
ok

> erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}).
12395
> erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}).
12508
```

As a bonus this also cleans up the code a bit, too.
nickva added a commit that referenced this issue Jun 7, 2022
This function is used in the hot path of _revs_diff and _bulk_docs API calls.
Those could always use a bit more optimization:

  * In `_revs_diff` it's used when fetching all the FDIs to see which docs are
    missing in `couch_btree:lookup/2`.

  * In `_bulk_docs` it's used in the `fabric_doc_update` when finalizing the
    response.

Using erlperf in #4051 noticed an at most 5x speedup from using a map instead
of a dict. Since a map already falls back to a proplist for small sizes, skip
the length guard.

Some erlperf examples from #4051:

500 Keys
```
> f(Keys), f(Res), {Keys, Res} = Gen(500), ok.

> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, dict]}}).
2407
> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, map]}}).
11639
```

Using a map without the guard, which is the change in this this PR:

```
> f(Keys), f(Res), {Keys, Res} = Gen(500), ok.
ok

> erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}).
12395
> erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}).
12508
```

As a bonus this also cleans up the code a bit, too.
nickva added a commit that referenced this issue Jun 7, 2022
This function is used in the hot path of _revs_diff and _bulk_docs API calls.
Those could always use a bit more optimization:

  * In `_revs_diff` it's used when fetching all the FDIs to see which docs are
    missing in `couch_btree:lookup/2`.

  * In `_bulk_docs` it's used in the `fabric_doc_update` when finalizing the
    response.

Using erlperf in #4051 noticed an at most 5x speedup from using a map instead
of a dict. Since a map already falls back to a proplist for small sizes, skip
the length guard.

Some erlperf examples from #4051:

500 Keys
```
> f(Keys), f(Res), {Keys, Res} = Gen(500), ok.

> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, dict]}}).
2407
> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, map]}}).
11639
```

Using a map without the guard, which is the change in this this PR:

```
> f(Keys), f(Res), {Keys, Res} = Gen(500), ok.
ok

> erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}).
12395
> erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}).
12508
```

As a bonus this also cleans up the code a bit, too.
nickva added a commit that referenced this issue Jun 7, 2022
This function is used in the hot path of _revs_diff and _bulk_docs API calls.
Those could always use a bit more optimization:

  * In `_revs_diff` it's used when fetching all the FDIs to see which docs are
    missing in `couch_btree:lookup/2`.

  * In `_bulk_docs` it's used in the `fabric_doc_update` when finalizing the
    response.

Using erlperf in #4051 noticed an at most 5x speedup from using a map instead
of a dict. Since a map already falls back to a proplist for small sizes, skip
the length guard.

Some erlperf examples from #4051:

500 Keys
```
> f(Keys), f(Res), {Keys, Res} = Gen(500), ok.

> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, dict]}}).
2407
> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, map]}}).
11639
```

Using a map without the guard, which is the change in this this PR:

```
> f(Keys), f(Res), {Keys, Res} = Gen(500), ok.
ok

> erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}).
12395
> erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}).
12508
```

As a bonus this also cleans up the code a bit, too.
@nickva
Copy link
Contributor Author

nickva commented Jun 8, 2022

Optimization PR merged. Closing issue.

@nickva nickva closed this as completed Jun 8, 2022
noahshaw11 pushed a commit to noahshaw11/couchdb that referenced this issue Jul 1, 2022
This function is used in the hot path of _revs_diff and _bulk_docs API calls.
Those could always use a bit more optimization:

  * In `_revs_diff` it's used when fetching all the FDIs to see which docs are
    missing in `couch_btree:lookup/2`.

  * In `_bulk_docs` it's used in the `fabric_doc_update` when finalizing the
    response.

Using erlperf in apache#4051 noticed an at most 5x speedup from using a map instead
of a dict. Since a map already falls back to a proplist for small sizes, skip
the length guard.

Some erlperf examples from apache#4051:

500 Keys
```
> f(Keys), f(Res), {Keys, Res} = Gen(500), ok.

> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, dict]}}).
2407
> erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, map]}}).
11639
```

Using a map without the guard, which is the change in this this PR:

```
> f(Keys), f(Res), {Keys, Res} = Gen(500), ok.
ok

> erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}).
12395
> erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}).
12508
```

As a bonus this also cleans up the code a bit, too.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants