-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
See if it is possible to optimize a couch_util:reorder_results/2 #4051
Comments
The quick and dirty patch: diff --git a/src/couch/src/couch_util.erl b/src/couch/src/couch_util.erl
--- a/src/couch/src/couch_util.erl
+++ b/src/couch/src/couch_util.erl
@@ -530,6 +531,17 @@ reorder_results(Keys, SortedResults) ->
+% linear search is faster for small lists, length() is 0.5 ms for 100k list
+reorder_results2(Keys, SortedResults, Cutoff, _DictType) when length(Keys) < Cutoff ->
+ [couch_util:get_value(Key, SortedResults) || Key <- Keys];
+reorder_results2(Keys, SortedResults, _Cutoff, map) ->
+ Map = maps:from_list(SortedResults),
+ [maps:get(Key, Map, undefined) || Key <- Keys];
+reorder_results2(Keys, SortedResults, _Cutoff, dict) ->
+ KeyDict = dict:from_list(SortedResults),
+ [dict:fetch(Key, KeyDict) || Key <- Keys].
+ |
Another possible optimization. If the function is always called in the context when passed results are sorted (and sorting method is compatible with erlang sorting) then we can use In case of In case of I wish erlang had |
@iilyak neat ideas! For couch_btree.erl the issue is that we're sorting the input keys, so that's guaranteed, but the response assertion is that the results will be in the same order as the input Keys. We, for instance check if Keys is sorted already and skip the sort and final reorder. Good point about the length guard, it is O(N). I am thinking since map already falls back to an orddict (but with =:=) comparison semantics which we want to preserve, I think we can try to simply the code and just use map always. It seems with 2 elements in play, building the map vs fetching with couch_util is not that much different (map is even a tiny bit faster).
(There changed the cutoff switch from 1 to 5, when it's 1 it should use the map, when 5 it should use couch_util). |
With just the map implementation and without the length/1 guard seems to do pretty well in most cases:
(Compared to 5816921 with the length guard)
(Compared to 11639 with the length/1 guard) |
This function is used in the hot path of _revs_diff and _bulk_docs API calls. Those could always use a bit more optimization: * In `_revs_diff` it's used when fetching all the FDIs to see which docs are missing in `couch_btree:lookup/2`. * In `_bulk_docs` it's used in the `fabric_doc_update` when finalizing the response. Using erlperf in #4051 noticed an at most 5x speedup from using a map instead of a dict. Since a map already falls back to a proplist for small sizes, skip the length guard. Some erlperf examples from #4051: 500 Keys ``` > f(Keys), f(Res), {Keys, Res} = Gen(500), ok. > erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, dict]}}). 2407 > erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, map]}}). 11639 ``` Using a map without the guard, which is the change in this this PR: ``` > f(Keys), f(Res), {Keys, Res} = Gen(500), ok. ok > erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}). 12395 > erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}). 12508 ``` As a bonus this also cleans up the code a bit, too.
This function is used in the hot path of _revs_diff and _bulk_docs API calls. Those could always use a bit more optimization: * In `_revs_diff` it's used when fetching all the FDIs to see which docs are missing in `couch_btree:lookup/2`. * In `_bulk_docs` it's used in the `fabric_doc_update` when finalizing the response. Using erlperf in #4051 noticed an at most 5x speedup from using a map instead of a dict. Since a map already falls back to a proplist for small sizes, skip the length guard. Some erlperf examples from #4051: 500 Keys ``` > f(Keys), f(Res), {Keys, Res} = Gen(500), ok. > erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, dict]}}). 2407 > erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, map]}}). 11639 ``` Using a map without the guard, which is the change in this this PR: ``` > f(Keys), f(Res), {Keys, Res} = Gen(500), ok. ok > erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}). 12395 > erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}). 12508 ``` As a bonus this also cleans up the code a bit, too.
This function is used in the hot path of _revs_diff and _bulk_docs API calls. Those could always use a bit more optimization: * In `_revs_diff` it's used when fetching all the FDIs to see which docs are missing in `couch_btree:lookup/2`. * In `_bulk_docs` it's used in the `fabric_doc_update` when finalizing the response. Using erlperf in #4051 noticed an at most 5x speedup from using a map instead of a dict. Since a map already falls back to a proplist for small sizes, skip the length guard. Some erlperf examples from #4051: 500 Keys ``` > f(Keys), f(Res), {Keys, Res} = Gen(500), ok. > erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, dict]}}). 2407 > erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, map]}}). 11639 ``` Using a map without the guard, which is the change in this this PR: ``` > f(Keys), f(Res), {Keys, Res} = Gen(500), ok. ok > erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}). 12395 > erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}). 12508 ``` As a bonus this also cleans up the code a bit, too.
This function is used in the hot path of _revs_diff and _bulk_docs API calls. Those could always use a bit more optimization: * In `_revs_diff` it's used when fetching all the FDIs to see which docs are missing in `couch_btree:lookup/2`. * In `_bulk_docs` it's used in the `fabric_doc_update` when finalizing the response. Using erlperf in #4051 noticed an at most 5x speedup from using a map instead of a dict. Since a map already falls back to a proplist for small sizes, skip the length guard. Some erlperf examples from #4051: 500 Keys ``` > f(Keys), f(Res), {Keys, Res} = Gen(500), ok. > erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, dict]}}). 2407 > erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, map]}}). 11639 ``` Using a map without the guard, which is the change in this this PR: ``` > f(Keys), f(Res), {Keys, Res} = Gen(500), ok. ok > erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}). 12395 > erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}). 12508 ``` As a bonus this also cleans up the code a bit, too.
Optimization PR merged. Closing issue. |
This function is used in the hot path of _revs_diff and _bulk_docs API calls. Those could always use a bit more optimization: * In `_revs_diff` it's used when fetching all the FDIs to see which docs are missing in `couch_btree:lookup/2`. * In `_bulk_docs` it's used in the `fabric_doc_update` when finalizing the response. Using erlperf in apache#4051 noticed an at most 5x speedup from using a map instead of a dict. Since a map already falls back to a proplist for small sizes, skip the length guard. Some erlperf examples from apache#4051: 500 Keys ``` > f(Keys), f(Res), {Keys, Res} = Gen(500), ok. > erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, dict]}}). 2407 > erlperf:run(#{runner => {couch_util, reorder_results2, [Keys, Res, 100, map]}}). 11639 ``` Using a map without the guard, which is the change in this this PR: ``` > f(Keys), f(Res), {Keys, Res} = Gen(500), ok. ok > erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}). 12395 > erlperf:run(#{runner => {couch_util, reorder_results, [Keys, Res]}}). 12508 ``` As a bonus this also cleans up the code a bit, too.
Looking at
couchdb/src/couch/src/couch_util.erl
Lines 526 to 531 in 369ecc9
With the recent map optimization in couch_key_tree on OTP 23, it be might worth double-checking if we can get some performance improvements by having a different cutoff value for linear search instead of 100, or use a map instead of a dict.
A quick informal benchmark showed a nice improvement when using a map reordering 1000 16 byte random doc IDs and values (Results are in iterations per second):
500 DocIDs
200 DocIDs
Stays pretty consistent around 5x.
The function is called from
fabric_doc_update
andcouch_btree:lookup/2
which is in the hotpath of a good number of API calls (I mainly started looking at it with the eye to optimize _revs_diff implementation to speed-up replication a bit).The text was updated successfully, but these errors were encountered: