Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of sortBy #92

Merged
merged 1 commit into from
Apr 29, 2018
Merged

Improve performance of sortBy #92

merged 1 commit into from
Apr 29, 2018

Conversation

paldepind
Copy link
Contributor

This PR improves the performance of sortBy and thus also sort. These are the benchmark results I get when running sort on an array of 100 numbers.

┌───────────────────────┬────────────────────┬──────────────────────┬────────┬───────────┬───┐
│ suite                 │ old                │ new                  │ diff   │ change    │ α │
├───────────────────────┼────────────────────┼──────────────────────┼────────┼───────────┼───┤
│ methods.sort.Array    │ 3 Hz ±1.49% (n 12) │ 637 Hz ±0.30% (n 89) │ 099.1% │ +21464.1% │ ✓ │
├───────────────────────┼────────────────────┼──────────────────────┼────────┼───────────┼───┤
│ methods.sort.Foldable │ 4 Hz ±0.99% (n 14) │ 317 Hz ±0.45% (n 82) │ 097.7% │ +8503.4%  │ ✓ │
└───────────────────────┴────────────────────┴──────────────────────┴────────┴───────────┴───┘

In this case sorting an array got 214 times faster and sorting a foldable (using List as the benchmark subject) got 85 times faster 😄

Note however that to achieve the performance increase I had to use fewer sanctuary-type-class functions internally in the implementation of sortBy. It seems that using sanctuary-type-class functions results in some overhead. I created two benchmarks to try and measure the overhead of the functions used inside sortBy.

var test = R.range(0, 100);
// ...
'methods.test.reduceMin':     function(Z) { Z.reduce(Z.min, 0, test); },
'methods.test.append':        function(Z) { Z.reduce((a, b) => Z.append(b, a), [], [0, 1, 2, 3, 4]); },

Then I plugged in Ramda and sanctuary-type-class in the above benchmarks and got the following results:

┌───────────────────────┬──────────────────────────┬────────────────────────────┬────────┬───────────┬───┐
│ suite                 │ sanctuary-type-classes   │ Ramda                      │ diff   │ change    │ α │
├───────────────────────┼──────────────────────────┼────────────────────────────┼────────┼───────────┼───┤
│ methods.test.reduceMin│ 2,073 Hz ±0.52% (n 89)   │ 504,551 Hz ±0.37% (n 93)   │ 099.2% │ +24235.2% │ ✓ │
├───────────────────────┼──────────────────────────┼────────────────────────────┼────────┼───────────┼───┤
│ methods.test.append   │ 235,364 Hz ±1.02% (n 87) │ 1,022,981 Hz ±0.46% (n 91) │ 062.6% │ +334.6%   │ ✓ │
└───────────────────────┴──────────────────────────┴────────────────────────────┴────────┴───────────┴───┘

Getting the minimum of an array of 100 numbers seems to be 242 times as slow with sanctuary-type-classes and creating an array of size 5 by prepending is about 3.3 times slower than Ramda. Z.lte in particular seems to be be slow and it is still used inside sortBy in some cases and in these cases sortBy is still slow due to Z.lte ending up as the bottleneck.

Copy link
Member

@Avaq Avaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gives me a warm feeling on the inside to see my hard work on the benchmarks being put to good use! I found one potential improvement.

index.js Outdated
}
var lteFn =
typeof rs[0].prop === 'string' || typeof rs[0].prop === 'number' ?
function(a, b) { return a <= b; } :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function can be hoisted. May slightly benefit performance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function above behaves incorrectly for NaN, as this demonstrates:

> Z.sort ([NaN, 2, NaN, 0, NaN, 1, NaN])
[NaN, NaN, 0, 1, NaN, 2, NaN]

We must either remove the special case for typeof rs[0].prop === 'number' or ensure via other means that NaN is handled appropriately.

Z.lte was only recently updated to handle NaN, in #88.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fast-path for numbers and strings is very important for performance.

I think we should consider whether it's worth the code complexity and performance to handle NaN. For instance, Array$prototype$lte doesn't handle sparse arrays because sparse arrays can be considered to not be a part of the array type we define. In the same way Number$prototype$lte could not handle NaN as NaN could be considered not part of the number type we define.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be best for these changes to be refactoring only. In a subsequent pull request you could make the case for not supporting NaN by showing the performance difference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidchambers
Copy link
Member

Amazing work, Simon!

index.js Outdated
var rs = reduce(reducer, [], foldable);
if (rs.length === 0) {
return foldable;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this special case make an appreciable difference? I would have thought that sorting an empty structure would be an efficient operation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now see that this is guarding rs[0].prop below. I would still like to remove the special case. We could define lteFn like so:

var type = rs.length > 0 ? typeof rs[0].prop : 'undefined';
var lteFn = type === 'number' || type === 'string' ? ... : lte;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index.js Outdated
var foldableIsArray = Array.isArray(foldable);
var idx = 0;
function reducer(xs, x) {
xs.push({idx: idx, x: x, prop: f(x)});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you consider prop to be a better name than fx for the f(x) property?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm indifferent. For prop I didn't actually look at the old name. The thought behind prop is that f returns some property of x (not property as in a JS property of an object but in the more general sense of the word). But I'd have no problem in switching it back.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer fx. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index.js Outdated
lteFn(b.prop, a.prop) ? a.idx - b.idx : -1 :
1;
});
if (foldableIsArray) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since foldableIsArray is only referenced here, let's replace this identifier with Array.isArray(foldable).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I was using it in several places earlier but that went away.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index.js Outdated
return rs;
} else {
var result = empty(foldable.constructor);
var of = foldable.constructor['fantasy-land/of'];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this safe? Could foldable.constructor be Function?

index.js Outdated
var result = empty(foldable.constructor);
var of = foldable.constructor['fantasy-land/of'];
for (var i = 0; i < rs.length; i += 1) {
result = result['fantasy-land/concat'](of(rs[i].x));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this safe? Are we sure that Array is the only built-in type that can support sorting?

package.json Outdated
@@ -19,6 +19,9 @@
},
"devDependencies": {
"fantasy-land": "3.5.0",
"knuth-shuffle": "^1.0.8",
"list": "^2.0.7",
"ramda": "^0.25.0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the "1.2.x" form rather than the "^1.2.3" form.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index.js Outdated
for (var i = 0; i < this.length; i += 1) {
acc = f(acc, this[i]);
}
return acc;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change make an appreciable difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've measured the difference in the benchmark suite for List. I'd say the difference is quite significant.

screenshot_20180319_142156

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Significant indeed!

Now that I know we'll keep the code I'll share my formatting request. I'd like to see this:

for (var idx = 0; idx < this.length; idx += 1) acc = f(acc, this[idx]);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paldepind
Copy link
Contributor Author

@Avaq Props on the benchmark system! The output in particular is really nice 👍

@davidchambers
Copy link
Member

@paldepind, I'm happy to push tweaks to your branch. I'd love to merge this pull request soon. :)

@paldepind
Copy link
Contributor Author

@davidchambers You're very welcome to that David 👍

@paldepind
Copy link
Contributor Author

Looks good to me @davidchambers.

index.js Outdated
@@ -2132,9 +2132,7 @@
return xs;
}
var rs = reduce(reducer, [], foldable);
if (rs.length === 0) return foldable;

var type = typeof rs[0].fx;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of this? Replacing one special case with another that doesn't terminate as early?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer typeof (rs[0] && rs[0].fx) because the unsafe expression (rs[0].fx) and its guard appear very close together. My preference is not strong, though, so feel free to reinstate the rs.length === 0 check. :)

index.js Outdated
for (idx = 0; idx < rs.length; idx += 1) {
result = concat(result, of(F, rs[idx].x));
}
return result;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does seem to affect performance. On my computer, the Foldable benchmark went from 308 Hz to 233 Hz. That is, the previous is about 33% faster.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about correctness. Representatives of built-in types do not have fantasy-land/-prefixed members, so replacing concat(x, y) with x['fantasy-land/concat'](y) is not necessarily valid. Can we be sure that result will have a fantasy-land/concat member and that F will have a fantasy-land/of member?

fantasy-land/concat implementations are provided for the following built-in types: String, Array, and Object.

fantasy-land/of implementations are provided for the following built-in types: Array and Function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are any of the build-in types, except for Array, both a foldable and a monoid?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a StrMap, Object also implements Foldable and Monoid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gabejohnson Sorry, I forgot to mention that it has to be an Applicative as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh...I missed the of. Yeah, Array is the only one that fits the bill AFAIK.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we reinstate the ['fantasy-land/concat'] and ['fantasy-land/of'] optimizations, let's add a comment explaining why these optimizations are safe.

@davidchambers
Copy link
Member

ℹ️ @paldepind, I rebased this branch.

package.json Outdated
@@ -19,6 +19,7 @@
},
"devDependencies": {
"fantasy-land": "3.5.0",
"list": "2.0.7",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Avaq, should this be listed here or in bench/package.json?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used as a library like any other (just so happens to be used in benchmarks), so it should be listed here. The bench/package.json file only exists to allow for the installation of any version of choice, of the package you're benchmarking against.

@paldepind
Copy link
Contributor Author

LGTM 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants