Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark custom maps #2573

Merged
merged 6 commits into from
Dec 10, 2021
Merged

Benchmark custom maps #2573

merged 6 commits into from
Dec 10, 2021

Conversation

lehins
Copy link
Collaborator

@lehins lehins commented Dec 3, 2021

This PR adds benchmarks, applies hlint and some renameing for consistency.

Most importantly it also improves speed of insertion into the KeyMap and fixes a bug in filterArrayWithBitmap that resulted in intersection throwing an exception on some input.

@lehins lehins requested review from nc6 and TimSheard December 3, 2021 19:59
@lehins lehins changed the title Benchmark custom maps and Benchmark custom maps Dec 3, 2021
libs/compact-map/src/Data/Compact/KeyMap.hs Outdated Show resolved Hide resolved
@lehins lehins force-pushed the lehins/benchmark-maps branch from 5282504 to 24c1564 Compare December 6, 2021 14:51
Copy link
Contributor

@TimSheard TimSheard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleans up quite a bit of loose ends, so a good job. I have some concerns about using arrays for Path in every function. keyPath computes the list lazily in in most functions, if we use a list, on most calls we will look at only the first 6-8 items in the path, while the full path may contain 32+ items. So laziness is our friend. There is 1 exception to the rule in insertWithKey! where we need to access particular locations in the path with VP.!. We might have two kinds of paths, one an array for use in insertWithKey' and one a lazy list for every where else a path is used.

We ought to talk about this.

libs/compact-map/compact-map.cabal Show resolved Hide resolved
libs/compact-map/src/Data/Compact/KeyMap.hs Show resolved Hide resolved
Comment on lines +313 to +327
twoLeaf path1 leaf1 path2 k2 v2 = go path1 path2
where
leaf2 = Leaf k2 v2
go p1 p2
| Just (i, is) <- VP.uncons p1,
Just (j, js) <- VP.uncons p2 =
if i == j
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be some tension between thinking of Path as a list or as an array. Sometimes I think a list is better (like here, where we are pattern matching) and other times an array (when we need something like (path !! n)) The 'n' is usually needed to handle the (Leaf __ ) case because unlike all the other constructors Leaf's do not track which segement they are under. So we do some adustment to the Path, to recover this lost information.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your reasoning, but I was using benchmarks as a guide, rather than logical reasoning. In the end what we think the performance should be is not necessarily the performance that we'll get. Using lists for Path makes the worst case 5 times slower than when using primitive vector.

We can certainly discuss it in more detail. If you can come up with the worst case example where vector would perform worse that would be tremendously useful, because then we could benchmark it and reason about the performance hit if any. One way or another insert is one of the most important operations in this map, and that is the one that I've optimized.

Comment on lines +351 to 361
accum !ans (k, v) = insert k v ans
{-# INLINE fromList #-}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't fold' strict in the accumulator argument (that is what the ' is used for), so is the ! on ans needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't technically necessary to fix memory leaks, but there is a subtle difference between just foldl' and foldl' with bang on accumulator: https://www.well-typed.com/blog/2014/04/fixing-foldl/#a-postscript-which-foldl

With all honesty I added it because I was benchmarking the function and I was tapping into all possible optimizations I could find. So this change is not strictly necessary (pun intended), but it won't hurt either.

libs/compact-map/src/Data/Compact/KeyMap.hs Show resolved Hide resolved
libs/compact-map/src/Data/Compact/KeyMap.hs Show resolved Hide resolved
libs/compact-map/src/Data/Compact/KeyMap.hs Show resolved Hide resolved
libs/compact-map/src/Data/Compact/KeyMap.hs Outdated Show resolved Hide resolved
…Also:

* Manual export of KeyMap interface

* Make tests compile, rewrite histogram as debugging function with Vector and move it away from tests

* Improve maxSuccess logic
…benhcmark

* Add tests and bench to hie.

Fix regression. Better name for deleteInternal function

* Rename `wordSize` -> `segementMaxValue`

* Fix Show instance

* Rename `insertWithKey'` -> `insertWithKeyInternal`
@lehins lehins force-pushed the lehins/benchmark-maps branch from 99ff492 to 6fb198a Compare December 9, 2021 21:11
@lehins lehins merged commit e159f6b into master Dec 10, 2021
@iohk-bors iohk-bors bot deleted the lehins/benchmark-maps branch December 10, 2021 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants