Improve Ord instances for IntSet and IntMap #470

treeowl · 2018-01-01T02:59:04Z

The current Ord instances for IntSet and IntMap don't take advantage of the way these types are structured at all. We may therefore be leaving some performance on the table. In particular:

I don't think we should need to convert the maps to lists for comparison.
We can recognize certain special arrangements of keys, prefixes, and perhaps masks to avoid following a bunch of pointers for nothing. For example, if one map has a negative prefix and the other has a (strictly) positive prefix, we are done.

The text was updated successfully, but these errors were encountered:

Zemyla · 2019-07-17T19:31:47Z

If nothing else, we should have use lists with unpacked Ints.

data KeyList
  = LEnd
  | LCons {-# UNPACK #-} !Key KeyList
  deriving (Eq, Ord)

data KeyPairList a
  = PEnd
  | PCons {-# UNPACK #-} !Key a (KeyPairList a)
  deriving (Eq, Ord)

And then you have pre-inlined functions to turn an IntSet into a KeyList and an IntMap into a KeyPairList.

Though what would probably be even better, at least for the KeyList case, would be:

data KeyList
  = LEnd
  | LCons {-# UNPACK #-} !Prefix {-# UNPACK #-} !BitMap KeyList
  deriving (Eq)

instance Ord KeyList where
  compare LEnd LEnd = EQ
  compare LEnd _ = LT
  compare (LCons prea bita la) (LCons preb bitb lb)
    | prea < preb = LT
    | prea > preb = GT
    | otherwise = case compare (revNat bita) (revNat bitb) of
        EQ -> compare la lb
        c  -> c

This way, there's only one entry per Tip node in each IntSet. This sort of thing would probably also speed up Show IntSet and Show IntMap as well.

EDIT: However, for those, the best bet may be just rewriting it so that the fold over the container constructs the desired string directly, such as:

instance Show IntSet where
  showsPrec p is = showParen (p > 10) $ \str -> showString "fromList [" $ either id id $ foldr showOne (Left (']':str)) is where
    showOne !i r = Right $ shows i $ either id ((:) ',') r

I'm pretty sure the definition for showList @Int isn't going to change from the default any time in the near future, is it? Similarly, because the liftShowWith2 instance for (,) is the default as well, we can inline the Show instance for IntMap the same way.

instance Show a => Show (IntMap a) where
  showsPrec p im = showParen (p > 10) $ \s -> showString "fromList [" $ either id id $ foldrWithKey showOne (Left (']':s)) im where
    showOne !i a r = Right $ showChar '(' $ shows i $ showChar ',' $ shows a $ showChar ')' $ either id ((:) ',') r

This means that we don't have to worry about creating and then destroying intermediate lists, boxed Ints, and tuples.

treeowl · 2019-07-17T20:23:08Z

I'm not so sure about revNat being the right approach (it's kind of heavy). I think something like this might work:

instance Ord KeyList where
  compare LEnd LEnd = EQ
  compare LEnd _ = LT
  compare (LCons prea bita la) (LCons preb bitb lb)
    | prea < preb = LT
    | prea > preb = GT
    | lbm <- lowestBitMask (bita `xor` bitb)
    = compare (bita && lbm) (bitb && lbm) <> compare la lb

Still, it would be nice to use the IntMap/IntSet structure a bit...

jwaldmann · 2019-07-17T20:24:36Z

Benchmarks? Ideally, derived from actual use cases.

Where do we need Ord on Sets? When we take a Set (or Map) of sets.
As in the power-set construction, for making deterministic automata?

I have something at https://gitlab.imn.htwk-leipzig.de/autotool/all0/blob/master/fa/src/Autolib/NFA/Det.hs but it needs to be extracted before being useful.

jwaldmann · 2019-07-18T22:47:34Z

stand-alone NFA determinisation in this branch: https://github.com/jwaldmann/containers/commits/instance-Ord-IntSet

Example profile - see below. I think it shows that indeed, Data.IntSet.Internal.compare is exercised here.

        Fri Jul 19 00:41 2019 Time and Allocation Profiling Report  (Final)

           ord-intset-benchmarks +RTS -P -RTS -q det/hard/n=16

        total time  =        7.55 secs   (7550 ticks @ 1000 us, 1 processor)
        total alloc = 13,662,565,528 bytes  (excludes profiling overheads)

COST CENTRE        MODULE                            SRC                                                %time %alloc  ticks     bytes

shiftRL            Utils.Containers.Internal.BitUtil src/Utils/Containers/Internal/BitUtil.hs:100:1-22   27.2   18.1   2055 2470760640
revNat             Data.IntSet.Internal              src/Data/IntSet/Internal.hs:(1464,1)-(1469,98)      26.1   33.4   1971 4568460432
toAscList          Data.IntSet.Internal              src/Data/IntSet/Internal.hs:1016:1-24               18.1   37.1   1365 5068974496
shiftLL            Utils.Containers.Internal.BitUtil src/Utils/Containers/Internal/BitUtil.hs:101:1-22   15.4    7.7   1162 1055309648
compare            Data.IntSet.Internal              src/Data/IntSet/Internal.hs:1160:5-57                6.4    0.0    483        16
findWithDefault.go Data.IntMap.Internal              src/Data/IntMap/Internal.hs:(625,5)-(630,16)         1.4    0.0    107         0
det.go.ts.next     Main                              benchmarks/OrdIntSet.hs:(64,26)-(66,72)              0.5    1.0     34 138936336

But - it really is never a good idea to use Set IntSet (or similar). It's at least as expensive as Set [Int].

NB: I wonder how poeople do implement the powerset construction (from NFA to DFA). Perhaps the answer is "they don't" (and go from regular expression to DFA directly).

So, are there other applications where one needs an (optimized) instance Ord IntSet?

jwaldmann · 2019-07-20T14:38:47Z

I am trying this https://github.com/jwaldmann/containers/blob/instance-Ord-IntSet/containers-tests/benchmarks/OrdIntSet.hs#L76

It seems to be working for natural numbers, not for negative keys, since they break the assumption that in Bin p m l r, all the keys in l are smaller than all in r.

*Main> Bin p m l r = fromList [-1,0]
*Main> (l,r)
(fromList [0],fromList [-1])

But these cases should be easy to detect? I could use some ideas here (also for relateBM) @treeowl @int-e

jwaldmann · 2019-07-21T15:34:17Z

Yes! I get a nice speed-up (around 3), and much reduced allocation.

treeowl · 2019-07-21T18:54:26Z

Lovely! In principle, the negative weirdness should be limited to the roots. If you're having trouble, I can try to help, but you likely know at least as much about it as I do!

…

On Sun, Jul 21, 2019, 11:34 AM jwaldmann ***@***.***> wrote: Yes! I get a nice speed-up (around 3), and much reduced allocation. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#470?email_source=notifications&email_token=AAOOF7IIM63Y64366FEGRG3QAR6XVA5CNFSM4EJ75JBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2OF5YQ#issuecomment-513564386>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAOOF7KDIPW5O4LYDQ3ZPCDQAR6XVANCNFSM4EJ75JBA> .

jwaldmann · 2019-07-21T19:01:08Z

In my benchmarks (NFA -> DFA) I think the IntSet always fits into one Tip. I will spread out the numbers and see what happens there.

jwaldmann · 2019-07-22T13:56:29Z

I now have a version that I think is correct (by enumerative testing, it agrees with the original version, and for the NFA to DFA conversion, automata have expected size). Performance is roughly

for small trees (Tip only): 30 percent runtime
for large trees (each Tip is singleton): 70 percent runtime (both versions walk the tree in the same way but I don't do any allocation)

I am sure the code can be golfed and micro-optimised (using equivalences of bitwise operations). I don't have much experience with that.

treeowl · 2019-07-22T13:58:12Z

Can you put together a pull request? Or better, two: make the first one add your benchmark.

…

On Mon, Jul 22, 2019, 9:56 AM jwaldmann ***@***.***> wrote: I now have a version that I think is correct (by enumerative testing, it agrees with the original version, and for the NFA to DFA conversion, automata have expected size). Performance is roughly - for small trees (Tip only): 30 percent runtime - for large trees (each Tip is singleton): 70 percent runtime (both versions walk the tree in the same way but I don't do any allocation) I am sure the code can be golfed and micro-optimised (using equivalences of bitwise operations). I don't have much experience with that. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#470?email_source=notifications&email_token=AAOOF7MKRFNKHYIVC35C7CDQAW4A5A5CNFSM4EJ75JBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2P75TI#issuecomment-513801933>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAOOF7NWOSLIKZ5X5FXZ5XLQAW4A5ANCNFSM4EJ75JBA> .

jwaldmann · 2019-07-22T14:10:05Z

actually it's three parts

implementation
test for correctness (compare s t == compare (toAscList s) (toAscList t))
benchmark

treeowl · 2019-07-22T14:12:08Z

I didn't realize there wasn't a correctness test yet! Ouch!

…

On Mon, Jul 22, 2019, 10:10 AM jwaldmann ***@***.***> wrote: actually it's three parts - implementation - test for correctness (compare s t == compare (toAscList s) (toAscList t)) - benchmark — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#470?email_source=notifications&email_token=AAOOF7PSSISZCNJSVSDBSMTQAW5T3A5CNFSM4EJ75JBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2QBJ7Y#issuecomment-513807615>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAOOF7MUSK2JERMIJEZ2SN3QAW5T3ANCNFSM4EJ75JBA> .

jwaldmann · 2019-07-22T16:29:39Z

Are these properties (that Eq and Ord of IntSet go via toAscList) announced anywhere? If not, that deserves a separate discussion?

jwaldmann · 2019-07-22T16:38:51Z

Oh I see now that intset-properties.hs already contains

prop_ord :: IntSet -> IntSet -> Bool
prop_ord s1 s2 = s1 `compare` s2 == toList s1 `compare` toList s2

but it's hidden deep down the file. I'd expected it to go at the top (following the order of the declarations:
data type, instances, operations).

And it really should be toAscList.

that avoids toAscList and walks the tree directly. See haskell#470

* Test that instances for Eq and Ord agree with going via toAscList * Add benchmark for "instance Ord IntSet", using "Set IntSet" * Improve implementation of "instance Ord IntSet" that avoids toAscList and walks the tree directly. See #470

m-renaud added the performance label Jan 1, 2018

gereeter mentioned this issue Jan 2, 2018

Rewrite Data.IntMap to be faster and use less memory #340

Open

jwaldmann pushed a commit to jwaldmann/containers that referenced this issue Jul 18, 2019

ansatz for benchmark for haskell#470

7b6ad58

jwaldmann mentioned this issue Jul 20, 2019

export Data.IntSet.Internal.lowestBitSet, highestBitSet, revNat #668

Closed

jwaldmann pushed a commit to jwaldmann/containers that referenced this issue Jul 20, 2019

for haskell#470 (does not handle negative keys correctly)

12b8e48

jwaldmann added a commit to jwaldmann/containers that referenced this issue Jul 21, 2019

for haskell#470 (works for negative keys as well)

c90862e

jwaldmann added a commit to jwaldmann/containers that referenced this issue Jul 21, 2019

use improved compare method in benchmark (for haskell#470)

aa863dd

jwaldmann added a commit to jwaldmann/containers that referenced this issue Jul 22, 2019

for haskell#470

66eaea4

jwaldmann mentioned this issue Jul 22, 2019

test that instances for Eq and Ord agree with going via toAscList #670

Merged

jwaldmann added a commit to jwaldmann/containers that referenced this issue Jul 22, 2019

improved implementation of "instance Ord IntSet"

c0fb190

that avoids toAscList and walks the tree directly. See haskell#470

jwaldmann mentioned this issue Jul 28, 2019

IntSet: reverse bitmap for faster comparison? #674

Open

sjakobi added IntMap IntSet labels Jul 15, 2020

jwaldmann mentioned this issue Jul 20, 2023

better instance Hashable IntSet? #964

Open

meooow25 mentioned this issue Aug 5, 2024

Faster Eq and Ord #1016

Open

8 tasks

meooow25 mentioned this issue Oct 31, 2024

Next release, 0.8 #1059

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Ord instances for IntSet and IntMap #470

Improve Ord instances for IntSet and IntMap #470

treeowl commented Jan 1, 2018

Zemyla commented Jul 17, 2019 •

edited

Loading

treeowl commented Jul 17, 2019 •

edited

Loading

jwaldmann commented Jul 17, 2019

jwaldmann commented Jul 18, 2019 •

edited

Loading

jwaldmann commented Jul 20, 2019

jwaldmann commented Jul 21, 2019

treeowl commented Jul 21, 2019 via email

jwaldmann commented Jul 21, 2019

jwaldmann commented Jul 22, 2019

treeowl commented Jul 22, 2019 via email

jwaldmann commented Jul 22, 2019

treeowl commented Jul 22, 2019 via email

jwaldmann commented Jul 22, 2019

jwaldmann commented Jul 22, 2019 •

edited

Loading

Improve Ord instances for IntSet and IntMap #470

Improve Ord instances for IntSet and IntMap #470

Comments

treeowl commented Jan 1, 2018

Zemyla commented Jul 17, 2019 • edited Loading

treeowl commented Jul 17, 2019 • edited Loading

jwaldmann commented Jul 17, 2019

jwaldmann commented Jul 18, 2019 • edited Loading

jwaldmann commented Jul 20, 2019

jwaldmann commented Jul 21, 2019

treeowl commented Jul 21, 2019 via email

jwaldmann commented Jul 21, 2019

jwaldmann commented Jul 22, 2019

treeowl commented Jul 22, 2019 via email

jwaldmann commented Jul 22, 2019

treeowl commented Jul 22, 2019 via email

jwaldmann commented Jul 22, 2019

jwaldmann commented Jul 22, 2019 • edited Loading

Zemyla commented Jul 17, 2019 •

edited

Loading

treeowl commented Jul 17, 2019 •

edited

Loading

jwaldmann commented Jul 18, 2019 •

edited

Loading

jwaldmann commented Jul 22, 2019 •

edited

Loading