Improve performance of generating distinct interactions · upsetjs/upsetjs_r@12f306f

Commit

Improve performance of generating distinct interactions

When generating distinct intersections on data with hundreds of
thousands of elements, it grinds to a halt. The time seems to be
roughly O(n^2), meaning that with double the data execition takes
2^2=4x times as long. With the help of profviz, we find the main
source to be a Filter in pushCombination(), which causes a twice
nested loop over the elements.

Minimal benchmark on a fairly beefy computer (5950X, 128 GB RAM)
on Fedora Linux, R 4.1.3 and upsetjs 1.11.0, git hash 4b375a8

```
generate_data <- function(n) {
  tibble::tibble(
    col_0 = sample(c(0, 1), n, replace = TRUE),
    col_1 = sample(c(0, 1), n, replace = TRUE),
    col_2 = sample(c(0, 1), n, replace = TRUE),
    col_3 = sample(c(0, 1), n, replace = TRUE),
    col_4 = sample(c(0, 1), n, replace = TRUE),
    col_5 = sample(c(0, 1), n, replace = TRUE),
    col_6 = sample(c(0, 1), n, replace = TRUE),
    col_7 = sample(c(0, 1), n, replace = TRUE),
    col_8 = sample(c(0, 1), n, replace = TRUE),
    col_9 = sample(c(0, 1), n, replace = TRUE)
  )
}
```

Before this PR:

```
> start <- Sys.time()
> upsetjs() |>
+     upsetjs:::fromDataFrame(generate_data(10000)) |>
+     upsetjs:::generateDistinctIntersections(limit = 5)
> Sys.time() - start
Time difference of 24.85004 secs
```

With this PR:

```
> start <- Sys.time()
> upsetjs() |>
+     upsetjs:::fromDataFrame(generate_data(10000)) |>
+     upsetjs:::generateDistinctIntersections(limit = 5)
> Sys.time() - start
Time difference of 0.7690187 secs
```

Also, scaling is now closer to O(n) or slightly better.
With 10x the data:

```
> start <- Sys.time()
> upsetjs() |>
+     upsetjs:::fromDataFrame(generate_data(100000)) |>
+     upsetjs:::generateDistinctIntersections(limit = 5)
> Sys.time() - start
Time difference of 5.745839 secs
```

Loading branch information

halhen committed Jul 7, 2022

1 parent 4b375a8 commit 12f306f

R/data-helpers.R

-Original file line number
+Diff line change
@@ Expand Up / @@ -118,14 +118,11 @@ generateCombinationsImpl <- function(sets, @@
         otherSets <- Filter(function(ss) {
           !(ss$name %in% s$setNames)
         }, sets)
-        dElems <- Filter(function(e) {
-          for (o in otherSets) {
-            if (e %in% o$elems) {
-              return(FALSE)
-            }
-          }
-          TRUE
-        }, s$elems)
+        dElems <- s$elems
+        for (o in otherSets) {
+          dElems <- setdiff(dElems, o$elems)
+        }
         if (s$cardinality == length(dElems)) {
           combinations <<- c(combinations, list(s))
@@ Expand Down @@

0 comments on commit `12f306f`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `12f306f`

Commit

There are no files selected for viewing

0 comments on commit 12f306f

0 comments on commit `12f306f`