Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-repeated and non-duplicated combinations from groups with varying sizes that contain overlaps #50

Open
delt87 opened this issue Jan 24, 2024 · 6 comments

Comments

@delt87
Copy link

delt87 commented Jan 24, 2024

Is it possible to achieve the results below using either comboGeneral or comboGroups? (I'm not sure which one is the right function based on the few attempts I've made so far). I would like the end result to have no repetitions and duplications from any of the sets. Using expand.grid is too slow as I have to manually adjust for repetitions and duplications and I would also like to take advantage of the FUN argument in your library to do some other things to the result. Thank you for this great library!

  set1 <- c("Paul G","Mike B","Steph D","Ron B","Ryan R")
  set2 <- c("Ron B","Ryan R","Matt B","Joe D")
  set3 <- c("Ryan R","Steven Z","Dennis D","Mike A")
  set4 <- c("Mike A","Steven Z","Luka E","Francis B","Tom A")
  set5 <- c("Ross U","Bobby A","Max Z","Flori A")
  
  all_combos <- expand.grid(set1, set1, set2, set2, set3, set3, set4, set4, set5)
  all_combos <- all_combos[1:25000,]
  all_combos <- all_combos[!duplicated(t(apply(all_combos, 1, sort))), ]
  all_combos$dups <- apply(all_combos, 1, function(i) any(duplicated(i[!is.na(i)])))
  all_combos <- all_combos[all_combos$dups==FALSE,][,-c(10)]

>   all_combos[1:2,]
        Var1   Var2   Var3  Var4     Var5   Var6     Var7   Var8   Var9
7252  Mike B Paul G Matt B Ron B Dennis D Ryan R Steven Z Mike A Ross U
7253 Steph D Paul G Matt B Ron B Dennis D Ryan R Steven Z Mike A Ross U
@delt87 delt87 changed the title Non-repeated and non-duplicated combinations from groups with varying sizes and overlaps Non-repeated and non-duplicated combinations from groups with varying sizes that contain overlaps Jan 24, 2024
@jwood000
Copy link
Owner

Hi @delt87,

What you are looking for is comboGrid.

system.time(all_combos_algos <- RcppAlgos::comboGrid(
    set1, set1, set2, set2, set3, set3, set4, set4, set5,
    repetition = FALSE
))
#>  user  system elapsed 
#> 0.001   0.000   0.001

dim(all_combos_algos)
#> [1] 2400    9

head(all_combos_algos)
#>          Var1     Var2     Var3    Var4     Var5       Var6     Var7        Var8       Var9     
#> [1,] "Mike B" "Paul G" "Joe D" "Matt B" "Dennis D" "Mike A" "Francis B" "Luka E"   "Bobby A"
#> [2,] "Mike B" "Paul G" "Joe D" "Matt B" "Dennis D" "Mike A" "Francis B" "Luka E"   "Flori A"
#> [3,] "Mike B" "Paul G" "Joe D" "Matt B" "Dennis D" "Mike A" "Francis B" "Luka E"   "Max Z"  
#> [4,] "Mike B" "Paul G" "Joe D" "Matt B" "Dennis D" "Mike A" "Francis B" "Luka E"   "Ross U" 
#> [5,] "Mike B" "Paul G" "Joe D" "Matt B" "Dennis D" "Mike A" "Francis B" "Steven Z" "Bobby A"
#> [6,] "Mike B" "Paul G" "Joe D" "Matt B" "Dennis D" "Mike A" "Francis B" "Steven Z" "Flori A"

I ran your example above without subsetting (i.e. I removed the line all_combos <- all_combos[1:25000,]) so as to see the entire output and I obtained 2400 results.

For more information check out Cartesian Product where Order does not Matter.

Hope this helps!

Regards,
Joseph

@delt87
Copy link
Author

delt87 commented Jan 24, 2024

Thank you for the assist. That's exactly what I'm looking for. Though it appears that unfortunately comboGrid does not have the lower and upper arguments that I've been using in the other combo functions for problems where the number of combinations is too large to generate all in one go. I'll have to think about how to properly do this in chunks since each of my sets can have anywhere from 20-90 values.

Edit:
It would be great if in a future release, the comboGrid function could have the lower and upper arguments like some of the other functions to make large combinations easier to generate.

@jwood000
Copy link
Owner

@delt87,

You are spot on. I have spent many a night thinking about this problem.

As you can imagine, the underlying algorithm is non-trivial. It doesn't generate them one by one, however that is the ultimate goal.

Much of the current algorithm is inspired by answers to this question: Picking unordered combinations from pools with overlap.

I won't close this for now and take this as a moment for inspiration.

Thanks for the nudge

@jwood000
Copy link
Owner

@delt87,

Just wanted to give an update.

I've been working on a sketch for this for some time and I'm confident we are getting close to having this in production.

@delt87
Copy link
Author

delt87 commented Mar 28, 2024

I'm sure whatever release you come up with will be amazing. Looking forward to it. Also, I did a quick look in your github page to see if you had a 'donate to developer' option set up anywhere but I couldn't find one. Feel free to point me in the right direction if you have something like that for the work you put into this package.

@jwood000
Copy link
Owner

@delt87,

Sorry for the late reply and thank you for the kind words.

You are right, there is nothing set up for monetary contributions, but I really do appreciate the sentiment.

... back to work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants