Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tune.spca() not reproducible #330

Closed
evaham1 opened this issue Oct 23, 2024 · 2 comments
Closed

tune.spca() not reproducible #330

evaham1 opened this issue Oct 23, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@evaham1
Copy link
Collaborator

evaham1 commented Oct 23, 2024


🐞 Describe the bug:

Tune function not reproducible even when setting seed


🔍 reprex results from reproducible example including sessioninfo():

data(multidrug)
X <- multidrug$ABC.trans
set.seed(8589) # for reproducibility with this case study, remove otherwise
test.keepX <- c(seq(5, 25, 5)) # set the number of variable values to be tested

tune.spca.res <- tune.spca(X, ncomp = 3, # generate the first 3 components
                           nrepeat = 5, # repeat the cross-validation 5 times
                           folds = 3, # use 3 folds for the cross-validation
                           test.keepX = test.keepX)
tune.spca.res.2 <- tune.spca(X, ncomp = 3, # generate the first 3 components
                           nrepeat = 5, # repeat the cross-validation 5 times
                           folds = 3, # use 3 folds for the cross-validation
                           test.keepX = test.keepX)

identical(tune.spca.res$cor.comp$comp1$cor.mean, tune.spca.res.2$cor.comp$comp1$cor.mean)

🤔 Expected behavior:

The code above is from a case study - http://mixomics.org/case-studies/spca-multidrug-case-study/
It says to set seed for reproducibility but that doesn't seem to work. We need to be able to pass seed to ensure reproducibility and for unit testing.


💡 Possible solution:
Seed may not be being passed to tune functions

@evaham1 evaham1 added the bug Something isn't working label Oct 23, 2024
@evaham1 evaham1 self-assigned this Oct 23, 2024
@evaham1
Copy link
Collaborator Author

evaham1 commented Oct 23, 2024

this issue is connected to #216 , have started working on updating parallel processes for tune functions but can't properly test them without setting seed

@evaham1
Copy link
Collaborator Author

evaham1 commented Oct 23, 2024

Update: tune.spca is reproducible already, but need to set seed in BPPARAM

After more extensive checks, I can see that tune.spca() is reproducible if the RNGseed is set in the BPPARAM argument. set.seed() before running the function seems to have no effect on reproducibility. The RNGseed works both in serial and in parallel.

## Code taken from test-tune.spca

## set up data
data(srbct)
X <- srbct$gene[1:20, 1:200]
grid.keepX <- seq(5, 35, 10)

## with global seed and RNGseed set is reproducible
set.seed(5212)
object1 <- tune.spca(X,ncomp = 2, folds = 5, test.keepX = grid.keepX, nrepeat = 3,
                    BPPARAM = SerialParam(RNGseed = 5212))
set.seed(5212)
object2 <- tune.spca(X,ncomp = 2, folds = 5, test.keepX = grid.keepX, nrepeat = 3,
                     BPPARAM = SerialParam(RNGseed = 5212))
identical(object1$choice.keepX, object2$choice.keepX) # TRUE
identical(object1$cor.comp$comp1, object2$cor.comp$comp1) # TRUE
identical(object1$cor.comp$comp2, object2$cor.comp$comp2) # TRUE

## with only RNGseed set is reproducible
set.seed(NULL)
object1 <- tune.spca(X,ncomp = 2, folds = 5, test.keepX = grid.keepX, nrepeat = 3,
                     BPPARAM = SerialParam(RNGseed = 5212))
set.seed(NULL)
object2 <- tune.spca(X,ncomp = 2, folds = 5, test.keepX = grid.keepX, nrepeat = 3,
                     BPPARAM = SerialParam(RNGseed = 5212))
identical(object1$choice.keepX, object2$choice.keepX) # TRUE
identical(object1$cor.comp$comp1, object2$cor.comp$comp1) # TRUE
identical(object1$cor.comp$comp2, object2$cor.comp$comp2) # TRUE

## with only set.seed set is NOT reproducible
set.seed(123)
object1 <- tune.spca(X,ncomp = 2, folds = 5, test.keepX = grid.keepX, nrepeat = 3,
                     BPPARAM = SerialParam(RNGseed = NULL))
set.seed(123)
object2 <- tune.spca(X,ncomp = 2, folds = 5, test.keepX = grid.keepX, nrepeat = 3,
                     BPPARAM = SerialParam(RNGseed = NULL))
identical(object1$choice.keepX, object2$choice.keepX) # FALSE
identical(object1$cor.comp$comp1, object2$cor.comp$comp1) # FALSE
identical(object1$cor.comp$comp2, object2$cor.comp$comp2) # FALSE

## setting different RNGseeds gives different results
object1 <- tune.spca(X,ncomp = 2, folds = 5, test.keepX = grid.keepX, nrepeat = 3,
                     BPPARAM = SerialParam(RNGseed = 123))
object2 <- tune.spca(X,ncomp = 2, folds = 5, test.keepX = grid.keepX, nrepeat = 3,
                     BPPARAM = SerialParam(RNGseed = 321))
identical(object1$choice.keepX, object2$choice.keepX) # TRUE
identical(object1$cor.comp$comp1, object2$cor.comp$comp1) # FALSE
identical(object1$cor.comp$comp2, object2$cor.comp$comp2) # FALSE

## RNGseed also reproducible when run on multicore
object1 <- tune.spca(X,ncomp = 2, folds = 5, test.keepX = grid.keepX, nrepeat = 3,
                     BPPARAM = MulticoreParam(RNGseed = 123))
object2 <- tune.spca(X,ncomp = 2, folds = 5, test.keepX = grid.keepX, nrepeat = 3,
                     BPPARAM = MulticoreParam(RNGseed = 123))
identical(object1$choice.keepX, object2$choice.keepX) # TRUE
identical(object1$cor.comp$comp1, object2$cor.comp$comp1) # TRUE
identical(object1$cor.comp$comp2, object2$cor.comp$comp2) # TRUE

@evaham1 evaham1 changed the title Tune not reproducible Tune.spca() not reproducible Oct 23, 2024
@evaham1 evaham1 changed the title Tune.spca() not reproducible tune.spca() not reproducible Oct 23, 2024
@evaham1 evaham1 closed this as completed Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant