Extending `BPPARAM` usage to all `tune()` variants #216

Max-Bladen · 2022-05-11T01:23:14Z

After some initial exploration as part of rectifying problems noted in Issue #214, the inconsistency of the BiocParallel usage across all the tune() variants was noticed. In my exploration of this wider issue, I have come to the follow conclusions:

tune
- this is the wrapper function which can engage the usage of all below function, save for tune.block.splsda for some reason. tune() still takes cpus. It handles the subsequent functions as such:
  - tune.mint.splsda() - not passed any parallelisation parameters due to the forced use of LOOCV. Hence, it cannot be parallelised. Nothing to change here
    - if called directly: totally fine due to LOOCV
  - tune.rcc() - not passed any parallelisation parameters due to the lack of any implementation of parallelisation within this function. Once tune.rcc() can handle multiple CPU usage, this should be adjusted. Therefore, this will be left for now
    - if called directly: cannot use any parallelisation
  - tune.pca() - not passed any parallelisation parameters due to this function not involving any repeated cross validation. It merely calculates the explained variance per component. Nothing to change here
    - if called directly: cannot use any parallelisation
  - tune.spca() - not passed any parallelisation parameters despite the function having BiocParallel implementation. Need changes here
    - if called directly: must use BPPARAM and not cpus
  - tune.splsda() - passed cpus directly from input. Need changes here
    - if called directly: uses cpus and has no BiocParallel implementation. Uses its own methodology, involving makePSOCKcluster()/makeForkCluster(). Need to do more exploration here
  - tune.spls() - prior to function call, BPPARAM object generated using cpus to control workers count. This BiocParallel object is passed to the function Need changes here
    - if called directly: must use BPPARAM and not cpus
  - tune.splslevel() - not passed any parallelisation parameters due to a lack of parallelisation implementation. Probably not worth spending time on, but if so then need changes here*
    - if called directly: cannot use any parallelisation
tune.block.splsda
- this is not wrapped by tune() for some reason. It does take BPPARAM rather than cpus and applies it properly

These conclusions were drawn from looking at the source code as well as running these function while observing any errors (eg unused argument (BPPARAM = param)) AND runtime when using 1 vs 14 cores.

Note to self: when creating a new branch from this issue, fork it from branch issue-214 to carry over changes to tune.spls()

The text was updated successfully, but these errors were encountered:

plopez842 · 2024-02-19T00:39:42Z

@Max-Bladen any updates, would love to use this feature

evaham1 · 2024-10-22T23:09:52Z

Restarting work on this issue - tune.spls() parallelisation works independently and within the tune() wrapper

First, looking at tune.spls. Testing on local computer comparing serial computation versus parallel with 4 workers shows that this function takes BPPARAM args and uses them to speed up processing times both when used independently as tune.spls and when called in the wrapper function tune. BPPARAM arg is passed to the internal function bplapply.

-> Nothing needs to be done for tune.spls()

## Data set up
data(liver.toxicity)
X <- liver.toxicity$gene
Y <- liver.toxicity$clinic
set.seed(42)
library(BiocParallel)

## Comparing execution time of tune.spls in serial and parallel
serial_time <- system.time(
  tune_res_serial <- tune.spls(X, Y, ncomp = 3, 
                               test.keepX = c(5, 10, 15), 
                               test.keepY = c(3, 6, 8), 
                               folds = 5,
                               BPPARAM = SerialParam())
)
parallel_time <- system.time(
  tune_res_serial <- tune.spls(X, Y, ncomp = 3, 
                               test.keepX = c(5, 10, 15), 
                               test.keepY = c(3, 6, 8), 
                               folds = 5,
                               BPPARAM = MulticoreParam(workers = 4))
)
print(serial_time)
# user  system elapsed 
# 10.851   0.351  11.396 
print(parallel_time)
# user  system elapsed 
# 11.993   1.787   5.060 

## Comparing execution time of tune.spls wrapped in tune in serial and parallel
serial_time <- system.time(
  tune_res_serial <- tune(method = "spls",
                              X, Y, ncomp = 3, 
                               test.keepX = c(5, 10, 15), 
                               test.keepY = c(3, 6, 8), 
                               folds = 5,
                               BPPARAM = SerialParam())
)
parallel_time <- system.time(
  tune_res_parallel <- tune(method = "spls",
                          X, Y, ncomp = 3, 
                          test.keepX = c(5, 10, 15), 
                          test.keepY = c(3, 6, 8), 
                          folds = 5,
                          BPPARAM = MulticoreParam(workers = 4))
)
print(serial_time)
# user  system elapsed 
# 10.727   0.356  11.445 
print(parallel_time)
# user  system elapsed 
# 10.235   2.424   5.719

evaham1 · 2024-10-22T23:42:21Z

tune.splsda parallelisation works but uses old cpu and parallel arguments rather than BiocParallel

--> Need to update this so runs using BPPARAM arg rather than manually creating cluster

data(breast.tumors)
X = breast.tumors$gene.exp
Y = as.factor(breast.tumors$sample$treatment)

serial_time <- system.time(
  tune_res_serial <- tune.splsda(X, Y, ncomp = 5, nrepeat = 50, logratio = "none",
                   test.keepX = c(5, 10, 15), folds = 10, dist = "max.dist",
                   cpus = 1)
)
parallel_time <- system.time(
  tune_res_parallel <- tune.splsda(X, Y, ncomp = 5, nrepeat = 50, logratio = "none",
                                 test.keepX = c(5, 10, 15), folds = 10, dist = "max.dist",
                                 cpus = 4)
)
print(serial_time)
# user  system elapsed 
# 72.164   6.713  58.117 
print(parallel_time)
# user  system elapsed 
# 5.586   3.039  43.183

evaham1 · 2024-10-23T01:54:03Z

tune.spca() parallelisation works using BPPARAM argument.
For this test data only found time difference when using high number of nrepeat

--> Nothing needs to be done for tune.spca()

data("nutrimouse")
set.seed(42)
nrepeat <- 5

serial_time <- system.time(
  tune.spca.res.serial <- tune.spca(X = nutrimouse$lipid, ncomp = 2, folds = 3, test.keepX = seq(5, 15, 5),
                           nrepeat = 500, BPPARAM = SerialParam())
)

parallel_time <- system.time(
  tune.spca.res.parallel <- tune.spca(X = nutrimouse$lipid, ncomp = 2, folds = 3, test.keepX = seq(5, 15, 5),
                           nrepeat = 500, BPPARAM = MulticoreParam(workers = 4))
)

print(serial_time)
# user  system elapsed 
# 8.975   0.137   9.161
print(parallel_time)
# user  system elapsed 
# 9.524   1.030   3.919

Max-Bladen added the feature-request Can be implemented if there's enough interest label May 11, 2022

Max-Bladen assigned aljabadi May 11, 2022

Max-Bladen mentioned this issue May 11, 2022

No parallelization in tune.spls #214

Closed

Max-Bladen assigned Max-Bladen and unassigned aljabadi May 11, 2022

Max-Bladen added the wip work-in-progress label May 17, 2022

Max-Bladen removed the wip work-in-progress label May 25, 2022

Max-Bladen added the wip work-in-progress label Dec 19, 2022

evaham1 mentioned this issue Oct 23, 2024

tune.spca() not reproducible #330

Closed

evaham1 linked a pull request Oct 23, 2024 that will close this issue

Issue 216 - BPPARAM in tune and perf functions #333

Merged

evaham1 self-assigned this Oct 28, 2024

evaham1 closed this as completed in #333 Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extending `BPPARAM` usage to all `tune()` variants #216

Extending `BPPARAM` usage to all `tune()` variants #216

Max-Bladen commented May 11, 2022 •

edited

Loading

plopez842 commented Feb 19, 2024

evaham1 commented Oct 22, 2024 •

edited

Loading

evaham1 commented Oct 22, 2024

evaham1 commented Oct 23, 2024

Extending BPPARAM usage to all tune() variants #216

Extending BPPARAM usage to all tune() variants #216

Comments

Max-Bladen commented May 11, 2022 • edited Loading

plopez842 commented Feb 19, 2024

evaham1 commented Oct 22, 2024 • edited Loading

evaham1 commented Oct 22, 2024

evaham1 commented Oct 23, 2024

Extending `BPPARAM` usage to all `tune()` variants #216

Extending `BPPARAM` usage to all `tune()` variants #216

Max-Bladen commented May 11, 2022 •

edited

Loading

evaham1 commented Oct 22, 2024 •

edited

Loading