Releases: osg-bosco/GridR
Adding Debian 7 support
In this release, we add Debian 7 support (#15)
Boostrap of condor.local service
In this release, we add a new feature to Bosco, bootstrapping for the condor.local
service. This can be used when submitting to distributed systems, such as GlideinWMS. Full details of the change can be found in #14.
This addition has been tested with GlideinWMS running on the Open Science Grid.
Batch and Debian Bug Fix
Python 2.4 Bug Fix
Custom R Edition
Summary
In this release, easily the most complicated changes since the beginning of the modifications, we have made some much requested additions:
- #3 - It possible for users to specify custom packages to be installed on the remote machine.
- #4 - A user may specify a custom R URL to download the R binary from.
- #7 - The R bootstrap can now download a newer version of the R binaries if they are available.
- #6 - When running a batched job, the outputs in the list will be updated asynchronously as they complete.
Updated Default Download
Since more people are using GridR, we have updated the default download URL to one that can handle more traffic.
Custom Packages
When the user initializes GridR, they are now able to specify packages that will be installed on the remote machine.
> grid.init(service="bosco.direct", localTmpDir="tmp",
remotePackages=c("<path to package.tar.gz>", "<package2>", "<package3>"))
The packages listed in the remotePackages
argument will be sent with the jobs and installed on the remote cluster. They will automatically be available in the R environment when your processing begins.
The packages must be in source form. They will be installed with the command:
> R CMD INSTALL --build <package>
Custom R Installation
In addition to installing custom packages, users may also want to have a custom version of R itself. In this case, the user can specify the HTTP URL of another R binary tar ball. It will be downloaded on the worker node and the user's processing will be executed using this custom R installation.
User will give the argument Rurl
to grid.init
. The Rurl
will then be used to download and start R on the worker node.
For example, the user would do:
> library("GridR")
> grid.init(service="bosco.direct", localTmpDir="tmp", Rurl="http://asdf/R-new.tar.gz")
> grid.apply("x", a)
Creating the R tarball
The R tar ball needs to be created to include the user's custom packages. Additionally, the R executable needs to be made portable. Documentation can be found on the Wiki. You may view this blog post. A working example can be seen from Dropbox.
Installing custom packages
Installing custom packages can be done by:
- Download or create a functional, portable, R tar ball.
- Install the custom package(s).
- Tar back the package.
Bug Fix release 0.9.7
Batch Apply Release
This release implements the proper batching of apply statements. This means you can parallelize a function by passing a vector to the grid.apply
function.
Example
In this next example, we will run a very simple function to illustrate how to use the batch apply. We will multiply a number by 2:
> a<-function(s){return(s*2)}
> library("GridR", lib.loc="/Library/Frameworks/R.framework/Versions/2.15/Resources/library")
Loading required package: codetools
> grid.init(service="bosco.direct", localTmpDir="tmp")
> grid.apply("x", a, c(1:10), batch=c(1))
In this example, we created a vector using the c(1:10)
that contains the numbers 1 to 10. Next, we call the grid.apply
function to call the function a
against every element in the vector. The grid.apply
function sends each of these function calls to a separate processor on the remote Bosco connected cluster, parallelizing the execution. After the function is complete, we can access the result in the variable x
:
> x
Grid job finished, result written to variable x
[[1]]
[1] 2
[[2]]
[1] 4
[[3]]
[1] 6
[[4]]
[1] 8
[[5]]
[1] 10
...
Fixed Issues
- Fixed CAMPUS-120 - Add proper support for Apply functionality