Currently this is a barely working prototype.
Calling R functions from Nim works reasonably well, if basic Nim types are used. Both named and unnamed function arguments are supported.
The R SEXP
object can be converted into all Nim types, which are
supported in the other direction.
Interfacing with shared libraries written in Nim works for basic
types. See the tNimFromR.nim
and tCallNimFromR.R
files for an
example in tests
.
Intefacing with R from Nim works by making use of the Rembedded.h
functionality, which effectively launches a silent, embedded R repl.
This repl is then fed with S expressions to be evaluated. The S expression is the basic data type on the C side of R. Essentially everything is mapped to different kinds of S expressions, be it symbols, functions, simple data types, vectors etc.
This library aims to hide both the data conversions and memory handling from the user.
This means that typically one sets up the R repl, does some calls to R and finally shuts down the R repl again:
let R = setupR()
# some or many calls to R functions
teardown(R)
The returned R
object is essentially just a dummy object, which is
used to help with overload resolution (we want untyped
templates to
allow calling and R function by ident without having to manually wrap
them) and it keeps track of the state of the repl.
In order to not have to call the teardown
procedure manually, there
are two options:
- a
withR
template, which takes a block of code and injects a variableR
into its calling scope. The repl will be shut down when leaving its scope - by compiling with
--gc:arc
or--gc:orc
. In that case we can define a proper destructor, which will be automatically called when theR
variable runs out of scope and is destroyed.
Note two things:
- in principle there is a finalizer defined for the non ARC / ORC
case, which performs the same duty. However, at least according to
my understanding, it’s run whenever the GC decides to collect the
R
variable. This might not be very convenient. - I don’t know whether it’s an inherent limitation of the embedded R repl, but it seems like one cannot destroy an R repl and construct a new one. If one tries, one is greeted by
R is already initialized
message.
The above out of the way, let’s look at the basic things currently possible.
For clarity I will annotate the types even where not required.
import rnim
let R = setupR()
# perform a call to the R stdlib function `sum`, by using
# the `.()` dot call template and handing a normal Nim seq
let res: SEXP = R.sum(@[1, 2, 3])
# the result is a `SEXP`, the basic R data type. We can now
# use the `to` proc to get a Nim type from it:
doAssert res.to(int) == 6
Some functions, which have atypical names may not be possible to call
via the dot call template. In that case, we can call the underlying
macro directly, called callEval
(possibly name change incoming…):
doAssert callEval(`+`, 4.5, 10.5).to(float) == 15.0
This also showcases that functions taking multiple arguments work as expected. At the moment we’re limited to 6 arguments (there’s specific C functions to construct calls up to 6 arguments. Need to implement arbitrary numbers manually).
Also named arguments are supported. Let’s use the seq
function as an
example, the more general version of the :
operator in R
(e.g. 1:5
):
check R.seq(1, 10, by = 2).to(seq[int]) == toSeq(countup(1, 10, 2))
As we can see, we can also convert SEXPs
containing vectors back to
Nim sequences.
Finally, we can also source from arbitrary R files. Assuming we have
some R file foo.R
:
hello <- function(name) {
return(paste(c("Hello", name), sep = " ", collapse = " "))
}
From Nim we can then call it via:
import rnim
# first set up an R interpreter
let R = setupR()
# now source the file
R.source("foo.R")
# and now we can call R functions defined in the sourced file
doAssert R.hello("User").to(string) == "Hello User"
That covers the most basic functionality in place so far.
Arrays are always a special case, as they are usually the main source of computational work. Avoiding unnecessary copies of arrays is important to keep performance high.
To provide a no-copy interface to data arrays (R vectors) from R,
there are two types to help: NumericVector[T]
and RawVector[T]
.
They provide a nice Nim interface to work with such numerical data.
Any R SEXP
can be converted to either of these two types. If the
corresponding SEXP
does not correspond to a vector, an exception
will be thrown at runtime.
These types internally simply keep a copy of the underlying data array
in the SEXP
.
From a usability standpoint NumericVector[T]
is the main type that
should be used. RawVector[T]
simply provides a slightly lower
wrapper, which is however more restrictive.
A RawVector[T]
can only be constructed for: cint, int32, float,
cdouble
. This is because the underlying R SEXP
come only in two
types: INTSXP
and REALSXP
, the former stores 32-bit integers and
the latter 64-bit floats (technically afaik the platform specific
size, so 32-bit floats on a 32-bit machine. The inverse is not the
case for INTSXP
though!). There is no way to treat a REALSXP
vector as a RawVector[int32]
for instance.
This is where NumericVector[T]
comes in. It can be constructed for
all numerical types larger or equal to 32-bit in size (to avoid loss
of information when constructing from a SEXP
). Unsigned integers
so far are also not supported.
A short example:
import rnim
let R = setupR()
let x = @[1, 2, 3]
let xR: SEXP = x.nimToR # types for clarity
var nv = initNumericVector[int](xR)
# `nv` is now a vector pointing to the same data as `xR`
# we can access individual elements:
echo nv[1] # 2
# modify elements:
nv[2] = 5
# check its length
doAssert nv.len == 3
# iterate over it
for i in 0 .. nv.high:
echo nv[i]
for x in nv:
echo x
for i, x in nv:
echo "Index ", i, " contains ", x
# compare them:
doAssert nv == nv
# and print them:
echo nv # NumericVector[int](len: 3, kind: vkFloat, data: [1, 2, 5])
# as `xR` contains the same memory location, constructing another vector
# and comparing them yields `true`, even though we modified `nv`
let nv2 = initNumericVector[int](xR)
doAssert nv == nv2
# finally we can also construct a `NumericVector` straight from a Nim sequence
let nv3 = @[1.5, 2.5, 3.5].toNumericVector()
echo nv3
If you ran this code you will see a message:
Interpreting input vector of type `REALSXP` as int loses information!
This is because we first constructed a SEXP
from a 64-bit integer
sequence in Nim. As mentioned before, 64-bit integers do not
exist. Therefore, the xR SEXP
above is actually stored in a
REALSXP
. By constructing a NumericVector[int]
we tell the Nim
compiler we wish to convert from and to int
, no matter the
underlying type of the SEXP
array, i.e. INTSXP
or REALSXP
. The
message simply makes you aware that this is happening (it may be taken
out in the future).
The fact that this conversion happens internally is the reason for the
existence of RawVector
, which explicitly disallows this.
Further, NumericVector
is actually a variant object. Depending on
the runtime type of the SEXP
from which we construct a SEXP
the
correct branch of the variant object will be filled.
For extremely performance sensitive application it may thus be
preferable to have a type where variant kind checks and possible type
conversions do not happen.
As mentioned in the previous secton, some function names are weird and
require the user to use callEval
directly.
To make calling such functions a bit nicer, there is an Rctx
macro,
which allows for directly calling R functions with e.g. dots in their
names, and also allows for assignments.
let x = @[5, 10, 15]
let y = @[2.0, 4.0, 6.0]
var df: SEXP
Rctx:
df = data.frame(Col1 = x, Col2 = y)
let df2 = data.frame(Col1 = x, Col2 = y)
print("Hello from R")
where both df
as well as df2
will then store an equivalent data
frame. The last line shows that it’s also possible to use this macro
to avoid the need to discard all R calls.
Nim can be used to write extensions for R. This is done by compiling a
Nim file as a shared library and calling it in R using the .Call
interface.
An example can be seen from the tests:
- https://github.com/SciNim/rnim/blob/master/tests/tNimFromR.nim the Nim file that is compiled to a shared library
- https://github.com/SciNim/rnim/blob/master/tests/tCallNimFromR.R the corresponding R file that wraps the shared library
In the near future the latter R file will be auto generated by the Nim code at compile time.
The basic idea is as follows. Assume you want to write an extension that adds two numbers in Nim to be called from R.
You write a Nim file with the desired procedure and attach the
{.exportR.}
pragma as follows:
myRmodule.nim
:
import rnim
proc addNumbers*(x, y: SEXP): SEXP {.exportR.} =
## adds two numbers. We will treat them as floats
let xNim = x.to(float)
let yNim = y.to(float)
result = (x + y).nimToR
Note the usage of SEXP
as the input and output types. In the future
the conversions (and possibly non copy access) will be automated. For
now we have to convert manually to and from Nim types.
This file is compiled as follows:
nim c (-d:danger) --app:lib (--gc:arc) myRModule.nim
where the danger
and ARC
usage are of course optional (but ARC/ORC
is recommended).
This will generate a libmyRmodule.so
. The resulting shared library
in principle needs to be manually loaded via dyn.load
in R and each
procedure in it needs to be called using the .Call
interface.
Fortunately, this can be automated easily. Therefore, when compiling such a shared library, we automatically emit an R wrapper, that has the same name as the input Nim file. So the following file is generated:
myRmodule.R
:
dyn.load("libmyRmodule.so")
addNumbers <- function(a, b) {
return(.Call("addNumbers", a, b))
}
This file can now be sourced from the R interpreter (using the
source
function) or in an R script and then addNumbers
is usable
and will execute the compiled Nim code!
Note that the autogeneration logic assumes the shared library and the
generated R script will live in the same directory. If you wish to
move one, you might have to adjust the paths that perform the
dyn.load
command!
To try out the functionality of calling R from Nim, you need to meet a few prerequisites.
- a working R installation with a
libR.so
shared library - the shell environment variable
R_HOME
needs to be defined and has to point to the directory which contains the full R directory structure. That is not the path where the R binary lies! Finally, thelibR.so
has to be findable for dynamic loading. On my machine the path of it by default isn’t added told
via/etc/ld.so.conf.d
(for the time being I just defineLD_LIBRARY_PATH
Setup on my machine:which R echo $R_HOME echo $LD_LIBRARY_PATH
/usr/bin/R /usr/lib/R /usr/lib/R/lib
An easy way to set the R_HOME
variable is by asking R about it:
R RHOME
returns the correct path. We can use that to set the R_HOME
variable:
export R_HOME=`R RHOME`
export LD_LIBRARY_PATH=$R_HOME/lib # maybe not required on your system
- a working R installation with a
R.dll
shared library - the shell environment variable
R_HOME
needs to be defined and has to point to the directory which contains the full R directory structure. That is not the path where the R binary lies! Example setup:where R.dll set R_HOME
C:\Program Files\R\R-4.0.4\bin\x64\R.dll R_HOME=C:\Program Files\R\R-4.0.4
Run the test file:
nim c -r tests/tRfromNim.nim