Make your R scripts faster with C code

# Make your R scripts faster with C code

Sometimes, a problem can easily be solved using loops in R, however this way may be very slow. For example, one wants to estimate a quantile by Monte Carlo of :

$S := \sum_{k=1}^{N} X_k,$

where $$N \sim \mathcal{P}(\lambda)$$ and $$X := (X_{k})_{k \geq 1}$$ is a sequence of i.i.d. random variables which follow a log-normal distribution of parameter $$\mu \in \mathbb{R}$$ and $$\sigma > 0$$.

In this case, we need to simulate $$(S_i)_{1 \leq i \leq n}$$. The following R code do the job, however it is slow because of the loop.

The returned execution time is 16.23 on my laptop. We can write a specific full R code without any loop which is faster, however it would need a lot of memory during the computation and would still be much slower than C code.

Note that we use the function rpgm.rlnorm from package our package rpgm in order to be faster in the simulation of log-normal random variables, see package rpgm. With the original rlnorm, the execution time  21.48 .

How to call C code using R library ? First of all, download and install Rtools at https://cran.r-project.org/bin/windows/Rtools/.

We will use RPGM in order to manage easily our C code, but this is not a mandatory. In a project with RPGM, use Create a new C script. You should have by default:

This header calls the definitions of the function from the R intern code. There is two ways to write C functions for R. The first one only deals with vector through the pointers in C. The second use the object from R with the definition used by R in C. The second one is more complicated, we will use the first one. How to write this function ?

Here is the definition of the function without the body.

The first keyword void says that the function does not return anything. int * n is the number of simulations for $$S$$. Note that the special character * is required (it will be always the case), because variables must be sent by pointer. N is the vector of simulations of $$(N_i)_{1 \leq i \leq n}$$, m and s the parameters of the Log-normal distribution. double * S is the variable here which will hold the result. Recall that our function does not return anything: the result is stored through the variable double * S. The variable is send by pointer, this means that if we modify it, it will be also modified outside of the function.

Inside the function, the first loop, i is the i-th simulation. It is stored inside the vector S. Then, we add to S[i] a Log-normal simulation. The Log-normal simulation is done by calling the function rlnorm from the C library for R. Note that this function only returns one value and needs the inverse.

Now, we are ready to compile. If you use RPGM, just click on the Compile button for C (on the right of the list). Else, go the command line of your system, and write R CMD SHLIB myfunction.c. It will create a file to call.

Now, we can write our function in R. For this example, it will be

Here, .C calls a C function. We write as first argument the C function called, and then the arguments of the C function. Note that .C returns a list with each argument passed, and S is the output we want so we add \$S. We added conversion since it is a common error to send double instead of an integer which will lead to a strange behavior.

Finally, before to call the function, we have to lead the C function. Use

The execution time is now 6.28. More than 2.5 times faster the initial loop, despite the fact that we used rpgm.rlnorm function instead of rlnorm, with the original rlnorm, it is 3.5 times faster.

It looks hard at first sight, and the first time it is. But, after several times, it is easy to use. Don’t use it for eveyrhing, only when execution time is a problem and when you identified where it is slow and, moreover, you cannot do more with R.