My illustration of a well-known “R versus Python” competition in terms of the time they require to loop and generate pseudo-random numbers.

To accomplish the task, the following steps were performed in Python and R:

    loop 100k times (i is the loop index)

      generate a random integer number out of the array of integers from 1 to the current loop index i (i+1 for Python)
      output elapsed time at the probe loop steps: i (i+1 for Python) in [1,10,100,1000,5000,10000,25000,50000,75000,100000]

The result is presented on the plot below.

    The following conclusions can be drawn:

  • Python is indeed faster than R, when the number of iterations is less than 1000. Below 100 steps, python is up to 8 times faster than R, while if the number of steps is higher than 1000, R beats Python when using lapply function!
  • Try to avoid using "for" loop in R, especially when the number of looping steps is higher than 1000. Use the functoins lapply/sapply/vapply instead.
  • Timing runaway of the R "for" loop starts at 10k looping steps.

The snippets of the code used for the task are below.

Python code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
from numpy import random as rand
import datetime as dt

#number of the loop iterations
n_elements = int(1e5)
#probe points
x = [1,10,100,1000,5000,10000,25000,50000,75000,100000]

#for loop
t = dt.datetime.now()
vec = []
elapsed = []

for i in range(n_elements):
    vec.append(rand.choice(i+1, size=1, replace=True))
    if i+1 in x:
        elapsed.append((dt.datetime.now() - t).total_seconds())

R code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
library(magrittr)
#number of the loop iterations
n_elements <- 1e5
#probe points
x <- c(1,10,100,1000,5000,10000,25000,50000,75000,100000)

#for loop
t <- Sys.time()
vec <- NULL
elapsed <- NULL

for (i in seq_len(n_elements))
{
    vec <- c(vec, sample(i, size = 1, replace = T))
    if(i %in% x)
        elapsed <- c(elapsed, as.numeric(difftime(Sys.time(), t, 'secs')))
}

#lapply function
t <- Sys.time()
vec <- NULL
elapsed_sapply <- lapply(seq_len(n_elements), function(i) {
    vec <- c(vec, sample(i, size = 1, replace = T))
    if(i %in% x)
        return(as.numeric(difftime(Sys.time(), t, 'secs')))
}) %>% Filter(Negate(is.null), .) %>% unlist()

Plot code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
<script src="https://cdn.plot.ly/plotly-latest.min.js"></script>
<div id="r_vs_py_loops" align="center"><!-- Plotly is being drawn here --></div>
<script>
var xDat = [1,10,100,1000,5000,10000,25000,50000,75000,100000];
var yDat = [[0.000655,0.000827,0.00184,0.012137,0.052789,0.101116,0.248447,0.435075,0.622154,0.808063],
            [0.008551,0.008749,0.009103,0.015438,0.057638,0.151765,0.717195,2.663847,6.110421,10.980278],
            [0.002781,0.005598,0.005958,0.00921,0.05569,0.075909,0.134087,0.231207,0.327564,0.426327]
          ];
var labels = ['Python: "for" loop', 'R: "for" loop', 'R: lapply'];
var data = [];
for(var i = 0; i<3; i++) {
  var trace = {
  x: xDat,
    y: yDat[i],
    mode: 'lines+markers',
    line: {      
      width: 2
    },
    name: labels[i]
};
data.push(trace);
};

var layout = {
  showlegend: true,
  height: 600,
  width: 800,
  title: 'R v. Python: Loops, required Time Comparison',
  xaxis: {
    title: 'Number of steps in the loop',
    type: 'log',
    autorange: true
  },
  yaxis: {
    title: 'Elapsed time [s]',
    autorange: false,
    range: [0, 10]
  }
}
Plotly.newPlot('r_vs_py_loops', data, layout);
</script>