Paralell processing map()

In this post, we yet again build on the former post, this time to understand parallel processing a bit more. Remember the poll from the last post, where we calculated CI:s from a poll?

Where the results looked like this?

Example poll
Party # of votes Share
C 80 8.00%
KD 59 5.90%
L 37 3.70%
M 211 21.10%
MP 34 3.40%
Other/no reply 14 1.40%
S 282 28.20%
SD 179 17.90%
V 104 10.40%
Data from here

Good. In it, we calculated the CI:s using purrr and a function we’d written that used infer to bootstrap and calculate CI:s. That was good, and we managed to cut down on the amount of code we wrote, quite a bit.

If you try it yourself however, you’ll realise that the calculation takes a while. Not an absurd amount of time, especially when compared to other calculations that takes time for real, but it works as a good intro to parallel processing. Let’s take a look at how long the process takes, using the package tictoc, which measures the processor time it takes to go from the tic() to the toc()

set.seed(123)
library(tictoc)
tic()
complete_CI <- map_dfr(unique(voter$party), ci_calculation, voter)
toc()
## 10.95 sec elapsed

Not to slow, but imagine if you had a larger dataset with 10 000 votes instead, or you wanted to run far more simulations, then the time adds up. Instead, we can use a variation of the map_*() function, that takes each calculation and runs it in parallel instead of sequence. Here it’ll probably take a bit longer due to the plan(multiprocess), which takes some extra time to set up, but in a larger setting you’ll save time.

set.seed(123)
plan(multiprocess)
tic()
complete_CI_paralell <- future_map_dfr(unique(voter$party), ci_calculation, voter)
toc()
## 4.693 sec elapsed

There we go! Just a bit more than half the time! Just for fun, let’s look at how much time we actually saved, by putting the plan(multiprocess) between the tictoc.

set.seed(123)
tic()
plan(multiprocess)
complete_CI_paralell <- future_map_dfr(unique(voter$party), ci_calculation, voter)
toc()
## 8.596 sec elapsed

So only around a second here, but still, better! And more importantly, we’ve learned something.

Leo Carlsson
Leo Carlsson

My research interests include science, statistics and politics.

Related