The “Happy Scientist” Semminar Series #5
A brief introduction to using R for high-performance computing

George Vega Yon Garrett Weaver
vegayon@usc.edu gmweaver@usc.edu


Department of Preventive Medicine
March 23, 2017

Agenda

  1. High-Performance: An overview

  2. Parallel computing in R

  3. Examples:

    1. parallel
    2. iterators+foreach
    3. RcppArmadillo + OpenMP

High-Performance Computing: An overview

Loosely, from R’s perspective, we can think of HPC in terms of two, maybe three things:

  1. Big data: How to work with data that doesn’t fit your computer

  2. Parallel computing: How to take advantage of multiple core systems

  3. Compiled code: Write your own low-level code (if R doesn’t has it yet…)

Big Data

Parallel computing

Flynn's Classical Taxonomy ([Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National Laboratory](https://computing.llnl.gov/tutorials/parallel_comp/#Whatis))

Flynn’s Classical Taxonomy (Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National Laboratory)

GPU vs CPU

[NVIDIA Blog](http://www.nvidia.com/object/what-is-gpu-computing.html)

NVIDIA Blog

Why are we still using CPUs instead of GPUs?

GPUs have far more processor cores than CPUs, but because each GPU core runs significantly slower than a CPU core and do not have the features needed for modern operating systems, they are not appropriate for performing most of the processing in everyday computing. They are most suited to compute-intensive operations such as video processing and physics simulations. (bwDraco at superuser)

When is it a good idea?