R is the most popular data analysis language, but there is little concrete experimental data analyzing the time breakdown when running R programs. In this project, we addresses this limitation by systematically cataloging where time is spent when running R programs.

The results from this project can be used by

  1. Database researchers to better integrate R with database engines
  2. Architects to design micro-architectural features to improve the performance of R programs
  3. Programming languages researchers to consider techniques to improve the performance of R programs.

Datasets and Software

The software and datasets required for reproducing the experiments in the paper can be found below.

Dependencies and R source code changes

The dependecies and the R source code changes for running the experiments can be downloaded here.
Please begin with the README file for instructions.

R programs

The R scripts can be found here.

Datasets [compressed]

The compressed datasets can be found here.

Queries

In case of ANY queries on reproducing or running the experiments, please contact jignesh@cs.wisc.edu or shrirams@cs.wisc.edu