R programming tips

This page will be an ongoing collection of tips and suggestions I find useful (or found out through much trial and effort) when using R. As a living document, it will start as a haphazard collection, but should it grow, I may re-order it.

Use a consistent coding style

I have mainly been following Hadley Wickham’s style guide, although I have not settled on a consistent variable and function naming schema as of yet. Another good resource is Paul Johnson’s brief exposition (PDF).

Benchmark your code

There are multiple ways to time code. Personally, I use the microbenchmark package. There is also the rbenchmark package, and the tried-and-true workhorse System.time(foo). Regardless of which you use, it can be illuminating to compare slightly different implementations. Which brings us to the next suggestion…

Profile slow code

Use R’s code profiling mechanisms, specifically Rprof, when dealing with slow code. Identifying the bottleneck and recoding it, or moving it into C++, can provide speed gains measured not in multiples but orders of magnitude!

Use R’s built-in optimized code as much as possible

This was not immediately obvious to me, but it makes sense. As an example, compare 1 – pnorm(4) with pnorm(4, lower.tail=FALSE). There is a small, but measurable speed increase seen in the latter, probably because the subtraction is happening in the C (or is it FORTRAN) routine and not at the R level. If you need to do several billion calls, this savings can become meaningful. I’ve tested it with many of the basic distributions (normal, lognormal, gamma, etc.) and, as a rule, it seems to hold. I’ll be keeping an eye out for this one in the future.

Compare methods when inverting matrices

In much of my code, I have to invert a matrix (Hessian of negative log-likelihood function at point of convergence to find fitted parameter variance-covariance matrix, if you really want to know). For various reasons, I used to use the QR factorization. Using a cholesky decomposition and then chol2inv was markedly faster in my cases.

Use a fast BLAS if possible

See this post and this update for more detail.
Test files used in 3.1.0 speed tests:

A.csv (16 MB)
B.csv (16 MB)

5 Responses

jot_en January 16, 2014 at 7:51 AM | Permalink

Hi,
thanks for tips.
P.s. Link for Paul Johnson’s brief exposition is broken.

Avraham January 16, 2014 at 10:42 AM | Permalink

My pleasure, and thanks for the heads up. It’s fixed now.

Reply

Michael January 16, 2014 at 5:54 PM | Permalink

Most of your tips focus on code optimisation so my tip would be:
Focus on writing clear, readable code. Don’t worry about optimtisation unless you have a performance problem.

MagicSpik3 August 10, 2015 at 9:13 AM | Permalink

I ditto that. But I’m concerned with research more than commercial applications.

Reply

Reduce Dependency Hell: from testthat to tinytest | Strange Attractors January 20, 2022 at 3:41 PM | Permalink

[…] intelligent packages like tinytest, easier than you may expect! I should probably add this to my list of R tips. As always, comments and suggestions are […]

Wandering around chaotically until beauty appears