Tomas Kalibera and Richard Jones' paper on how to do benchmarking that's actually meaningful -- presenting results as confidence intervals for effect sizes, with techniques to establish i.i.d. results and work out how many repetitions you need to do. Very nice work for a pretty short paper! (I've spent most of today chasing references from this in the interests of understanding the maths behind it...)

