« earlier | later » Page 3 of 5
Publication: Quantifying Performance Changes with Effect Size Confidence Intervals - School of Computing - University of Kent edit / delete
Tech report with more details of the statistics behind Tomas/Richard's approach. In particular, this describes how to do the same thing in either parametric or non-parametric ways, and gives some description of how badly the parametric approach performs when the underlying data isn't normally distributed (not very badly, as it turns out).
to benchmarking confidence effect-size non-parametric performance statistics ... on 26 March 2014
Rigorous Benchmarking in Reasonable Time - Kent Academic Repository edit / delete
Tomas Kalibera and Richard Jones' paper on how to do benchmarking that's actually meaningful -- presenting results as confidence intervals for effect sizes, with techniques to establish i.i.d. results and work out how many repetitions you need to do. Very nice work for a pretty short paper! (I've spent most of today chasing references from this in the interests of understanding the maths behind it...)
to benchmarking compiler confidence effect-size independence java performance reproducibility statistics vm ... on 26 March 2014
Statistically rigorous Java performance evaluation edit / delete
One of the papers that inspired Tomas/Richard's rigorous benchmarking work. This is a much simpler strategy, involving looking for overlapping confidence intervals -- which is statistically pretty dubious, but common in other disciplines...
to benchmarking confidence java performance research statistics ... on 26 March 2014
Producing wrong data without doing anything obviously wrong! edit / delete
Lots of examples of how environmental factors (e.g. environment variable size, room temperature, link order, ASLR...) can affect experimental results, to the tune of 20% or more. Basically: why pretty much any benchmark you've seen in a paper where the effect size isn't huge is probably nonsense.
to benchmarking compiler performance reproducibility research statistics ... on 26 March 2014
The Earth is Round (p < 0.5) edit / delete
An extremely grumpy study of statistical significance. This is worth reading in conjunction with Susan's stats tutorial, since it gives more examples of what significance actually means (and why it's probably not what you think it means).
to ag0803 significance statistics ... on 26 March 2014
Computing and Interpreting Effect Sizes - Springer edit / delete
A fairly grumpy study of effect size measurement -- this makes some good points, though.
to effect-size significance statistics ... on 26 March 2014
If we're so different, why do we keep overlapping? When 1 plus 1 doesn't make 2 edit / delete
Why looking for overlapping confidence intervals isn't the right thing to do when comparing two distributions (contrary to some modern benchmarking advice) -- you can have overlapping but also have a significant difference.
to confidence statistics ... on 26 March 2014
FAQ: Why is the Mann-Whitney significant when the medians are equal? edit / delete
A nice example of why the rank-sum test *doesn't* test whether the medians are the same. (Unless the distributions are otherwise very similar.)
to median non-parametric significance statistics ... on 26 March 2014
Statistics with Confidence edit / delete
Susan's stats tutorial (which I first saw at ICARIS 2009). Highly recommended for students who're doing performance measurement.
to ag0803 benchmarking honours performance science significance statistics ... on 19 March 2014
« earlier | later » Page 3 of 5
tasty by Adam Sampson.