Publication: Quantifying Performance Changes with Effect Size Confidence Intervals - School of Computing - University of Kent edit / delete

Tech report with more details of the statistics behind Tomas/Richard's approach. In particular, this describes how to do the same thing in either parametric or non-parametric ways, and gives some description of how badly the parametric approach performs when the underlying data isn't normally distributed (not very badly, as it turns out).

to benchmarking confidence effect-size non-parametric performance statistics ... on 26 March 2014

Rigorous Benchmarking in Reasonable Time - Kent Academic Repository edit / delete

Tomas Kalibera and Richard Jones' paper on how to do benchmarking that's actually meaningful -- presenting results as confidence intervals for effect sizes, with techniques to establish i.i.d. results and work out how many repetitions you need to do. Very nice work for a pretty short paper! (I've spent most of today chasing references from this in the interests of understanding the maths behind it...)

to benchmarking compiler confidence effect-size independence java performance reproducibility statistics vm ... on 26 March 2014

Statistically rigorous Java performance evaluation edit / delete

One of the papers that inspired Tomas/Richard's rigorous benchmarking work. This is a much simpler strategy, involving looking for overlapping confidence intervals -- which is statistically pretty dubious, but common in other disciplines...

to benchmarking confidence java performance research statistics ... on 26 March 2014

Producing wrong data without doing anything obviously wrong! edit / delete

Lots of examples of how environmental factors (e.g. environment variable size, room temperature, link order, ASLR...) can affect experimental results, to the tune of 20% or more. Basically: why pretty much any benchmark you've seen in a paper where the effect size isn't huge is probably nonsense.

to benchmarking compiler performance reproducibility research statistics ... on 26 March 2014

Statistics with Confidence edit / delete

Susan's stats tutorial (which I first saw at ICARIS 2009). Highly recommended for students who're doing performance measurement.

to ag0803 benchmarking honours performance science significance statistics ... on 19 March 2014

Perl, Python, Ruby, PHP, C, C++, Lua, tcl, javascript and Java comparison edit / delete

Comparison of lots of languages on a fairly simple string-handling problem. Interesting for the breadth of languages. I'd take his assertions with a large yellow roadside bin of salt, though.

to benchmarking language-design ... on 14 December 2013

How not to lie with statistics: the correct way to summarize benchmark results edit / delete

"Using the arithmetic mean to summarize normalized benchmark results leads to mistaken conclusions that can be avoided by using the preferred method: the geometric mean."

to ag0803 benchmarking maths statistics ... on 14 December 2013

How 3.6 nearly broke PostgreSQL [LWN.net] edit / delete

Specifically because a concurrency change made a big difference to PostgreSQL's internal scheduler. Has anyone tried benchmarking CCSP/TBB/etc. across multiple Linux versions?

to benchmarking concurrency linux performance ... on 13 October 2012

Programmers Need To Learn Statistics Or I Will Kill Them All edit / delete

This is fantastic. My students'll be getting a link to it next year, along with Susan's presentation along similar lines.

to benchmarking maths statistics teaching ... on 25 June 2011

IOScheduling - IA64wiki edit / delete

Comparing Linux IO schedulers -- work sponsored by Google.

to benchmarking io linux ... on 09 August 2008

Browser bookmarks: tasty+ | tasty= Log in | Export | Atom