NZSM Online

Get TurboNote+ desktop sticky notes

Interclue makes your browsing smarter, faster, more informative

SciTech Daily Review

Webcentre Ltd: Web solutions, Smart software, Quality graphics

Spotlight

A New Approach to Statistics

Dave Saville and Graham Wood

Have you ever wondered about who writes maths text books and what motivates them to do it? For two New Zealand mathematicians, Graham Wood and myself, the aim was to reintroduce and illustrate some statistical ideas which had been tied up in complicated formulae for 60 years.

The statistical tests are used by researchers to examine questions such as "are males on average taller than females?" or "does it matter whether the milk is added to the coffee before or after the water?"

In the 1920s and 30s, a man named R. A. Fisher invented a whole battery of statistical tests which are now widely applied to problems in almost every field of research, from agricultural to social science. Fisher was very good at geometry, so applied his geometric skills to the statistical problems being encountered by the agricultural scientists at Rothamsted, an experimental station in England. The result was modern-day statistical methods.

Unfortunately geniuses are rarely understood by their peers, and this was certainly the case with Fisher. The solutions were obvious to him -- he could mentally see the geometric picture -- but they were not at all obvious to his peers. As a result, the statistical tests he invented were converted into recipes involving algebraic formulae. Sadly, lost from our view was the elegant geometry upon which his tests were based.

This is where Graham Wood and I come into the picture. We studied together at Otago University and later worked together when Graham was setting up a new applied statistics course at the University of Canterbury. We started teaching statistics using geometry, first here, and then in the US. The approach proved very popular, but there were no text books. So we started writing down this geometry in the simplest possible manner.

The result is a paperback text book Statistical Methods: A Geometric Primer (Springer-Verlag, 1996). The book is based around four major case studies: heights of males and females in twin pairs; their heights in the general population; selenium levels in Christchurch; and Christchurch air pollution levels in relation to the inversion effect.

The reason "everyone hates statistics" is that it's a whole lot of ad hoc recipes and the basic understanding of maths isn't taught. Our book explains the maths, so it's more real, and it makes sense of the recipes. Also it uses real life examples to clearly illustrate the usefulness of statistics.

Picturing Statistics

The simplest statistical test is the "paired samples t test". We might use this test when asking is there proof that, on average, males (M) are taller than females (F) in sets of mixed-sex twins?

Suppose we have a random sample of two such twin pairs: Janet and John, Bob and Betty. John is taller than Janet by 6cm, and Bob is taller than Betty by 9 cm. Can we tell from this small data set (6,9) whether, on average, M=F in height or does it support the idea that M>F?

To statistically analyse this using geometric ideas, we first plot one value against the other as a point on a two-dimensional graph, then join the point to the origin to form the "data vector" A  New Approach to Statistics Figure A (1KB)
.


Now if on average M=F, we would expect the data vectors to be spread all around the origin as each data value is as likely to be negative as positive, and each vector is as likely to point in any one direction as any other direction. Measuring a second set of twin pairs might show us that Stephen was smaller than Shelagh by 2 cm, and Peter taller than Paula by 4 cm, resulting in a data vector of A  New Approach to Statistics Figure D (1KB)
, and so on around the four quadrants of the graph (Fig 1).

A  New Approach to Statistics Figure B (8KB)
Figure 1

On the other hand, if on average M>F, then each data value is more likely to be positive than negative, so vectors in the first quadrant are more likely than vectors in the other three (Fig 2). If M>F then the direction of the 1:1 line is the "most likely" direction, and directions close to it are more likely than directions further away. (Incidentally, the cluster of points indicates the average height difference between the siblings.)

A  New Approach to Statistics Figure C (12KB)
Figure 2

The above reasoning gives us the idea of using the "closeness" of the data vector to the 1:1 line as a way of measuring the strength of the evidence against the idea M=F. The measure of closeness that we use is the angle between the data vector and the 1:1 line.

If in fact M=F in average height, then all angles between 0o and 180o are equally likely. The angle corresponding to our small data set (6,9) can be shown to be 11.3o. Therefore an angle as small or smaller than 11.3o will occur in only 11.3/180=0.063 of studies if it is in fact true that M=F in average height. In statistical jargon, 0.063 is the "p value" for a paired samples t-test of the "null hypothesis" M=F against the "alternative hypothesis" M>F.

In more ordinary language, if male and female twins are, on average, equal in height, there is only a 0.063 chance (1 in 16) of obtaining data which is as convincing from the viewpoint of the idea that M>F. One in sixteen is still fairly high, so we can be reasonably suspicious that male twins are more likely to be taller than their female counterparts, but not convinced beyond a shadow of a doubt.

Dave Saville is a biometrician for AgResearch at Lincoln
Graham Wood is now Professor of Mathematics at Central Queensland University in Rockhampton, Australia