Statistics
Sep. 27th, 2005 04:25 pmIf you'd told me 3 weeks ago that I'd be voluntarily learning statistics and even enjoying it, I wouldn't have believe you. I blame
emperor. I think I owe
leonato an apology for being horrible about his subject. Unlike when I did (small amounts of) statistics in GCSE Maths, I have data I am interested in analysing and therefore stats has a point! I'm still not convinced it's Maths though and think that maybe it would be better taught in a subject other than Maths, like Geography when you're doing a project involving data (which is why I picked geography as our coursework involved data!). But I suppose the problem with that is that not everyone in a geography class will necessarily have the same amount of maths.*
On the whole I'm following it, but the following has left me confused as it seems absurd. A test result has been worked out to be -1.74 (this is for a Wilcoxon sign-rank test with sufficient pairs (35) for the distribution of W (the test result) to be almost normal). The author (Butler) writes:
We can ignore the sign, which is normally, negative. We know that values of 1.96 and 1.64 are required at the 5 per cent level for a non-directional or directional test, respectively. Therefore if our test was non-directional we must conclude that no significant difference has been demonstrated. If, on the other hand, we had made a direction prediction, we could claim significance at the 5 per cent level.
As I understand it, this is saying that if we were just trying to say that there was a difference between the two groups we cannot, but that if we had predicted that one group would be bigger/have done better than the other we cannot. This seems completely stupid because surely it ought to be easier to demonstrate a difference than which direction it is! Saying, we cannot be sure that it's different, but we can be sure it is higher makes no sense. Where have I gone wrong? I've stared at this too much now and am confused.
There's also an irritating point where the reasoning is left out!
U1=N1N2 +(N1(N1+1)/2)-R1
and
U2=N1N2+(N2(N2+1)/2)-R2
However, it can be shown, by rather tedious but fairly elementary algebra, that
U2=N1N2-U1.
This is the Mann-Whitney U-test. R1 and R2 are the sum of the ranks of the results (i.e. the results of the two groups are pooled and the results given a rank (with tied being the mean of the relevant number of ranks so, if the third, fourth and fifth results are the same, the all get the rank 4 and the next one is 6). I know that R1+R2 = ((N1+N2)(N1+N2))/2 because the sum of the ranks obviously has to add up to the triangular number for the total number of things ranked. But I can't make this be of help in working out why the simpler formula works. It's not really relevant, but I'm intrigued.
*With us Maths was setted from year 9, but for years 8 and 9 Geography was banded with History and Science,** so that there were three supposedly equal upper classes (a,b,c) and two supposedly equal lower classes(d,e). Thus it is likely that people from the top maths set where spread out over the groups a-c. At GCSE, I don't know about those who only took geography, I guess they were set, but 9 of us did both history and geography and so were the one geography set in the free option time so couldn't be setted!
**Due to timetabling constraints like the number of teachers available
On the whole I'm following it, but the following has left me confused as it seems absurd. A test result has been worked out to be -1.74 (this is for a Wilcoxon sign-rank test with sufficient pairs (35) for the distribution of W (the test result) to be almost normal). The author (Butler) writes:
We can ignore the sign, which is normally, negative. We know that values of 1.96 and 1.64 are required at the 5 per cent level for a non-directional or directional test, respectively. Therefore if our test was non-directional we must conclude that no significant difference has been demonstrated. If, on the other hand, we had made a direction prediction, we could claim significance at the 5 per cent level.
As I understand it, this is saying that if we were just trying to say that there was a difference between the two groups we cannot, but that if we had predicted that one group would be bigger/have done better than the other we cannot. This seems completely stupid because surely it ought to be easier to demonstrate a difference than which direction it is! Saying, we cannot be sure that it's different, but we can be sure it is higher makes no sense. Where have I gone wrong? I've stared at this too much now and am confused.
There's also an irritating point where the reasoning is left out!
U1=N1N2 +(N1(N1+1)/2)-R1
and
U2=N1N2+(N2(N2+1)/2)-R2
However, it can be shown, by rather tedious but fairly elementary algebra, that
U2=N1N2-U1.
This is the Mann-Whitney U-test. R1 and R2 are the sum of the ranks of the results (i.e. the results of the two groups are pooled and the results given a rank (with tied being the mean of the relevant number of ranks so, if the third, fourth and fifth results are the same, the all get the rank 4 and the next one is 6). I know that R1+R2 = ((N1+N2)(N1+N2))/2 because the sum of the ranks obviously has to add up to the triangular number for the total number of things ranked. But I can't make this be of help in working out why the simpler formula works. It's not really relevant, but I'm intrigued.
*With us Maths was setted from year 9, but for years 8 and 9 Geography was banded with History and Science,** so that there were three supposedly equal upper classes (a,b,c) and two supposedly equal lower classes(d,e). Thus it is likely that people from the top maths set where spread out over the groups a-c. At GCSE, I don't know about those who only took geography, I guess they were set, but 9 of us did both history and geography and so were the one geography set in the free option time so couldn't be setted!
**Due to timetabling constraints like the number of teachers available
no subject
Date: 2005-09-27 05:12 pm (UTC)...As I understand it, this is saying that if we were just trying to say that there was a difference between the two groups we cannot, but that if we had predicted that one group would be bigger/have done better than the other we cannot. This seems completely stupid because surely it ought to be easier to demonstrate a difference than which direction it is! Saying, we cannot be sure that it's different, but we can be sure it is higher makes no sense. Where have I gone wrong? I've stared at this too much now and am confused...
This is to do with the messy difference between one and two-sided p-values.
Lets say we are testing some number X against a Normal(0,1) distribution. We want to know if X is extreme compared to N(0,1). We define this by saying X is extreme if > 95% of all other possible samples are closer to 0 than X is.
If we know X > 0 say (a directional assumption) then we find the number Z such that 95% of N(0,1) is < Z. This is 1.64. We say X is extreme if X > 1.64
If we don't know X > 0, so X might be negative, we find Y such that the probability of a sample from N(0,1) being less than -Y or greater than +Y is 95% and say X is extreme if X < -Y or X > +Y. Y turns out to be 1.96.
The point being (and aren't you glad I got to a point...) is that defining X to be positive is assuming something is known about X. We then need less additional information to show that X is extreme because negative X has been discounted a priori. Hence we can say X is extreme if > 1.64 rather than > 1.96.
Of course what this actually shows is that naive Fisherian statistics is a load of bollocks ;-)
no subject
Date: 2005-09-27 05:44 pm (UTC)no subject
Date: 2005-09-27 05:52 pm (UTC)There's an error in your formula for R1+R2, which should be R1+R2 = (N1+N2)(N1+N2+1)/2.
Then if you do the tedious algebra (evaluate U1+U2 and sub in the expression for R1+R2) you'll find it works. As you've written it you get stray N1/2 + N2/2 terms.
Not knowing anything about the Mann-Whitney test (it was GCSE when I last did stats with real numbers), I had to read a bit before I understood what you meant, and found http://faculty.vassar.edu/lowry/ch11a.html useful.
no subject
Date: 2005-09-27 07:01 pm (UTC)Yeah, right! Unfortunately, reading the chapter on the &chi2 test has left me slightly more confused than before doing so (having had
There's an error in your formula for R1+R2, which should be R1+R2 = (N1+N2)(N1+N2+1)/2.
Indeed. That's what I meant to type, but got so lost in <sub></sub> that I lost it and having got a { instead of a > my proofreading was all taken up with sorting that out! (and was time limited by the library closing!)
Then if you do the tedious algebra (evaluate U1+U2 and sub in the expression for R1+R2) you'll find it works. As you've written it you get stray N1/2 + N2/2 terms.
Aaah. The trouble was I started trying to do this before I remembered the formula for R1+R2 and stuck the expressions for U1 and U2 into the simple formula too soon and then once I'd remembered the formula I was stuck because I couldn't get R1 and R2 to be positive on the same side!
no subject
Date: 2005-09-28 10:44 am (UTC)